## Abstract

This paper studies a sparse reconstruction-based approach to learn time–space fractional differential equations (FDEs), i.e., to identify parameter values and particularly the order of the fractional derivatives. The approach uses a generalized Taylor series expansion to generate, in every iteration, a feature matrix, which is used to learn the fractional orders of both, temporal and spatial derivatives by minimizing the least absolute shrinkage and selection operator (LASSO) operator using differential evolution (DE) algorithm. To verify the robustness of the method, numerical results for time–space fractional diffusion equation, wave equation, and Burgers' equation at different noise levels in the data are presented. Finally, the methodology is applied to a realistic example where underlying fractional differential equation associated with published experimental data obtained from an in vitro cell culture assay is learned.

## 1 Introduction

Fractional differential equations (FDEs) involve fractional derivatives. In contrast to traditional differential equations that model the rate of change of a system based on integer-order derivatives, FDEs generalize this concept. Fractional-order derivatives can provide more accurate descriptions of physical phenomena that exhibit nonlocal and non-Markovian behavior [1,2]. For example, many natural phenomena, such as the diffusion of particles in porous media [3], exhibit subdiffusive or super-diffusive behavior that cannot be accurately captured by traditional integer-order differential equations. FDEs can help to provide more accurate descriptions of these phenomena, which can be useful in a range of scientific and engineering applications. Additionally, FDEs have been used in the modeling of complex systems such as visco-elastic materials [4], non-Newtonian fluids, and fractal structures, among others [5,6]. Multiterm fractional differential equations involve more than one fractional derivative. These equations are commonly used to model complex systems that exhibit multiple time scales and memory effects, such as in mechanics [7]. Basset equation [8] and Bagley Torvik equation [7] are examples of multiterm fractional differential equations. Discovering (“learning”) these equations requires a deep understanding of the theory, experimental data and underlying phenomena associated with the dynamics being modeled.

Data-driven discovery of differential equations is a groundbreaking approach that leverages computational techniques and large datasets to uncover hidden mathematical relationships governing complex systems. In the field of learning differential equations, two main approaches are commonly employed: learning with known form and learning with unknown form. When dealing with known forms of differential equations, Brunton et al. [9] explored nonlinear systems through sparse identification. However, when implementing this approach in higher dimensions, computational limitations arise due to increased costs. To address the challenges of learning with known form of differential equations, researchers have turned to neural networks called physics-informed neural networks. These networks have been utilized to learn the parameters of differential equations with known forms. However, training these networks incurs a significant increase in computational costs. More recently, the work of Ren et al. [10] has emerged, focusing on learning differential equations with unknown form. This approach allows the discovery of the underlying form of differential equations directly from the data, providing a more flexible and exploratory framework for modeling complex systems.

Motivated by the wide-ranging applications of FDEs in diverse areas, researchers have begun to investigate methods for learning these types of equations too. Data-driven discovery of FDEs poses a challenge due to the complexity of the integral operators defining fractional order derivatives. Therefore, selecting an appropriate machine learning technique is crucial. In the learning-with-unknown-form approach, the objective is to solve differential equations when the specific form or structure is not known beforehand. This approach emphasizes exploration to find solutions or approximations without relying on predetermined mathematical expressions. Schaeffer [11] employed the Sparse reconstruction technique, utilizing Taylor expansion, to learn the partial differential equations (PDEs) using numerical optimization. Specifically, they utilized the Douglas-Rachford algorithm in their process. However, applying this algorithm to time and space fractional derivatives is challenging due to the variability in the feature matrix and the resulting nonlinearity of the system of equations. To address this issue, Singh et al. [12] introduced the use of differential evolution (DE) for minimizing the least absolute shrinkage and selection operator (LASSO) operator in order to learn time fractional partial differential equations (fPDE). Their method effectively learned the temporal Caputo fractional order and the underlying fPDE from given data, even in the presence of different levels of noise. This marked the first application of differential evolution for learning time fractional partial differential equations.

Differential evolution has gained widespread usage in solving optimization problems across various domains, including image processing [13], energy conservation [14], text classification [15], and electromagnetics [16]. Building upon the work of Singh et al. [12], we extend the application of differential evolution to the realm of time–space fractional differential equations where the aim is to learn the specific structure of the FDE. In related research on learning known-form-FDEs, Gulian et al. [17] employed Gaussian process regression to learn space fractional differential equations, Pang et al. [18] utilized physics-informed neural networks to learn time–space fractional differential equations, with a specific focus on fractional advection–diffusion equations and Rudy et al. [19] addressed learning equations using finite difference and polynomial approximation; however, this approach yielded poor performance when confronted with noisy data; in addition, their use of neural networks was restricted to specific types of fPDEs. Notably, there is currently no published result on learning time–space fractional differential equations directly from given data when the form of the FDE is not known. This highlights the gap in existing research and motivates the investigation into learning of time–space fractional differential equations using differential evolution. By leveraging the power of this optimization algorithm, we aim to overcome the limitations and address the challenges associated with learning these complex equations from data.

In this study, the research conducted by Singh et al. [12] is extended to address the problem of time–space fractional diffusion equation, wave equation, fractional Burgers' equation and an underlying fractional differential equation associated with measured data. One unique aspect of the approach is dynamic behavior of the feature matrix, which undergoes changes after each iteration. Differential evolution together with sparse reconstruction based approach proved to be particularly effective in handling the updates of the feature matrix.

This paper is structured as follows: Sec. 2 provides an overview of the sparse reconstruction technique applied to time–space fractional differential equations, utilizing the Taylor series approach and a feature matrix in every iteration. In Sec. 3, the application of differential evolution is discussed, along with modifications made to the algorithm. Section 4 presents the numerical experiments and simulation results, showcasing the effectiveness of the proposed approach through various examples. Section 5 offers concluding remarks and outline future directions for research.

## 2 Methods

### 2.1 Sparse Reconstruction of Time-Space Fractional Differential Equations.

*t*where

*m*is a positive integer is

*β*and

*γ*in space ($\beta =\alpha 1$ and $\gamma =\alpha 2$) is taken into consideration which is given by

*α,*and the space-fractional orders

*β*and

*γ*from the given solution $V$. Note that

*β*and

*γ*are not equal unless explicitly stated in the considered examples. The function $H$ is approximated by its second-order Taylor series expansion about the origin $0=(0,0,0)$

*x*,

_{i}*t*), $i=1\u2026m,\u2009j=1\u2026n$. These data are used to assemble the feature matrix for every

_{j}*t*given by

_{j}**a**,

*α*,

*β*, and

*γ*such that

*j*, the vector $Mj(\alpha )$ is defined by

*α*,

*β*,

*γ*, and

**a**is given by

where $\lambda >0$ is the regularization parameter in the *l*^{1} regularization. Note that *l*^{1} regularization promotes sparsity by imposing a constant preference (the derivative of the regularization term with respect to the parameter) for small parameter values. In that sense, feature selection is built into the *l*^{1} regularization technique. In *l*^{2} regularization, on the other hand, that preference for smaller parameter values itself becomes small as parameters approach zero. *l*^{2} regularization therefore imposes parameter values close to 0 but not exactly 0.

When dealing with time-fractional differential equations, the feature matrix depends only on *t* but here it also depends on the space fractional orders which adds some complexity. Differential evolution deals with that efficiently by updating the feature matrix after each iteration by generating space fractional orders.

### 2.2 Differential Evolution.

To find minimizers of (2.6), DE is used. DE is an iterative optimization algorithm developed by Storn and Price [20] designed to solve nonlinear optimization problems. It has a unique advantage over other optimization methods, such as Gradient Descent, as it does not require the objective function to be differentiable which vastly facilitates the numerical treatment of (2.6). This algorithm is used to find minimizers $(\alpha ,\beta ,\gamma $, **a**) of the optimization problem (2.6) as follows:

Initially, the population matrix $\Omega 0\u2208RN\xd7M$ ($RN\xd7M$ denotes the space consisting of matrices of dimension *N *×* M* with each element from **R**) sampling *N *=* *50 sets of parameters *α _{i}*,

*β*,

_{i}*γ*, $ai$ ($1\u2264i\u2264N$) is assembled. If $\beta =\gamma $, then

_{i}*M*=

*12 otherwise*

*M*=

*13. Each parameter is sampled from a uniform distribution in a specific interval, e.g., $(\alpha 1,\alpha 2),\u2009(\beta 1,\beta 2)$. This differs from traditional DE, where typically the same admissible interval is used for all parameters. Let $\omega i0\u2208RM$ ($RM$ is the row vector of*

*M*elements with each element from

**R**) denote the parameter set

*α*,

_{i}*β*,

_{i}*γ*, $ai$, i.e., $\omega ig,\u20091\u2264i\u2264N$ are the rows of the population matrix $\Omega g$ where $g=0,1,\u2026$ denotes the generation.

_{i}Then, the following steps are iterated until a stopping condition is reached:

The standard mutation and crossover steps of DE are applied. In the mutation process, for each parameter set $\omega ig$ termed target vector in the population matrix, three other parameter sets of the same generation

*g*are selected randomly. The component-wise difference of the first two, multiplied by a scalar weight, is added to the third one which yields the so-called donor vector $\nu ig$. In every iteration, the weight or scaling factor is sampled from a uniform distribution in the interval $[0.2,0.8]$. Every component of the donor vector is compared to the lower bound of its admissible interval. If a given component of $\nu ig$ is smaller than the lower bound, then this value of the donor vector is replaced by its lower bound. Analogously, if any component of the donor vector is larger than its upper bound, it is replaced by the upper bound.- In the crossover step, a trial vector is obtained for each target vector $\omega ig$ in the population matrix. For this step, let $Yi,j$ ($1\u2264i\u2264N,\u20091\u2264j\u2264M$) be sampled from the uniform distribution on the unit interval,
*J*be an integer value sampled from the uniform distribution of integers between 1 and_{r}*M*, and*C*is the recombination probability (in this study we use $Cp=0.9$). Then, the trial vector $uig$ is given by_{p}$ui,jg={vi,jgif\u2003Yi,j\u2264Cp\u2003or\u2003j=Jr\u2009\omega i,jgotherwise$ For every target vector $\omega ig$ and every trial vector $uig$, the feature matrix $Pj\beta ,\gamma $ is calculated using the exact solution at the points (

*t*,_{j}*x*). This differs from the case of time-fractional differential equations, where the feature matrix doesn't change between iterations due to the absence of space-fractional derivatives._{i}- Using the LASSO operator (2.6), the error values $J(\omega ig)$ and $J(uig)$ for all target vectors and all trial vectors are calculated. Based on these error values, the parameter sets $\omega ig+1$ are as follows:$\omega ig+1={\omega igif\u2003J(\omega ig)<J(uig)uigif\u2003J(\omega ig)\u2265J(uig)$
The next iteration is started by going back to 1 unless a threshold condition is reached. In the case of non-noisy data, iterations are stopped as soon as at least for one parameter set the error value $J$ is below a given threshold value; otherwise, with noisy data, the procedure after 1000 iterations is stopped.

From the last generation *g*, the parameter set $\omega ig$ with lowest approximation error $J(\omega ig)$ is selected. It contains the fractional orders and coefficients determining the shape $H$ of the learned equation. By following these steps, the DE algorithm is utilized to effectively learn the desired fractional orders and coefficients, and to recover the original form of the equation. Figure 1 summarizes the above procedure.

## 3 Results

To illustrate the effectiveness of the proposed method (discussed in Secs. 2.1 and 2.2), numerical examples are presented. For the first example, an equation with spatial derivatives of Riemann–Liouville (RL) type is considered, while in the second example, an equation involving Riesz fractional derivative in space is considered. By considering different types of spatial fractional derivatives in these two examples, the robustness of the proposed method across various noise levels is assessed.

The third example is more intricate since it includes nonlinear terms in the equation. This is used to explore a larger number of spatial fractional orders, as compared to the preceding examples. In the last example, the data describing the Fisher-KPP equation are considered and the underlying differential equation is learned using the approach described in Secs. 2.1 and 2.2. Note that in first three numerical test cases, the regularization parameter is *λ* = 100 while in the last two examples, regularization parameter $\lambda =10\u22127$ is chosen. All simulations are conducted using MATLAB on a 12th Gen Intel(R) Core(TM), i5-1235 U, 1.30 GHz processor, with 16 GB RAM. These computational resources are chosen to ensure efficient and reliable execution of the simulations.

### 3.1 Time–Space Fractional Diffusion Equation.

*u*(

*x*,

*t*) is the concentration of the quantity undergoing diffusion,

*ν*is the diffusion coefficient,

*α*and

*β*are the fractional orders of the time and space derivatives such that $0<\alpha <1$ and $1<\beta <2$, respectively, and

*g*(

*x*,

*t*) is the given, known source term that represents any additional external influence or forcing on the system. Consider the following time–space fractional boundary value problem of diffusion type for

*t*>

*0 on the domain $0<x<1$:*

*γ*where $0<\alpha <1,\u20091<\beta <2$ and $1<\gamma <2$ are to be learned from the exact solution $V(x,t)$ of (3.1). Note that in this example, $\beta =\gamma $. The approximated value of the left side operator in Eq. (3.1) with the help of the exact solution $V(x,t)$ is given by

where $R(x,t)=g(x,t)$ and the coefficients *a _{i}* are calculated as in Eq. (2.4). The quantities $V0,x\beta (x,t)=t10\Gamma (2)\Gamma (2\u2212\beta )x1\u2212\beta \u2212t10\Gamma (3)\Gamma (3\u2212\beta )x2\u2212\beta $ and $V0,x\gamma (x,t)=t10\Gamma (2)\Gamma (2\u2212\gamma )x1\u2212\gamma \u2212t10\Gamma (3)\Gamma (3\u2212\gamma )x2\u2212\gamma $ in Eq. (3.2) are RL fractional derivatives where $V(x,t)$ is the exact solution of Eq. (3.1). In the numerical discretization on the uniform grid

*t*,

_{j}*x*, the values $R(xi,tj)=g(xi,tj)+\xi i$ where

_{i}*ξ*is normally distributed with mean 0 and variance $\sigma 2$ are calculated. It represents random noise in observing discrete values of the solution $V$ on the grid (

_{i}*x*,

_{i}*t*) and in numerically evaluating fractional derivatives to setup the feature matrix $Pj\beta ,\gamma $. It is evident from Sec. 2.1 that the feature matrix $Pj\beta ,\gamma $ changes in every iteration. To test the method, the exact solution is used to generate data points on a uniform grid of 200 × 200 points. By applying the method to the generated data at different noise levels, the algorithm succeeds in determining the values of the time and space fractional orders, as well as the form of the diffusion equation represented by the coefficients

_{j}**a**. Table 1 provides a summary of the learned values at different noise levels. At all noise levels, the learning algorithm correctly identifies the form of (3.1), namely, the order of the time derivative being 0.5 and the spatial derivative being of order 1.5.

σ (noise) | α | β | γ | a (rounded to five decimal places) |
---|---|---|---|---|

0 | 0.49018 | 1.50170 | 1.50170 | $[0.00206,0,0.46910,0.50508,0.01842,0.00179,0.00737,0.00156,0,0]$ |

0.05 | 0.45720 | 1.52330 | 1.52330 | $[0,0,0.19792,0.75160,0,0,0,0,0,0]$ |

0.10 | 0.45699 | 1.52350 | 1.52350 | $[0,0,0.13174,0.93610,0,0,0,0,0,0]$ |

σ (noise) | α | β | γ | a (rounded to five decimal places) |
---|---|---|---|---|

0 | 0.49018 | 1.50170 | 1.50170 | $[0.00206,0,0.46910,0.50508,0.01842,0.00179,0.00737,0.00156,0,0]$ |

0.05 | 0.45720 | 1.52330 | 1.52330 | $[0,0,0.19792,0.75160,0,0,0,0,0,0]$ |

0.10 | 0.45699 | 1.52350 | 1.52350 | $[0,0,0.13174,0.93610,0,0,0,0,0,0]$ |

*β*and

*γ*in Table 1. The sum of the specific coefficients $a3=V0,x\beta (0)$ and $a4=V0,x\gamma (0)$ at all noise levels is close to 1, which is the coefficient of the spatial derivative in (3.1), yet they differ between noise levels. This is due to the absence of strict convexity in the

*l*

^{1}regularization term, which potentially admits nonunique minimizers of (2.6). The learned equation in case of noise level 0.05 is given by

The obtained results illustrate the effectiveness and robustness of the method for learning the time–space fractional diffusion equation, even in the presence of noise where noise is incorporated on the left side of (3.2).

### 3.2 Time–Space Fractional Wave Equation.

*u*(

*x*,

*t*) is given by

where $t\u22650$ represents time and $x\u2208\Omega \u2282R$ represents space coordinates. The constant *c* represents the wave speed. Here, $1<\alpha <2$ and $1<\beta <2$. The right-hand side of the equation includes the term *g*(*x*, *t*), which represents a nonhomogeneous forcing function. The specific form of the forcing function depends on the particular problem or system under consideration. Some applications lie in symmetry breaking [24] and in anomalous transport [25].

Note that here $\beta =\gamma $. Similar to the approach in Sec. 4.1, the aim is to learn the structure of the PDE within the class of Eq. (2.4) where $a3=V0,x\beta (0)$ and $a4=V0,x\gamma (0)$ in the feature matrix $Pj\beta ,\gamma $ are interpreted as Riesz derivatives. Precise Riesz-type fractional derivatives of order $1<\beta <2$ in Eq. (2.4) of the exact solution $V$ of Eq. (3.4) are given by $V0,x\beta (x,t)=\u2212(t3.5+t+2)12\u2009cos(\beta \pi /2)(\Gamma (2)\Gamma (0.5)x\u22120.5\u2212\Gamma (3)\Gamma (1.5)x0.5)\u2212(t3.5+t+2)12\u2009cos(\beta \pi /2)(\Gamma (2)\Gamma (0.5)(1\u2212x)\u22120.5\u2212\Gamma (3)\Gamma (1.5)(1\u2212x)0.5)$ which are used to generate the feature matrix in every iteration of the differential evolution algorithm.

**a**at various noise levels. The learned value of

*α*is close to 1.5 and the values of both,

*β*and

*γ*are also close to the exact value 1.5, the sum of their weights, $a3+a4$ being close to 1 at all noise levels. The individual values

*a*

_{3}and

*a*

_{4}, however, differ due to the lack of uniqueness of minimizers. This indicates that the learning algorithm correctly identifies the presence of a single spatial derivative of order 1.5. The results, summarized in Table 2, illustrate the robustness of the approach in handling noise, which is incorporated on the left side of Eq. (2.2) in the same way as above in section. The learned equation in case of noise level 0.05 (ignoring the coefficients of the terms not involved in wave equation) is given as

Noise . | α
. | β
. | γ
. | a (rounded to five decimal places)
. |
---|---|---|---|---|

0 | 1.45780 | 1.50220 | 1.50220 | $[0,0.00071,0.00422,0.98068,0.00592,0,0.00068,0.024813,0.00124,0.00007]$ |

0.05 | 1.40290 | 1.50620 | 1.50620 | $[0,0,0.03928,0.94716,0,0,0,0.01991,0.01775,0]$ |

0.10 | 1.40440 | 1.50620 | 1.50620 | $[0,0,0.93227,0.05431,0,0,0,0.01962,0.017312,0]$ |

Noise . | α
. | β
. | γ
. | a (rounded to five decimal places)
. |
---|---|---|---|---|

0 | 1.45780 | 1.50220 | 1.50220 | $[0,0.00071,0.00422,0.98068,0.00592,0,0.00068,0.024813,0.00124,0.00007]$ |

0.05 | 1.40290 | 1.50620 | 1.50620 | $[0,0,0.03928,0.94716,0,0,0,0.01991,0.01775,0]$ |

0.10 | 1.40440 | 1.50620 | 1.50620 | $[0,0,0.93227,0.05431,0,0,0,0.01962,0.017312,0]$ |

Fig. 2 shows the contour plot of the exact solution and solution of learned time space fractional wave Eq. (3.5). This numerical example demonstrates that the algorithm is effective at learning the symmetric Riesz fractional derivative in addition to nonsymmetric fractional derivatives of Riemann–Liouville and Caputo type.

### 3.3 Time-Space Fractional Burgers' Equation.

Here, the equation incorporates fractional derivatives with respect to both time *t* and space *x* where $t\u22650,\u2009x\u2208\Omega \u2282R,\u20090<\alpha <1$, $0<\beta <1$, and $1<\gamma <2$. The parameters *ν* and *λ* represent the coefficients for diffusion and flux, respectively. The function *g*(*x*, *t*) denotes a source or forcing term. The time–space fractional Burgers' equation finds applications in various fields where the dynamics of fractional diffusion processes is involved.

where $g(x,t)=10\pi x2\Gamma (2)\Gamma (1.5)t0.5\u2212\pi t\Gamma (3)\Gamma (1.5)x0.5\u2212100\pi 2t2x3.5\Gamma (3)\Gamma (2.5)$. An exact solution of (3.7) is $V(x,t)=10\pi x2t$ when $\alpha =\beta =0.5,\u2009\gamma =1.5$.

*x*and

*t*axes, using a uniform grid size in every iteration of the differential evolution algorithm. Using the learning algorithm, the values of the time and space fractional orders,

*α*,

*β,*and

*γ*as well as the form of the Burgers' equation represented by $a4\u22480.1$ and $a8\u22481$ are successfully learned with all other coefficients close to zero. We incorporate the effect of noise in the same way as considered in Secs. 3.1 and 3.2. Table 3 provides a summary of the learned values with different noise levels. The learned equation (ignoring the terms not involved in Burgers' equation) is given as

Noise . | α
. | β
. | γ
. | a (rounded to five decimal places)
. |
---|---|---|---|---|

0 | 0.49985 | 0.51122 | 1.48790 | $[0,0,0,0.10043,0.036685,0.00002,0,0.96799,0,0]$ |

0.05 | 0.45320 | 0.52300 | 1.46299 | [0,0,0,0.10089,0.03764,0,0,0.95299,0,0] |

0.10 | 0.45098 | 0.52486 | 1.46010 | [0,0,0,0.10120,0.03781,0,0,0.95105,0,0] |

Noise . | α
. | β
. | γ
. | a (rounded to five decimal places)
. |
---|---|---|---|---|

0 | 0.49985 | 0.51122 | 1.48790 | $[0,0,0,0.10043,0.036685,0.00002,0,0.96799,0,0]$ |

0.05 | 0.45320 | 0.52300 | 1.46299 | [0,0,0,0.10089,0.03764,0,0,0.95299,0,0] |

0.10 | 0.45098 | 0.52486 | 1.46010 | [0,0,0,0.10120,0.03781,0,0,0.95105,0,0] |

The obtained results strongly indicate the robustness of the method for learning even nonlinear time–space fractional equations.

### 3.4 Time–Space Fractional Fisher–Kolmogorov– Petrovsky–Piskunov Equation.

*D*is the diffusion constant,

*r*is the growth rate of the chosen species, and

*K*is the carrying capacity. For a numerical test, the one-dimensional Fisher–KPP equation for the function

*u*(

*x*,

*t*) as a model for tumor growth on the domain $\u221250\u2264x\u226450,\u2009t\u22650$ is given by

*t*refers to time (in day “d”) and

*x*refers to position (in “cm”). The parameters

*D*and

*r*have units $cm2d\u22121$ and $d\u22121$, respectively. The initial condition for (3.9) is prescribed as

**a**are 1 and −1, respectively. Also, the upper and lower bounds for ${\alpha ,\u2009\beta ,\u2009\gamma}$ are {1, 1, 2} and {0, 0, 1}, respectively. The learning algorithm successfully learned the time and space fractional orders with the particular form of Fisher–KPP equation from the simulated data (ignoring the almost zero coefficients) given as

The solution of Eq. (3.10) is “learned simulated solution.” Table 4 summarizes the learned parameters of Fisher–KPP equation for different levels of noise. The identified fractional orders *α* and *γ* are very close to 1 and 2 (classical orders) although the algorithm runs over a range of values for $\alpha \u2208(0,1)$ and $\beta \u2208(1,2)$. The learned value of *β* is close to 1, indicating that the fractional derivative of order *β* with respect to *x* exists but since the coefficient of *u _{x}* is 0, this derivative is not playing any role. The values of

*D*and

*r*are correctly identified and noninvolved terms have coefficient 0 when rounded to 2 decimal places. Figure 3 shows that the simulated solution of (3.9) and the solution of the Fisher–KPP equation with the learned parameters (learned simulated solution) are very close. These results claim that the learning method is very accurate in determining the Fisher–KPP equation (3.9) even in the case when the exact solution is not known and a discrete solution is provided on the grid.

Noise | α | β | γ | a (rounded to five decimal places) |
---|---|---|---|---|

0 | 0.99999 | 0.99999 | 1.99999 | $[0,0.10582,0,0.01708,\u22120.10584,0,\u22120.00755,0,\u22120.00198,0]$ |

0.05 | 0.96453 | 0.97775 | 1.96988 | [0,0.09980,0,0.01698,−0.10776,0,0,0,0,0] |

0.10 | 0.96200 | 0.97639 | 1.96380 | [0,0.09899,0,0.01652,−0.10834,0,0,0,0,0] |

Noise | α | β | γ | a (rounded to five decimal places) |
---|---|---|---|---|

0 | 0.99999 | 0.99999 | 1.99999 | $[0,0.10582,0,0.01708,\u22120.10584,0,\u22120.00755,0,\u22120.00198,0]$ |

0.05 | 0.96453 | 0.97775 | 1.96988 | [0,0.09980,0,0.01698,−0.10776,0,0,0,0,0] |

0.10 | 0.96200 | 0.97639 | 1.96380 | [0,0.09899,0,0.01652,−0.10834,0,0,0,0,0] |

### 3.5 Learning a Fractional Fisher–Kolmogorov–Petrovsky– Piskunov Equation From Data.

*μ*m. Note that for numerical convenience, we convert the units to mm and days, respectively. The new step sizes are $0.5\u2009d$ and $0.05\u2009mm$. The data at two different times is shown in Fig. 4. To facilitate taking numerical derivatives the data are smoothened using kernel smoothing and evaluated on a much finer grid with step sizes of $0.0091667d$ and 0.0159 mm. The smoothened data are shown in Fig. 4. In this application, we assume that only one spatial derivative may be fractional and we keep the order of the time derivative at 1. To this end, we impose $a3=a6=a8=a10=0$ in (2.4) to avoid the presence of fractional derivatives of order

*β*. The value of the space fractional derivative of order between 1 and 2 is calculated by taking a classical derivative of a spatial fractional derivative of order between 0 and 1. Table 5 summarizes the learned parameters in this case. The learned equation is

α | γ | a (rounded to five decimal places) |
---|---|---|

1 | 1.67585 | $[417.71330,1.65177,0,0.02151,\u22120.00126,0,0,0,0,0]$ |

α | γ | a (rounded to five decimal places) |
---|---|---|

1 | 1.67585 | $[417.71330,1.65177,0,0.02151,\u22120.00126,0,0,0,0,0]$ |

Equation (3.11) is numerically simulated using the finite difference method where the spatial fractional derivative is approximated using the fractional trapezoidal method due its higher order of accuracy as compared to the L2 scheme [31]. The (classical) time derivative is approximated using the forward difference approach. As initial and boundary values, we use the values given by the smoothened data. The resulting solution of Eq. (3.11) is what we call “learned solution from the smoothened data,” which is shown in Fig. 4. It compares well with the data and we therefore conclude that the algorithm is successful in learning the equation underlying this phenomenon.

## 4 Discussion and Conclusion

In this study, we have proposed a novel approach for learning time–space fractional differential equations using a combination of sparse reconstruction and differential evolution. The objective of the research is to accurately capture the dynamics of time–space fractional diffusion, wave, and Burgers' equations, both in the presence and absence of noise. Through numerical experiments, the effectiveness and accuracy of the proposed approach is demonstrated. The combination of sparse reconstruction techniques helped to efficiently extract relevant features from the given data, while the application of differential evolution facilitated the optimization process for determining the parameters of the fractional differential equations. The learned models are able to accurately capture the intricate dynamics of time–space fractional systems, even in the presence of noisy input data. A real-world example is also included and the underlying differential equation is learned. The solution to this equation approximates the data well although we do not attempt to minimize the error, but the residual of the underlying equation when evaluated at the (smoothened) data. The advantage of this approach is that we never have to solve numerically the forward problem. This suggests that the proposed approach is well-suited for real-world scenarios where data may be subject to noise and uncertainty.

Looking ahead, there are several promising directions for future research. First, the approach can be extended to higher dimensions, allowing for the modeling and analysis of complex systems in multiple spatial dimensions. Second, the integration of additional constraints and regularization techniques could further enhance the accuracy and stability of the learned models. Third, the applicability of the methodology can be extended to other types of real-world data as well. Fourth, the concept of learning the structure of fractional models underlying given data should also be evaluated through classical error minimization in combination with inverse problem regularization techniques. Finally, in the presence of high noise in real world data, proper smoothing techniques need to be implemented in both dimensions to calculate time and space fractional orders of derivatives, which is a notable challenge that we wish to overcome in the near future.

In conclusion, the research demonstrates the effectiveness of the sparse reconstruction and differential evolution approach in learning time–space fractional differential equations. The ability to accurately learn and analyze time–space fractional diffusion, wave, and Burgers' equations opens up new avenues for applications in diverse fields, such as physics, engineering, and finance. The findings presented in this study contribute to the field of fractional calculus and provide a solid foundation for further advancements in the understanding and modeling of fractional systems.

## Funding Data

Department of Science and Technology, India (Grant No. SR/FST/MS-1/2019/45; Funder ID: 10.13039/501100001409).

## Data Availability Statement

The authors attest that all data for this study are included in the paper.

## References

**41**(4).10.1137/18M1204991