Home Artificial Intelligence On Data-Driven Equation Discovery Motivation and historical perspectives Sparse system identification Physics informed deep learning identification Conclusion

On Data-Driven Equation Discovery Motivation and historical perspectives Sparse system identification Physics informed deep learning identification Conclusion

On Data-Driven Equation Discovery
Motivation and historical perspectives
Sparse system identification
Physics informed deep learning identification

Photo by ThisisEngineering RAEng on Unsplash

Describing the character with the assistance of analytical expressions verified through experiments has been an indicator of the success of science especially in physics from fundamental law of gravitation to quantum mechanics and beyond. As challenges similar to climate change, fusion, and computational biology pivot our focus toward more compute, there’s a growing need for concise yet robust reduced models that maintain physical consistency at a lower cost. Scientific machine learning is an emergent field which guarantees to supply such solutions. This text is a brief review of recent data-driven equation discovery methods targeting scientists and engineers acquainted with very basics of machine learning or statistics.

Simply fitting the info well has proven to be a short-sighted endeavour, as demonstrated by the Ptolemy’s model of geocentrism which was probably the most observationally accurate model until Kepler’s heliocentric one. Thus, combining observations with fundamental physical principles plays a giant role in science. Nonetheless, often in physics we forget the extent to which our models of the world are already data driven. Take an example of normal model of particles with 19 parameters, whose numerical values are established by experiment. Earth system models used for meteorology and climate, while running on a physically consistent core based on fluid dynamics, also require careful calibration to observations of a lot of their sensitive parameters. Finally, reduced order modelling is gaining traction in fusion and space weather community and can likely remain relevant in future. In fields similar to biology and social sciences, where first principle approaches are less effective, statistical system identification already plays a big role.

There are numerous methods in machine learning that allow predicting the evolution of the system directly from data. More recently, deep neural networks have achieved significant advances in the sector of weather forecasting as demonstrated by the team of Google’s DeepMind and others. That is partly attributable to the large resources available to them in addition to general availability of meteorological data and physical numerical weather prediction models which have interpolated this data over the entire globe because of data assimilation. Nonetheless, if the conditions under which data has been generated change (similar to climate change) there’s a risk that such fully-data driven models would poorly generalise. Because of this applying such black box approaches to climate modelling and other situations where we have now lack of information could possibly be suspect. Thus, in this text I’ll emphasise methods which extract equation from data, since equations are more interpretable and suffer less from overfitting. In machine learning speak we are able to discuss with such paradigms as high bias — low variance.

The primary method which deserves a mention is a seminal work by Schmidt and Lipson which used Genetic Programming (GP) for symbolic regression and extracted equation from data of trajectories of straightforward dynamical systems similar to double pendulum etc. The procedure consists of generating candidate symbolic functions, derive partial derivatives involved in these expressions and compare them with numerically estimated derivatives from data. Procedure is repeated until sufficient accuracy is reached. Importantly, as there’s a really large variety of potential candidate expressions that are relatively accurate, one choses those which satisfy the principle of “parsimony”. Parsimony is measured because the inverse of the variety of terms within the expression, whereas the predictive accuracy is measured because the error on withheld experimental data used just for validation. This principle of parsimonious modelling forms the bedrock of equation discovery.

The thought of Genetic Programming (GP) consists of exploring the space of possible analytical expressions by trying a family of potential terms. This expression is encoded within the tree above, whose structure will be represented as a type of “gene”. Recent trees are obtained by mutating sequences of those genes choosing and crossing over best candidates. As an example to acquire the equation within the box on the precise just follow the arrows within the hierarchy of the tree on the precise.

This method has the advantage of exploring various possible combos of analytical expressions. It has been tried in various systems, specifically, I’ll highlight AI — Feynman which with the assistance of GP and neural networks allowed to discover from data 100 equations from Feynman lectures on physics. One other interesting application of GP is discovering ocean parameterisations in climate, where essentially the next fidelity model is run to supply the training data, while a correction for cheaper lower fidelity model is discovered from the training data. With that being said, GP isn’t without its faults and human-in-the-loop was indispensable to be sure that the parameterisations work well. As well as, it will probably be very inefficient since it follows the recipe of evolution: trial and error. Are there other possibilities? This brings us to the strategy which has dominated the sector of equation discovery within the recent years.

Sparse Identification of Nonlinear Dynamics (SINDy) belongs to the family of conceptually easy yet powerful methods. It was introduced by the group of Steven L. Brunton alongside other groups and is supplied with well-documented, well-supported repository and youtube tutorials. To get some practical hands-on experience just check out their Jupyter notebooks.

I’ll describe the strategy following the unique SINDy paper. Typically, one has the trajectory data, which consists of coordinates similar to x(t), y(t), z(t), etc. The goal is to reconstruct first-order Atypical Differential Equations (ODEs) from data:

Typically, x(t) (sometimes called response function) is obtained from either observations or modelled data. The goal is then to estimate the optimal alternative for f = f(x) (right hand side of the ODE). Often, a library of monomials is tried and the algorithm proceeds with finding the sparse coefficient vector. Each element of the coefficient vector controls how vital this monomial contribution is to the entire expression.)
Here the function f = f(x) is represented because the product of the library of monomials times the sparsity vector. See the graphic below for further explanation.

Finite difference method (for instance) is often used to compute the derivatives on the left hand side of the ODE. Because derivative estimation is vulnerable to error this generates noise in the info which is usually unwanted. In some cases, filtering may help to take care of these problems. Next, a library of monomials (basis functions) is chosen to suit the precise hand side of the ODEs as described within the graphic:

Sparse Identification of Nonlinear Dynamics (SINDy) as depicted in [1]. The thought is to extract a small collection of basis function (e.g. monomials), a subset of the complete basis library, which satisfy the equation when the info is plugged in. On the left hand side the time derivatives are written (each column corresponds to different variable and every raw to the info sample which could possibly be time), while on the precise hand side there’s the premise library matrix (whose raw spans each basis function) multiplied by the sparsity vector , which is the item that’s being learned by the algorithm. Promoting sparsity implies that we would like to find yourself with most values of those vector set to zero, which corresponds to the principle of Parismony.

The issue is that unless we were in possession of astronomical amounts of information, this task can be hopeless, since many alternative polynomials would work just tremendous which might result in spectacular overfitting. Fortunately, that is where sparse regression involves rescue: the purpose is to penalise having too many lively basis functions on the precise hand side. This will be done in some ways. Certainly one of the methods on which the unique SINDy relied on is known as Sequential Threshold Least Squares (STLS) which will be summarised as follows:

The sparse representation code in Matlab from the supplementary material of the SINDy paper.

In other words, solve for the coefficients using standard least squares method after which eliminate the small coefficients sequentially while applying the least squares every time. The procedure relies on a hyperparameter which controls the tolerance for the smallness of the coefficients. This parameter appears arbitrary, nonetheless, one may perform what’s generally known as Pareto evaluation: determine this sparsification hyperparameter by holding out some data and testing how well the learned model performs on the test set. The reasonable value for this coefficient corresponds to the “elbow’’ within the curve of the accuracy vs complexity of the learned model (complexity = what number of terms it includes), the so-called Pareto front. Alternatively, another publications have promoted sparsity using information criteria as a substitute of performing Pareto evaluation described above.

As a simplest application of SINDy, consider how STLS will be used to successfully discover Lorenz 63 model from data:

Example of SINDy applied to identification of Lorenz 63 model . The coefficients (color) roughly correspond to those used to generate the training data. This data was generated by solving the associated ODE with those parameters.

The STLS has limitations when applied to the massive variety of degree of freedom systems similar to Partial Differential Equations (PDEs) and on this case one may consider dimensional reduction through Principal Component Evaluation (PCA) or nonlinear autoencoders etc. Later the the SINDy algorithm was further imporoved by PDE-FIND paper which introduced Sequential Threshold Ridge (STRidge). Within the latter ridge regression refers to regression with L2 penalty and in STRidge it’s alternated with elimination of small coefficients like in STLS. This allowed discovery of varied standard PDEs from the simulation data similar to Burgers’ equation, Korteweg–De Vries (KdV), Navier Stokes, Response-Diffusion and even a slightly peculiar equation one often encounters in scientific machine learning called Kuramoto — Sivashinsky, which is usually tricky attributable to the necessity to estimate its fourth derivative term directly from data:

Kuramoto-Sivashinksy equation describes diffusive-thermal instabilities in a laminar flame flow

Identification of this equation happens directly from the next input data (which is obtained by numerically solving the identical equation):

Solution of Kuramoto-Sivashinksy equation. Right panel displays the sector, while the precise panel its time derivative.

This isn’t to say that the strategy is error prone. In reality, one among the massive challenges of applying SINDy to realistic observational data is that it tends to be sparse and noisy itself and typically identification suffers in such circumstances. The identical issue also affects methods based on symbolic regression similar to Genetic Programming (GP).

Weak SINDy is a newer development which significantly improves the robustness of the algorithm with respect to the noise. This approach has been implemented independently by several authors, most notably by Daniel Messenger, Daniel R. Gurevich and Patrick Reinbold. The important idea is, slightly than discovering differential type of the PDE, to find its [weak] integral form by integrating the PDE over a set of domains multiplying it by some test functions. This permits integration by parts and thus removing tricky derivatives from the response function (unknown solution) of the PDE and as a substitute applying these derivatives to the test functions that are known. The tactic was further implemented in plasma physics equation discovery carried out by Alves and Fiuza, where Vlasov equation and plasma fluid models were recovered from the simulation data.

One other, slightly obvious, limitation of the SINDy approach is that identification is at all times restricted by the library of terms which form the premise, e.g. monomials. While, other sorts of basis functions will be used, similar to trigonometric functions, this still not general enough. Suppose the PDE has a type of a rational function where each numerator and denominator could possibly be polynomials:

A situation which makes application of algorithms similar to PDE-FIND complicated

That is the form of situation that will be in fact treated with Genetic Programming (GP) easily. Nonetheless, SINDy was also prolonged to such situations introducing SINDy-PI (parallel-implicit), which was used successfully to discover PDE describing Belousov-Zhabotinsky response.

Finally, other sparsity promoting methods similar to sparse Bayesian regression, also generally known as Relevance Vector Machine (RVM), were equally used to discover equations from data using precisely the identical fitting the library of terms approach, but benefiting from marginalisation and “Occam’s razor” principles which might be highly respected by statisticians. I’m not covering these approaches here but it surely suffices to say that some authors similar to Zhang and Lin have claimed more robust system identification of ODEs and this approach has been even tried for learning closures for easy baroclinic ocean models where the authors argued RVM appeared more robust than STRidge. As well as, these methods provide natural Uncertainty Quantification (UQ) for the estimated coefficients of the identified equation. With that being said, newer developments of ensemble SINDy are more robust and supply UQ but rely as a substitute on a statistical approach to bootstrap aggregating (bagging) also widely applied in statistics and machine learning.

Alternative approach to each solving and identifying coefficients of PDEs which has attracted enormous attention within the literature concerns Physics Informed Neural Networks (PINNs). The important idea is to parametrise the answer of the PDE using a neural network and introduce the equation of motion or other sorts of physics-based inductive biases into the loss function. The loss function is evaluated on a pre-defined set of the so-called “collocation points”. When performing gradient descent, the weights of the neural network are adjusted and the answer is “learned”. The one data that should be provided includes initial and boundary conditions that are also penalised in a separate loss term. The tactic actually borrows from older collocation methods of solving PDEs that weren’t based on neural networks. The proven fact that neural networks provide a natural way of automatic differentiation makes this approach very attractive, nonetheless, because it seems, PINNs are generally not competitive with standard numerical methods similar to finite volumes/elements etc. Thus as a tool of solving the forward problem (solving the PDE numerically) PINNs usually are not so interesting.

They turn out to be interesting as a tool for solving inverse problems: estimating the model via the info, slightly than generating data via a known model. In the unique PINNs paper two unknown coefficients of Navier-Stokes equation are estimated from data

The assumed type of Navier-Stokes equation that’s fed into the loss function of PINN. In consequence of identification the 2 unknown parameters are obtained (contained in the yellow boxes). For tensorflow implementation of PINNs see.

Looking back, when comparing to algorithms similar to PDE-FIND this seems slightly naive, as the overall type of the equation has already been assumed. Nevertheless, one interesting aspect of this work is that the algorithm isn’t fed the pressure, as a substitute incompressible flow is assumed and the answer for the pressure is recovered directly by the PINN.

PINNs have been applied in every kind of situations, one particular application I would love to focus on is space weather, where it was shown that their applications allows estimation of electron density within the radiation belts by solving inverse problem for Fokker-Planck equation. Here ensemble method (of re-training the neural network) seems to be handy in estimating the uncertainty. Eventually, to realize interpretability polynomial expansion of the learned diffusion coefficients is performed. It would definitely be interesting to check this approach with using directly something like SINDy, that might also provude a polynomial expansion.

The term physics-informed has been taken up by other teams, who sometimes invented their very own version of putting physics priors into neural nets and followed the formula of calling their approach something catchy like physics-based or physics-inspired etc. These approaches can sometimes be categorised as soft constraints (penalising not satisfying some equation or symmetry contained in the loss) or hard constraints (implementing the constraints into the architecture of the neural network). Examples of such approaches will be present in climate science, as an illustration, amongst other disciplines.

Given the backpropagation of neural nets provide another for estimating temporal and spatial derivatives, it seemed inevitable that the Sparse Regression (SR) or Genetic Programming (GP) can be coupled with these neural net collocation methods. While there are numerous of such studies, I’ll highlight one among them, DeePyMoD for relatively well-documented and supported repository. It’s sufficient to grasp how the strategy works to grasp all the opposite studies that got here around the identical time or later and improve upon it in various ways.

DeePyMoD framework: Solution of the PDE is parameterised via a feed-forward Neural Network (NN). In probably the most recent paper the loss function consists of two terms: a Mean Square Error (MSE) term between the info and the NN prediction; regularisation loss which penalises the functional type of the PDE including the lively library terms. Like in STLS of SINDy when the network converges to the answer the small terms within the sparsity vector are eliminated thus promoting only the biggest coefficients of the library. Then the training of the NN is repeated until satisfying convergence criterion.

The loss function consists of Mean Square Error (MSE):

and regularisation which promotes the functional type of the PDE

DeePyMoD is significantly more robust to noise even in comparison to weak SINDy and requires only a fraction of statement points on the spatio-temporal domain which is great news for locating equations from observational data. As an example, most of the standard PDEs that PDE-FIND gets right will also be identified by DeePyMoD but only sampling on the order of few thousand points in space incorporating noise dominated data. Nonetheless, using neural networks for this task comes at a value of longer convergence. One other issue is that some PDEs are problematic for the vanilla collocation methods, as an illustration the Kuramoto-Sivashinsky (KS) equation due its high order derivatives. KS is usually hard to discover from data without weak-form approaches, especially within the presence of noise. More moderen developments to assist this problem involve coupling weak SINDy approach with neural network collocation methods. One other interesting, practically unexplored query is how such methods are generally affected by non-Gaussian noise.

To summarise, equation discovery is a natural candidate for physics-based machine learning, which is being actively developed by various groups on this planet. It has found applications in lots of fields similar to fluid dynamics, plasma physics, climate and beyond. For a broader overview with the emphasis on another methods see the review article. Hopefully, the reader got a flavour of various methodologies which exist in the sector, but I actually have only scratched the surface by avoiding getting too technical. It’s important to say many recent approaches to physics-based machine learning similar to neural Atypical Differential Equations (ODEs).


  1. Camps-Valls, G. et al. Discovering causal relations and equations from data. Physics Reports 1044, 1–68 (2023).
  2. Lam, R. et al. Learning skillful medium-range global weather forecasting. Science 0, eadi2336 (2023).
  3. Mehta, P. et al. A high-bias, low-variance introduction to Machine Learning for physicists. Physics Reports 810, 1–124 (2019).
  4. Schmidt, M. & Lipson, H. Distilling Free-Form Natural Laws from Experimental Data. Science 324, 81–85 (2009).
  5. Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci Adv 6, eaay2631 (2020).
  6. Ross, A., Li, Z., Perezhogin, P., Fernandez-Granda, C. & Zanna, L. Benchmarking of Machine Learning Ocean Subgrid Parameterizations in an Idealized Model. Journal of Advances in Modeling Earth Systems 15, e2022MS003258 (2023).
  7. Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences 113, 3932–3937 (2016).
  8. Mangan, N. M., Kutz, J. N., Brunton, S. L. & Proctor, J. L. Model selection for dynamical systems via sparse regression and data criteria. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473, 20170009 (2017).
  9. Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations. Science Advances 3, e1602614 (2017).
  10. Messenger, D. A. & Bortz, D. M. Weak SINDy for partial differential equations. Journal of Computational Physics 443, 110525 (2021).
  11. Gurevich, D. R., Reinbold, P. A. K. & Grigoriev, R. O. Robust and optimal sparse regression for nonlinear PDE models. Chaos: An Interdisciplinary Journal of Nonlinear Science 29, 103113 (2019).
  12. Reinbold, P. A. K., Kageorge, L. M., Schatz, M. F. & Grigoriev, R. O. Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nat Commun 12, 3219 (2021).
  13. Alves, E. P. & Fiuza, F. Data-driven discovery of reduced plasma physics models from fully kinetic simulations. Phys. Rev. Res. 4, 033192 (2022).
  14. Zhang, S. & Lin, G. Robust data-driven discovery of governing physical laws with error bars. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 474, 20180305 (2018).
  15. Zanna, L. & Bolton, T. Data-Driven Equation Discovery of Ocean Mesoscale Closures. Geophysical Research Letters 47, e2020GL088376 (2020).
  16. Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: Robust sparse model discovery within the low-data, high-noise limit, with lively learning and control. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 478, 20210904 (2022).
  17. Raissi, M., Perdikaris, P. & Karniadakis, G. E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686–707 (2019).
  18. Markidis, S. The Old and the Recent: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers? Frontiers in Big Data 4, (2021).
  19. Camporeale, E., Wilkie, G. J., Drozdov, A. Y. & Bortnik, J. Data-Driven Discovery of Fokker-Planck Equation for the Earth’s Radiation Belts Electrons Using Physics-Informed Neural Networks. Journal of Geophysical Research: Space Physics 127, e2022JA030377 (2022).
  20. Beucler, T. et al. Enforcing Analytic Constraints in Neural Networks Emulating Physical Systems. Phys. Rev. Lett. 126, 098302 (2021).
  21. Each, G.-J., Choudhury, S., Sens, P. & Kusters, R. DeepMoD: Deep learning for model discovery in noisy data. Journal of Computational Physics 428, 109985 (2021).
  22. Stephany, R. & Earls, C. PDE-READ: Human-readable partial differential equation discovery using deep learning. Neural Networks 154, 360–382 (2022).
  23. Each, G.-J., Vermarien, G. & Kusters, R. Sparsely constrained neural networks for model discovery of PDEs. Preprint at https://doi.org/10.48550/arXiv.2011.04336 (2021).
  24. Stephany, R. & Earls, C. Weak-PDE-LEARN: A Weak Form Based Approach to Discovering PDEs From Noisy, Limited Data. Preprint at https://doi.org/10.48550/arXiv.2309.04699 (2023).


Please enter your comment!
Please enter your name here