## 2.1 Problem π―

In the applying of Physics-Informed Neural Networks (PINNs), it comes as no surprise that the neural network hyperparameters, akin to network depth, width, the selection of activation function, etc, all have significant impacts on the PINNsβ efficiency and accuracy.

Naturally, people would resort to **AutoML** (more specifically, neural architecture search) to mechanically discover the optimal network hyperparameters. But before we are able to do this, there are two questions that should be addressed:

- How you can effectively navigate the vast search space?
- How you can define a correct search objective?

This latter point is as a result of the incontrovertible fact that PINN is frequently seen as an βunsupervisedβ problem: no labeled data is required because the training is guided by minimizing the ODE/PDE residuals.

To raised understand those two issues, the authors have conducted extensive experiments to analyze the PINN performanceβs sensitivity with respect to the network structure. Letβs now take a have a look at what they’ve found.

## 2.2 Solution π‘

The primary idea proposed within the paper is that **the training loss might be used because the surrogate for the search objective**, because it highly correlates with the ultimate prediction accuracy of the PINN. This addresses the problem of defining a correct optimization goal for hyperparameter search.

The second idea is that **there isn’t any have to optimize all network hyperparameters concurrently**. As a substitute, we are able to adopt a **step-by-step decoupling strategy** to, for instance, first seek for the optimal activation function, then fix the selection of the activation function and find the optimal network width, then fix the previous decisions and optimize network depth, and so forth. Of their experiments, the authors demonstrated that this strategy could be very effective.

With those two ideas in mind, letβs see how we are able to execute the search intimately.

To start with, which network hyperparameters are considered? Within the paper, the beneficial search space is:

**Width**: variety of neurons in each hidden layer. The considered range is [8, 512] with a step of 4 or 8.**Depth**: variety of hidden layers. The considered range is [3, 10] with a step of 1.**Activation function**: Tanh, Sigmoid, ReLU, and Swish.**Changing point**: the portion of the epochs using Adam to the overall training epochs. The considered values are [0.1, 0.2, 0.3, 0.4, 0.5]. In PINN, itβs a standard practice to first use Adam to coach for certain epochs after which switch to L-BFGS to maintain training for some epochs. This changing point hyperparameter determines the timing of the change.**Learning rate**: a set value of 1e-5, because it has a small effect on the ultimate architecture search results.**Training epochs**: a set value of 10000, because it has a small effect on the ultimate architecture search results.

Secondly, letβs examine the proposed procedure intimately:

- The primary search goal is the
*activation function*. To attain that, we sample the width and depth parameter space and calculate the losses for all width-depth samples under different activation functions. These results may give us ideas of which activation function is the dominant one. Once decided, we fix the activation function for the next steps.

- The second search goal is the
*width*. More specifically, we’re searching for a few width intervals where PINN performs well.

- The third search goal is the
*depth*. Here, we only consider width various throughout the best-performing intervals determined from the last step, and we would really like to search out the very best K width-depth mixtures where PINN performs well.

- The ultimate search goal is the
*changing point*. We simply seek for the very best changing point for every of the top-K configurations identified from the last step.

The consequence of this search procedure is **K different PINN structures**. We are able to either select the best-performing one out of those K candidates or just use all of them to form a K-ensemble PINN model.

Notice that several tuning parameters should be laid out in the above-presented procedure (e.g., variety of width intervals, variety of K, etc.), which might rely upon the available tuning budget.

As for the particular optimization algorithms utilized in individual steps, off-the-shelf AutoML libraries might be employed to finish the duty. For instance, the authors within the paper used Tune package for executing the hyperparameter tuning.

## 2.3 Why the answer might work π οΈ

By decoupling the search of various hyperparameters, the dimensions of the search space might be greatly decreased. This not only substantially decreases the search complexity, but additionally significantly increases the possibility of locating a (near) optimal network architecture for the physical problems under investigation.

Also, using the training loss because the search objective is each easy to implement and desirable. Because the training loss (mainly constituted by PDE residual loss) highly correlates with the PINN accuracy during inference (based on the experiments conducted within the paper), identifying an architecture that delivers minimum training loss may also likely result in a model with high prediction accuracy.

## 2.4 Benchmark β±οΈ

The paper considered a complete of seven different benchmark problems. All problems are forward problems where PINN is used to unravel the PDEs.

- Heat equation with Dirichlet boundary condition. The sort of equation describes the warmth or temperature distribution in a given domain over

time.

- Heat equation with Neumann boundary conditions.

- Wave equation, which describes the propagation of oscillations in an area, akin to mechanical and electromagnetic waves. Each Dirichlet and Neumann conditions are considered here.

- Burgers equation, which has been leveraged to model shock flows, wave propagation in combustion chambers, vehicular traffic movement, and more.

- Advection equation, which describes the motion of a scalar field because it is advected by a known velocity vector field.

- Advection equation, with different boundary conditions.

- Response equation, which describes chemical reactions.

The benchmark studies yielded that:

- The proposed Auto-PINN shows stable performance for various PDEs.
- For many cases, Auto-PINN is capable of discover the neural network architecture with the smallest error values.
- The search trials are fewer with the Auto-PINN approach.

## 2.5 Strengths and Weaknesses β‘

**Strengths **πͺ

- Significantly reduced computational cost for performing neural architecture seek for PINN applications.
- Improved likelihood of identifying a (near) optimal neural network architecture for various PDE problems.

**Weaknesses **π

- The effectiveness of using the training loss value because the search objective might rely upon the particular characteristics of the PDE problem at hand, because the benchmarks are performed just for a particular set of PDEs.
- Data sampling strategy influences Auto-PINN performance. While the paper discusses the impact of various data sampling strategies, it doesn’t provide a transparent guideline on the best way to select the very best strategy for a given PDE problem. This might potentially add one other layer of complexity to the usage of Auto-PINN.

## 2.6 Alternatives π

The standard out-of-box AutoML algorithms can be employed to tackle the issue of hyperparameter optimization in Physics-Informed Neural Networks (PINNs). Those algorithms include *Random Search*, *Genetic Algorithms*, *Bayesian optimization*, etc.

In comparison with those alternative algorithms, the newly proposed Auto-PINN is specifically designed for PINN. This makes it a singular and effective solution for optimizing PINN hyperparameters.

There are several possibilities to further improve the proposed strategy:

- Incorporating more sophisticated data sampling strategies, akin to adaptive- and residual-based sampling methods, to enhance the search accuracy and the model performance.

To learn more about the best way to optimize the residual points distribution, try this blog within the PINN design pattern series.

- More benchmarking on the search objective, to evaluate if training loss value is indeed an excellent surrogate for various forms of PDEs.
- Incorporating other forms of neural networks. The present version of Auto-PINN is designed for multilayer perceptron (MLP) architectures only. Future work could explore convolutional neural networks (CNNs) or recurrent neural networks (RNNs), which could potentially enhance the potential of PINNs in solving more complex PDE problems.
- Transfer learning in Auto-PINN. As an illustration, architectures that perform well on certain forms of PDE problems could possibly be used as starting points for the search process on similar forms of PDE problems. This might potentially speed up the search process and improve the performance of the model.