## 2.1 Problem

Physics-Informed Neural Networks (PINNs) offer a definite advantage over conventional neural networks by explicitly integrating known governing bizarre or partial differential equations (ODEs/PDEs) of physical processes. The enforcement of those governing equations in PINNs relies on a set of points often called residual points. These points are strategically chosen inside the simulation domain, and the corresponding network outputs are substituted into the governing equations to judge the residuals. The residuals indicate the extent to which the network outputs align with the underlying physical processes, thereby serving as an important physical loss term that guides the neural network training process.

It is clear that the distribution of those residual points plays a pivotal role in influencing the accuracy and efficiency of PINNs during training. Nevertheless, the prevailing approach often involves easy uniform sampling, which leaves ample room for improvement.

Consequently, a pressing query arises: How can we optimize the distribution of residual points to reinforce the accuracy and training efficiency of PINNs?

## 2.2 Solution

Promising ways of distributing the residual points are by adopting the **adaptive strategy** and the **refinement strategy:**

- The adaptive strategy implies that after every certain number of coaching iterations, a brand new batch of residual points could be generated to exchange the previous residual points;
- The refinement strategy implies that extra residual points could be added to the prevailing ones, thus “refining” the residual points.

Based on those two foundational strategies, the paper proposed two novel sampling methods: *Residual-based Adaptive Distribution* (RAD) and *Residual-based Adaptive Refinement with Distribution* (RAR-D):

1. RAD: **R**esidual-based **A**daptive **D**istribution

The important thing idea is to attract recent residual samples based on a customized probability density function over the spatial domain ** x**.

**The probability density function**

*P*(

**) is designed such that it’s proportional to the PDE residual ε(**

*x***) at**

*x***:**

*x*Here, *k* and *c* are two hyperparameters, and the expectation term within the denominator could be approximated by e.g., Monte Carlo integration.

In total, there are three hyperparameters for RAD approach: *k*,* c*, and the period of resampling *N*. Although the optimal hyperparameter values are problem-dependent, the suggested default values are 1, 1, and 2000.

2. RAR-D: **R**esidual-based** A**daptive **R**efinement with **D**istribution

Essentially, RAR-D adds the element of refinement on top of the proposed RAD approach: after certain training iterations, as a substitute of replacing entirely the old residual points with recent ones, RAR-D keeps the old residual points and draws recent residual points based on the custom probability density function displayed above.

For RAR-D, the suggested default values for *k *and *c *are 2 and 0, respectively.

## 2.3 Why the answer might work

The important thing lies within the designed sampling probability density function: this density function tends to put more points in regions where the PDE residuals are large and fewer points in regions where the residuals are small. This strategic distribution of points enables a more detailed evaluation of the PDE in regions where the residuals are higher, potentially resulting in enhanced accuracy in PINN predictions. Moreover, the optimized distribution allows for more efficient use of computational resources, thus reducing the full variety of points required for accurate resolution of the governing PDE.

## 2.4 Benchmark

The paper benchmarked the performance of the 2 proposed approaches together with 8 other sampling strategies, when it comes to addressing forward and inverse problems. The considered physical equations include:

- Diffusion-reaction equation (inverse problem, calibrating response rate
*k*(*x*))

- Korteweg-de Vries equation (inverse problem, calibrating λ₁ and λ₂)

The comparison studies yielded that:

- RAD all the time performed the very best, thus making it a great default strategy;
- If computational cost is a priority, RAR-D might be a robust alternative, because it tends to supply adequate accuracy and it’s inexpensive than RAD;
- RAD & RAR-D are especially effective for sophisticated PDEs;
- The advantage of RAD & RAR-D shrinks if the simulated PDEs have smooth solutions.

## 2.5 Strength and Weakness

👍**Strength**

- dynamically improves the distribution of residual points based on the PDE residuals during training;
- results in a rise in PINN accuracy;
- achieves comparable accuracy to existing methods with fewer residual points.

👎**Weakness**

- could be more computationally expensive than other non-adaptive uniform sampling methods. Nevertheless, that is the worth to pay for a better accuracy;
- for PDEs with smooth solutions, e.g., diffusion equation, diffusion-reaction equation, some easy uniform sampling methods may produce sufficiently low errors, making the proposed solution potentially less suitable in those cases;
- introduced two recent hyperparameters
*k*and*c*that have to be tuned as their optimal values are problem-dependent.

## 2.6 Alternatives

Other approaches have been proposed prior to the present paper:

Amongst those methods, two of them heavily influenced the approaches proposed in the present paper:

- Residual-based adaptive refinement (Lu et al.), which is a special case of the proposed RAR-D with a big value of
*k*; - Importance sampling (Nabian et al.), which is a special case of RAD by setting
*k*=1 and*c*=0.