The estimation of parameters from measured data plays a central role in physics and many other scientific disciplines. A key tool in the analysis of such estimation problems is the Fisher information (FI), which quantifies how much information data carry about an unknown parameter and sets a fundamental bound on estimation precision. When the data follow a complex or unknown distribution, solving the estimation problem with conventional methods, such as maximum likelihood estimation, can become prohibitively difficult. Artificial neural networks (ANNs), however, have proven highly effective at tackling such problems when sufficient training data are available. Conversely, the FI sets an ultimalte limit to the performance of ANNs that are trained to solve physical parameter estimation tasks [1].
We present here a method to track the flow of FI through an ANN performing a parameter estimation task, from the input to the output layer [2,3]. Optimal performance corresponds to maximal transmission of FI through the network, while further training results in overfitting. This observation yields a model-free early stopping criterion based solely on the training data.
Figure 1. An example of the flow of FI through an ANN at different epochs of training. The FI is plotted against the layer index (0 is the input layer). At random initialization (epoch 0) the FI is lost as the data is compressed by the network. Training the ANN allows more and more FI to be transmitted.
Our approach is based on a pessimistic bound on the FI – the so-called linear FI. Rather than treating it as a crude approximation, we use it as the objective function of an optimization problem, seeking a data transformation for which the linear FI matches the true FI [2,3]. This strategy enables reliable extraction of the FI directly from data, even for highly complex distributions.
Beyong enabling FI calculations for the distributions generated by the nonlinear layers of an ANN, a linear-FI-based objective function is itself useful for machine-learning tasks. For example, it allows layer-by-layer training of parameter-estimating networks, enforcing conservation of relevant information. Another application, which we present here is the detection of weak signals embedded in complex, non-Gaussian noise. We show that our objective function learns an optimal data transformation that enhances signal detectability using only noise data and without requiring a noise model [3].
References
[1] I. Starshynov, M. Weimar et al., Nature Photonics 19.6,
593-600(2025).
[2] M. Weimar, et al., Physical Review X 15.3, 031072 (2025).
[3] J. Zschetzsche, M. Weimar et al., arXiv:2603.01737 (2026).