Extraction of fissile isotope antineutrino spectra using feedforward neural network

ACCELERATOR, RAY TECHNOLOGY AND APPLICATIONS

Extraction of fissile isotope antineutrino spectra using feedforward neural network

Jian Chen，

Jun Wang ，

Wei Wang ，

Yue-Huan Wei

Nuclear Science and Techniques

Vol.36, No.10

Article number 177

Published in print Oct 2025

Available online 16 Jul 2025

DOI：10.1007/s41365-025-01746-9

CSTR：32136.14.NST.2025.10177

45107

The precise measurement of the antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. In this paper, we report a method that utilizes a feedforward neural network (FNN) model to decompose the prompt energy spectrum observed in a short-baseline reactor neutrino experiment and extract the antineutrino spectra produced by the fission of major isotopes such as ²³⁵U, ²³⁸U, ²³⁹Pu, and ²⁴¹Pu in the nuclear reactor. We present two training strategies for the model and compare them with the traditional χ² minimization method by applying them to the same set of pseudo-data corresponding to a total exposure of $(2.9 \times 5 \times 1800) {GW}_{th} \cdot tonnes \cdot days$ . The results show that the FNN model not only converges faster and better during the fitting process but also achieves relative errors of less than 1% in the 2-8 MeV range in the extracted spectra, outperforming the χ² minimization method. The feasibility and superiority of this method were validated in the study.

Reactor neutrinosIsotope antineutrino spectraFeedforward neural network

Introduction

Since the direct discovery of neutrinos by Cowan and Reines at the Savannah River reactor power plant in 1956 [1], reactor neutrino experiments have played a pivotal role in the advancement of neutrino physics. Reactor neutrinos are also known as reactor antineutrinos because they are composed exclusively of electron antineutrinos ( ${\bar{ν}}_{e}$ ). In commercial pressurized water reactors (PWRs), more than 99.7% of the reactor neutrinos are emitted from the beta decay branches of neutron-rich fission products generated by four isotopes: ²³⁵U, ²³⁸U, ²³⁹Pu, and ²⁴¹Pu. In research reactors utilizing 93% ²³⁵ U-enriched fuel, 99.3% of the reactor neutrinos result from the fission of ²³⁵U. Each isotope releases approximately six ${\bar{ν}}_{e}$ per fission along with a corresponding antineutrino flux and spectrum. Precise fissile isotope antineutrino spectra are required for reactor monitoring and safeguarding applications [2-4] and serve as valuable inputs to reactor neutrino experiments utilizing the inverse beta decay (IBD) reaction [5-7] or coherent elastic neutrino-nucleus scattering (CEνNS) [8, 9].

Fissile isotope antineutrino spectra and fluxes have been evaluated several times in the past decades. The methodologies employed can be classified into three major categories comprising summation [10], conversion [11, 12], and extraction methods [13-15]. The summation method, i.e., the ab initio approach, utilizes information on fission products and decays from nuclear databases to calculate and sum the contributions of all possible beta decay chains to ${\bar{ν}}_{e}$ [16]. However, the presence of incomplete or inaccurate information in nuclear databases introduces complexities and challenges in constructing reliable spectral models, ultimately leading to potentially large and unknown uncertainties in model predictions. The conversion method relies on the measured beta spectra of uranium and plutonium. The beta spectra for thermal-neutron-induced fissions of ²³⁵U, ²³⁹Pu, and ²⁴¹Pu have been measured at the Institut Laue-Langevin High Flux Reactor in the 1980s [11, 17-19], while those for the fast-neutron-induced fission of ²³⁸U were measured at the Heinz Maier-Leibnitz (FRM II) research neutron source in 2013 [20]. The measured beta spectra for each isotope are fitted by a set of virtual beta decay branches based on the allowed beta decay transitions, which are then converted into antineutrino branches and summed to the corresponding isotope antineutrino spectra [16, 21]. Although the reduced dependence on nuclear databases in this method provides spectral shapes with typical relative uncertainties of a few percent, the fine structural information in the spectral shapes is not as rich as that obtained using the summation method. To address these shortcomings, several antineutrino spectrum models have been developed based on the conversion method or a combination of both methods. One example is the Huber-Mueller model [16, 21], which provides predictions that roughly agree with earlier experimental data and is widely accepted in reactor neutrino experiments. However, measurements from short-baseline reactor neutrino experiments such as Double Chooz [22], RENO [23], Daya Bay [13], and NEOS [6] confirmed a ~6% deficit in the measured reactor antineutrino flux and an excess in the 4-6 MeV prompt energy range compared to the predictions of the Huber-Mueller model. These discrepancies, which are respectively known as the “reactor antineutrino anomaly (RAA)” [12] and “5 MeV excess” or “5 MeV bump” [24, 25], cannot be ignored in the era of precise measurements. The extraction method, in which the fission isotope antineutrino spectrum is inferred from the reconstructed prompt energy spectrum measured by the detector and independent of nuclear databases, has become a common approach for testing various RAA formation hypotheses, including explanations of sterile neutrinos. Using this method, the Daya Bay experiment [13, 14] extracted the ²³⁵U and ²³⁹Pu antineutrino spectra from PWRs, while the PROSPECT [15, 26] and STEREO [27, 28] experiments extracted the ²³⁵U antineutrino spectrum from highly enriched uranium research reactors. Moreover, it was revealed that the flux deficit was primarily carried by ²³⁵U, and the 5 MeV bump had shared contributions from uranium and plutonium. However, the extraction of the ²³⁸U and ²⁴¹Pu antineutrino spectra was not satisfactory owing to statistical limitations [14].

The current general practice in experiments for extracting fissile isotope antineutrino spectra involves first unfolding the reconstructed prompt energy spectrum to obtain an antineutrino energy spectrum weighted by the IBD cross section, and then further fitting the unfolded spectrum with the χ² minimization method to extract individual or combined isotope antineutrino spectra [14, 15, 26, 27]. Unfolding is a common technique used in high-energy physics (HEP) to disentangle detector effects, correct migration effects, suppress fluctuations, and reconstruct approximate distributions of quantities. Common methods for unfolding include singular value decomposition (SVD) [29], Wiener SVD [30], and Bayesian iterations [31]. In the Daya Bay experiment, these methods were used to yield consistent extraction results. Although the Wiener-SVD method produces the smallest unfolded spectrum mean square error (MSE) within the energy range of 3-6 MeV, it does not perform as well as the other methods outside this energy range because of the large statistical fluctuations in the intrinsic neutrino energy spectrum [14]. To obtain more precise solutions, the number of bins for the unfolded spectrum in experiments is typically limited to that of the intrinsic spectrum [32]. Although this simplifies the subsequent fitting process for extracting the specific fission isotope antineutrino spectrum, it also suppresses the fine structure of the spectrum shape.

In our previous study [33], we proposed a machine learning method in which a convolutional neural network (CNN) model is employed to extract fission isotope antineutrino spectra from the unfolded prompt energy spectrum in a virtual short-baseline reactor neutrino experiment. The analysis results demonstrate that the proposed CNN model can achieve subpercentage uncertainties in the extracted ²³⁵U and ²³⁹Pu antineutrino spectra whereas the ²³⁸U and ²⁴¹Pu antineutrino spectra need to be constrained via prior knowledge during the fitting process. In this study, we extend the method and establish a feedforward neural network (FNN) model to resolve this extraction problem. This new method is designed to directly extract the antineutrino spectra of the four fission isotopes from the reconstructed prompt energy spectrum without highlighting the unfolding process or any constraints on the spectra while better preserving the fine structure of the extracted spectra.

The remainder of this paper is organized as follows: In Sect. 2, we present the antineutrino spectra of the IBD reactions and the generation of the simulation dataset for this study. In Sect. 3, we introduce the conceptual and technical details of the proposed FNN model and its training strategies. In Sect. 4, we compare the performance of this new method in extracting fission isotope antineutrino spectra with that of the benchmark traditional method, that is, the χ² minimization method, and discuss the obtained results. Finally, a summary and future outlook are presented in Sect. 5.

Dataset generation for FNN model

In this study, we constructed a virtual reactor neutrino experiment in a layout comprising a PWR and a detector. To verify the feasibility of the virtual experiment, we referred to the Daya Bay [14] and Taishan Antineutrino Observatory (TAO, also known as JUNO-TAO) [34, 35] experiments, and made the following assumptions about the experimental parameters: The reactor is operated for 1800 days at a full thermal power of 2.9 ${GW}_{th}$ with an initial uranium fuel mass of 72 tons. The detector is loaded with 5 tonnes of liquid scintillator (LS) with 12% hydrogen by mass, has an energy resolution of 8% at 1 MeV and detection efficiency of 50%, and is situated at a baseline distance of 30 m. We adopted the Huber-Mueller model as the foundational theory for the phenomenological prediction of the IBD yield to generate the simulated sample dataset for this study. The model selection did not significantly affect the analysis. We disregarded the contributions of the spent nuclear fuel and the non-equilibrium effect on the IBD yield [16, 32].

2.1

IBD yield prediction

The Huber-Mueller model is a theoretical framework for predicting the antineutrino spectra produced by the fission reactions of four isotopes in reactors. Each of these isotopic antineutrino spectrum can be parameterized using the exponent of a fifth-order polynomial as follows: $s_{l} (E_{ν}) = \exp (\sum_{p = 1}^{6} α_{l p} E_{ν}^{p - 1}),$ (1) where $l = {^{235} U,^{238} U,^{239} Pu,^{241} Pu}$ , Eν is the ${\bar{ν}}_{e}$ energy, and the αlps are polynomial coefficients for the isotope l. The αlp coefficients for ²³⁵U, ²³⁹Pu, and ²⁴¹Pu were derived using the conversion method by Huber [21], whereas the αlp coefficients for ²³⁸U were obtained using the summation method of Mueller et al. [16]. To incorporate the RAA in this study, we modified the isotopic antineutrino spectrum in Eq. (1) as follows: $S_{l} (E_{ν}) = s_{l} (E_{ν}) r_{RAA} (E_{ν}),$ (2) where r_RAA(Eν) is the ratio of the RAA between the spectra measured in the Daya Bay experiment [32] and the Huber-Mueller model prediction. To evaluate r_RAA(Eν), we performed cubic spline interpolation within the provided energy range of 1.8-8 MeV and set it uniformly to 1 for energy values above 8 MeV.

The antineutrino yield per fission can be expressed as $ϕ (E_{ν}, t) = \sum_{l} f_{l} (t) S_{l} (E_{ν}),$ (3) where the fission fraction $f_{l} (t)$ represents the relative contribution of the isotope l to the fission reaction at time t. The event rate of antineutrinos emitted from the reactor core can be calculated as $\frac{d N}{d E_{ν}} = \frac{W (t)}{\sum_{l} f_{l} (t) ϵ_{l}} ϕ (E_{ν}, t),$ (4) where W(t) is the thermal power of the reactor at time t, ϵl is the mean energy released per fission of the isotope l, and the values for ϵl were obtained from Ref. [36].

In the standard three-flavor neutrino oscillation framework, the survival probability P_ee of ${\bar{ν}}_{e}$ after propagating a distance L is given by [37] $\begin{array}{l} P_{ee} (L, E_{ν}) & = P ({\bar{ν}}_{e} \to {\bar{ν}}_{e}; L, E_{ν}) \\ = 1 - \cos^{4} θ_{13} \sin^{2} (2 θ_{12}) \sin^{2} (Δ_{21}) \\ - \cos^{2} θ_{12} \sin^{2} (2 θ_{13}) \sin^{2} (Δ_{31}) \\ - \sin^{2} θ_{12} \sin^{2} (2 θ_{13}) \sin^{2} (Δ_{32}), \end{array}$ (5) where the θijs represent the neutrino mixing angles. The oscillation phases Δ_ij are given by $\begin{array}{l} Δ_{i j} = \frac{Δ m_{i j}^{2} L}{4 E_{ν}} ≃ \frac{1.267 Δ m_{i j}^{2} [{eV}^{2}] L [m]}{E_{ν} [MeV]}, \end{array}$ (6) where $Δ m_{i j}^{2}$ denotes the mass-squared difference between the two mass eigenstates mi and mj, i.e., $Δ m_{i j}^{2} \equiv m_{i}^{2} - m_{j}^{2}$ .

For short-baseline reactor neutrino experiments, considering that the term involving Δ₂₁ is negligible and $Δ m_{31}^{2} m_{32}^{2}$ , Eq. (5) can be simplified to $\begin{array}{l} P_{ee} (L, E_{ν}) \approx 1 - \sin^{2} (2 θ_{13}) \sin^{2} (\frac{Δ m_{31}^{2} L}{4 E_{ν}}) . \end{array}$ (7) Unless otherwise specified, $\sin^{2} θ_{13} = (2.20 \pm 0.07) \times 10^{- 2}$ , $Δ m_{32}^{2} = (2.437 \pm 0.033) \times 10^{- 3} {eV}^{2}$ , and $Δ m_{21}^{2} = (7.53 \pm 0.18) \times 10^{- 5} {eV}^{2}$ in this study based on the values from the Particle Data Group (PDG) 2022 [37].

As the ${\bar{ν}}_{e}$ emitted by the reactor propagate to the LS detector, some of them engage in IBD reactions with the free target protons in the LS, which are denoted as ${\bar{ν}}_{e} + p \to e^{+} + n$ . In this process, the positron e⁺ rapidly deposits its energy and annihilates the surrounding electron e^- to form two 0.511 MeV gammas, generating a prompt signal. The neutron n scatters within the detector until it is thermalized and subsequently captured by hydrogen (99%) or carbon (1%) within ~200 μs, thereby releasing a 2.22 or 4.95 MeV gamma, respectively, and yielding a delayed signal [38]. An IBD event is identified by the prompt-delayed signal pair during such a brief interval. The measured IBD event number Mk in the k-th reconstructed prompt energy E_rec bin observed at a detector within the data acquisition time T_DAQ is therefore given by [5] $\begin{matrix} M_{k} = \frac{N_{p} ε}{4 π L^{2}} \int_{E_{rec}^{k}}^{E_{rec}^{k + 1}} d E_{rec} \int_{T_{DAQ}} d t \int_{E_{thr}} d E_{ν} \\ \times \frac{d N}{d E_{ν}} P_{ee} (L, E_{ν}) σ_{IBD} (E_{ν}) G (E_{ν}, E_{rec}) \end{matrix}$ (8) where N_p is the number of free target protons in the LS, ε is the detection efficiency of the detector, the IBD threshold energy E_thr~m_n-m_p+m_n~1.8 MeV, σ_IBD(Eν) is the cross-section of the IBD taken from Ref. [39], and $G (E_{rec}, E_{ν})$ is a normalized Gaussian smearing function, which includes the energy resolution effect.

To simplify the calculation, we assumed that the detector has no energy leakage or LS nonlinearity [32]. Thus, the prompt energy $E_{pro} \sim E_{ν} - 0.78$ MeV, and E_rec is expected to obey the distribution $G (E_{rec}, E_{ν})$ defined as follows [5]: $\begin{array}{l} G (E_{ν}, E_{rec}) ≃ \frac{1}{\sqrt{2 π} δ_{E_{pro}}} \exp {- \frac{{(E_{pro} - E_{rec})}^{2}}{2 {(δ_{E_{pro}})}^{2}}}, \end{array} .$ (9) The energy resolution $δ_{E_{pro}}$ is parameterized as $\begin{array}{l} \frac{δ_{E_{pro}}}{E_{pro}} = \sqrt{{(\frac{p_{0}}{\sqrt{E_{pro}}})}^{2} + p_{1}^{2} + {(\frac{p_{2}}{E_{pro}})}^{2}}, \end{array}$ (10) where p₀ quantifies the statistical fluctuations in the photons detected by the detector, p₁ is predominantly influenced by residual effects resulting from the spatial nonuniformity and temporal instability correction of the detector, and p₂ quantifies the effects associated with the photomultiplier tube (PMT), notably the PMT dark noise [5, 38].

For simplicity, we set p₀ = 0.08, p₁ = 0, and p₂ = 0 in this study for an energy resolution of 8% at 1 MeV in the detector. Therefore, under full reactor power and classical fission fractions conditions [32], the detector observes the energy spectrum of the IBD events (i.e., the reconstructed prompt energy spectrum) distorted by the RAA in one day, as shown in Fig. 1, and approximately 7473 IBD events are recorded.

Fig. 1

The virtual detector observes the reconstructed prompt energy spectrum distorted by the RAA with an exposure of

(2.9 \times 5 \times 1) {GW}_{th} \cdot tonnes \cdot day

. The insert shows the cross-section of the IBD reaction [39] and the four isotope antineutrino spectra obtained by modifying the Huber-Muller model according to Eq. (2)

2.2

Simulated samples and targets in dataset

Considering the significant computational resources and time required for the integral terms in Eq. (8), Eq. (8) is typically converted to a discrete summation or matrix multiplication equivalent form in practical computations. In this study, the integral form of the reconstructed prompt energy spectrum is rewritten as an element of the row matrix $M_{1 \times N_{E_{rec}}}$ , which is given in Eq. (11). Each element of the matrix represents the measured IBD event number in the corresponding energy bin. $\begin{array}{l} M_{1 \times N_{E_{r e c}}} & = X_{1 \times 4} \cdot S_{4 \times N_{E_{ν}}} \cdot P_{N_{E_{ν}} \times N_{E_{ν}}} \cdot σ_{N_{E_{ν}} \times N_{E_{ν}}} \cdot R_{N_{E_{ν}} \times N_{E_{rec}}} \\ = X_{1 \times 4} \cdot S_{4 \times N_{E_{ν}}} \cdot P σ R_{N_{E_{ν}} \times N_{E_{rec}}}, \end{array}$ (11) where the subscripts denote the dimensions of the corresponding matrices and the number 4 indicates the four isotopes. $N_{E_{rec}}$ is the number of bins in the reconstructed prompt energy spectrum while $N_{E_{ν}}$ is the number of terms in the discretized sum for integration over Eν, which is also the number of bins in the extracted isotopic antineutrino spectrum. The ranges of Eν and E_rec are 1.8-10 MeV and 0.8-10 MeV, respectively. In this study, $N_{E_{rec}}$ was set to 80 based on the limit of the virtual detector energy resolution whereas $N_{E_{ν}}$ was optimized to 401 after balancing model performance and computational cost. The element Xl in X_1×4 can be expressed as $\begin{array}{l} X_{l} = \sum_{u}^{N_{t}} \frac{N_{p} ε W (t^{u}) f_{l} (t^{u})}{4 π L^{2} \sum_{l} f_{l} (t^{u}) ϵ_{l}} Δ t Δ E_{ν} Δ E_{rec}, \end{array}$ (12) where T_DAQ in Eq. (8) is divided into Nt time units of Δt, u is the time unit index, and $Δ E_{ν}$ and $Δ E_{rec}$ are the bin widths of the extracted isotopic antineutrino spectrum and reconstructed prompt energy spectrum, respectively. In Eq. (12), W, fl, and ϵl are reactor-related parameters. W and fl vary as the reactor evolves while ϵl and the remaining parameters are constants. Xl is therefore referred to as the reactor dynamic evolution information.

Each row of $S_{4 \times N_{E_{ν}}}$ represents the binned antineutrino spectrum for isotope l, as given by $S_{l} (E_{ν})$ . Both $P_{N_{E_{ν}} \times N_{E_{ν}}}$ and $σ_{N_{E_{ν}} \times N_{E_{ν}}}$ are diagonal matrices whose diagonal elements are given by $P_{ee} (L, E_{ν})$ and σ_IBD(Eν), respectively. The role of $R_{N_{E_{ν}} \times N_{E_{rec}}}$ is to map each Eν to a spectrum of E_rec. $R_{N_{E_{ν}} \times N_{E_{rec}}}$ is therefore also referred to as the detector response matrix [Rqk], which is defined as follows: $\begin{matrix} R_{q k} = R (E_{ν}^{q}, E_{rec}^{k}) = G (E_{ν}^{q}, \frac{E_{rec}^{k} + E_{rec}^{k + 1}}{2}) \\ = G (E_{ν}^{q}, E_{rec}^{k} + \frac{Δ E_{rec}}{2}), \end{matrix}$ (13) where q is the index for binning Eν, $q \in [1, 2, \dots, N_{E_{ν}}]$ , and $k \in [1, 2, \dots, N_{E_{rec}}]$ . In contexts that do not involve the oscillation parameters or unfolding, $P_{N_{E_{ν}} \times N_{E_{ν}}}$ , $σ_{N_{E_{ν}} \times N_{E_{ν}}}$ , and $R_{N_{E_{ν}} \times N_{E_{rec}}}$ can be pre-multiplied to obtain the matrix $P σ R_{N_{E_{ν}} \times N_{E_{rec}}}$ .

The matrix multiplication relation in Eq. (11) provides the mathematical foundation for constructing the FNN architecture presented in Sect. 3. Furthermore, X_1×4 and $M_{1 \times N_{E_{rec}}}$ respectively constitute a sample and its associated target in our dataset, which serve as a feature-label pair for supervised learning in the FNN model implemented in this study.

As described in Eq. (12), the fission fraction varies dynamically with burn-up as the reactor operates. In each reactor core refueling cycle, the cycle burn-up can be calculated as [32] $Burn - up = \frac{W \cdot D}{M_{U^{ini}}},$ (14) where W, D, and $M_{U^{ini}}$ represent the total thermal power of the reactor, the number of days since the refueling cycle started, and the mass of the initial uranium fuel loaded into the reactor, respectively. The unit for burn-up is ${GW}_{th} \cdot day \cdot {tonne}^{- 1}$ . Given that the real-time power output of the reactor is dynamic and cannot exceed its maximum capacity of 2.9 GW_th for safe operation, we used a random number generator for a normal distribution with a mean of 2.9 GW_th and downward fluctuation of 0.5% to determine the daily average power output of the virtual reactor [33]. By incorporating the fission fraction evolution data of the isotopes during a complete burn-up cycle from Ref. [32], we obtained the evolution of the fission fractions for the four main isotopes as a function of the operation day, as shown in Fig. 2.

Fig. 2

The evolution of fissile fractions for the four main isotopes in the reactor core as a function of operation day, which includes one complete refueling cycle [32]. The cumulative fission fraction of the four main isotopes used in this experiment is normalized to unity, and other isotopes contributing less than 0.3% are excluded from our analysis

Under the assumption that the thermal power and fissile fractions for the four main isotopes of the reactor are constant within each day, we accumulated the exposure over each 3-day interval as a sample to create a dataset of 600 simulated samples and their corresponding targets for subsequent analysis.

Implementation of FNN model

Machine learning algorithms such as neural network (NN) models have attracted increasing attention from high-energy and nuclear physics researchers [33, 40-44]. However, most of these applications are characterized by black-box models in which the meaning of the model parameters are challenging to understand or interpret. In this section, we present a FNN-based white-box model where each layer and parameter has a clear physical or mathematical meaning, thereby ensuring the interpretability of the model.

3.1

Mathematical foundations of FNN model

The NN is a powerful machine learning model that has been widely explored and applied across various fields. The universal approximation theorem [45, 46] implies that any continuous function can be approximated with arbitrary precision using an appropriate NN, even if the NN is an FNN with only one hidden layer containing a sufficient number of neurons. However, the internal structure and parameters of the NN in such scenarios often lack physical meaning or interpretability. This results in black-box models, which are not fully trusted by high-energy physicists. Therefore, we designed and implemented a white-box NN model in this study for converting the mathematical mapping function in Eq. (11) to a FNN model.

An FNN is typically composed of one to several single-layer perceptrons, which are considered the fundamental building units of the FNN and play a vital role in its overall functionality [47]. Each perceptron in the FNN follows the computational flow shown in Fig. 3 to process data. Forward and backward propagation are two phases in the NN training process that interact to optimize network performance.

Fig. 3

An example illustration of the structure of a single-layer perceptron along with forward (black flow arrows) and backward (red flow arrows) propagation

During the forward propagation phase, the perceptron performs computation by computing the dot product of the input vector $\overset{\leftarrow}{x} = {[x_{1}, x_{2},..., x_{N}]}^{T}$ with the weight coefficient vector $\overset{\leftarrow}{w} = {[w_{1}, w_{2},..., w_{N}]}^{T}$ , adding the bias b, and applying the activation function ψ to yield the activation result y as the output. The discrepancy between the output y and target $\hat{y}$ is then calculated using the loss function $L (y, \hat{y})$ . Forward propagation provides the foundation for evaluating network performance. Backward propagation in turn determines how the network parameters (weights and bias) are updated to reduce loss. It can be described as $\begin{array}{l} {ω^{'}}_{μ} = ω_{μ} - η \times [\nabla_{ω_{μ}} L (y, \hat{y}) + λ ω_{μ}], \end{array}$ (15) $\begin{array}{l} b^{'} = b - η \times \nabla_{b} L (y, \hat{y}), \end{array}$ (16) where ωμ and ${ω^{'}}_{μ}$ represent the μ-th weight coefficient of the current and subsequent steps, respectively; b and b’ the biases of the current and subsequent steps, respectively; and η and λ are the learning rate and weight decay rate, respectively. This iterative update process of the parameters based on the computed gradients allows the NN to learn and improve its predictions over time.

To allow matrix multiplication in the perceptrons, the bias b must be eliminated, i.e., set to zero. The absence of negative values in our data flow justifies the use of the default rectified linear unit (ReLU) activation function, which is defined as $ψ (z) = \max {0, z}$ . This setup also permits the perceptrons to be chained to perform successive matrix dot product operations, which is integral to the development of our FNN model.

As shown in Fig. 4, the architecture of the FNN model consists of three layers comprising, from left to right, the input, hidden, and output layers with four, $N_{E_{ν}}$ , and $N_{E_{rec}}$ neurons, respectively. The neurons in adjacent layers are connected using a fully connected approach; that is, each neuron in one layer is connected to every neuron in the subsequent layer with no connections between neurons within the same layer. The training process of the model starts from the input layer, at which each neuron receives the reactor dynamic evolution information corresponding to its fission isotope. The output of the hidden layer is the scaled total spectrum [hq] of antineutrinos emitted by the reactor. The output layer then provides a predicted reconstructed prompt energy spectrum [yk]. The two weight coefficient matrices W⁽¹⁾ and W⁽²⁾ correspond to the transposes of the matrices $S_{4 \times N_{E_{ν}}}$ and $P σ R_{N_{E_{ν}} \times N_{E_{rec}}}$ , respectively. The matrix W⁽¹⁾ contains the fission isotope antineutrino spectra to be extracted, which are learned during training. In contrast, the matrix W⁽²⁾ is fixed as $P σ R_{N_{E_{ν}} \times N_{E_{rec}}}^{T}$ because it is assumed to be a constant matrix without uncertainties in this study. The FNN is therefore a supervised learning model that iteratively refines W⁽¹⁾ to minimize the discrepancies between its outputs and corresponding targets.

Fig. 4

The FNN is a white-box model that describes the mapping relation between the reactor dynamic evolution information and reconstructed prompt energy spectrum. The architecture of the FNN model includes an input layer, hidden layer, output layer, and two sets of weight coefficient matrices W⁽¹⁾ and W⁽²⁾. The weight values between neurons associated with connections of the same color form the rows of the weight coefficient matrix

3.2

Training strategy

All the samples generated in Sect. 2.2 were utilized solely to train the FNN model. The validation and testing processes were omitted. This approach was chosen because our aim is to minimize the loss function during the training process to determine the optimal W⁽¹⁾ for extracting the four main isotopic antineutrino spectra. Our focus is on optimizing spectra extraction performance rather than evaluating model performance across various datasets, as well as on simplifying the process and aligning with our primary research objective.

The loss function is a fundamental component in deep-learning models. It serves as the criterion for evaluating how well the model predictions match the actual outcomes and provides a numerical indicator of model accuracy. The Combined Neyman-Pearson (CNP) chi-square model is a statistical model frequently employed in HEP experiments to quantify the error between predicted and measured values [48]. Based on this model, we define the loss function for the FNN model as $\begin{array}{l} χ_{CNP}^{2} = \sum_{k = 1}^{N_{E_{rec}}} \frac{{[M_{k} - y_{k} (W^{(1)})]}^{2}}{3 / [\frac{1}{M_{k}} + \frac{2}{y_{k} (W^{(1)})}]}, \end{array}$ (17) where Mk is the IBD event number in the k-th bin for the measured reconstructed prompt neutrino energy spectrum given by Eq. (8) and yk is the corresponding predicted value output of the model. We used this loss function to guide the optimization process of W⁽¹⁾ during the training process so that the FNN was driven towards increasingly precise predictions.

After defining the loss function, it is essential to select a suitable optimizer, learning rate schedule, batch size, and epoch, among other hyperparameters. Following hyperparameter tuning using the Optuna framework [49] and extensive testing, we developed two training strategies denoted as the short- and long-epoch strategies to investigate the performance of the FNN model in extracting the antineutrino spectra of the four fission isotopes from the reconstructed prompt energy spectrum [50]. As shown in Table 1, a critical commonality between these two strategies is the segmentation of the hidden layer in the FNN model into multiple partitions or parallel hidden layers. This setup allows distinct learning and weight decay rates to be assigned to each partition to facilitate differential performance outcomes. Because the focus in this study is not on the isotope antineutrino spectra above 8 MeV, i.e., in the (303, 401] partition or the matrix $P σ R_{N_{E_{ν}} \times N_{E_{rec}}}$ , we fixed their learning and weight decay rates to zero and disabled the gradient calculations for the corresponding weight coefficients. Additionally, we set the initial value of W⁽¹⁾ based on the Huber-Mueller model.

Configurations of the two training strategies for FNN model

Strategy	Short-epoch	Long-epoch
Epoch	2×10⁶	2×10⁶
Optimizer	AdamW	Adam
Hidden layer partitions	[1], (1, 180], (180, 225], (225, 303], (303, 401]
Learning rates for hidden layer	[3.4892×10^-4, 9.9485×10^-4, 2.754×10^-4, 1.8272×10^-4, 0]
Weight decay rates for hidden layer	[7.418×10^-3, 7.748×10^-3, 4.155×10^-3, 9.999×10^-3, 0]	[0, 0, 0, 0, 0]
Learning rate for output layer	0
Weight decay for output layer	0
Learning rate scheduler	ReduceLROnPlateau (factor=0.32, patience=1×10²)	ReduceLROnPlateau (factor=0.32, patience=1×10⁴) & epoch≥2×10⁵
Batch size	30

The configurations were derived based on our empirical knowledge and optimized using Optuna [49]. The partition numbers correspond to neuron indices in the hidden layer of the FNN model. ReduceLROnPlateau is a Python class that dynamically adjusts the learning rate during deep learning model training to improve convergence speed and performance [50].

As indicated by their names, the main distinction between the short- and long-epoch strategies lies in the epochs. The short-epoch strategy leverages the AdamW [51] optimizer with non-zero weight decay rates for faster loss reduction. In contrast, in the long-epoch strategy, the Adam [52] optimizer is applied without weight decay, i.e., the weight decay rates are set to zero. Superior convergence results were obtained using the long-epoch strategy. The results are presented and discussed in Sect. 4. As illustrated in Table 1, these circumstances also led to minor differences in the configurations of the learning rate schedulers. Nonetheless, the same metric, i.e., the sum of the losses for all samples denoted as $χ_{\sum CNP}^{2}$ , was monitored in both schedulers.

We also extracted the antineutrino spectra of the four fission isotopes using the χ² minimization method to provide a comparison and benchmark for the FNN model. We employed the Minuit2 minimization library from ROOT [53] to implement this method. $χ_{\sum CNP}^{2}$ was used as the objective function to be minimized to find the best fit. The same dataset as that for the FNN model was used as the measured value in this fitting process. In contrast, the predicted value was derived from Eq. (11) where the $S_{4 \times N_{E_{ν}}}$ matrix elements corresponding to $\leq 8$ MeV are the parameters to be fitted and the remaining elements considered as fixed parameters in the fitting procedure. We adopted the “Combined” minimizer algorithm to minimize the objective function with initial fitting values from the Huber-Mueller model and fitting step sizes of 1% of the order of magnitude of these values. We set the tolerance for the fitting procedure to 1×10^-30. The fitting stopped automatically only when the improvement in the $χ_{\sum CNP}^{2}$ value between consecutive iterations fell below this threshold.

The FNN model was implemented using PyTorch [54], a Python-based deep learning library that supports both CPU and GPU platforms and is one of the mainstream tools for developing and training NN models. A NVIDIA GeForce RTX 3060 Ti GPU platform was used to deploy the FNN model, whereas tasks involving Optuna and ROOT were performed on two identical servers, each of which was equipped with two 28-core Intel(R) Xeon(R) Gold 6330 CPUs @ 2.00 GHz.

Results and discussions

To facilitate the discussion and comparative analysis of the short- and long-epoch strategies of our FNN model and the χ² minimization method, we first consider their performance in fitting all the samples and reducing the losses. As shown in Fig. 5, did the loss $χ_{\sum CNP}^{2}$ decreased more rapidly in both FNN strategies, and lower ultimate $χ_{\sum CNP}^{2}$ values were obtained compared to those obtained by the χ² minimization method. The $χ_{\sum CNP}^{2}$ values at the conclusion of the epochs are 5.51×10^-6, 5.42×10^-10, and 9.34×10^-6, respectively.

Fig. 5

Evolution of loss function across epochs for the short- and long-epoch strategies and the χ² minimization method. The epochs of the first two were manually specified to be 2×10³ and 2×10⁶, respectively, while that of the χ² minimization method was automatically determined as approximately 4.39×10⁵

The short-epoch strategy can rapidly reduce the loss in the early stages of training mainly because of the regularization effects and optimization efficiency due to the combination of nonzero weight decay rates and the AdamW optimizer. However, in the later stages of training, the model must be able to respond to small changes in the loss function for fine adjustments of the parameters. Weight decay may interfere with this process and make it challenging for the model to determine the optimal solution within regions of small loss function gradients.

Figure 6 shows a comparison of the performance in extracting the antineutrino spectra of the four isotopes using these three approaches. The extraction performance decreases in the order of the long-epoch strategy, short-epoch strategy, and χ² minimization method. The FNN model accurately extracted the antineutrino spectra of ²³⁵U, ²³⁹Pu, and ²⁴¹Pu in the energy range of 2-5 MeV. The FNN model with the short-epoch strategy achieved relative errors of less than 2% in the 5-8 MeV range, which decreased to less than 1% with the long-epoch strategy. In comparison, the χ² minimization method achieved relative extraction errors of less than 2% and 3% for these three isotopes in the respective energy ranges. For the isotope ²³⁸U, both the short-epoch strategy and χ² minimization method showed relatively poor extraction performance compared to that for the other isotopes. The maximum extraction relative errors in the 2-8 MeV range are approximately 4% and 8%, respectively, whereas only the long-epoch strategy maintained relative errors of less than 1%.

Fig. 6

Comparison of the ratios between the four isotope antineutrino spectra extracted using the short- and long-epoch strategies in our FNN model and the χ² minimization method, and the assumed true spectra described by Eq. (2)

It is worth noting that although ²⁴¹Pu has a lower average fission fraction throughout the entire refueling cycle compared to ²³⁸U, the extraction performance for the former is better in all the extraction approaches. This indicates that in addition to large fission fractions, significant variations are also crucial for extracting isotopic antineutrino spectra accurately. Greater variations produce better extraction results. This is further confirmed by the extraction performance for the ²³⁵U and ²³⁹Pu antineutrino spectra. Therefore, such long epochs are employed in the long-epoch strategy primarily to enhance the extraction performance for ²³⁸U. Overall, regardless of the extraction approach used, the extraction performance for the isotopic antineutrino spectra in descending order is as follows: ²³⁵U, ²³⁹Pu, ²⁴¹Pu, and ²³⁸U.

The above results and discussion reveal that because of the exceptional capability of NNs in optimizing large-scale parameters, the FNN model achieved faster and more effective convergence than the traditional χ² minimization method. Based on PyTorch’s extensive array of optimization algorithms [55], various model training strategies can be designed to satisfy the practical requirements for extracting isotope antineutrino spectra. Moreover, executing spectrum extraction algorithms on GPU platforms can significantly increase the inference speed of the process, thereby improving extraction efficiency.

Summary and outlook

In this study, we presented an FNN model designed to infer and extract the corresponding antineutrino spectra generated by the fission of ²³⁵U, ²³⁸U, ²³⁹Pu, and ²⁴¹Pu from the reconstructed prompt energy spectrum measured by the detector in a reactor neutrino experiment. Using a simulated short-baseline reactor neutrino experiment with an exposure of $(2.9 \times 5 \times 1800) {GW}_{th} \cdot tonnes \cdot days$ , we demonstrated how this FNN model establishes a mapping from reactor evolution information to the reconstructed prompt energy spectrum and enables the extraction of antineutrino spectra for the four isotopes through its training process.

By comparing the extraction effects of the short- and long-epoch training strategies for our FNN model with the traditional χ² minimization method, as shown in Fig. 6, we found that the FNN model converged faster and better, and the performance of the three approaches for extracting the isotope antineutrino spectra in descending order is as follows: long-epoch strategy, short-epoch strategy, and χ² minimization method. Furthermore, the relative extraction errors of the antineutrino spectra for the four isotopes are reduced to less than 1% in the 2-8 MeV range of interest by the FNN model with the long-epoch strategy, which is better than the error of 8% or less obtained using the χ² minimization method in the control group. These results show that the FNN model has considerable potential for extracting fission isotope antineutrino spectra.

In the near future, TAO will serve as a satellite experiment of JUNO and achieve an energy resolution exceeding 2% at 1 MeV in measuring reactor antineutrinos [34]. Its primary physics goals include constraining the fine structures of isotope antineutrino spectra and providing a model-independent reference spectrum for JUNO and a benchmark measurement to test nuclear databases. Employing the FNN model in high-precision experiments such as TAO would therefore be an excellent match. In addition, depending on the research objectives, new NN models can be developed using the methodologies outlined in this study to further investigate a broader range of physics topics such as unfolding, neutrino oscillation parameter measurements, sterile neutrino searches, and reactor monitoring. For example, the unfolded neutrino energy spectrum is represented by the output of the hidden layers in our FNN model, which can achieve a relative error of less than 1% in the 2-8 MeV range.

References

F. Reines, C. L. Cowan, F. B. Harrison et al.,

Detection of the free antineutrino

. Phys. Rev. 117, 159-173 (1960). https://doi.org/10.1103/PhysRev.117.159