Decomposition of fissile isotope antineutrino spectra using convolutional neural network

NUCLEAR ENERGY SCIENCE AND ENGINEERING

Decomposition of fissile isotope antineutrino spectra using convolutional neural network

Yu-Da Zeng，

Jun Wang，

Rong Zhao，

Feng-Peng An，

Xiang Xiao，

Yuenkeung Hor，

Wei Wang

Nuclear Science and Techniques

Vol.34, No.5

Article number 79

Published in print May 2023

Available online 31 May 2023

DOI：10.1007/s41365-023-01229-9

76202

Recent reactor antineutrino experiments have observed that the neutrino spectrum changes with the reactor core evolution and that the individual fissile isotope antineutrino spectra can be decomposed from the evolving data, providing valuable information for the reactor model and data inconsistent problems. We propose a machine learning method by building a convolutional neural network based on a virtual experiment with a typical short-baseline reactor antineutrino experiment configuration: by utilizing the reactor evolution information, the major fissile isotope spectra are correctly extracted, and the uncertainties are evaluated using the Monte Carlo method. Validation tests show that the method is unbiased and introduces tiny extra uncertainties.

Reactor antineutrinoIsotope antineutrino spectrum decompositionConvolutional neural network

Introduction

Significant deviations between the Huber-Mueller model and experimental isotope antineutrino spectra have been demonstrated, causing a ∼6% deficit in the reactor antineutrino flux (the so-called Reactor Antineutrino Anomaly, RAA) and an excess of reconstructed positron signal events in the 4-6 MeV (the so-called 5-MeV bump) [1-6]. Determining the origin of the reactor antineutrino rate and shape anomaly is critical, especially for understanding nuclear physics and improving nuclear databases for fundamental and application research. Relevant experimental and theoretical efforts have been made to solve the aforementioned problem, including attempting to determine the individual isotope contributions of reactor ${\bar{ν}}_{e}$ , which has provoked further investigations. In 2017, the Daya Bay experiment revealed a 7.8% discrepancy between the observed and predicted ²³⁵U yields by using the span of effective ²³⁹Pu fission fractions, which may be the primary contributor to the RAA [7]. In 2019, the PROSPECT experiment measured the ²³⁵U spectrum from the highly enriched uranium of the High Flux Isotope Reactor, and the ²³⁵U spectrum shape was consistent with a deviation relative to the prediction made by the Daya Bay experiment in the energy region of 5-7 MeV [8]. In the same year, the theoretical result of the summation method was compared with that of the Daya Bay experiment without any renormalization, which reduced the flux discrepancy to 1.9% by inducing the correction of the pandemonium effect [9]. Also in 2019, the Daya Bay experiment first extracted the ²³⁵U and ²³⁹Pu neutrino spectrum from commercial reactors by using the reactor evolution information [10].

Determining individual isotope antineutrino spectra can also play an important role in nuclear safeguards. The International Atomic Energy Agency (IAEA) cooperates with neutrino physicists to develop new approaches for reactor monitoring methods by observing the ${\bar{ν}}_{e}$ emitted from the reactor, where the isotope antineutrino spectra are key inputs for the reactor monitoring applications [11] because the reactor antineutrino flux and spectra are sensitive to the changes of the fuel content in the reactor core and can be observed via a suitable antineutrino detector. The applied neutrino physics community also explored the reactor antineutrinos as a tool for reactor monitoring and concluded that improving the knowledge of the reactor antineutrino flux and spectrum is required for reactor safeguards applications [12]. The DOE National Nuclear Security Administration (NNSA) Office of Defense Nuclear Nonproliferation Research and Development (DNN R&D) organized a group of neutrino physicists and nuclear engineers to find practical roles of neutrino technology in nuclear energy and security; the final report, called Nu Tools, asserted that it is possible to exploit the neutrino spectrum to determine the fissile material content of the reactor with high reactor antineutrino rates [13]. The isotope antineutrino spectra decomposed directly from reactor antineutrino experiments have no RAA or spectrum distortion problem while having comparable or better uncertainties than those in the Huber-Mueller model, providing more reliable data inputs for the nuclear safeguards.

Only the Daya Bay experiment has published the reactor isotope antineutrino spectra by using two methods, the minimum χ² and Markov Chain Monte Carlo (MCMC) methods, and has obtained consistent results. The minimum χ² method is a statistical inference method that minimizes the χ² statistic, which is constructed in the form of a χ² function. The χ² function $χ^{2} (θ)$ is an estimator for the parameter $θ$ and composed of a likelihood function comparing the binned observation data $n = (n_{1},..., n_{N})$ , the expectation $μ (θ) = (μ_{1} (θ),..., μ_{N} (θ))$ , and the penalty term for constraining the parameters: $χ^{2} (θ) = 2 \sum_{j = 1}^{N} [μ_{j} (θ) - n_{j} + n_{j} \ln \frac{n_{j}}{μ_{j} (θ)}] + f (ϵ, Σ),$ (1) where nj follows a Poisson distribution, and $f (ϵ, Σ)$ is the penalty term that constrains the nuisance parameter $ϵ$ with the correlations $Σ$ of the nuisance parameters. The minimum χ² method naturally introduces the statistical uncertainty and the systematic uncertainties into the estimator, and the best fit parameters and the corresponding uncertainties can be obtained by minimizing the χ² function. The minimum χ² method is a robust, traditional frequency fitting method commonly used in high energy physics. The second method used in the Daya Bay decomposition research is the MCMC method based on Bayesian inference. In Bayesian theory, all knowledge of the parameter $θ$ is summarized in the posteriori probability density function (p.d.f.) $p (θ | D)$ : $p (θ | D) \propto P (D | θ) π (θ),$ (2) where $D$ is the data, $θ$ is the parameter, $P (D | θ)$ is the likelihood function, and $π (θ)$ is the priori p.d.f. of $θ$ . By performing statistical calculations on the posteriori p.d.f., the mean values and uncertainties can be extracted. Usually, calculating the posteriori p.d.f. is difficult, especially for high dimensional problems. Instead, the MCMC method is used to sample the posteriori p.d.f., and the mean values and uncertainties can be obtained by performing calculations on the samples. In the Daya Bay experiment, the measured data were divided into 20 groups of inverse beta-decay (IBD) spectra, corresponding to different burning stages of a reactor cycle. The prediction spectra for the 20 groups were obtained by considering the detector and reactor model combined with the reactor information. Data and prediction were used to construct the likelihood function in the minimum χ² method and the Bayesian inference method. The uncertainties from the detectors and the reactors were incorporated into the penalty terms in the minimum χ² method and priori p.d.f. in the Bayesian inference method, respectively. Eventually, the results of the decomposed isotope spectra are consistent by using the two methods.

The extraction of isotope antineutrino spectra has been studied in reactor neutrino physics, and there is no convincing answer to RAA; nevertheless, we consider it beneficial to study the applications of new methods. Here, we propose a new method that uses a convolutional neural network (CNN) to decompose primary fissile isotope antineutrino spectra by fitting the weekly detected antineutrino spectrum as a function of the individual isotope fission fractions. A CNN is a network model for machine learning, which provides an optimal architecture for detecting key features in images and time series data. It has broad applications in, for example, computer vision and natural language processing [14-17]. And it has been used in certain physics research fields to extract information from experimental data and fit the model parameters [18]. Notably, the established decomposition methods, such as the minimum χ² and MCMC methods, are offline algorithms. Thus, the analysis results must be updated from scratch as new data arrives, which is a waste of time, especially for long-term experiments. Second, the minimum χ² method and the MCMC method have to load the entire dataset into the computer memory, requireing a large amount of computer memory when dealing with big data, for example, with many reactor burning cycles and detailed reactor information, making these methods unusable. Moreover, they usually resample from the original data to reduce the size of the dataset. However, the processing may introduce the loss of information and bias in the analysis. By contrast, the CNN approach is an online algorithm [19]. The advantage of online updating is that analysis can be performed without access to the historical data; thus, overcoming the storage and computation limitations is possible in some cases. In addition, the proposed method makes full use of the data without causing excessive information loss. This provides an additional machine learning technology for the decomposition of reactor fissile isotope spectra and can be used for neutrino spectrum analysis in future reactor antineutrino experiments.

Setup of the virtual experiment

In this section, we describe a virtual reactor antineutrino experiment to produce the simulation dataset for the proposed CNN method for training and testing.

Suppose there is a virtual experiment with a one-reactor one-detector layout, where the reactor is a type of pressurized water reactor (PWR) and the sole source of the ${\bar{ν}}_{e}$ flux. Antineutrinos are produced from thousands of beta-decay branches of the fission products from four major fissile isotopes, ²³⁵U, ²³⁸U, ²³⁹Pu, and ²⁴¹Pu, in the reactor core. A virtual 20 ton liquid scintillator antineutrino detector with a 50 m baseline from the reactor is set up using the parameters in Table 1. The antineutrino is detected via IBD reactions in the detector: $_{e} + p \to e^{+} + n$ . The predicted ${\bar{ν}}_{e}$ spectrum at a given time t is calculated as $\begin{matrix} S_{d} (E_{ν}, t) = N_{p} \cdot ϵ \cdot σ (E_{ν}) \cdot \frac{P_{sur} (E_{ν}, L)}{4 π L^{2}} \\ \cdot \frac{W (t)}{\sum_{i} f_{i} (t) \cdot e_{i}} \cdot \sum_{i} f_{i} (t) \cdot S_{i} (E_{ν}), \end{matrix}$ (3) where Eν is the ${\bar{ν}}_{e}$ energy, N_p is the target proton number, ϵ is the detection efficiency, σ(Eν) is the inverse beta-decay cross section, L is the distance from reactor to detector, $P_{sur} (E_{ν}, L)$ is the ${\bar{ν}}_{e}$ survival probability, W(t) is the thermal power of the reactor, ei is the energy released per fission for isotope i, fi is the fission fraction, and Si(Eν) is the ${\bar{ν}}_{e}$ energy spectrum of fissile isotope i per fission.

Parameter list of the virtual experiment.

Parameter	Value	Uncertainty
Thermal power, W	2.9 GW	0.5%
Fission fraction, fi	Ref. [20]	5%
Energy/fission, ei	Ref. [21]	0.2%
Detection efficiency, ϵ	80.25%	1.5%
Target protons, N_p	1.43 × 10³⁰	0.92%
Baseline, L	50 m	Negligible

For the virtual experiment, the isotope antineutrino spectra Si(Eν) are assumed to be the same as those in the Huber-Mueller model, denoted by $S_{i}^{HM} (E_{ν})$ . Using the configurations of the Daya Bay experiment as a reference, the experimental parameter values in Eq. (3) are presented in Table 1.

In addition to Table 1, the fission fraction evolution of a fuel cycle is presented in the top panel of Fig. 1, where the fission fractions of the four major fissile isotopes are shown as a function of the burn-up. For PWR, the reactor core usually consists of three batches of fuel assemblies with different ages, and usually, one-third of the old batches are replaced by fresh fuel at the end of a refueling cycle. During the reactor burning time, the fissile isotopes are mainly depleted by fission, decay, and neutron capture processes. Some of them, such as plutonium isotopes, are also generated by the neutron captures and decays from the mother nuclei in the reaction chains. The depletion and generation of the fissile isotopes are essential for the evolution of the reactor fuels.

Fig. 1

(Color online) (Top panel) Isotope fission contribution status in one fuel cycle. The fission fraction summations of four major isotopes are normalized to 1. Data are extracted from Ref. [20]. (Bottom panel) Weekly

{\bar{ν}}_{e}

event rates during the entire experiment operation. The color represents the observed

{\bar{ν}}_{e}

event rates. The operation comprises several fuel cycles, among which each fuel cycle is similar to that in the top panel.

{\bar{ν}}_{e}

event rates vary with the operating time because the fission fractions of fuel components differ. Thus, the observed antineutrino spectrum is a function of time and fission fractions.

Burn-up in the top panel of Fig. 1 is defined as $burn-up = \frac{W \cdot D}{M_{U}},$ (4) where W is the average power of the fuel element, D is the number of days since the fuel element begins to burn in the core, and $M_{U}$ is the initial uranium mass of the fuel element. In this study, $M_{U}$ is supposed to be 72 tons.

The uncertainties of the fission fractions of the four major fissile isotopes are assumed to be 5%, as in the Daya Bay experiment, and the correlation matrix of the uncertainties is from Ref. [20], which was extracted from the simulations of a typical PWR. The energies released per fission are from Ref. [21]. All the uncertainties are assumed to be time-correlated in this study.

Due to the fuel evolution of the four major fissile isotopes, the ${\bar{ν}}_{e}$ emitted from the reactor core changes as a function of time. The bottom panel of Fig. 1 shows the reactor antineutrino spectrum evolution of nine reactor fuel cycles over 657 weeks. These spectra are treated as measurement data from the virtual experimental antineutrino detector, which contain information on the reactor evolution. The individual fissile isotope antineutrino spectra are decomposed from these observed spectra by utilizing the reactor information listed in Table 1, which uses typical values similar to those in the Daya Bay experiment, and in a real reactor antineutrino experiment is provided by the nuclear power plant.

Notably, the IBD cross section σ(Eν) and the isotope antineutrino spectrum Si(Eν) are coupled with antineutrino energy in Eq. (3). The IBD yield per fission from individual isotopes could be defined as $σ_{i} (E_{ν}) = σ (E_{ν}) \cdot S_{i} (E_{ν}) i = (235, 238, 239, 241),$ (5) which is the isotope spectrum to be decomposed in this study, as the Daya Bay experiment did [10]. In the Huber-Mueller model case, σi(Eν) is denoted by $σ_{i}^{HM} (E_{ν})$ .

Thus, the predicted ${\bar{ν}}_{e}$ spectrum can be denoted as the combination of σi(Eν) and the coefficient ki(Eν): $S_{d} (E_{ν}, t) = \sum_{i} k_{i} (E_{ν}, t) \cdot σ_{i} (E_{ν}),$ (6) where coefficient ki(Eν, t) is the multiplication of a set of experimental parameters referring to Eq. (3): $k_{i} (E_{ν}, t) = N_{p} \cdot ϵ \cdot \frac{P_{sur} (E_{ν}, L)}{4 π L^{2}} \cdot \frac{W (t)}{\sum_{l} f_{l} (t) \cdot e_{l}} \cdot f_{i} (t) .$ (7) Assuming the virtual experiment had run for nine fuel cycles (∼4600 days), information on the reactor thermal power and antineutrino spectrum is collected weekly during the operation. As a result, a list of coefficients and ${\bar{ν}}_{e}$ observations varying with time is provided (see the bottom panel of Fig. 1).

Configurations of Convolutional Neural Network

Among the many methods of machine learning, the CNN is commonly applied to extract the shift-invariant features of data with its specialized convolutional layer. In the reactor antineutrino spectrum decomposition study, the isotope antineutrino spectra are time-invariant in the reactor evolution data. Thus, the CNN method might be a suitable approach for extracting the isotope antineutrino spectra. To extract the isotope spectra from the simulation dataset, we constructed a one-dimensional CNN. To explicitly describe the CNN model we constructed, before our introduction of the CNN, we introduce the data structures required by the CNN model, the operation performed on data, and some key concepts of the CNN, which are summarized in Table 2. The dataset of the virtual experiment is organized sample by sample that is tagged with time in Table 2, such as t₁, t₂, ..., tn, for each week. The CNN model splits the periodical experimental measurement data (one week) to create a training sample. The ’Coefficient’ columns of Table 2 is the key input of the CNN, in which the coefficient kti is calculated using Eq. (7) from the virtual experiment for week t and isotope i, week by week. The central part of the CNN is the convolutional kernel, a small matrix for feature extraction, defined as (σ₂₃₅, σ₂₃₈, σ₂₃₉, σ₂₄₁), as shown in the second row of Table 2, representing the respective isotope spectra in Eq. (5). A linear operation, called convolution in a CNN, is performed on the convolutional kernel and input data to generate the output data in Table 2, as shown in the second row of the ’Expectation’ column. The output is the expected antineutrino spectrum in Eq. (6). The convolutional operation is performed sample by sample across the entire dataset; in other words, the convolutional kernel (σ₂₃₅, σ₂₃₈, σ₂₃₉, σ₂₄₁) in the table slides along the timeline and combines with each row of coefficients to predict the ${\bar{ν}}_{e}$ spectrum outcome. Such a process returns a list of calculation outputs (’Expectation’ column), which is compared with the label data, the ${\bar{ν}}_{e}$ spectrum observed by the detector (’Observation’ column). Notably, entries in Table 2 focus on the same energy bin. In this study, the neutrino energy bins range from 2 to 8 MeV, and each of them covers a range of 0.25 MeV; thus, there are 24 energy bins.

Virtual experiment dataset and convolutional operation.

Sample	Coefficient				Expectation	Observation
$t_{1}$	$k_{15}$	$k_{18}$	$k_{19}$	$k_{11}$	$\sum_{i} k_{1 i} σ_{i}$	$S^{obs} (t_{1})$
$t_{2}$	k25×σ₂₃₅	k28×σ₂₃₈	k29×σ₂₃₉	k21×σ₂₄₁	$\sum_{i} k_{2 i} σ_{i}$	$S^{obs} (t_{2})$
⋯	$\dots \dots$				⋯	⋯
tn	$k_{n 5}$	$k_{n 8}$	$k_{n 9}$	$k_{n 1}$		$S^{obs} (t_{n})$
Input					Output	Label

The CNN aims to learn from reactor antineutrino experimental data to fit the isotope spectra by updating its convolutional kernel. Because this study divides the energy range into 24 bins from 2 to 8 MeV, a corresponding number of convolutional kernels are employed.

The architecture of the constructed CNN model is shown in Fig. 2. This CNN model comprises three layers: the convolutional layer, the flatten layer, and the fully connected layer. The convolutional layer is where most computations occur. This requires input data (rectangles on the left side of Fig. 2) and convolutional kernels (the shaded patch on the bottom left). The input data are from the simulation coefficients, as shown in Table 2. For each energy bin, the coefficient table and the respective convolutional kernels perform the convolutional operation, and the outcomes, representing the expected ${\bar{ν}}_{e}$ , are conveyed to the second layer (bars on the middle side and marked as feature maps). Next, the flattening operation is applied to transform the multidimensional data into one dimension. Such a flattening operation is commonly used in the transition from the convolutional layer to the fully connected layer. The last layer (bars on the right side), the fully connected layer, outputs the flattened results as the expectation of ${\bar{ν}}_{e}$ . Later, the CNN compares the output values with the corresponding ${\bar{ν}}_{e}$ label data and begins its training process via the so-called back-propagation method. The purpose of back-propagation is to make the output values as close as possible to the label values. During the training process, the CNN repeats back-propagation many times, and in this manner, the parameters of the convolutional kernel (σ₂₃₅, σ₂₃₈, σ₂₃₉, σ₂₄₁) are adjusted to their best fit values by iteration. Unlike the conventional neural networks, described as a black box model, this CNN model is interpretable, where the convolutional kernel components carry the information of isotope spectra, the inputs corresponding to the convolutional kernel components represent the fission rates of the four isotopes, and the outputs simulate the predicted ${\bar{ν}}_{e}$ spectra.

Fig. 2

(Color online) Architecture of the one-dimensional convolutional neural network. It takes the coefficients of isotope spectra as inputs and performs the convolutional operation by sliding the convolutional kernels over the inputs to form the convolutional layer. The convolutional results (feature maps) are passed to the next layer (flatten layer) and converted from tensor values to scalar values. The last layer (fully connected layer) of the CNN outputs the expectations of antineutrino spectrum in the detector.

Once the architecture of the CNN has been built, the next step is tuning the hyperparameters of the CNN model. Hyperparameters are configurations used to control the training process, for example, the objective function, optimizer, and learning rate. Hyperparameters are usually set before data training; therefore, we should find their appropriate configurations before our real decomposition work. This hyperparameter tuning process is called pre-training to distinguish it from the subsequent training procedure of our real decomposition work, in which the hyperparameters are configured properly. However, one of the most challenging limitations is that the hyperparameters cannot be estimated directly from the data and must be specified manually. Generally, there is no golden rule, and searching for the best hyperparameters is conducted by trial and error.

During the pre-training process, the simulation dataset fed into the CNN is noiseless, and systematic uncertainties of parameters in Table 1 are assumed to be zero. In other words, measurements of the virtual experimental parameters are regarded as being sufficiently precise to suppress the noise effects. Such efforts enable the CNN model to determine the most suitable hyperparameters.

Our computation is conducted on a server cluster consisting of a group of computers with 16-core CPUs. The cluster provides support for up to 500 multi-core jobs for our study. Thus, we are able to decompose from 500 Monte Carlo datasets simultaneously [22]. The pre-training of the CNN is implemented in Keras 2.3, a user-friendly framework that provides a Python frontend for researchers, and Keras uses the TensorFlow platform as its backend. These two tools provide sufficient standard modules for users to build and train the neural network models; thus, our coding is mainly based on the standard modules of the two packages. However, we need to develop a new objective function prototype for our study, which we will explain in detail later. With this cluster and the two packages, our computation process requires ∼300 Mbytes of memory and ∼5 hours for each decomposition task.

For decomposing the individual isotope spectra from the data, the CNN requires an objective function to optimize the neural network parameter σi by reducing the difference between the output result and the label data. For general regression problems of a CNN, the mean squared error (MSE) is the conventional choice, in which no uncertainties are considered. However, in this study, an objective function in the form of the χ² function is constructed by considering the statistical uncertainty and the uncertainties introduced by ²³⁸U and ²⁴¹Pu, commonly used in high energy physics analysis. The χ² function is defined as $\begin{matrix} J (E_{ν}, σ) = \sum_{j = 1}^{n} \frac{{(S_{j}^{obs} (E_{ν}) - S_{j}^{exp} (E_{ν}))}^{2}}{S_{j}^{exp} (E_{ν})} \\ + \frac{{(σ_{238} (E_{ν}) - σ_{238}^{HM} (E_{ν}))}^{2}}{{(σ_{238}^{HM} (E_{ν}) \times 15 %)}^{2}} + \frac{{(σ_{241} (E_{ν}) - σ_{241}^{HM} (E_{ν}))}^{2}}{{(σ_{241}^{HM} (E_{ν}) \times 10 %)}^{2}}, \end{matrix}$ (8) where j is the sample index, and $S_{j}^{obs} (E_{ν})$ is the observed ${\bar{ν}}_{e}$ spectrum of the j-th sample and assumed to be a Gaussian distributed variable. $S_{j}^{exp} (E_{ν})$ is the expected ${\bar{ν}}_{e}$ spectrum of the j-th sample, which is calculated by the CNN using the convolutional operation as follows: $S_{j}^{exp} (E_{ν}) = \sum_{i} k_{i} (E_{ν}, t_{j}) \cdot σ_{i} (E_{ν}) .$ (9) The first term of Eq. (8) is a likelihood function that measures the distance from the predicted ${\bar{ν}}_{e}$ value to its labeled observation value. As aforementioned, the CNN reduces the difference by iteratively updating its network parameters. The other parts of Eq. (8) are the penalty terms that allow the CNN to use a priori constraints on σ₂₃₈ and σ₂₄₁ with their uncertainties. Because the fission fractions of ²³⁸U and ²⁴¹Pu are small and the fuel evolution is not sensitive to the two isotopes, they are treated as penalty terms. Using the Huber-Mueller model as their priors, the shape uncertainties of ²³⁸U and ²⁴¹Pu are assigned values of 15% and 10%, respectively.

During training, the neural network uses an iterative algorithm (called optimizer) to minimize the objective function and adjust its internal network parameters. In this study, the CNN implements the adaptive moment estimation (Adam) method as its optimizer, which facilitates the computation of learning rates by using the first and second moments of the gradient [23, 24].

Initially, the CNN parameter σi is as follows: $σ_{i} (E_{ν}) = \frac{1}{4} \sum_{l} σ_{l}^{HM} (E_{ν}), (l = 235, 238, 239, 241) .$ (10) Sometimes, the starting point of the parameter is crucial for the training result because the optimizer of the neural network is susceptible to finding the local optima solution and becoming stuck with some of these points. Hence, to examine the influence of different initial parameter values, a 50% uncertainty is assigned to σi in Eq. (10) as the initialization test, and the results are almost identical. This shows that the CNN model is not sensitive to the parameter initialization schemes in this study.

Based on the objective function and optimizer, the neural network follows the specified algorithm to iteratively update its parameters. Controlling the speed of parameter value change (called learning rate) is important because a learning rate that is too large might cause the model to converge too quickly to a local optima solution, whereas a rate that is too small would result in the process being stuck. In this study, the learning rates of the CNN parameters follow the schedule shown in the top panel of Fig. 3, where the learning rates appear to be functions of the epoch. High energy parameters are configured with a smaller learning rate than those of low energy, mainly because isotope spectra exhibit minor values at high energy; and therefore, the CNN requires increased accuracy in control in these areas.

Fig. 3

(Color online) (Top panel) Learning rates schedule of the CNN. All learning rates are divided into two groups, among which those of the low energy region (

⩽ 6

MeV) decay slower with the epoch than those of the high energy region (>6 MeV). (Bottom panel) The superposition of the decomposed results from thousands of training. With the increasing epoch, the verification factor gradually approaches 100%. After the epoch exceeds 1500, the decomposed results tend to be steady.

An epoch refers to training the neural network with all training data for one cycle. It consists of one or more batches, where a part of the dataset is used as the input. The number of samples in a batch is called batch size. In this study, the batch size is set to four samples, hence, four weeks of data are passed into the CNN between each iteration of the parameters.

When the CNN prepares to train the data, the number of training cycles (called epochs) should be set before the training starts. However, determining the exact optimal number of epochs for the model is difficult. Depending on the network model and the various datasets, we must determine when the parameters are converged and when the CNN model should stop its training process. Regarding machine learning, on the one hand, we might have the over-fitting problem, where the neural network model fits perfectly to the training data but has poor generalization performance to new data, usually caused by an excess number of training cycles. On the other hand, we might have the under-fitting problem if the model does not sufficiently learn the data, usually due to a low epoch number. In determining whether a neural network model has converged, the common practice is to examine the variation in the training results with epochs. If the number of epochs is set too low, the training process terminates before the model converges. By contrast, if the number of epochs is set too high, the model is probably over-fitting. Thus, the number of epochs should be considered.

For evaluating and visualizing the effectiveness of the CNN decomposition method, a verification factor is defined as $Rratio_{i} (E_{ν}) = \frac{σ_{i} (E_{ν})}{σ_{i}^{HM} (E_{ν})} \times 100 %,$ (11) which is the ratio between the predicted isotope spectrum and the truth spectrum.

In this study, we evaluate the influence from the configuration of different epochs, by conducting thousands of training processes and superposing their results in one plot, as shown in the bottom panel of Fig. 3, where the X-axis represents the training cycle number, the Y-axis represents the verification factor, and the color of the data represents the frequency of the training results. When the epoch reaches the number of ∼1500, the verification factor stably converges to nearly 100%. Conservatively, the number of epochs is set to 2000 cycles.

After the hyperparameters have been determined, we complete the pre-training process and establish the entire CNN model. Maintaining the same configurations, we prepare to test the decomposition performance of the CNN with the experimental data. In this study, the simulation data are used instead.

Results of decomposition

Using the aforementioned hyperparameter configurations, the CNN decomposes the individual isotope spectra from both noiseless and noisy simulation datasets. In this study, we mainly examine the unbiasedness and uncertainties of the decomposition results by using the CNN method.

Using noiseless datasets, in which both systematic uncertainties and statistical error are ignored, the decomposition is implemented 1000 times, and the extracted spectra samples are compared with the truth values to evaluate the bias and uncertainties. As shown in Fig. 4, the ratios of the mean values of the extracted spectra samples to the truth spectra are presented as data points; and the deviations are less than 0.1%, which can be ignored; and the decomposed isotope spectra can be regarded as unbiased. The tiny error bars represent the uncertainties introduced by the CNN model, and they are obtained by calculating the standard deviations of the ratios of the extracted spectra samples to the truth spectra.

Fig. 4

(Color online) Verification of the unbiasedness of the CNN method. The data point shows the ratio of the decomposed isotope spectrum to the truth spectrum.The error bar in the data point represents the decomposition uncertainty. For reasons of contrast, three of the curves are shifted down. Originally all curves are centered at 100%.

When considering the noise effects, the statistical error and the systematic uncertainties are assigned to the experimental measurements, by applying the Poisson fluctuation and the systematic uncertainties in Table 1, respectively. One thousand different noisy datasets are generated with these uncertainties, from which the individual isotope spectra are extracted, and the decomposition results vary under the noise disturbance. The mean value and the standard deviation of the whole decomposition results are shown in Fig. 5.

Fig. 5

(Color online) (Top panel) The decomposed ²³⁵U and ²³⁹Pu spectrum. The error bar in the data point is the square root of the diagonal term of the decomposed spectrum covariance matrix. (Bottom panel) Ratio of the decomposed spectrum to the truth spectrum. The data point of ²³⁵U is displaced for visual clarity of error bars.

Because ²³⁸U and ²⁴¹Pu spectrum are treated as prior knowledge, this study presents the decomposition results of ²³⁵U and ²³⁹Pu, whose fitting is principally driven by the simulation experimental data. As shown in the bottom panel of Fig. 5, decomposition results of both isotopes deviate from the truth spectrum by less than 0.3%, which is practically unbiased. The decomposed ²³⁵U spectrum has a smaller uncertainty than the ²³⁹Pu spectrum, mainly because ²³⁵U is the primary contributor of reactor ${\bar{ν}}_{e}$ , and it provides the largest number of antineutrino events.

Conclusion and Discussion

In summary, we propose a machine learning approach to decompose ²³⁵U and ²³⁹Pu isotope antineutrino spectrum from the evolution data of a simulated reactor antineutrino experiment. The CNN decomposition method is applied to noiseless and noisy datasets considering the main uncertainties of a reactor antineutrino experiment, and the validation tests show that the deviations of the decomposed spectra are less than 0.1% and 0.3%, respectively, and thus, could be viewed as unbiased. The uncertainty introduced by the CNN method is less than 0.1%, and the statistical and systematic uncertainties can be evaluated using the Monte Carlo method.

The CNN decomposition method is applicable to realistic commercial reactor antineutrino experiments as well because the physical principles of ${\bar{ν}}_{e}$ emission and detection in these reactor antineutrino experiments are almost the same as those in the virtual experiment designed in this study. Unlike the virtual experiment, realistic experiments commonly employ multiple reactors and detectors; thus, the coefficient ki(Eν, t) formula defined in Eq. (7) should be replaced by the effective coefficients for different reactors. The effective coefficient is calculated as $k_{i d} (E_{ν}, t) = \frac{N_{d} \cdot ϵ_{d}}{4 π} \cdot \sum_{r} \frac{W_{r} (t) \cdot f_{i r} (t) \cdot P_{sur} (E_{ν}, L_{r d})}{L_{r d}^{2} \cdot \sum_{l} f_{l r} (t) \cdot e_{l}},$ (12) where the subscript d is the detector index, r is the reactor index, Eν is the ${\bar{ν}}_{e}$ energy, Nd is the target proton number, ϵd is the detection efficiency, Lrd is the distance from reactor r to detector d, P_sur(Eν,Lrd) is the ${\bar{ν}}_{e}$ survival probability, Wr(t) is the thermal power of reactor r, el is the energy released per fission for isotope l, and flr is the fission fraction of reactor r for isotope l. This is simply the summation of the coefficient contributions from individual reactors.

Due to the various experimental operation times and baseline scales ranging from ∼10 m to ∼1000 m, the number of the observed ${\bar{ν}}_{e}$ over a period and an experiment could be very different. Thus, we could merge the periodic measurement data, and rearrange them into new groups, to ensure the antineutrino event rate of a sample on the same scale as this study and to guarantee the validity of the χ² objective function. Implementing such efforts would make the CNN method applicable to realistic experimental cases.

In addition, the decomposition in this study is applied directly to the antineutrino spectrum. However, in realistic reactor antineutrino experiments, the energy spectrum of ${\bar{ν}}_{e}$ is detected and converted via the visible prompt energy. The prompt energy is related to the antineutrino energy as follows: $E_{p} \approx E_{{\bar{ν}}_{e}} - 0.78 MeV .$ (13) Therefore, before the step of decomposition with the CNN method, we transfer the measured prompt spectrum to the ${\bar{ν}}_{e}$ spectrum (commonly called unfolding), which, in principle, can be integrated into the layers of the CNN. We plan to append extra neural network layers to the established CNN model in our future studies to accomplish the unfolding analysis.

In the near future, very short-baseline reactor antineutrino experiments are expected to measure the reactor antineutrino spectrum with higher precision and energy resolution. The promising decomposition approach introduced and well demonstrated in this paper could be applied in these experiments to provide the most up-to-date individual isotope antineutrino spectra.

References

[1]

Th. A. Mueller, D. Lhuillier, M. Fallot et al.,

Improved predictions of reactor antineutrino spectra

. Phys. Rev. C 83, 054615 (2011). doi: 10.1103/PhysRevC.83.054615