Introduction
Significant deviations between the Huber-Mueller model and experimental isotope antineutrino spectra have been demonstrated, causing a ∼6% deficit in the reactor antineutrino flux (the so-called Reactor Antineutrino Anomaly, RAA) and an excess of reconstructed positron signal events in the 4-6 MeV (the so-called 5-MeV bump) [1-6]. Determining the origin of the reactor antineutrino rate and shape anomaly is critical, especially for understanding nuclear physics and improving nuclear databases for fundamental and application research. Relevant experimental and theoretical efforts have been made to solve the aforementioned problem, including attempting to determine the individual isotope contributions of reactor
Determining individual isotope antineutrino spectra can also play an important role in nuclear safeguards. The International Atomic Energy Agency (IAEA) cooperates with neutrino physicists to develop new approaches for reactor monitoring methods by observing the
Only the Daya Bay experiment has published the reactor isotope antineutrino spectra by using two methods, the minimum χ2 and Markov Chain Monte Carlo (MCMC) methods, and has obtained consistent results. The minimum χ2 method is a statistical inference method that minimizes the χ2 statistic, which is constructed in the form of a χ2 function. The χ2 function
The extraction of isotope antineutrino spectra has been studied in reactor neutrino physics, and there is no convincing answer to RAA; nevertheless, we consider it beneficial to study the applications of new methods. Here, we propose a new method that uses a convolutional neural network (CNN) to decompose primary fissile isotope antineutrino spectra by fitting the weekly detected antineutrino spectrum as a function of the individual isotope fission fractions. A CNN is a network model for machine learning, which provides an optimal architecture for detecting key features in images and time series data. It has broad applications in, for example, computer vision and natural language processing [14-17]. And it has been used in certain physics research fields to extract information from experimental data and fit the model parameters [18]. Notably, the established decomposition methods, such as the minimum χ2 and MCMC methods, are offline algorithms. Thus, the analysis results must be updated from scratch as new data arrives, which is a waste of time, especially for long-term experiments. Second, the minimum χ2 method and the MCMC method have to load the entire dataset into the computer memory, requireing a large amount of computer memory when dealing with big data, for example, with many reactor burning cycles and detailed reactor information, making these methods unusable. Moreover, they usually resample from the original data to reduce the size of the dataset. However, the processing may introduce the loss of information and bias in the analysis. By contrast, the CNN approach is an online algorithm [19]. The advantage of online updating is that analysis can be performed without access to the historical data; thus, overcoming the storage and computation limitations is possible in some cases. In addition, the proposed method makes full use of the data without causing excessive information loss. This provides an additional machine learning technology for the decomposition of reactor fissile isotope spectra and can be used for neutrino spectrum analysis in future reactor antineutrino experiments.
Setup of the virtual experiment
In this section, we describe a virtual reactor antineutrino experiment to produce the simulation dataset for the proposed CNN method for training and testing.
Suppose there is a virtual experiment with a one-reactor one-detector layout, where the reactor is a type of pressurized water reactor (PWR) and the sole source of the
For the virtual experiment, the isotope antineutrino spectra Si(Eν) are assumed to be the same as those in the Huber-Mueller model, denoted by
In addition to Table 1, the fission fraction evolution of a fuel cycle is presented in the top panel of Fig. 1, where the fission fractions of the four major fissile isotopes are shown as a function of the burn-up. For PWR, the reactor core usually consists of three batches of fuel assemblies with different ages, and usually, one-third of the old batches are replaced by fresh fuel at the end of a refueling cycle. During the reactor burning time, the fissile isotopes are mainly depleted by fission, decay, and neutron capture processes. Some of them, such as plutonium isotopes, are also generated by the neutron captures and decays from the mother nuclei in the reaction chains. The depletion and generation of the fissile isotopes are essential for the evolution of the reactor fuels.
-202305/1001-8042-34-05-015/alternativeImage/1001-8042-34-05-015-F001.jpg)
Burn-up in the top panel of Fig. 1 is defined as
The uncertainties of the fission fractions of the four major fissile isotopes are assumed to be 5%, as in the Daya Bay experiment, and the correlation matrix of the uncertainties is from Ref. [20], which was extracted from the simulations of a typical PWR. The energies released per fission are from Ref. [21]. All the uncertainties are assumed to be time-correlated in this study.
Due to the fuel evolution of the four major fissile isotopes, the
Notably, the IBD cross section σ(Eν) and the isotope antineutrino spectrum Si(Eν) are coupled with antineutrino energy in Eq. (3). The IBD yield per fission from individual isotopes could be defined as
Thus, the predicted
Configurations of Convolutional Neural Network
Among the many methods of machine learning, the CNN is commonly applied to extract the shift-invariant features of data with its specialized convolutional layer. In the reactor antineutrino spectrum decomposition study, the isotope antineutrino spectra are time-invariant in the reactor evolution data. Thus, the CNN method might be a suitable approach for extracting the isotope antineutrino spectra. To extract the isotope spectra from the simulation dataset, we constructed a one-dimensional CNN. To explicitly describe the CNN model we constructed, before our introduction of the CNN, we introduce the data structures required by the CNN model, the operation performed on data, and some key concepts of the CNN, which are summarized in Table 2. The dataset of the virtual experiment is organized sample by sample that is tagged with time in Table 2, such as t1, t2, ..., tn, for each week. The CNN model splits the periodical experimental measurement data (one week) to create a training sample. The ’Coefficient’ columns of Table 2 is the key input of the CNN, in which the coefficient kti is calculated using Eq. (7) from the virtual experiment for week t and isotope i, week by week. The central part of the CNN is the convolutional kernel, a small matrix for feature extraction, defined as (σ235, σ238, σ239, σ241), as shown in the second row of Table 2, representing the respective isotope spectra in Eq. (5). A linear operation, called convolution in a CNN, is performed on the convolutional kernel and input data to generate the output data in Table 2, as shown in the second row of the ’Expectation’ column. The output is the expected antineutrino spectrum in Eq. (6). The convolutional operation is performed sample by sample across the entire dataset; in other words, the convolutional kernel (σ235, σ238, σ239, σ241) in the table slides along the timeline and combines with each row of coefficients to predict the
Sample | Coefficient | Expectation | Observation | |||
---|---|---|---|---|---|---|
k25×σ235 | k28×σ238 | k29×σ239 | k21×σ241 | |||
⋯ | ⋯ | ⋯ | ||||
tn | ||||||
Input | Output | Label |
The CNN aims to learn from reactor antineutrino experimental data to fit the isotope spectra by updating its convolutional kernel. Because this study divides the energy range into 24 bins from 2 to 8 MeV, a corresponding number of convolutional kernels are employed.
The architecture of the constructed CNN model is shown in Fig. 2. This CNN model comprises three layers: the convolutional layer, the flatten layer, and the fully connected layer. The convolutional layer is where most computations occur. This requires input data (rectangles on the left side of Fig. 2) and convolutional kernels (the shaded patch on the bottom left). The input data are from the simulation coefficients, as shown in Table 2. For each energy bin, the coefficient table and the respective convolutional kernels perform the convolutional operation, and the outcomes, representing the expected
-202305/1001-8042-34-05-015/alternativeImage/1001-8042-34-05-015-F002.jpg)
Once the architecture of the CNN has been built, the next step is tuning the hyperparameters of the CNN model. Hyperparameters are configurations used to control the training process, for example, the objective function, optimizer, and learning rate. Hyperparameters are usually set before data training; therefore, we should find their appropriate configurations before our real decomposition work. This hyperparameter tuning process is called pre-training to distinguish it from the subsequent training procedure of our real decomposition work, in which the hyperparameters are configured properly. However, one of the most challenging limitations is that the hyperparameters cannot be estimated directly from the data and must be specified manually. Generally, there is no golden rule, and searching for the best hyperparameters is conducted by trial and error.
During the pre-training process, the simulation dataset fed into the CNN is noiseless, and systematic uncertainties of parameters in Table 1 are assumed to be zero. In other words, measurements of the virtual experimental parameters are regarded as being sufficiently precise to suppress the noise effects. Such efforts enable the CNN model to determine the most suitable hyperparameters.
Our computation is conducted on a server cluster consisting of a group of computers with 16-core CPUs. The cluster provides support for up to 500 multi-core jobs for our study. Thus, we are able to decompose from 500 Monte Carlo datasets simultaneously [22]. The pre-training of the CNN is implemented in Keras 2.3, a user-friendly framework that provides a Python frontend for researchers, and Keras uses the TensorFlow platform as its backend. These two tools provide sufficient standard modules for users to build and train the neural network models; thus, our coding is mainly based on the standard modules of the two packages. However, we need to develop a new objective function prototype for our study, which we will explain in detail later. With this cluster and the two packages, our computation process requires ∼300 Mbytes of memory and ∼5 hours for each decomposition task.
For decomposing the individual isotope spectra from the data, the CNN requires an objective function to optimize the neural network parameter σi by reducing the difference between the output result and the label data. For general regression problems of a CNN, the mean squared error (MSE) is the conventional choice, in which no uncertainties are considered. However, in this study, an objective function in the form of the χ2 function is constructed by considering the statistical uncertainty and the uncertainties introduced by 238U and 241Pu, commonly used in high energy physics analysis. The χ2 function is defined as
During training, the neural network uses an iterative algorithm (called optimizer) to minimize the objective function and adjust its internal network parameters. In this study, the CNN implements the adaptive moment estimation (Adam) method as its optimizer, which facilitates the computation of learning rates by using the first and second moments of the gradient [23, 24].
Initially, the CNN parameter σi is as follows:
Based on the objective function and optimizer, the neural network follows the specified algorithm to iteratively update its parameters. Controlling the speed of parameter value change (called learning rate) is important because a learning rate that is too large might cause the model to converge too quickly to a local optima solution, whereas a rate that is too small would result in the process being stuck. In this study, the learning rates of the CNN parameters follow the schedule shown in the top panel of Fig. 3, where the learning rates appear to be functions of the epoch. High energy parameters are configured with a smaller learning rate than those of low energy, mainly because isotope spectra exhibit minor values at high energy; and therefore, the CNN requires increased accuracy in control in these areas.
-202305/1001-8042-34-05-015/alternativeImage/1001-8042-34-05-015-F003.jpg)
An epoch refers to training the neural network with all training data for one cycle. It consists of one or more batches, where a part of the dataset is used as the input. The number of samples in a batch is called batch size. In this study, the batch size is set to four samples, hence, four weeks of data are passed into the CNN between each iteration of the parameters.
When the CNN prepares to train the data, the number of training cycles (called epochs) should be set before the training starts. However, determining the exact optimal number of epochs for the model is difficult. Depending on the network model and the various datasets, we must determine when the parameters are converged and when the CNN model should stop its training process. Regarding machine learning, on the one hand, we might have the over-fitting problem, where the neural network model fits perfectly to the training data but has poor generalization performance to new data, usually caused by an excess number of training cycles. On the other hand, we might have the under-fitting problem if the model does not sufficiently learn the data, usually due to a low epoch number. In determining whether a neural network model has converged, the common practice is to examine the variation in the training results with epochs. If the number of epochs is set too low, the training process terminates before the model converges. By contrast, if the number of epochs is set too high, the model is probably over-fitting. Thus, the number of epochs should be considered.
For evaluating and visualizing the effectiveness of the CNN decomposition method, a verification factor is defined as
In this study, we evaluate the influence from the configuration of different epochs, by conducting thousands of training processes and superposing their results in one plot, as shown in the bottom panel of Fig. 3, where the X-axis represents the training cycle number, the Y-axis represents the verification factor, and the color of the data represents the frequency of the training results. When the epoch reaches the number of ∼1500, the verification factor stably converges to nearly 100%. Conservatively, the number of epochs is set to 2000 cycles.
After the hyperparameters have been determined, we complete the pre-training process and establish the entire CNN model. Maintaining the same configurations, we prepare to test the decomposition performance of the CNN with the experimental data. In this study, the simulation data are used instead.
Results of decomposition
Using the aforementioned hyperparameter configurations, the CNN decomposes the individual isotope spectra from both noiseless and noisy simulation datasets. In this study, we mainly examine the unbiasedness and uncertainties of the decomposition results by using the CNN method.
Using noiseless datasets, in which both systematic uncertainties and statistical error are ignored, the decomposition is implemented 1000 times, and the extracted spectra samples are compared with the truth values to evaluate the bias and uncertainties. As shown in Fig. 4, the ratios of the mean values of the extracted spectra samples to the truth spectra are presented as data points; and the deviations are less than 0.1%, which can be ignored; and the decomposed isotope spectra can be regarded as unbiased. The tiny error bars represent the uncertainties introduced by the CNN model, and they are obtained by calculating the standard deviations of the ratios of the extracted spectra samples to the truth spectra.
-202305/1001-8042-34-05-015/alternativeImage/1001-8042-34-05-015-F004.jpg)
When considering the noise effects, the statistical error and the systematic uncertainties are assigned to the experimental measurements, by applying the Poisson fluctuation and the systematic uncertainties in Table 1, respectively. One thousand different noisy datasets are generated with these uncertainties, from which the individual isotope spectra are extracted, and the decomposition results vary under the noise disturbance. The mean value and the standard deviation of the whole decomposition results are shown in Fig. 5.
-202305/1001-8042-34-05-015/alternativeImage/1001-8042-34-05-015-F005.jpg)
Because 238U and 241Pu spectrum are treated as prior knowledge, this study presents the decomposition results of 235U and 239Pu, whose fitting is principally driven by the simulation experimental data. As shown in the bottom panel of Fig. 5, decomposition results of both isotopes deviate from the truth spectrum by less than 0.3%, which is practically unbiased. The decomposed 235U spectrum has a smaller uncertainty than the 239Pu spectrum, mainly because 235U is the primary contributor of reactor
Conclusion and Discussion
In summary, we propose a machine learning approach to decompose 235U and 239Pu isotope antineutrino spectrum from the evolution data of a simulated reactor antineutrino experiment. The CNN decomposition method is applied to noiseless and noisy datasets considering the main uncertainties of a reactor antineutrino experiment, and the validation tests show that the deviations of the decomposed spectra are less than 0.1% and 0.3%, respectively, and thus, could be viewed as unbiased. The uncertainty introduced by the CNN method is less than 0.1%, and the statistical and systematic uncertainties can be evaluated using the Monte Carlo method.
The CNN decomposition method is applicable to realistic commercial reactor antineutrino experiments as well because the physical principles of
Due to the various experimental operation times and baseline scales ranging from ∼10 m to ∼1000 m, the number of the observed
In addition, the decomposition in this study is applied directly to the antineutrino spectrum. However, in realistic reactor antineutrino experiments, the energy spectrum of
In the near future, very short-baseline reactor antineutrino experiments are expected to measure the reactor antineutrino spectrum with higher precision and energy resolution. The promising decomposition approach introduced and well demonstrated in this paper could be applied in these experiments to provide the most up-to-date individual isotope antineutrino spectra.
Improved predictions of reactor antineutrino spectra
. Phys. Rev. C 83, 054615 (2011). doi: 10.1103/PhysRevC.83.054615Determination of antineutrino spectra from nuclear reactors
. Phys. Rev. C 84, 024617 (2011). doi: 10.1103/PhysRevC.84.024617Reactor antineutrino anomaly
. Phys. Rev. D 83, 073006 (2011). doi: 10.1103/PhysRevD.83.073006Improved measurements of the neutrino mixing angle θ13 with the Double Chooz detector
. J. High Energy Phys. 2014, 86 (2014). doi: 10.1007/JHEP10(2014)086(RENO Collaboration), New results from RENO and the 5 MeV excess
. AIP Conf. Proc. 1666, 080002 (2015). doi: 10.1063/1.4915563(Daya Bay Collaboration), Measurement of the Reactor Antineutrino Flux and Spectrum at Daya Bay
. Phys. Rev. Lett. 116, 061801 (2016). doi: 10.1103/PhysRevLett.116.061801(Daya Bay Collaboration), Evolution of the Reactor Antineutrino Flux and Spectrum at Daya Bay
. Phys. Rev. Lett. 118, 251801 (2017). doi: 10.1103/PhysRevLett.118.251801(PROSPECT Collaboration), Measurement of the Antineutrino Spectrum from 235U Fission at HFIR with PROSPECT
. Phys. Rev. Lett. 122, 251801 (2019). doi: 10.1103/PhysRevLett.122.251801Updated summation model: An improved agreement with the Daya Bay antineutrino fluxes
. Phys. Rev. Lett. 123, 022502 (2019). doi: 10.1103/PhysRevLett.123.022502(Daya Bay Collaboration), Extraction of the 235U and 239Pu Antineutrino Spectra at Daya Bay
. Phys. Rev. Lett. 123, 111801 (2019). doi: 10.1103/PhysRevLett.123.111801Technical Meeting on Nuclear Data for Antineutrino Spectra and their Applications
,Report of the Topical Group on Neutrino Applications for Snowmass 2021
. doi: 10.48550/arXiv.2209.07483Nu tools: Exploring practical roles for neutrinos in nuclear energy and security
. doi: 10.48550/arXiv.2112.12593CNN variants for computer vision: History, architecture, application, challenges and future scope
. Electronics. 10, 2470 (2021). doi: 10.3390/electronics10202470MeshCNN-based BREP to CSG conversion algorithm for 3D CAD models and its application
. Nucl. Sci. Tech. 33, 74 (2022). doi: 10.1007/s41365-022-01063-5Study on analytical noise propagation in convolutional neural network methods used in computed tomography imaging
. Nucl. Sci. Tech. 33, 77 (2022). doi: 10.1007/s41365-022-01057-3A non-invasive diagnostic method of cavity detuning based on a convolutional neural network
. Nucl. Sci. Tech. 33, 94 (2022) doi: 10.1007/s41365-022-01069-zWeak lensing cosmology with convolutional neural networks on noisy data
. Mon. Not. R. Astron. Soc. 490, 1843 (2019). doi: 10.1093/mnras/stz2610(Daya Bay Collaboration), Improved measurement of the reactor antineutrino flux and spectrum at Daya Bay
. Chin. Phys. C 41, 013002. doi: 10.1088/1674-1137/41/1/013002Improved calculation of the energy release in neutron-induced fission
. Phys. Rev. C 88, 014605. doi: 10.1103/PhysRevC.88.014605Distributed data processing platform of national high energy physics data center
. Frontiers of Data and Computing 4, 97 (2022). doi: 10.11871/jfdc.issn.2096-742X.2022.01.008 (in Chinese)Adam: A method for stochastic optimization
. Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015). https://www.iclr.cc/archive/www/2015.htmlAn overview of gradient descent optimization algorithms
. doi: 10.48550/arXiv.1609.04747The authors declare that they have no competing interests.