Nuclear mass based on the multi-task learning neural network method

NUCLEAR PHYSICS AND INTERDISCIPLINARY RESEARCH

Nuclear mass based on the multi-task learning neural network method

Xing-Chen Ming ，

Hong-Fei Zhang ，

Rui-Rui Xu，

Xiao-Dong Sun，

Yuan Tian，

Zhi-Gang Ge

Nuclear Science and Techniques

Vol.33, No.4

Article number 48

Published in print Apr 2022

Available online 25 Apr 2022

DOI：10.1007/s41365-022-01031-z

105803

The global nuclear mass based on the macroscopic-microscopic model was studied by applying a newly designed multitask-learning artificial neural network (MTL-ANN). First, the reported nuclear binding energies of 2095 nuclei (Z ≥ 8, N ≥ 8) released in the latest Atomic Mass Evaluation AME2020 and the deviations between the fitting result of the liquid drop model(LDM) and data from AME2020 for each nucleus were obtained. To compensate for the deviations and investigate the possible ignored physics in the LDM, the MTL-ANN method was introduced in the model. Compared to the single-task-learning (STL) method, this new network has a powerful ability to simultaneously learn multi-nuclear properties, such as the binding energies and single neutron and proton separation energies. Moreover, it is highly effective in reducing the risk of overfitting and achieving better predictions. Consequently, good predictions can be obtained using this nuclear mass model for both the training and validation datasets and for the testing dataset. In detail, the global root mean square (RMS) of the binding energy is effectively reduced from approximately 2.4 MeV of LDM to the current 0.2 MeV, and the RMS of S_n, S_p can also reach approximately 0.2 MeV. Moreover, compared to STL, for the training and validation sets, 3∼9% improvement can be achieved with the binding energy, and 20∼30% improvement for S_n, S_p; for the testing sets, the reduction in deviations can even reach 30∼40%, which significantly illustrates the advantage of the current MTL.

Macroscopic-microscopic modelBinding energyNeural networkMulti-task learning

Introduction

Nuclear mass is a fundamental quantity widely involved in various domain studies in nuclear science and engineering. Accurate masses are crucial not only to derive highly concerning nuclear shell information but also to quantify the procedure of nuclear reactions [1-4]. Thus, much interest has been drawn in the past several decades to obtain and improve nuclear mass values to meet the requirements of contemporary nuclear studies.

Nuclear researchers have been involved in this field, especially since the 1950s, and international cooperation has been established to create the well-known atomic mass evaluation (AME) motivated to provide a reliable database to the public, where the data are based on pure measurements and empirical extrapolation [5-7]. Significant success has been achieved with AME, and over 3500 nuclei have been evaluated, whereas there is a gap between the number of evaluated nuclei and the real requirements from high-fidelity simulation calculations regarding complex nuclear physics environments. Moreover, the uncertainties of AME are still worth concentrating on, for further improvements.

Therefore, many theoretical calculations based on microscopic mean-field models [8-11] and macroscopic-microscopic models [12-21] have been developed to obtain the global nuclear mass.The macroscopic-microscopic models start from the liquid drop model (LDM) and the correction energy terms, based on a special single-particle potential, which makes the calculation relatively simple compared with microscopic mean-field models. Meanwhile, the macroscopic-microscopic models show better performance than the global nuclear mass [5, 22]; therefore, they are normally considered more applicable in real evaluations.

In the scheme of macroscopic-microscopic models, the theoretical determination of the shell correction energy of the single-particle potential is complicated [23]. Normally, in real calculations, smoothing methods are necessary to deal with the single-particle potential, which influences the final results [22, 24]. To simplify this problem, a so-called “simple nuclear mass formula form” is proposed in [25], the linear polynomial functions are applied to replace the residual correction energy and the global root means square (RMS) successfully reaches 0.266 MeV compared to the original LDM of 2.456 MeV. Artificial neural networks (ANNs) have been proven to be excellent methods in many research regions [24]. It seems to be a better choice here than simple mathematical functions because of its powerful capability in dealing with complex problems.

The application of neural networks to predict nuclear masses can be traced back to the 1990s [5, 26]. The input layer of the neural networks is designed according to basic nuclear properties such as the proton, neutron number Z, N of target nuclei, and the relevant Z₀, N₀ of the nearest magic number, and the application of ANN in mass model calculations has been validated by many perfect outputs. Most previous studies on the ANN nuclear mass were constructed using the single-task-learning (STL) technique at the output layer, where the fluctuation of nuclear binding energy (δ_LDM) was taken as the only task guiding the entire training and testing procedures. Consequently, a better global RMS is obtained via STL; for example, the RMS can be reduced to 0.235 MeV in Ref. [24]. Other nuclear properties such as the single proton separation energy (S_p), single neutron separation energy (S_n), nuclear charge radii, and β-decay half-lives have also been studied using various artificial intelligence (AItools [27-31]. In most of the above-mentioned studies, AI methods are trained to learn one of the nuclear properties. The aforementioned properties are naturally correlated; therefore, novel AI methods that can study more than one property simultaneously should be developed. Accordingly, we attempted to involve more tasks in the ANN to study the neural network using deep learning and further reduce the global RMS.

In this study, an improved multi-task-learning (MTL) technique was created to integrate more crucial knowledge from nuclear physics into the neural network. A total of 2095 nuclei with fully evaluated nuclear properties in the AME were adopted in this new MTL-ANN. To include more tasks, S_n and S_p are included to provide information on the nuclear shell. In the input layer, we adopt the neurons with proton number, mass number, and the number of residual particles or holes relative to the closet magic shell for protons, and that for neutrons, which were applied in our previous study [24]. Moreover, we expand the current input layer by adding pairing terms with the expression in Ref. [18].

The remainder of this paper is organized as follows. In Sect. 2, first, a general LDM formula is used, and the general formalism and structure of the present applied neural network with the MTL technique, called MTL-ANN, are introduced, and a new mass method incorporating MTL-ANN is proposed. The results of the global analysis of 2095 nuclear binding energy with the novel MTL-ANN are presented in Sect. 3, and discussions on MTL-ANN parameters and optimizing procedures are illuminated in detail simultaneously. Finally, the summary of this study is provided in Sect. 4.

Macroscopic-microscopic model with Multi-Task-Learning technique

In the macroscopic-microscopic model scheme, the binding energy of a given nucleus with A mass and Z protons E(Z,A) can be assumed as the macroscopic binding energy with LDM E_LDM (Z,A) and the fluctuating part δ_LDM(Z,A)[32], $E (Z, A) = E_{LDM} (Z, A) + δ_{LDM} (Z, A),$ (1) the general form of E_LDM (Z,A) can be written as $E_{LDM} (Z, A) = a_{V} A + a_{S} A^{2 / 3} + V_{C} + a_{sym} I^{2} A + a_{pair} A^{- 1 / 3} δ_{n p},$ (2) where the coefficients of volume energy a_V, surface energy a_S, and pairing energy a_pair can be adjusted to determine E_LDM (Z,A); V_C is the Coulomb energy expressed as V_C=(a_C (Z(Z-1)))/(A^1/3 (1-Z^-2/3)) [13], where a_C is the adjustable parameter; and a_sym is the coefficient for the symmetry energy parameter, which is taken as $a_{sym} = c_{sym} (1 - κ / A^{1 / 3} + (2 - | I |) / (2 + | I | A)),$ (3) and c_sym and κ are free; δ_np is formed by $I = \frac{N - Z}{A}$ with relations [18] $δ_{n p} = {\begin{array}{l} 2 - | I |, & Z even, N even \\ | I |, & Z odd, N odd \\ 1 - | I |, & Z even, N odd, N < Z \\ 1 - | I |, & Z odd, N even, N > Z \\ 1, & Z even, N odd, N > Z \\ 1, & Z odd, N even, N < Z \end{array}$ (4) In this study, to determine the macroscopic-microscopic E_LDM (Z,A), the nuclear properties binding energy (E_AME), S_n, and S_p of 2095 nuclei (Z ≥ 8, N ≥ 8) within AME2020 were adopted to restrict the free parameters. The optimized coefficients were obtained as a_V=15.6829 MeV, a_S= -18.5264 MeV, a_pair=6.5149 MeV, a_C=-0.7170 MeV, c_sym=46.4121 MeV, and κ=-0.6460, and the current LDM RMS deviation was 2.4027 MeV.

Conversely, the fluctuating part of the binding energy (δ_LDM) is a necessary compensation in the macroscopic-microscopic model. Multitask learning has a strong ability to improve generalization using the domain information contained in the data, and the learned knowledge for one task can assist other tasks to be learned better [33]. Therefore, a novel MTL artificial neural network (MTL-ANN) was designed in this study to mimic δ_LDM more accurately using more related nuclear properties.

The MTL-ANN is a feedforward neural network. The structure of the MTL-ANN used in this study is illustrated in Fig. 1, the architecture consists of three layers: input, hidden, and output.

Fig. 1

(Color online) The structure of the present MTL-ANN.

Five features $(Z, A, | Z - Z_{0} |, | N - N_{0} |, δ_{np})$ are taken as the inputs, where Z, N, and A are the proton number, neutron number, and mass number of a given nucleus, respectively; Z₀ and N₀ are suitable magic numbers, assumed as 8, 20, 50, 82, and 126 for protons, and 8, 20, 50, 82, 126, and 184 for neutrons, and δ_np is the value in Eq. (4) used to describe nuclear pairing and shell effects.

As shown in Fig. 1, two hidden layers were defined to adequately share information, and 20 neurons were set in each hidden layer. The multitask outputs are obtained through training iterations between the hidden and output layers back and forth. Generally, it is assumed that the input vector is x=(x₁,x_2,……,xn), the obtained output vector is y=(y₁,y_2,……,ym), and n, m denotes the total number of inputs and tasks. The lth task yl can be written as $y_{l} = a_{l} + \sum_{k = 1}^{H_{2}} b_{l k} \tanh (c_{k} + \sum_{j = 1}^{H_{1}} d_{k j} \tanh (e_{j} + \sum_{i = 1}^{n} g_{j i} x_{i})),$ (5) where (al,ck,ej), and (blk,dkj,gji) denote the optimized bias and weight parameters for neurons between different layers; H₁ and H₂ are the number of neurons in each hidden layer. The optimized parameters are obtained through iterations for net training to obtain the minimized value of a defined loss function L in each iteration. For the ith iteration, Li can be expressed as $L_{i} = \sum_{l = 1}^{m} w_{l i} Δ_{l i},$ (6) $Δ_{l i} = \frac{\sum^{N} {(f_{l i} - t_{l})}^{2}}{N},$ (7) where N is the number of training data points, fli corresponds to the ANN output for the l-th task in the i-th iteration, and Δli is the related average deviation between fli and its target value tl. Further, Li is built by summing all the deviations Δli for all m tasks with the backpropagation factor wli, $w_{l} = \frac{Δ_{l i} / Δ_{(l, i - 1)}}{\sum_{l}^{m} (Δ_{l i} / Δ_{(l, i - 1)})}$ (8) In this study, three types of task were adopted to train the network. First, according to the experimental values from AME2020 [6, 7] and the theoretical results from LDM, T_B can be obtained as: $T_{B} = E_{AME} - E_{LDM}$ (9) and T_B is considered as a task for learning in Eq. (5) in the output layer. Consequently, when the minimized loss value is obtained, it is believed that y_B=T_B≡δ_LDM and the revised binding energy E_MTL for a nucleus is obtained as $E_{MTL} = E_{LDM} + y_{B}$ (10) In addition to T_B, two other properties related to nuclear mass, neutron, and proton separation energies S_n and S_p in AME2020 are adopted as two choices for the current multiple tasks. Similar to Equation (9), their target values can be obtained as T_sn and T_sp. Consequently, in the real network training process, four task groups: 1)T_B, 2)T_B and T_sn, 3)T_B and T_sp, and 4) T_B, T_sn and T_sp are classified to compare the effects of different types of data in AME2020.

In addition, it should be noted that a hard-sharing approach is employed in all the network neurons to build the full connections between the input, hidden, and output layers, which is believed to contain more nuclear mass physics in the training process and efficiently avoid overfitting [34]. Moreover, the limited-BFGS, an updated quasi-Newton method that can deal with large-number-parameter training [35], is used in backpropagation learning procedures to efficiently obtain the minimum loss value.

Result and Discussion

The newly designed MTL-ANN for the nuclear mass was used to analyze 2095 nuclei (Z ≥ 8, N ≥ 8)from the AME2020 database. In our calculation, 2095 nuclei were divided into three datasets for training, validation, and testing. All the data were sampled with a uniform distribution. In practice, 95 nuclei were first sampled from the data pool, which did not participate in neural network training. Subsequently, 1400 nuclei of training data were constructed by sampling stochastically from the remaining 2000 nuclei, and the remaining 600 nuclei were used for validation.

In our training process, the loss value can normally reach stable values after several hundred iterations. The convergence for data training is illustrated in Fig. 2. The loss for training reached a minimum after 200 iterations, and the corresponding validation value maintained a speed similar to that of the training. The stability of the two main procedures guarantees the correctness of the network.

Fig. 2

The variation of Loss values with the iterations in training and validation. The solid line indicates the loss in training for the multitasks (T_B T_Sn and T_Sp) of 1400 nuclei; the dashed line is the derived loss of 600 nuclei in validation using the trained network.

To examine the validity of the proposed model, four types of network were designed according to the tasks in the output layer. The RMS values of the binding energies (E_MTL), neutron separation energy (S_n), and proton separation energy (S_p) in the training, validation, and testing processes are listed in Table 1. The RMS is calculated as $χ = \sqrt{\frac{\sum^{N} {(E_{exp} - E_{calc})}^{2}}{N}}$ (11) where E_exp and E_calc indicate the experimental data and calculated results, respectively, and N is the total number of points of concern.

RMS of experimental data and MTL method results

Task group		Training and Validation
		RMS of E_MTL (MeV)	RMS of S_n (MeV)	RMS of S_p (MeV)
TYPE 1	TB	0.23	0.28*	0.32*
TYPE 2	TB and T Sn	0.21	0.21	0.24*
TYPE 3	TB and T Sp	0.21	0.23*	0.23
TYPE 4	TB, T Sn and T Sp	0.22	0.22	0.24
Task group		Testing
		RMS of E_MTL (MeV)	RMS of S_n (MeV)	RMS of S_p (MeV)
TYPE 1	T_B	0.31	0.34*	0.33*
TYPE 2	T_B and T_Sn	0.23	0.20	0.24*
TYPE 3	T_B and T_Sp	0.21	0.22*	0.23
TYPE 4	T_B, T_Sn and T_Sp	0.25	0.23	0.23

the data marked by * are the deduced values from the predicted binding energies

Compared to the simple LDM model, the RMS of the binding energy between the calculation and experimental data can be reduced sharply from 2.4027 MeV to the current of 0.2∼0.24 MeV. Moreover, the multi-task networks with TYPE 2: T_B and T_Sn, TYPE 3: T_B and T_Sp, TYPE 4: T_B,T_Sn, and T_Sp all show better performance compared to network TYPE 1: with only a single task T_B, which demonstrates that the MTL approach has a more powerful capability to improve the mass model, and TYPE 3 can obtain the best RMS for the binding energy E_MTL not only in training and validation but also in the testing process. However, it can also be observed that when more tasks are added, the learning performance of the network may worsen. In TYPE 4, the RMS of the network with more tasks T_B,T_Sn, and T_Sp are even larger than the RMS in TYPE 2 and TYPE 3. This is called “negative transfer” in neural network training, which may be caused by the inner contradiction of experimental information from T_Sn, T_Sp, and T_B in the task inputs.

The prediction power of the MTL model is also verified in the testing part in Table 1. For the randomly selected 95 nuclei, the results predicted by the current multitask network performed similarly to the training and validation. Moreover, when we repeated the experiment by changing the 95 test sets, the change in Table 1 could be ignored because of its small percentage. In conclusion, TPYE 3 can suitably improve the current mass model more than other models.

To investigate further, we compare S_n and S_p with the related experimental data in Figs. 3, 4, 5, 6, 7, 8, 9, 10 for the selected nuclear chains Z=8,22,61,84 and N = 8,22,61,84, and the absolute deviations between the calculations and experimental data for δ_Sn and δ_Sp are plotted for each concerned nucleus. From these figures for each nucleus, the model description of S_n and S_p can be observed are satisfying, and it can also be confirmed that all current MTL networks can better describe the nuclear mass compared to STL. The four types of MTLs almost fit the experimental data analogously, although the global RMS of E_MTL, S_n, and S_p testing shows that the TPYE 3 task group (T_B and S_p) are the best choices.

Fig. 3

Left panel: the single neutron separation energy from the results of different networks and experimental value for Z=8. Right panel: S_n error values for the corresponding nuclei on the right.

Fig. 4

the same as Fig. 3 but for Z=22.

Fig. 5

the same as Fig. 3 but for Z=61.

Fig. 6

the same as Fig. 3 but for Z=84.

Fig. 7

Left panel: the single proton separation energy from the results of different networks and experimental value for N=8. Right panel: S_p error values for the corresponding nuclei on the right.

Fig. 8

the same as Fig. 7 but for N=22.

Fig. 9

the same as Fig. 7 but for N=61.

Fig. 10

the same as Fig. 7 but for N=84.

For different nuclear mass regions, it can observe that the prediction ability for the corresponding nuclei is significantly improved with increasing Z and N. The absolute values of δ_Sn and δ_Sp are varied from approximately 1.0 MeV for the very light nuclei (Z=8, N=8) to approximately 0.2 MeV for the heavy nuclei, which illustrates the obvious better fittings for the heavier mass region.

In addition, the current predictions of the network are significantly influenced by the status of the reported data in AME2020. For example, in the case of Z=61, some large vibrations occur abnormally within the N=90∼93 scope because the regulated patterns of the experimental data of N=90∼93 visibly deviate from those of other neighbor nuclei. We also investigated the reported errors in this mass region. As observed, the current predictions from MTL are populated beyond the experimental error band; that is, the data for Z=61 and N=90∼93 are recommended in AME2020 as 5.604±0.02 MeV, 7.860±0.02 MeV, 5.939±0.03 MeV, and 7.465 ±0.03 MeV; however, the deviations of our MTL-related predictions all reach approximately 0.4 MeV shown in Fig. 5. These large inconsistencies between the experimental data and model predictions require more attention to investigate the correctness of the measured points and our models in the future.

Conclusion

In summary, a newly designed MTL-ANN method was introduced to the global macroscopic-microscopic mass model. This method has been proven to increase the accuracy of mass models and effectively reduce the risk of network overfitting.

Five essential nuclear properties related to the neutron number, mass number, near magic number, and pairing $(Z, A, | Z - Z_{0} |, | N - N_{0} |, δ_{np})$ were adopted as inputs to involve the nuclear shell and odd-even information in the present model. Three types of multi-task networks related to the nuclear binding energy, S_n and S_p, are systematically investigated, and 2095 nuclei in AME2020 with the full nuclear properties above are selected in the network. All three designed multitask networks can describe the experimental data of the nuclear binding energy, S_n and Sp analogously from the light to the heavier nuclei. The global RMS deviations of the binding energy of the LDM can be significantly reduced by MTL-ANN, and MTL-ANN under the (T_B,T_Sp) task appears to be a better choice for the others. Moreover, compared to the STL method, significant improvements can be observed in the training and validation processes, even in the testing process, where the reduction in deviations can reach 30∼40%.

All of these excellent results verify the impressive prediction capability of the MTL-ANN mass model, which implies good predictive performance in the known nuclear region. Moreover, it can provide important hints to examine the correctness of the experimental data available in the future.

References

[1]

T.-L. Zhao, X.-J. Bao, H.-F. Zhang,

Improved macroscopic microscopic mass formula

. Chinese Physics C 45(7), 074108 (2021). doi: 10.1088/1674-1137/abfaf2