Estimation of Gaussian overlapping nuclear pulse parameters based on a deep learning LSTM model

NUCLEAR ELECTRONICS AND INSTRUMENTATION

Estimation of Gaussian overlapping nuclear pulse parameters based on a deep learning LSTM model

Xing-Ke Ma，

Hong-Quan Huang，

Qian-Cheng Wang，

Jing Zhao，

Fei Yang，

Kai-Ming Jiang，

Wei-Cheng Ding，

Wei Zhou

Nuclear Science and Techniques

Vol.30, No.11

Article number 171

Published in print 01 Nov 2019

Available online 29 Oct 2019

DOI：10.1007/s41365-019-0691-2

84804

A Long Short-Term Memory (LSTM) neural network has excellent learning ability applicable to time series’ of nuclear pulse signals. It can accurately estimate parameters associated with amplitude, time, and so on, in digitally shaped nuclear pulse signals—especially signals from overlapping pulses. By learning the mapping relationship between Gaussian overlapping pulses after digital shaping and exponential pulses before shaping, the shaping parameters of the overlapping exponential nuclear pulses can be estimated using the LSTM model. Firstly, the Gaussian overlapping nuclear pulse (ONP) parameters which need to be estimated received Gaussian digital shaping treatment, after superposition by multiple exponential nuclear pulses. Secondly, a dataset containing multiple samples was produced, each containing a sequence of sample values from Gaussian ONP, after digital shaping, and a set of shaping parameters from exponential pulses before digital shaping. Thirdly, the Training Set in the dataset was used to train the LSTM model. From these data sets, the values sampled from the Gaussian ONP were used as the input data for the LSTM model, and the pulse parameters estimated by the current LSTM model were calculated by forward propagation. Next, the loss function was used to calculate the loss value between the network-estimated pulse parameters and the actual pulse parameters. Then, a gradient-based optimization algorithm was applied, to feedback the loss value and the gradient of the loss function to the neural network, to update the weight of the LSTM model, thereby achieving the purpose of training the network. Finally, the sampled value of the Gaussian ONP for which the shaping parameters needed to be estimated was used as the input data for the LSTM model. After this, the LSTM model produced the required nuclear pulse parameter set. In summary, experimental results showed that the proposed method overcame the defect of local convergence encountered in traditional methods, and could accurately extract parameters from multiple, severely overlapping Gaussian pulses, to achieve optimal estimation of nuclear pulse parameters in the global sense. These results support the conclusion that this is a good method for estimating nuclear pulse parameters.

Nuclear pulsesS-K digital shapingDeep learningLSTM

1. Introduction

The parameter estimation problem associated with nuclear pulse signals is an important aspect of radioactivity measurement. Goulding ^[1], Gerardi ^[2], Noulis ^[3] et al. pointed out that Gaussian pulse signals have good performance in improving the signal–noise ratio and in energy resolution. Therefore, the Gaussian pulse signal is usually used as the signal after the nuclear pulse signal is shaped, however, the real Gaussian pulse form has an anti-causal part, making it difficult to implement analog systems ^[4]. With the development of digital signal processing methods and technologies, nuclear pulse digital formation technology has become widely used. Chen Shi-Guo ^[5] et al. proposed the recursive implementation of Gaussian pulse-shaping, based on wavelet analysis, while Kaiming Jiang ^[6] et al. proposed a pulse parameter extraction method, based on Sally–Key (S–K) digital shaping and a population technique. Hongquan Huang ^[7] et al. used a genetic algorithm to estimate S–K, Gaussian-shaped, overlapping pulse signal parameters.

Pulse signals have complex and varied characteristics, however, which means very large amounts of data are required for pulse parameter extraction. In addition, due to the simplicity of its mathematical model, the extraction efficiency and accuracy of the traditional search algorithm become rapidly reduced once the number of overlapping pulses increases, or the level of overlap deepens.

In recent years, deep learning technology has been continuously developing ^[8]. It has hidden layers that contain many nonlinear transformation structures, so its ability in fitting complex models by training more data is enhanced ^[9-12]. As part of the continuing development of this field, the recurrent neural network (RNN) method has been proven effective in processing time series problems ^[13-15]. Unfortunately, RNN may experience gradient disappearance, or gradient explosion, during network model training, however, this can be solved by a Long Short-Term Memory (LSTM) neural network replacing each hidden layer in the RNN with a memory cell composed of three control unit gates. According to the characteristics of deep learning, a multi-layer LSTM model, composed of multiple LSTM models, can map the abstract features of data to the network layer in higher dimensions, giving it more powerful learning and expressive abilities, for nonlinear sequences ^[16-18].

At present, research related to introducing deep learning technology into nuclear pulse parameter extraction is still at the preliminary stage, making it urgent to introduce the new, deep learning technology into the field. In this paper, continuous pulse signals have been discretized, and then classical S-K Gaussian shaping has been used to form discrete exponential pulses, in a dataset with the characteristics of a time series. Then, based on the characteristics of Gaussian overlapping pulses after shaping, an efficient and stable LSTM model for parameter extraction of Gaussian overlapping nuclear pulses (ONP) was explored.

Our experimental results showed that the proposed method could effectively overcome the difficulty of extracting parameters from noise-containing overlapping pulses. The extracted parameters have shown high precision and demonstrate the good performance of the proposed method for estimating pulse parameters.

2. Principles and Algorithms

2.1 A shaping model for nuclear pulses

The transmission process for nuclear signals in the detection channel is characterized by pulses that can exhibit exponential, double exponential, triangular, step, trapezoidal or Gaussian form. The following is an example of extracting overlapping pulse parameters after S-K Gaussian shaping. The input S-K signals were multiple exponential pulses.

2.1.1 Superposition model of multiple exponential pulses

For overlapping pulses formed by the superposition of N exponentially decaying nuclear pulses, the mathematical model is as follows:

V_{e} (t) = \sum_{i = 1}^{N} [u (t - T_{i}) A_{i} e^{\frac{- (t - T_{i})}{τ}}] + v (t) .

(1)

In Eq. (1), u(t) represents the step signal, $A_{i}$ is the amplitude coefficient of the i^th nuclear pulse, $T_{i}$ represents the occurrence time of the i^th nuclear pulse, $τ$ is the time constant, and $v_{t}$ represents noise. Discretization is performed with the sampling period $T_{s}$ , and the discretized exponential pulse is described as shown in Eq. (2):

V_{e} (m T_{s}) = \sum_{i = 1}^{N} [u (m T_{s} - T_{i}) A_{i} e^{\frac{- (m T_{s} - T_{i})}{τ}}] + v (m T_{s}) .

(2)

2.1.2 S-K Gaussian shaping

S–K is a common Gaussian shaping circuit. Assuming that the resistance of the circuit is R, and the capacitance is C. The parameter is $K = R C / T_{s}$ , and the digital Gaussian shaping method for exponentially overlapping nuclear pulses can be described as shown in Eq. (3):

V_{o} (m T_{s}) = \frac{[(K + 2 K^{2}) V_{o} ((m - 1) T_{s}) - K^{2} V_{o} ((m - 2) T_{s}) + 2 V_{e} (m T_{s})]}{1 + K + K^{2}} .

(3)

From the above, Eq. (4) can be obtained:

V_{o} (m T_{s}) = \frac{{(K + 2 K^{2}) V_{o} ((m - 1) T_{s}) - K^{2} V_{o} ((m - 2) T_{s}) + 2 \sum_{i = 1}^{N} [u (m T_{s} - T_{i}) A_{i} e^{\frac{- (m T_{s} - T_{i})}{τ}}] + 2 v (m T_{s})}}{1 + K + K^{2}}

(4)

In this formula, $V_{o} (m T_{s})$ is the signal of the overlapping pulses, after S-K digital Gaussian shaping, and $v (m T_{s})$ is noise. Parameter m is the number of sampling values, the total number of which is M, and m = 1, 2, 3,..., M.

2.2 Theories and techniques relevant to the LSTM model

The LSTM model includes forward propagation, back propagation through time (BPTT), and the Adam parameter optimization algorithm ^[19,20]. For a given sequence, a standard LSTM model is applied, and hidden layer and output sequences can be iterated through structures such as a forget gate, input gate, candidate information gate, and output gate. The mathematical models for these structures are shown in Eqs. (5)–(10):

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}),

(5)

g_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}),

(6)

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}),

(7)

C_{t} = f_{t} \cdot C_{t - 1} + g_{t} \cdot {\tilde{C}}_{t},

(8)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}),

(9)

h_{t} = o_{t} \tanh (C_{t}),

(10)

where $g_{t}$ , $f_{t}$ , $C_{t}$ , and $o_{t}$ represent the input gate, forget gate, cell state and output gate at the current time, respectively; $h_{t}$ is hidden state information, and ${\tilde{C}}_{t}$ represents the candidate information vector. $C_{t - 1}$ represents the cell state value for the previous moment, and $h_{t - 1}$ is the hidden state information of the previous moment. W and b represent the weight and bias of different gate functions, respectively.

The LSTM training process uses the BPTT algorithm, which consists of four main steps. First, the output value of each LSTM memory cell is calculated by forward propagation; secondly, the error term of the memory cell is calculated backwards; then, the gradient of each weight is calculated according to the error term, before, finally, the gradient-based optimization algorithm is applied to update the weight. Common gradient optimization algorithms include the stochastic gradient descent (SGD) ^[21], adaptive gradient (AdaGrad) ^[22], root mean square propagation (RMSProp) ^[23], and adaptive momentum estimation (Adam) ^[24] algorithms.

2.3 Parameter estimation for overlapping pulses

Overlapping pulse parameter estimation, after Gaussian shaping, mainly involves production of the data set, forward and back propagation training, based on the BPTT algorithm, and preservation of the model after training completion.

2.3.1 Data set production

Make a data set with n samples. The matrix representation of the data set is as follows:

[\begin{matrix} {[V_{o} (T_{s})]}_{1} & {[V_{o} (2 T_{s})]}_{1} & \dots & {[V_{o} (M T_{s})]}_{1} & θ_{1} \\ {[V_{o} (T_{s})]}_{2} & {[V_{o} (2 T_{s})]}_{2} & \dots & {[V_{o} (M T_{s})]}_{2} & θ_{2} \\ ⋮ & ⋮ & ⋱ & ⋮ & ⋮ \\ {[V_{o} (T_{s})]}_{n} & {[V_{o} (2 T_{s})]}_{n} & \dots & {[V_{o} (M T_{s})]}_{n} & θ_{n} \end{matrix}]

(11)

Each row in Eq. (11) represented data from one sample, with the first M data from each sample being the sampled values of V_o(mT_s). V_o(mT_s) is the signal from the overlapping pulses after S–K digital Gaussian shaping, corresponding to this sample—and so V_o(mT_s) is also known as the Gaussian overlapping pulses signal. The parameters of the input signal V_e(mT_s) were set to $A_{i} (i = 1, 2, \dots, N)$ , $T_{i} (i = 1, 2, \dots, N)$ , K, and τ. These parameters were constituted as a set of parameters for the sample θ_i, that is $θ_{i} = [A_{1}, A_{2}, ..., A_{N}, T_{1}, T_{2}, ..., T_{N}, τ, K]$ . For example, the set of parameters corresponding to the i^th sample was described as θ_i, and the sampling values of its Gaussian ONP were [V_o(T_s)]_i, [V_o(2T_s)]_i, [V_o(3T_s)]_i, …, and [V_o(MT_s)]_i.

Next, the data set was divided into the Training Set, the Test Set, and the Validation Set, according to a certain ratio. The Training Set was used for training the LSTM model, while the Test Set was used to verify the generalization ability of the model, after completion of its model training. The Validation Set was then used to verify whether the model for training completion suffered from the over-fitting phenomenon. The reason for this was that in an over-fitting situation, if the model loss value for the training data was small, the prediction accuracy was higher, but when the loss function was larger, the prediction accuracy on the validation data became low, which would cause the trained model to lose the ability to generalize.

Traditional machine learning models have often used the L1 and L2 regularization method to modify the loss function. However, for large deep neural networks, modification of the loss function alone cannot meet the actual needs, so in order to solve this difficulty, the Dropout algorithm was used in our study ^[25-27]. If the Dropout algorithm is used in the forward propagation process, a certain number of memory cells in a complex neural network will stop processing sequence information with a certain probability. Thus, the training process is carried out on the LSTM architecture of different combinations. This method can reduce the dependence of the neural network on certain local features, enhance the generalization ability of the LSTM model, and finally achieve the purpose of improving the performance of the neural network. The mathematical model for Dropout algorithm is as shown in Eqs 9120 and (13):

r_{j}^{(l)} \sim B e r n o u l l i (p)

(12)

{\tilde{y}}^{(l + 1)} = r^{(l)} y^{(l)}

(13)

where the p is the probability that the LSTM memory cells stop propagation. The function $r_{j}^{(l)}$ is the retention probability of state information for the j^th LSTM memory cell of the l^th layer neural network, and obeys the Bernoulli distribution. Parameter $y^{(l)}$ is the output information for the l^th layer neural network, and ${\tilde{y}}^{(l + 1)}$ is the input information for the l+1^th layer neural network.

2.3.2 Forward propagation calculation of the pulses sampling value sequence

The forward propagation calculation process for the pulses sampling value sequence involves using the pulses sampling value sequence [V_o(mT_s)]_i in the training set as the input data, and, after iteration through the multi-layer LSTM model, the process finally delivers the nuclear pulses parameter set $θ_{i}^{'}$ , which was estimated by the current neural network. In order to ensure that the sequence information from each time unit can be fully utilized by the network, the number of first-layer memory cells of the LSTM model was made equal to the number M of the nuclear pulse sampling values, [V_o(mT_s)]_i. Next, the pulses signal [V_o(mT_s)]_i in the training set was substituted into Eqs. (10)–(17) to obtain hidden state information, $h_{m}$ , and cell state information, $C_{m}$ . Therefore, the mathematical models for the forget gate, the input gate, the output gate and the memory cell included in the forward propagation process of the single-layer LSTM nuclear pulses parameter extraction were as shown in the following sub-sections.

Calculation of the forget gate structure

The forget gate structure could determine the retention probability of memory cell state information, and was calculated as shown in Eq. (14):

\begin{matrix} f_{m} = σ (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{f} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m}^{f} h_{m - 1} + b_{i}^{f}) \\ = \frac{1}{1 + e^{- (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{f} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m}^{f} h_{m - 1} + b_{i}^{f})}} \end{matrix}

(14)

In (14), hm-1 is the hidden state information of the previous memory cell. The functions $U_{i, m}^{f}$ and $W_{i, m}^{f}$ are the input weights and cyclic weights of the m^th sampling values [V_o(mT_s)]_i in the i^th sample in the forgotten gate structure, respectively. The $b_{i}^{f}$ function represents the bias of the i^th sample in the forgotten gate structure. The function σ is the gate function, which is composed of a sigmoid function. This function can be used to output a value between 0 and 1, to determine the retention probability of state information.

Calculation of the input gate structure

The input gate structure is used to calculate new state information inside the memory cell, and its structure is similar to that of the forget gate. The weight and offset parameters are U^g, W^g, and b^g, respectively, and its mathematical model is described by Eq. (15).

\begin{matrix} g_{m} = σ (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{g} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{g} h_{m - 1} + b_{i}^{g}) \\ = \frac{1}{1 + e^{- (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{g} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{g} h_{m - 1} + b_{i}^{g})}} \end{matrix}

(15)

In (15), $U_{i, m}^{g}$ and $W_{i, m}^{g}$ are the input weights and cyclic weights, respectively, of the m^th sampling values [V_o(mT_s)]_i in the i^th sample in the input gate structure. The function $b_{i}^{f}$ is the bias of the i^th sample in the input gate structure.

Status update of the memory cell

First, the candidate information vector ${\tilde{C}}_{m}$ was created, using the tanh function. Then, the forget gate information, the previous memory cell state information, the input gate information, and the candidate information vector were used as update elements for the current memory cell state information. Therefore, the mathematical model of the status update information was as shown in Eq. (16).

\begin{matrix} {\tilde{C}}_{m} = \tanh (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{C} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{C} h_{m - 1} + b_{i}^{C}) \\ = \frac{1 - e^{- 2 (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{C} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{C} h_{m - 1} + b_{i}^{C})}}{1 + e^{- 2 (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{C} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{C} h_{m - 1} + b_{i}^{C})}} \end{matrix}

(16)

C_{m} = f_{m} C_{m - 1} + g_{m} {\tilde{C}}_{m}

(17)

In these equations, C_m represents the status value of the memory cell at the current moment, while f_m represents the output value of the forget gate. C_m-1 represents the status value of the memory cell at the previous moment, g_m represents the output value of the input gate, and ${\tilde{C}}_{m}$ represents the candidate vector. $U_{i, m}^{C}$ and $W_{i, m}^{C}$ are the input weights and cyclic weights of the m^th sampling values [V_o(mT_s)]_i in the i^th sample in the input gate structure, respectively. Function $b_{i}^{C}$ is the bias of the i^th sample in the input gate structure.

Calculation of the output gate structure

The output gate structure determines the hidden state information, h_m. At first, the vector containing the hidden state information h_m-1, from the previous memory cell, and the vector containing the current pulse sequence information [V_o(mT_s)]_I, were calculated, using the sigmoid function. Then the cell state information C_m of the memory cell was calculated, using the tanh function. Next, the output value of the tanh function was multiplied with the output value, o_m, of the sigmoid function to determine the hidden state information, h_m. Finally, it was necessary to transmit the hidden state information, h_m, to the next layer of the network, and to transmit the information, h_m, of the hidden state and the state information C_m of the memory unit to the next memory cell of the same layer.

Therefore, the mathematical model of the output gate was as shown in Eq. (18) and (19):

\begin{matrix} o_{m} = σ (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{o} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{o} h_{m - 1} + b_{i}^{o}) \\ = \frac{1}{1 + e^{- (\sum_{m = 1}^{K + 2 + n_{c}} U_{i, m}^{o} {[V_{o} (m T_{s})]}_{i} + \sum_{m = 2}^{K + 2 + n_{c}} W_{i, m - 1}^{o} h_{m - 1} + b_{i}^{o})}} \end{matrix}

(18)

\begin{matrix} h_{m} = o_{m} \cdot \tanh (C_{m}) \\ = o_{m} \cdot \frac{1 - e^{- 2 C_{m}}}{1 + e^{- 2 C_{m}}} \end{matrix}

(19)

In this mathematical model, $U_{i, m}^{o}$ and $W_{i, m}^{o}$ are the input weights and cyclic weights, respectively, for the m^th sampling values [V_o(mT_s)]_i, in the i^th sample in the output gate structure, respectively. Function $b_{i}^{C}$ is the bias of the i^th sample in the output gate structure.

The ONP parameter estimation model for the multi-layer LSTM neural network could be considered as a stack of multiple, single-layer LSTM models. The hidden state information, h_m, was used as the transmission information between each neural network layer. Therefore, the extracted abstract information was passed between each network layer until the last layer of the LSTM network estimated the parameter set for the ONP, and forward propagation finished.

2.3.3 Back propagation training process for nuclear pulse sequences

Because the weights and bias of each LSTM memory cell are randomly assigned when defining a neural network, a loss function also needs to be designed. With this loss function, the error between the estimated pulse parameters $θ_{i}^{'}$ in the forward propagation and the true pulse parameters $θ_{i}$ in the data set can be calculated. At the same time, because this error is calculated from the loss function, it is also called the loss value.

The Gaussian ONP parameters estimation model proposed in this paper has solved the problem of estimating specific values. This problem belonged in the category of regression models, so it is more appropriate to use the mean square error (MSE) function as the loss function. For the Training Set with q samples, the error value between the estimated pulse parameter set $θ_{i}^{'}$ and the true parameter set $θ_{i}$ was calculated using the MSE function, and the error value was taken as the loss value $L o s s_{M S E}$ . The calculation formula for the loss function was therefore as shown in Eq. (20):

L o s s_{MSE} = \frac{1}{q} \sum_{i = 1}^{q} {(θ_{i} - θ_{i}^{'})}^{2} .

(20)

Then, using the Adam^[24] algorithm based on gradient optimization, the loss value and the gradient of the loss function were fed back to the network to update the weight, and as the weights were updated in backpropagation, the set of nuclear pulse parameters estimated in the next forward propagation process would be more accurate. By analogy, with the parameter set of the nuclear pulses estimated by forward propagations, and by correcting the weight of the LSTM network model by backpropagation, the loss value between the ONP parameter set estimated by the LSTM model and the true ONP set gradually decreased, thereby achieving the purpose of training the network.

Finally, in order to improve training efficiency and avoid LSTM model loss value oscillation in the later stages of training, a method was needed to determine the number of rounds after which the model stopped training. Because the Mean Absolute Error (MAE) function—as shown in Eq. (21)—has the property of avoiding mutual offsetting between deviations, it was used to determine the number of algorithm training rounds.

M A E = \frac{1}{q} \sum_{i = 1}^{q} | θ_{i} - θ_{i}^{'} |

(21)

In use, the technique applied was to set a threshold based on the actual situation, and then when the MAE was less than that threshold, the training ended. At this time, the data in the Test Set was input into the model to test the generalization ability of the ONP parameter estimation model.

2.3.4 Saving the model after training

After the LSTM model with the ability to estimate the Gaussian ONP parameter set θ was trained, important information such as the model structure, weights, training configuration, and optimizer status of the training were saved as Hierarchical Data Format 5 (HDF5) files. When it becomes necessary to estimate ONP in the future, first, let the program load the trained model through the HDF5 file; next, the sampled value of the Gaussian ONP for which the shaping parameters need to be estimated is used as the input data of the LSTM model; finally, the LSTM model outputs the required set of nuclear pulses parameters. Therefore, the structure diagram of the Gaussian ONP parameter estimation algorithm, based on the deep learning LSTM model, is as shown in Fig. 1.

Fig. 1

Overlapping nuclear pulses parameter extraction framework, based on multi-layer LSTM

3. Experimental verification and discussion

In order to verify the feasibility of the proposed method in estimating ONP parameters, three examples have been prepared and reviewed in this paper. First, Example 1 is a contrast experiment, in which the operation of the proposed method is compared with the estimation of nuclear pulse parameters using the traditional optimization algorithm proposed by Hong-Quan Huang, Xiao-Feng Yang et al ^[7].

Second, as the time interval between adjacent exponential pulses becomes smaller, their overlap becomes more severe, in a scenario that will further increase the difficulty of estimating nuclear pulse parameters. Therefore, Example 2 was used to verify the effect of the proposed algorithm on the estimation of ONP parameters when the time interval between adjacent nuclear pulses was short.

Finally, considering the difficulty of estimating nuclear pulse parameters caused by the overlap of multiple pulses, Example 3 involved designing a parameter extraction experiment using nine ONP. This example has been used to verify the parameter extraction effect of the algorithm under multiple nuclear pulse overlap conditions.

In addition, in order to ensure the accuracy of the calculation under the premise of maximizing computational efficiency improvement, an adaptive algorithm was designed, to determine the number M of sampled values of the ONP V_o(mT_s), and to determine the number of LSTM model neural network layers. Thus, for the ONP V_o(mT_s), if the k^th nuclear pulses sampling value V_o(kT_s) satisfies the condition expressed in Eq. (22), the number M of sampled values of the ONP V_o(mT_s) could be obtained by applying Eq. (23).

V_{o} (k T_{s}) \leq 0 < V_{o} [(k - 1) T_{s}]

(22)

M = ⌈ 1.2 k ⌉

(23)

Combined with the above conditions, the number of LSTM model neural network layers could be determined through applying Eq. (24):

{\begin{cases} 2^{s - 1} \leq M < 2^{s} = C_{num 1} \\ C_{num ξ} = ξ C_{num 1} \end{cases}

(24)

In Eq.(24), s is an arbitrary positive integer, C_num1 is the number of memory cells in the first layer, $C_{num ξ}$ is the number of memory cells in the last layer, and $ξ$ is the number of network layers. However, subject to hardware performance constraints, the number of network layers involved in this article did not exceed three, and the advantage of this structure type was that it could retain ONP sequence information to the maximum extent, and could minimize the scale of the network, allowing the adaptive algorithm mentioned above to effectively improve hardware computing efficiency.

Example 1

Exponential pulses Input 1, Input 2, Input 3, and Input 4 were inputted into an S–K shaping circuit, with the characteristic time of $τ$ = 100 ns. Values for the amplitude parameter, $A_{i}$ , were 300, 150, 200, and 250 Counts, and time parameter, $T_{i}$ , values were 1, 100, 200, and 300, respectively. The white noise standard deviation was 5 Counts, and the sampling period, T_s, was 5 ns. Taking RC = 250 ns, parameter K of S-K digital Gaussian pulses shaping algorithm was K = RC / T_s = 50. By applying these conditions into Eqs. (1)–(4), it was found that k = 488. Then, by applying this k value into Eqs. (22)–(24), the number $ξ$ of layers of the network and the number C_num1 of memory cells in the first layer were calculated, where the M equaled 586, s = 10, C_num1 = 1024, C_num2 = 2048, and C_num3 = 3072. The Adam algorithm was used to update the weights, and the parameters of the Adam optimizer were set to $β_{1}$ = 0.9, and $β_{2}$ = 0.999. In addition, the number of pre-training rounds (named epochs) was set to 20 and stopped the next round of training when the MAE value was less than three. At the same time, each training round randomly generated 1000 sample sets, and these were divided into the Training Set, the Test Set, and the Validation Set, in the ratio of 8:1:1. In addition, groups of 10 samples were gathered into a batch of input networks (batch size = 10). The changes in the network loss and MAE values achieved through training, for the Training Set and the Validation Set, are shown in Fig. 2.

Fig. 2

Changes in loss and MAE on training and validation data achieved through training

As shown in Fig. 2, the loss value of the model in both the Training Set and the Validation Set decreased monotonically, showing that there was no over fitting phenomenon, and, in a saving on computational costs, this example did not require use of the Dropout algorithm. Finally, the corresponding parameters and errors of the Gaussian ONP estimated by the deep learning LSTM model and the traditional optimization algorithm that Huang ^[7] et al. used are shown in Table 1.

Comparison between values obtained using the population search method, the deep learning LSTM model, and true values (

τ

= 100 ns; K = 50)

	A₁	A₂	A₃	A₄	T₁	T₂	T₃	T₄
True value	300	150	200	250	1	100	200	300
Calculated value (Huang)	305.62	150.85	189.33	256.69	1.17	99.52	198.63	299.9
Calculated LSTM value	294.44	150.06	199.95	248.92	0.9995	100.1	199.89	294.29
Error (Huang)	5.62(1.87%)	0.85(0.57%)	10.67(5.34%)	6.69(2.68%)	0.17	0.48	1.37	0.1
Error (LSTM)	5.56(1.85%)	0.06(0.04%)	0.05(0.03%)	1.08(0.43%)	0.0005	0.1	0.11	5.71

Bold value indicates a smaller error value in the comparison experiment. Used to compare the accuracy of the estimation of two experimental parameters

Figure 3 shows the results for nuclear pulse parameter estimation based on the LSTM model. Here, Figure 3a shows the exponential pulses before shaping, the Gaussian pulses after shaping, and the Gaussian ONP obtained using the LSTM parameter estimation method. Figure 3b shows the true exponential pulses and the exponential pulses obtained using the LSTM model, while Fig. 3c shows both the true Gaussian pulses and the Gaussian pulses whose parameters were estimated using the LSTM model.

Fig. 3

Results of pulse parameter estimation based on deep learning LSTM model. a: Calculated values: exponential signal (Input x’), Gaussian-shaped signal (Output x’) and overlapping pulses (Output 1’+ Output 2’+ Output 3’+ Output 4’), b: Exponential signal true (Input x) and calculated (Input x’) values, c: Gaussian-shaped signal true (Output x) and calculated (Output x’) values

From the experimental results, the relative errors for amplitude parameter, $A_{i}$ , were 1.85%, 0.04%, 0.03%, 0.43%, and the absolute errors for time parameter, $T_{i}$ , were 0.0005, 0.1, 0.11, and 5.71. This showed that the deep learning LSTM model provided in this paper has good parameter estimation ability when addressing the pulse parameter estimation problem, and that the accuracy of the estimation parameters had been improved to some extent, in comparison to the traditional optimization algorithm.

Example 2

When the time interval of the adjacent exponential nuclear pulses is short, the overlap between the pulses will be more serious, making it very difficult to estimate ONP parameters. The purpose of this example was to verify the ability of the proposed method to estimate ONP parameters when the time interval between adjacent exponential nuclear pulses was small.

Exponential pulses Input 1, Input 2, Input 3, and Input 4 were inputted into an S–K shaping circuit, with the characteristic time of $τ$ = 100 ns. The amplitude parameter, $A_{i}$ , values were 300, 150, 200, and 250 Counts, and time parameter, $T_{i}$ , values were 1, 100, 150, and 200 respectively. The white noise standard deviation was 5 Counts, and the sampling period, T_s, was 5 ns. Taking RC = 250 ns, parameter K for the S-K digital Gaussian pulses shaping algorithm was K = RC / T_s = 50. By bringing these conditions into equations (1)–(4), k = 382 was calculated, which was then inputted into equations (22)–(24), so that the number, $ξ$ , of network layers, and the number, C_num1, of memory cells in the first layer could be calculated, where M = 457, s = 9, C_num1 = 512, C_num2 = 1024, and C_num3 = 1536. The Adam algorithm was used to update the weights, and the parameters of the Adam optimizer were set at $β_{1}$ =0.9, and $β_{2}$ = 0.999. In addition, the number of pre-training rounds, named epochs, was set to 10, and the next training round was stopped when the value of MAE was less than three. At the same time, in each training round, 1000 sample sets were randomly generated, and were divided into a Training Set, Test Set, and Validation Set, in the ratio of 8:1:1. Furthermore, each 2 samples were composed into a batch of input networks (batch size = 2). The final parameter estimation results and related images can be seen in Table 2 and Fig. 4, respectively.

Comparison between values calculated with the deep learning LSTM model and true values (

τ

=100 ns; K = 50)

Items	A₁	A₂	A₃	A₄	T₁	T₂	T₃	T₄
True value	300	150	200	250	1	100	150	200
Calculated value	300.13	150.72	199.29	250.23	0.997	98.95	149.29	199.34
Error	0.13(0.043%)	0.72(0.48%)	0.71(0.36%)	0.23(0.09%)	0.003	1.05	0.71	0.66

Fig.4

Results of pulse parameter estimation based on the deep learning LSTM model. a: Calculated values for the exponential signal (Input x’), Gaussian-shaped signal (Output x’), and overlapping pulses (Output 1’+ Output 2’+ Output 3’+ Output 4’), b: Exponential signal true (Input x) and calculated (Input x’) values, c: Gaussian-shaped signal true (Output x) and calculated (Output x’) values

According to the experimental results, the relative amplitude parameter ( $A_{i}$ ) errors were 0.043%, 0.48%, 0.36%, and 0.09%, and the absolute time parameter ( $T_{i}$ ) errors were 0.003, 1.05, 0.71, and 0.66, respectively. Therefore, even when the time interval between adjacent exponential pulses was short, the ONP shaping parameters estimated by the LSTM model still showed high precision.

Example 3

To test the LSTM model's ability to estimate the parameters of multiple ONP, this example investigated LSTM model parameter estimation for nine overlapping pulses. Exponential pulses Input 1, Input 2, Input 3, Input 4, Input 5, Input 6, Input 7, Input 8, and Input 9 were inputted into an S–K shaping circuit, with the characteristic time of $τ$ = 100 ns. Values for the amplitude parameter, $A_{i}$ , were 300, 150, 200, 250, 650, 100, 550, 350, and 50 Counts, while values for the time parameter, $T_{i}$ , were 1, 100, 200, 300, 400, 500, 600, 700, and 800 respectively. The white noise standard deviation was 5 Counts, and the sampling period, T_s, was 5 ns.

Taking RC = 250 ns, the parameter K for the S–K digital Gaussian pulses shaping algorithm was K = RC / T_s = 50. Bringing the above conditions into Eqs. (1)–(4), it was calculated that k = 382, which, when inputted into Eqs. (22)–(24), allowed the number of network layers, $ξ$ , and the number of memory cells in the first layer, C_num1, to be calculated. Thus it was calculated that M = 902, s = 10, C_num1 = 1024, C_num2 = 2048, and C_num3 = 3072.

The Adam algorithm was used to update the weights, and the Adam parameters were set to $β_{1}$ = 0.9, and $β_{2}$ = 0.999. In addition, the number of pre-training rounds, named epochs, was set to 30, and the next training round was stopped when MAE was less than five. At the same time, in each training round, 1000 sample sets were randomly generated, and were divided into the Training Set, the Test Set, and the Validation Set, according to the ratio of 8:1:1. In addition, due to the large amount of training data, the network was allowed to learn sample by sample (batch size = 1), in order to ensure accuracy. However, owing to the large number of pulses involved in this example, it was not effective visually to display the results. Therefore, for this example, just the final estimated parameter results from the LSTM model are shown, in Table 3.

Comparison between values calculated using the deep learning LSTM model and true values (

τ

= 100 ns; K = 50; 9 pulses)

Items	A₁	A₂	A₃	A₄	A₅	A₆	A₇	A₈	A₉	T₁	T₂	T₃	T₄	T₅	T₆	T₇	T₈	T₉
True value	300	150	200	250	650	100	550	350	50	1	100	200	300	400	500	600	700	800
Calculated value	300.44	150.86	200.54	250.62	649.9	98.07	549.72	350.14	49.11	1	100.61	199.57	300.04	400.26	500.55	600.05	700.3	799.78
Absolute error	0.44	0.86	0.54	0.62	0.1	1.93	0.28	0.14	0.89	0	0.61	0.43	0.04	0.26	0.55	0.05	0.3	0.22
Relative error %	0.15	0.57	0.27	0.248	0.015	1.93	0.051	0.04	1.78

Examining the experimental results, it can be seen that the relative amplitude parameter, $A_{i}$ , errors were 0.15%, 0.57%, 0.27%, 0.248%, 0.015%, 1.93%, 0.051%, 0.04%, and 1.78%, and the time parameter, $T_{i}$ , absolute errors were 0, 0.61, 0.43, 0.04, 0.26, 0.55, 0.05, 0.3, and 0.22. Therefore, even when the time interval between adjacent exponential pulses was short, the ONP shaping parameters estimated by the LSTM model still showed high precision. This showed that even under conditions of multiple ONP, estimating the nuclear pulse shaping parameters using the LSTM model was still an effective method.

4. Conclusion

The LSTM model in deep learning has been used to estimate parameters for ONP signals after the S–K Gaussian shaping, and the problems of the pre-shaping exponential signal and noise influence on the parameters were seen to have been satisfactorily overcome. Taking the measured ONP signal as a sample, the sampled value of the ONP signal was transmitted into the LSTM model as a separate part. The processing and transfer of sampled value information was based on the memory cell structure peculiar to the LSTM model. Finally, through continuous training, when sampled ONP values were input into the LSTM model, the LSTM model was able to estimate the shaping parameters of these nuclear pulses quickly and accurately. Because the network was learning the entire sample, all features in the sample were recorded by the network, which indicated that this method could overcome the traditional method defect of local convergence, and achieve optimal estimation of nuclear pulse parameters in the global sense. It has been demonstrated that the model proposed herein is a good method for estimating ONP parameters.

Use of full-sample learning resulted, however, in a sharp increase in the amount of data that needed to be calculated during model training, in comparison with traditional methods, and so training time for the current LSTM model was longer than that required for the traditional method. In addition, the increased size of the model, caused by the increased amount of data, meant that the currently used hardware could not explore some additional aspects.

Network structure optimization and computational efficiency improvement will become the primary features, therefore, of our forward research, in which, for example, using multiple types of deep neural networks, and re-optimizing the memory cells inside the LSTM could be explored. In addition, the examples presented here only considered parameter estimation issues for ONP based on S-K digital shaping. In future research, the problem of pulse parameter extraction in digital trapezoidal (triangular) forms could be addressed, and, the issue of baseline estimates could also be taken into account.

References

[1]

F.S. Goulding,

Pulses-shaping in low-noise nuclear amplifiers: A physical approach to noise analysis

. Nucl. Instrum. Meth. A 100, 493-504(1972). doi: 10.1016/0029-554X(72)90828-2