Introduction
Scintillation detectors are typically used for neutron detection [1-3]. In scintillation detectors, neutron and gamma-ray signals have different time decay constants, with neutron signals having longer decay times than gamma-ray signals [4]. Thus, neutrons and gamma rays have different pulse shapes, and pulse shape discrimination (PSD) [5] utilizes this difference to discriminate between the two categories. In conventional n-γ discrimination algorithms, the charge comparison method (CCM) uses the PSD factor as a discriminating index [6], which is the ratio of the tail integral of the pulse (Qtail) to the total integral (Qtotal) [7, 8]. In general, the PSD factor corresponding to neutrons is larger than that corresponding to gamma rays. The CCM is simple and easy to use; however, it is weak in case of n-γ discrimination in the low-energy range.
Currently, machine learning algorithms are widely used for n-γ discrimination [9, 10]. Machine learning algorithms include unsupervised and supervised learning algorithms. Unsupervised learning algorithms cluster data according to their distribution in the feature space, which does not rely on pre-labeled samples and can identify abnormal pulse events [11, 12]. The Gaussian mixture model (GMM), which is commonly used in unsupervised learning, has demonstrated good performance in n-γ discrimination [13, 14]. However, the direct clustering of complete pulse data by the GMM still faces challenges. First, clustering high-dimensional data directly leads to the "curse of dimensionality". Second, when performing direct clustering on massive amounts of data using the GMM, a large number of significant errors occur, with neutrons incorrectly identified as γ rays within clusters with clear boundaries. Furthermore, GMM clustering is suitable only for fixed data [15, 16].
To overcome the limitations of the GMM in n-γ discrimination, Liu et al. (2023) [14] proposed a method combining principal component analysis (PCA) and GMM clustering. They extracted three features using PCA and then applied the GMM clustering algorithm. The results indicated that the PCA-GMM provided a higher figure of merit (FOM) for n-γ discrimination than the CCM. However, this method has strict data requirements, necessitating close pulse peak positions and including only pulse tails while disregarding the differences in the overall pulse.
Wang et al. (2022) [17] proposed a method for identifying neutrons and γ rays using a small-batch GMM clustering algorithm. This method yields a higher FOM than the CCM. However, in case of a large mismatch between the test pulse and the trained model, the exponent yields a large negative value, indicating that this method is susceptible to outliers.
Supervised learning algorithms can classify or regress unknown signals; however, they depend on prior knowledge. Common supervised learning algorithms include K-nearest neighbors (KNN) [18-20], support vector machines (SVM) [21, 22], and linear discriminant analysis (LDA) [23]. The KNN algorithm performs classification or regression prediction by calculating the distance between the test data and training samples. It is a simple and portable algorithm that accurately performs classification and regression tasks based on existing samples. However, the real-time performance of the KNN algorithm remains debatable, and its stability under different conditions requires further exploration.
Durbin et al. (2021) [18] proposed a new method that uses KNN regression to improve the PSD performance. This approach enabled direct comparison with conventional PSD methods using the FOM. However, this study did not consider the runtime of the algorithm, which is a critical indicator of real-time performance [24-27].
All the studies mentioned above concentrated on using machine learning algorithms to improve the n-γ discrimination capabilities. However, they did not address the algorithmic portability or real-time performance issues. Thus, this study proposed a combined method of GMM and KNN algorithms (GMM-KNN) to overcome the limitations of a single machine learning algorithm in n-γ discrimination. The proposed method constructs a training set from unlabeled data to discriminate unknown pulses.
Method
In this study, the GMM-KNN algorithm combines asupervised learning algorithm (GMM clustering) with an unsupervised learning algorithm (KNN classification and regression). Further, the GMM-KNN achieves pulse discrimination using the LabVIEW program.
GMM Clustering
The GMM is a probabilistic model that describes a dataset comprising multiple Gaussian distributions [28]. For each Gaussian component, the probability density function is expressed as:
Neglecting pulse stacking in the n-γ discrimination, the GMM has only two components: neutrons and gamma rays. For a Gaussian mixture distribution with two mixed components, the probability density is expressed as:
The model parameters αi, μi, and ∑i must be solved by the iterative optimization of the expectation-maximum (EM) algorithm [29, 30]. Each iteration of the EM algorithm comprises two steps: step E, which involves estimating the expectation of the hidden variables based on the current parameters; and step M, which involves using the computational results from step E to update the model parameters based on maximum likelihood estimation.
A large number of errors occurred when GMM clustering was performed directly on the entire training dataset. To enhance the accuracy of GMM clustering, we divided the data into three energy ranges [17]: 0–25 keV, 25–100 keV, and 100–2100 keV. The GMM soft clustering output provides the probability that a pulse belongs to either neutrons or gamma rays. If the probability of a pulse belonging to neutrons exceeded 50%, it was classified as a neutron; otherwise, it was classified as a gamma ray. The primary objective of GMM clustering was to produce a dependable training set that is subsequently utilized by the KNN algorithm. However, this classification method may encounter ambiguous pulses, which can reduce the accuracy of KNN classification.
KNN Classification and Regression Algorithm
GMM clustering can only cluster a fixed dataset; therefore, supervised learning algorithms must be used for real-time discrimination. This study used supervised learning to accurately represent the distribution of the training set samples, which followed a Gaussian mixture distribution comprising two components. The goal of supervised learning was to ensure that the test set data exhibited classification/clustering results similar to those of the training set. Among the supervised learning algorithms, such as SVM and LDA, KNN was selected for its simplicity and ease of implementation in LabVIEW programs. The KNN algorithm is known for its simplicity and accuracy, ensuring a generalization error that is not more than twice the error rate of the Bayes optimal classifier. To optimize the KNN algorithm, it is important to determine the optimal value of K, select an appropriate distance metric, and specify the decision rule.
First, the key to the KNN algorithm is the determination of the optimal K value. When the K value is excessively large, the distant points affect the prediction results, resulting in underfitting; Conversely, if K is excessively small, the model is less tolerant of noise and prone to overfitting. In this study, the method used to determine the optimal K value was 10-fold cross validation [31]. A 10-fold cross-validation means that the training dataset D was divided into ten equal parts, nine of which were used to train the model, and the remaining 1 part of the data was used to compute the test accuracy. This process was repeated for each part of the data to obtain the average accuracy. There exists a maximum value for the results of the 10-fold cross-validation, where the accuracy value first increases with increasing K and then decreases after reaching the maximum value. The K value corresponding to the highest accuracy average is the optimal K value. The dataset D must contain a sufficient number of samples, and simultaneously, the result of the GMM clustering can obtain a reliable training set. Therefore, GMM-KNN uses the training set obtained from GMM clustering as dataset D.
In addition, the KNN algorithm must calculate the distance between an unknown pulse and each pulse in the training set (in this study, the distance is the Euclidean distance). For example, for points X and Y with n features, the distance is calculated as
Finally, the KNN algorithm outputs the classification or regression results for the pulses in the test set. For the KNN classification task, the result is the category that occurs most frequently in the nearest neighboring K instances [32]. For the KNN regression task, the result is the average value of each feature over the nearest neighboring K instances [33].
GMM-KNN Classification and Regression
Both GMM and KNN are effective methods for n-γ discrimination. GMM clustering performs well in PSD analysis. However, the results of GMM clustering are limited to the current dataset, and prior knowledge obtained from the training set cannot be directly applied to the test set. In contrast, KNN accurately captures the sample distribution and can be easily implemented on hardware, providing strong real-time performance and the potential for real-time n-γ discrimination. However, the drawback of KNN is that it requires prestoring the sample set.
In this study, the goal of GMM clustering was to construct a reliable training set, and KNN utilized this training set to discriminate unknown pulses. To integrate the GMM and KNN, we proposed improvements to both methods. When GMM clusters the data across the entire energy range, a significant number of misclassifications are obtained. Therefore, in GMM-KNN, the data were divided into three energy partitions to enhance the clustering performance. In addition, the GMM clustering results often include confusing pulses (low-probability events). Hence, we selected only the pulses with classification probabilities greater than 99% as the training set. The test set of GMM-KNN comprised a large number of unknown pulses, and each pulse must calculate its distance from all samples in the training set. Using the complete training set yielded the most accurate classification results but significantly increased the computational cost of the algorithm.
In the context of GMM-KNN classification, reducing the size of the training dataset has a minimal impact on the classification results, but significantly decreases the algorithm complexity. This facilitates a flexible selection of the sample quantity within the training set for the GMM-KNN classification. However, for GMM-KNN regression, a complete training set must be used to ensure the accuracy of the regression predictions. To assess the real-time performance of our method, a unified programming language or first-in, first-out (FIFO) data transfer with other devices must be employed. The LabVIEW program can call or directly read the acquired data, which facilitates the real-time implementation of the GMM-KNN for pulse discrimination. Using parallel computing, the LabVIEW program can concurrently calculate two decision values, resulting in simpler decision logic and faster computation. In addition, the real-time performance of the algorithm must consider the runtime and memory footprint of the trained model. Thus, this study proposed the GMM-KNN algorithm that employed only two features, Qtail and Qtotal, for both GMM clustering and KNN classification and regression. These features reduced data dimensionality in GMM clustering and significantly decreased the computational cost and memory footprint of the trained model.
A block diagram of the procedure of–GMM-KNN algorithm is shown in Fig. 1. First, the method divides the preprocessed training and test data into three parts separately, These three energy ranges are 0–25 keV, 25–100 keV, and 100–2100 keV respectively. Second, GMM takes Qtail and Qtotal as pulse features, and performs small batch clustering in three energy ranges. Consequently, this method selects a portion of the data of clustering results (probability > 99%) as the training set of KNN. Finally, GMM-KNN implements the classification and regression algorithms with LabVIEW programming, and subsequently outputs the category and regression prediction values.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F001.jpg)
Evaluation Metrics
The output of the GMM-KNN classification was binary, with zero representing gamma rays and one representing neutrons. Both pulse types exhibited an elliptical distribution in the feature space. Comparing the difference between the output and ground truth facilitated the qualitative assessment of the effectiveness of the n-γ discrimination. The outputs of the GMM-KNN regression comprised the average values of the nearest K pulses for Qtail and Qtotal, as well as their ratios. We used this ratio to calculate the FOM, where a higher FOM indicated better n-γ discrimination performance.
There is a difference in the ratio of slow charge to total charge between the two types of pulses, neutrons and gamma rays; therefore, the CCM selects the ratio of the tail integral (Qtail) to the total integral (Qtotal) as the PSD factor [34] and then calculates the FOM as the discrimination metric. In this study, Qtail and Qtotal were adjusted to obtain the best FOM. Qtail was considered as 68 ns and Qtotal as 124 ns (Fig. 2). For the GMM-KNN regression, the pulse features are the predicted regression values, thus, PSD = (Regressed Qtail) / (Regressed Qtotal). After calculating the PSD values for all pulses, the GMM-KNN regression used the updated FOM as the evaluation metric, which can use the FOM to evaluate the n-γ discrimination effectiveness further.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F002.jpg)
Two Gaussian peaks were observed in the PSD histogram after Gaussian fitting (Fig. 3), and the FOM is calculated as follows:
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F003.jpg)
LabVIEW Implementation of GMM-KNN
LabVIEW is a graphical programming language used mainly in the fields of control, measurement, and data acquisition [35]. The nuclear signal output from the detector was digitized and stored using the LabVIEW host computer program and the signals were processed using discrimination algorithms. To evaluate the real-time performance of the algorithm for data acquisition, we used the unified LabVIEW programming language. KNN classification using the LabVIEW program first presorted the training set. Consequently, the distance values were completely independent of the index values, which simplified judgment logic. The array operations in KNN regression facilitated regression values for all features to be calculated simultaneously, greatly simplifying the LabVIEW program. In the LabVIEW program for real-time n-γ discrimination, the unknown pulses were computed as arrays. In this study, the GMM-KNN classification and GMM-KNN regression algorithms were applied to the same test set for discrimination, and the output results were saved as a.csv file.
Energy division
The pulses in the test set were randomly collected in the real experiments. The pulses were divided into three parts based on three energy ranges: 0–25 keV, 25–100 keV, and 100–2100 keV. Figure 4 shows the LabVIEW program diagram for pulses division based on energy. The conditional diagram uses energy as the determinant, and the program counts the pulses in each energy range. Thereafter, by sorting the energy in ascending order and setting the index of "array subset vi" based on the count value, we can obtain three sub-arrays.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F004.jpg)
GMM-KNN classification algorithm
A program block diagram of the GMM-KNN classification is shown in Fig. 5, GMM-KNN classification to determine the category of a pulse included three steps:
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F005.jpg)
Step 1 - Training set composition: In the GMM-KNN algorithm, the training set processed by GMM clustering contained 26172 pulses (probability >99%), and the test pulses calculated the distances with each pulse of the training set. It is more computationally intensive when using the full training set directly; therefore, we sorted the data within the training set by probability value and take one sample every three values. These samples formed a 2-D array training set Mc. The columns of Mc represent the two features of a pulse, Qtail and Qtotal, and each row represents a pulse. The m rows in front of the 2-D array are the gamma rays and the remaining k rows are neutrons. The expression for Mc is as follows:
When γ3 < n3, we have
When n3 < γ3, we have
In summary, the 3rd value of each of the subsets (γ3 and n3) was the judgment value. When γ3 < n3, the pulses were determined to be gamma rays. Conversely, when n3 < γ3, the pulses were considered as neutrons. In classification, the traditional KNN algorithm must perform five judgments and five counts. However, the GMM-KNN classification requires only one judgment, which improves algorithm efficiency and reduces algorithm complexity.
GMM-KNN regression algorithm
GMM-KNN regression requires determining the nearest neighbor of the K(K=5) values and calculating the average of the five values. The algorithm does not divide the training set into two groups but requires a complete training set. Similar to the GMM-KNN classification, the implementation of the GMM-KNN regression with LabVIEW involves three steps (Fig. 6):
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F006.jpg)
Step 1: The distance values between the test pulse
Results and discussion
The neutron source used in this experiment was an 241Am-Be source, the detector was an organic liquid scintillation detector (EJ-301) [36, 37], and the digitizer was DT5730B. The detector collected current pulses, which were digitized to obtain raw data. After preprocessing steps, such as smoothing, filtering, normalization, and baseline recovery, the raw data were transformed into initial data, which were stored the initial data in the computer. The preprocessed dataset of 60,000 pulses was divided into two parts. Of these, 30,000 pulses were used for the GMM clustering to obtain a reliable training set. The remaining 30,000 pulses were reserved to test the feasibility of the GMM-KNN algorithm.
GMM Clustering
Depending on whether the probability value exceeded 50%, we classified the GMM clustering results into two categories. Figure 7(a) shows the results of direct clustering, where the red squares and blue circles represent neutrons and gamma rays, respectively. There were significant errors in the pulse discrimination, with numerous neutrons misclassified as gamma rays. Figure 7(b) shows the results of small-batch clustering in 3 energy ranges, where only a few ambiguous pulses were observed. Small-batch clustering reduced the misclassification rate and improved the effectiveness of n-γ discrimination.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F007.jpg)
Although small-batch clustering yielded better results than direct clustering, Fig. 7 exhibits the red downward protrusion at the energy segmentation boundary (25 keV and 100 keV). To obtain a reliable training set, these ambiguous pulses must be removed. Within 0–25 keV, we progressively increased the probability of pulses in the training set (Fig. 8). Figures 8(a)-(c) correspond to complete pulses, pulses with a probability above 90%, and pulses with a probability above 99%, respectively. Figure 8 demonstrates that a higher probability results in a better separation of pulses in the training set; however, it also decreases the number of pulses in 0–25 keV.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F008.jpg)
Table 1 presents the quantity distributions in the training set for different probabilities. At 25–2100 keV, there was minimal variation in the number of pulses. Most of the ambiguous pulses were concentrated at 0–25 keV. As the probability increased, the number of pulses in 0–25 keV decreased significantly. When the training set comprised pulses with a probability exceeding 99.9%, the number of pulses in the 0–25 keV range was even lower than that in the 25–2100 keV. In addition, the PSD histogram of the test set exhibited three distinct peaks and poor fitting performance at this stage (Fig. 9). Therefore, this study used a training set composed of pulses with a probability exceeding 99%.
Probability | |||||
---|---|---|---|---|---|
Energy (keV) | Original dataset | 90% | 95% | 99% | 99.9% |
0-25 | 12570 | 11002 | 10442 | 9177 | 7074 |
25-100 | 9576 | 9422 | 9357 | 9170 | 8867 |
100-2100 | 7854 | 7826 | 7810 | 7755 | 7671 |
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F009.jpg)
Figure 10 shows the training set comprising pulses with a probability above 99%. In this case, the samples with PSD values below the threshold (PSDthreshold) were removed at the energy segmentation boundaries. At this stage, the neutrons and gamma rays are completely separated, and the GMM-KNN algorithm only needed to select the remaining pulses from this portion to construct the training set. Once the training set was constructed, the GMM-KNN can flexibly select a subset of data from the training set and implement n-γ discrimination using LabVIEW program.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F010.jpg)
GMM-KNN Classification
In the context of real-time n-γ discrimination, the algorithmic efficiency is a crucial factor that is influenced by both the size of the training set and the number of pulse features.
This study investigated the time consumption and error rates of the GMM-KNN algorithm by employing average sampling techniques to select subsets of the complete training set with proportions of 1/2, 1/3, 1/4, 1/5, and 1/6 (Table 2). By comparing the time consumption and error rates of differently sized training sets, it was observed that when only 1/4 of the complete dataset was used, the classifier’s time consumption was reduced to approximately 1/4 of the original, resulting in an average execution time per pulse of only 67 μs compared to 294.27 μs for the complete training set. The discriminative results between the reduced dataset and the complete training set exhibited a difference of only 0.13% (41 pulses, with 21 falling in the 0–25 keV). This sampling method significantly reduced the computational costs while ensuring reliable discrimination results. This facilitated flexible selection of the training set based on the experimental latency requirements.
Sample sizes | ||||||
---|---|---|---|---|---|---|
1/1 | 1/2 | 1/3 | 1/4 | 1/5 | 1/6 | |
Total time (ms) | 8828 | 4131 | 2712 | 2021 | 1648 | 1291 |
Error rate (%) | 0.000 | 0.080 | 0.126 | 0.130 | 0.196 | 0.240 |
KNN algorithms typically use the nonzero-amplitude portion of pulses as feature points, which encompass 64 points. Table 3 shows that the time consumption of the KNN algorithm was 4325 ms in the 0–25 keV. In contrast, GMM-KNN employed only two pulse features, Qtail and Qtotal, resulting in a reduced execution time of 2021 ms for the same test volume. The program was parallelized, and the selection of Qtail and Qtotal significantly improved the processing speed. Notably, even when using the 64-points KNN algorithm, the execution time remained at 4325 ms, indicating that the algorithm was not very demanding in terms of the number of pulse features and could adapt to different pulse types and experimental requirements.
Methods | Energy (keV) | Times (ms) |
---|---|---|
GMM-KNN | 0-25 | 2021 |
GMM-KNN | 25-100 | 1791 |
GMM-KNN | 100-2100 | 1446 |
64-points KNN | 0-25 | 3425 |
An imbalanced classification is a significant challenge for classification algorithms. In this study, we applied GMM clustering to partition a test dataset containing 10,000 gamma rays and 10,000 neutrons into 20 subsets, each comprising 1,000 pulses. We varied the gamma/neutron (γ/n) ratios from 10:1 to 10:10 and from 10:10 to 1:10, resulting in 19 different ratios. We compared the results of the GMM-KNN classifier with the prior knowledge to obtain the error rates and average execution times per pulse for each ratio, as shown in Fig. 11. All seven ratios ranging from 10:8 to 6:10 exhibited error rates below 5%. The lowest error rate of 1.01% (1.23%) and average execution time of a cycle of 301.84 μs (289.60 μs) were achieved at a ratio of 9:10 (10:9), which is consistent with the γ/n ratio of approximately 9:10 (12,542 gamma rays to 13,619 neutrons) observed in the training set. The GMM-KNN classifier demonstrated excellent real-time performance, strong adaptability, and robustness. Thus, it exhibits great potential for real-time discrimination and can be utilized for onsite analysis, serving as a reference for offline n-γ discrimination analysis.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F011.jpg)
Using the complete training set, the GMM-KNN performed the classification task simultaneously in three different energy ranges (0–25 keV, 25–100 keV, and 100–2100 keV). The discrimination results in these three energy ranges were concatenated to obtain a discrimination result across the complete energy range (as shown in Fig. 12). The classification exhibited a small error at the energy boundaries, which is consistent with the GMM clustering results. This indicated that the test results accurately reflected the distribution of the data in the training set. In contrast to the CCM, which judges the pulse categories using the threshold PSDthreshold, the results of the GMM-KNN classification were more consistent with the Gaussian mixture distributions of the two components. In the low-energy range, an overlap between neutrons and gamma rays was observed, and we could not directly observe the classification results of these two pulse types in the low-energy region in Fig. 12. Only by comparing the classification effects of CCM and GMM-KNN in the feature space containing Qtail and Qtotal can we effectively evaluate their classification performances.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F012.jpg)
Using Qtail, Qtotal and energy as the X, Y, and Z axes, respectively, we obtained a three-dimensional (3-D) plot of the classification results in the feature space (Fig. 13). The red cubes and blue spheres represent neutrons and gamma rays, respectively. Figure 13(a) shows the classification results of CCM. Figure 13 shows the classification results of GMM-KNN. As evident, for the majority of pulses, the neutrons and gamma rays exhibited distinct cone-shaped distributions, which rendered them easy to distinguish. However, for certain lower-energy pulses, there was no clear boundary between the neutrons and gamma rays (indicated by the flattened red region in Fig. 13), resulting in their mixture and making differentiation difficult. To further compare the classification results of the CCM and GMM-KNN in the low-energy range, we focused on the performance of the two classification algorithms in the feature space.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F013.jpg)
Figure 14 shows the projection of the 3-D visualization from Fig. 13 onto the X-Y plane. In the 2-D feature space, we can observe the classification results of the CCM (left) and GMM-KNN (right). The two types of pulses exhibited approximately elliptical distributions, with blue representing gamma rays and red representing neutrons. The CCM determined the pulse categories based on the threshold, PSDthreshold. It constructed a histogram of the pulse PSD and fit it with a Gaussian distribution, thereby facilitating the separate fitting of peaks for neutrons and gamma rays. The midpoint between the two peaks was considered as the threshold PSDthreshold, which served as the criterion for distinguishing between neutrons and gamma rays. In Fig. 14, PSDthreshold is represented by a simple straight line (the equation of the line is
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F014.jpg)
For the portion of pulses that is difficult to distinguish for both methods (the flattened red region in the 3-D visualization), both the CCM and GMM-KNN tended to classify these ambiguous pulses as neutron pulses. This is because gamma rays exhibit a more concentrated distribution, resulting in a higher peak in the PSD histogram corresponding to gamma rays. As the PSDthreshold moved towards the gamma peak, more pulses are classified as neutrons. In the case of the GMM-KNN classification method, the training set was also generated by the GMM, resulting in classification preferences similar to those of the CCM.
A quantitative comparison of the classification results of CCM and GMM-KNN revealed that GMM-KNN improved the n-γ discrimination. For quantitative analysis, we used the GMM-KNN regression algorithm.
GMM-KNN regression
To obtain a more generalizable metric, GMM-KNN employs regression to compute a quantifiable FOM value. This is because the difference between the two types of pulses, neutrons and gamma rays, increases with higher energies. Consequently, the FOM values vary across the three energy ranges.
For the CCM, we created a histogram of the PSD (Fig. 15). Figure 15(a), (b), and (c) correspond to the test results within 0–25 keV, 25–100 keV, and 100–2100 keV, respectively. Figure 15(a) corresponds to the lowest 0–25 keV range, where the two Gaussian-fit peaks were least separated. In this energy range, neutrons and gamma rays were the most difficult to discriminate, and the number of misclassified pulses was the highest. As the energy increased, the difference between the distributions of the two types of pulses increased, and the separation of the two Gaussian-fit peaks increased, as shown in Fig.15(b) and 15(c). These two parts of the pulse can be easily distinguished. The CCM is a simple and effective discrimination method in the energy range of 25–2100 keV; therefore, the CCM can discriminate pulses with high energies well. However, at lower energy levels, there are numerous ambiguous pulses, and the level of ambiguity increases as the energy decreases.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F015.jpg)
For the GMM-KNN regression, we generated PSD histograms (Fig. 16). Figure 16(a), (b) and (c) correspond to the test results within 0–25 keV, 25–100 keV, and 100–2100 keV, respectively. As shown in Fig. 16(a), for GMM-KNN, the separation of the two Gaussian fitting peaks increased, and the FOM value greatly improved compared with that in case of CCM. The discrimination ability of the GMM-KNN was significantly better than that of the CCM in the energy range of 0–25 keV. However, in the higher energy ranges (25–100 keV and 100–2100 keV), the FOM values improved less and the separation of the two Gaussian-fitted peaks became slightly larger. When the effectiveness of n-γ discrimination improved, we observed that the PSD is highly concentrated in certain bins. This is attributed to the fact that the training set was not a standard Gaussian distribution, but there were several small protrusions of varying heights. After the KNN regression, pulses in certain intervals became more concentrated, leading to a more pronounced peak in the histogram.
-202411/1001-8042-35-11-007/alternativeImage/1001-8042-35-11-007-F016.jpg)
The results of the CCM and GMM-KNN regressions are presented in Table 4. In the 0–25 keV, neither method could completely separate the two types of pulses. However, the FOM of the GMM-KNN method improved by 32.08%, thereby significantly enhancing the discrimination between neutrons and gamma rays. This method facilitated a further reduction in the energy threshold for discriminable pulses, implying that neutrons and gamma rays can be distinguished in a larger energy range. In addition, the improvement in FOM by GMM-KNN decreased as the energy increased. At 25–2100 keV, the CCM could achieve basic separation between neutrons and gamma rays, thereby diminishing the additional benefits of machine learning methods.
Energy (keV) | |||
---|---|---|---|
Method | 0–25 | 25–100 | 100–2100 |
CCM | 0.664 | 1.053 | 0.967 |
GMM-KNN | 0.877 | 1.262 | 1.020 |
Increasing rate | 32.08% | 19.85% | 5.48% |
Conclusion
We designed a new intelligent discrimination algorithm called GMM-KNN. This method can be used to construct a training set from unlabeled data and achieve the n-γ discrimination of unknown pulses. GMM-KNN selected Qtail and Qtotal as pulse features to reduce the dimensions and flexibly select samples from the training set, which significantly reduced algorithm complexity. GMM-KNN also used the LabVIEW program to achieve KNN classification and regression. The LabVIEW program can be executed in parallel; however, the memory consumption of the array should be strictly controlled. We improved the KNN algorithm, particularly for KNN classification. The improved KNN algorithm significantly enhanced the running speed. Compared to the conventional KNN algorithm, the GMM-KNN classifier required only half the time to test the same dataset. Moreover, GMM-KNN could flexibly choose the number of training set samples based on specific experimental delay requirements. When only a quarter of the sample set data were used, the GMM-KNN classifier required only approximately 1/4 of the time (2021 ms) required for the full dataset, whereas the discrimination results differed by only 0.13% (41 pulses). Further, this method maintained a stable performance over a wide range of gamma/neutron ratios, rendering it suitable for different experimental data. Prior to running the GMM-KNN classifier, we presorted the two types of pulse samples in the training set and executed the LabVIEW parallel calculation, thereby reducing the complexity of the judgment logic by using only the size of the two judgment values to determine the pulse category. The GMM-KNN regression first calculated the distance value and then synthesized it with the original pulse features into a 2-D array. This facilitated the calculation of the average of K values rather than all pulse regression values. To evaluate the n-γ discrimination effect of the GMM-KNN algorithm, we qualitatively analyzed the scatter distribution and quantitatively calculated the FOM value. In the feature space, the GMM-KNN classification could better fit near-elliptical distributions, correctly classifying approximately 5.52% of the gamma rays. The FOM values for the GMM-KNN regression in the three energy ranges were 0.877, 1.262, and 1.020, respectively. Compared with the CCM, this method exhibited higher discrimination factors in each energy range, particularly in the low-energy domain (<25 keV), where
Distance metrics for digital pulse-shape discrimination of scintillator detectors
. Radiat. Phys. Chem. 156, 205-209 (2019). https://doi.org/10.1016/j.radphyschem.2018.11.014Pulse shape discrimination in inorganic and organic scintillators
. I, Nucl. Instrum. Methods. 95, 141-153 (1971). https://doi.org/10.1016/0029-554X(71)90054-1An investigation of the digital discrimination of neutrons and γ rays with organic scintillation detectors using an artificial neural network
. Nucl. Instrum. Methods Phys. Res. A 607 (2009). https://doi.org/10.1016/j.nima.2009.06.027Digital n/γ discrimination measurement of low intensity pulsed neutron
. Nucl. Tech. 38,Characterization of the new scintillator Cs2LiYCl6:Ce3+
. Nucl. Sci. Tech. 29, 11 (2018). https://doi.org/10.1007/s41365-017-0342-4Fast pulse sampling module for real-time neutron–gamma discrimination
. Nucl. Sci. Tech. 30, 84 (2019). https://doi.org/10.1007/s41365-019-0595-1A versatile pulse shape discriminator for charged particle separation and its application to fast neutron time-of-flight spectroscopy
. Nucl. Instrum. Methods. 156, 459-476(1978). https://doi.org/10.1016/0029-554X(78)90746-2Particle identification via pulse-shape discrimination with a charge-integrating ADC
. Nucl. Instrum. Methods A 263, 441-445(1988). https://doi.org/10.1016/0168-9002(8890984-9Machine learning for digital pulse shape discrimination
. IEEE Nucl. Sci. Symp. Med. Imaging Conf. Rec. 89, 1-4 (2012). https://doi.org/10.1109/NSSMIC.2012.6551092Artificial neural network algorithms for pulse shape discrimination and recovery of piled-up pulses in organic scintillators
. Ann. Nucl. Energy 120, 410-421 (2018). https://doi.org/10.1016/j.anucene.2018.05.054An artificial neural network based neutron–gamma discrimination and pile-up rejection framework for the BC-501 liquid scintillation detector
. Nucl. Instrum. Methods A 610, 534-539 (2009). https://doi.org/10.1016/j.nima.2009.08.064Pulse pileup rejection methods using a two-component Gaussian Mixture Model for fast neutron detection with pulse shape discriminating scintillator
. Nucl. Instrum. Methods A 988,Study on neutron–gamma discrimination method based on the KPCA-GMM, Nucl
. Instrum. Methods A 1056,Gaussian mixture models as automated particle classifiers for fast neutron detectors
. Stat. Anal. Data Min. 12, 479-488 (2019). https://doi.org/10.1002/sam.11432Study on neutron–gamma discrimination method based on the KPCA-GMM-ANN, Radiat
. Phys. Chem. 203,A comparison of smallbatch clustering and charge-comparison methods for n/γ discrimination using a liquid scintillation detector
. Nucl. Instrum. Methods A 1028,K-Nearest Neighbors regression for the discrimination of gamma rays and neutrons in organic scintillators, Nucl
. Instrum. Methods A 987,Comparing Machine Learning Classifiers for Object-Based Land Cover Classification Using Very High Resolution Imagery
. Remote Sens. 7, 153-168 (2015). https://doi.org/10.3390/rs70100153Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence
. Pattern. Recogn. 39, 417-423 (2006). https://doi.org/10.1016/j.patcog.2005.08.009Neutron-gamma discrimination method based on blind source separation and machine learning
. Nucl. Sci. Tech. 32, 18 (2021). https://doi.org/10.1007/s41365-021-00850-wNeutron/gamma discrimination based on the support vector machine method
. Nucl. Instrum. Methods A 777, 80-84 (2015). https://doi.org/10.1016/j.nima.2014.12.087Performance of linear classification algorithms on ℵ/γ discrimination for LaBr3: Ce scintillation detectors with various pulse digitizer properties
. J. Instrum. 15,FPGA Code for the Data Acquisition and Real-Time Processing Prototype of the ITER Radial Neutron Camera
. IEEE Trans. Nucl. Sci. 66, 1318-1323 (2019). https://doi.org/10.1109/TNS.2019.2903646The Design and Performance of the Real-Time Software Architecture for the ITER Radial Neutron Camera
. IEEE Trans. Nucl. Sci. 32, 1310-1317 (2019). https://doi.org/10.1109/TNS.2019.2907056New FPGA based hardware implementation for JET gamma-ray camera upgrade
. Fusion. Eng. Des. 128, 188-192 (2018). https://doi.org/10.1016/j.fusengdes.2018.02.038Pulse discrimination with a Gaussian mixture model on an FPGA
. Nucl. Instrum. Methods A 900, 1-7 (2018). https://doi.org/10.1016/j.nima.2018.05.039Gaussian mixture models as automated particle classifiers for fast neutron detectors
. Stat. Anal. Data Min. 12, 479-488 (2019). https://doi.org/10.1002/sam.11432Studies on unfolding energy spectra of neutrons using maximum-likelihood expectation–maximization method
. Nucl. Sci. Tech. 30, 134 (2019). https://doi.org/10.1007/s41365-019-0662-7Maximum Likelihood from Incomplete Data Via the EM Algorithm
. Journal of the Royal Statistical Society. Series B (Methodological) 39, 1-22 (1977). https://doi.org/10.1111/j.2517-6161.1977.tb01600.xSimulation of n-γ pulse signal discrimination based on KNN classification algorithm
. Electronic Measurement Technology 45, 164-170 (2022). https://doi.org/10.19651/j.cnki.emt.2209025Efficient kNN Classification With Different Numbers of Nearest Neighbors
. IEEE Trans. Neur. Net. Lear. 29, 1774-1785 (2018). https://doi.org/10.1109/TNNLS.2017.2673241Neighbourhood com-ponents analysis
. Adv. Neural Inf. Process. Syst 17, 513-520 (2005). https://doi.org/10.1109/TCSVT.2013.2242640Development of a high-speed digital pulse signal acquisition and processing system based on MTCA for liquid scintillator neutron detector on EAST
. Nucl. Sci. Tech. 34, 150 (2023). https://doi.org/10.1007/s41365-023-01318-9Pulse shape discrimination and energy calibration of EJ301 liquid scintillation detector
. Nucl. Tech. 38,Neutron detection in a high gamma-ray background with EJ-301 and EJ-309 liquid scintillators
. Nucl. Instrum. Methods A 690, 96-101(2012). https://doi.org/10.1016/j.nima.2012.06.047Investigation of fusion neutron diagnostic technology on EAST device
.The authors declare that they have no competing interests.