The study of intelligent algorithm in particle identification of heavy-ion collisions at low and intermediate energies

NUCLEAR PHYSICS AND INTERDISCIPLINARY RESEARCH

The study of intelligent algorithm in particle identification of heavy-ion collisions at low and intermediate energies

Gao-Yi Cheng，

Qian-Min Su ，

Xi-Guang Cao ，

Guo-Qiang Zhang

Nuclear Science and Techniques

Vol.35, No.2

Article number 33

Published in print Feb 2024

Available online 28 Mar 2024

DOI：10.1007/s41365-024-01388-3

941017

Traditional particle identification methods face time consuming, experience-dependent, and poor repeatability challenges in heavy-ion collisions at low and intermediate energies. Researchers urgently need solutions to the dilemma of traditional particle identification methods. This study explores the possibility of applying intelligent learning algorithms to the particle identification of heavy-ion collisions at low and intermediate energies. Multiple intelligence algorithms, including XgBoost and TabNet, were selected to test datasets from the neutron ion multi-detector for reaction-oriented dynamics (NIMROD–ISiS) and Geant4 simulation. Machine learning algorithms based on tree structures and deep learning algorithms e.g. TabNet show excellent performance and generalization ability. Adding additional data features besides energy deposition can improve the algorithm’s identification ability when the data distribution is nonuniform. Intelligent learning algorithms can be applied to solve the particle identification problem in heavy-ion collisions at low and intermediate energies.

Heavy-ion collisions at low and intermediate energiesMachine learningEnsemble learning algorithmParticle identificationData imbalance

Introduction

Intelligent algorithms play crucial roles in nuclear physics. Challenges in nuclear physics experiments include high complexity, extensive data, time-consuming experiments, and intricate models. Taking particle collision experiments as an example, millions of terabytes of data are generated daily for heavy-ion collisions at high energies. Therefore, the extraction of useful information from complex experimental data has become an enormous challenge.

Large-scale experiments such as ATLAS, ALICE, and CMS have already applied machine-learning and deep-learning algorithms [1-4] to analyze and process experimental data. Typical examples include research on the particle–track reconstruction problem [5-8] in high-energy physics experiments, data analysis, and pattern recognition of the Higgs boson [9-13]. The application of machine learning in particle physics can be seen in a large-scale dynamic review [https://iml-wg.github.io/HEPML-LivingReview/] and the website opened by the ML Physics Portal [14-17].

Currently, research on intelligence algorithms in nuclear physics experiments [18-21] focuses on data analysis, such as the masses of atomic nuclei [22-26], nuclear charge radii [27-31], decay half-lives [32-37], critical reaction thresholds [38], and spallation reaction cross-sections [39], etc. In addition to using machine learning algorithms to investigate various physical issues [40-42], researchers have used these algorithms to analyze experimental data [43-45]. This involves tasks, such as particle trajectory reconstruction, vertex reconstruction [46], and particle identification in nuclear reactions. Advancements in experimental equipment and related technologies have facilitated the integration of machine learning and nuclear physics.

Current research on particle identification focuses on high-energy particle physics. To date, research on particle identification has mainly focused on identifying particle types [47] and separating rare particles from background signals. The data and algorithms used for particle identification depend on the type of detector. For example, the output data of a calorimeter detector can be processed and converted into matrix data; therefore, image algorithms, such as CNN and GNN, can be used for processing. The research and applications of machine learning in particle identification have mainly focused on LHC detectors, such as calorimeters [48-50] and Cherenkov detectors [51]. Moreover, a new research focus in recent years on LHC experiments has been the development of new detector software and hardware based on machine-learning and deep-learning algorithms [52].

Compared with other nuclear reactions, the particles generated in heavy-ion collisions at low and intermediate energies are of various types and have complex energy distributions. Numerous fragments have similar charges and masses. Experiments on heavy-ion collisions depend on the energy resolution of the detector and require a detection array with large solid-angle coverage. Therefore, the identification of dozens or even hundreds of reaction products from independent detection units is challenging. Traditional particle identification methods include telescope [53], time-of-flight [54], magnetic spectrometer, Bragg spectroscopy, and pulse shape discrimination methods. These methods are often combined to improve identification ability, especially for heavy fragments with minor differences in charge and mass numbers between adjacent fragments. The performance of the traditional methods for heavier particles is hindered by their dependence on experience, poor repeatability, and time consumption. The precise identification of charge and mass numbers is fundamental to all research related to heavy-ion collisions, and is a very powerful method for studying exotic nuclear configurations [55-60]. Compared with particle identification in particle physics, the wide variety and slight differences in the charge and mass numbers of charged particles produced in heavy-ion reactions pose significant challenges for existing particle identification methods. Therefore, the development of a universal, efficient, and high-precision particle identification method based on machine learning techniques will significantly boost the study of heavy-ion collisions.

Parker et al. [61] devised a 5-layer neural network and evaluated its performance on the 22nd and 23rd detectors of a neutron ion multi-detector for reaction-oriented dynamics (NIMROD–ISiS). We also used a dataset from NIMROD–ISiS detector array. This study aimed to identify the particle charge and mass numbers in heavy-ion collisions at low and intermediate energies. Supervised learning algorithms were used to train particle identification models based on ΔE-E energy deposits from telescope (or super-telescope) detectors in heavy-ion collisions. Machine learning and deep learning algorithms were applied to identify the particles’ charge and mass numbers, and their capabilities were compared.

Dataset and methods

Real-world data (RWD) come from experiments on heavy-ion collisions at low and intermediate energies carried out at the Cyclotron Institute of Texas A&M University and consist of reaction products detected by the NIMROD–ISiS array [62, 63]. The NIMROD–ISiS detector array comprised 14 rings. Experimental data were obtained from 143 detectors, including 124 telescope detectors and 19 super telescope detectors with ring numbers ranging from 2 to 15. The detection system included Si detectors and CsI (Tl) scintillators with angles ranging from 3.6° to 167.0°. The back half of the NIMROD (90.0° –167.0°) consists of half the Indiana Silicon Sphere. Si detectors were combined with the CsI detectors as ’telescopes,’ while some were equipped with two Si detectors in tandem, known as ’super telescopes’ (3.6°–45°), enhancing the ability to identify mass numbers of heavier fragments. The capacity to include ionization chambers in front of Si detectors is also available. Figure 1 shows the structure of the NIMROD-ISIS.

Fig. 1

Schematic diagram of the NIMROD–ISiS detector array layout, from Texas A&M NIMROD–ISiS official website (https://cyclotron.tamu.edu/nimrod/)

In addition to the dataset from Texas A&M University, Geant4 [64] was used to simulate heavy-ion collisions at intermediate energies. QMD model with G4IonQMDPhysics was used as an event generator to simulate the reaction process of a beam incident on a target. The detection processes in Geant4 include electromagnetic interactions (G4EmStandardPhysics), energy transfer and loss (G4EmExtraPhysics and G4StoppingPhysics), decay processes (G4DecayPhysics and G4RadioactiveDecayPhysics), and elastic and inelastic scattering (G4HadronHElasticPhysics, G4HadronPhysicsINCLXX, and G4IonElasticPhysics). The simulations involved collisions of ²⁸Si with an energy of 50 MeV/u and ¹²C particles in vacuum. The detector system consisted of four supertelescope detectors. The simulation generated a dataset with more than four million particles. Figure 2 shows the structure of the detector system and the ΔE-E two-dimensional histogram.

Fig. 2

(Color online) The structure of the super telescope detector used in the Geant4 simulation and the ΔE-E two-dimensional histogram from Geant4 simulation

This study covers serval common machine learning algorithms, such as Support Vector Machines (SVM), Logistic Regression (LR), and Bayesian classifiers. Ensemble learning algorithms based on tree structures and TabNet, a deep-learning algorithm, were also used. The algorithms used in this study are briefly described below.

(a) MLP

Multi-layer perceptron (MLP) is a feed-forward neural network composed of multiple neurons, which is the basis and prototype of many artificial and deep learning neural networks.

(b) Random Forest

Random forest is an early tree-based ensemble learning algorithm with multiple decision trees [65]. This offers advantages of both decision trees and ensemble learning. Strong robustness and predictive ability are also advantageous.

XgBoost is a tree-based ensemble learning algorithm proposed in 2016 [66] that is widely used in data mining, natural language processing, image recognition, and other fields. In general, XgBoost is a machine-learning algorithm with high efficiency, accuracy, flexibility, explainability, and scalability.

(d) LightGBM

LightGBM, a tree-based gradient boosting framework for ensemble learning, has been widely used in various applications [67]. Built on the gradient-boosting decision tree (GBDT) algorithm, LightGBM incorporates advanced techniques such as gradient-based one-sided sampling (GOSS) and histogram-based acceleration. These optimizations enabled faster training and lower memory consumption, making LightGBM an efficient and practical choice for machine learning tasks.

(e) CatBoost

CatBoost is a tree-based ensemble-learning algorithm developed by Yandex [68]. In terms of building a decision tree, compared with XgBoost and LightGBM, CatBoost can automatically process the category features of the data and automatically process the scaling of the data features without additional data processing. CatBoost adopts the same gradient-based splitting and feature selection strategies based on a greedy algorithm as XgBoost. CatBoost also automatically handles missing values in the data without additional data padding and has a certain robustness to noise and outliers.

Boosting-based ensemble learning algorithms such as XgBoost, LightGBM, and CatBoost are widely used in various fields. The basic process of these algorithms involves training multiple weak learners, assigning weights to training samples, and iteratively adjusting these weights based on the learner’s performance. This iterative process aims to create a powerful ensemble model that is capable of accurate classification. Figure 3 depicts the underlying structure of ensemble learning algorithms that employ the boosting method.

Fig. 3

(Color online) The structure of ensemble learning and boosting method

(f) TabNet

TabNet, which was introduced by Google in 2019, is a neural network structure explicitly designed for classification, prediction, and regression tasks involving tabular data [69]. Unlike traditional decision-tree-based machine learning algorithms, TabNet minimizes the need for preprocessing input data and can automatically learn the interdependencies among input features. It incorporates an attention transformer that uses an attention mechanism to select relevant feature vectors dynamically. Since its inception, TabNet has been widely adopted in various applications involving tabular data [70, 71].

Figure 4 illustrates the procedure for applying the intelligent algorithms in this study. Training a classification model typically involves several steps.

Fig. 4

(Color online) The procedure of applying intelligent algorithms in this paper. The process is divided into three parts: datasets and algorithms, model training and testing, and evaluation of test results. Model training and testing is the most important part

(a) Data acquisition: Obtaining a dataset containing information about particle charge and mass, which can be from experimental or simulated data.

(b) Data preprocessing: Ensuring the quality and consistency of the data through noise removal, addressing missing data, and normalizing features.

(c) Data splitting: Dividing the dataset into training and testing sets. A training set was used to train the model and a test set was employed to evaluate the trained model. Random and stratified sampling are the commonly used methods.

(d) Feature engineering: Raw data is transformed, extracted, and selected to create informative and expressive feature sets.

(e) Algorithm selection: Suitable algorithms are chosen based on specific task requirements. The main task of the algorithm is mult-classification.

(f) Training and tuning of parameters: The algorithm’s parameters can be tuned to improve the performance of the model further. Generally, each algorithm has a unique set of parameters that can be adjusted.

(g) The performance of the trained model is assessed using the testing dataset. Appropriate evaluation metrics were selected to evaluate the performance of the model for particle identification.

Based on the structure of NIMROD–ISiS, the dataset was initially split based on the ring number determined by the forward angle of the detector. Subsequently, the data were divided into two categories: telescope and supertelescope detectors. The Geant4 dataset was used for training and testing with machine learning and deep learning algorithms. After identifying the optimal algorithms, a subset of the detector data was used to evaluate the generalization ability of the algorithms.

Two classification strategies are adopted in this study.

(a) Using the algorithm to train and test charge and mass numbers, respectively.

(b) In classifying particle mass numbers, the particle’s charge number was included as a part of the data features. From a logical perspective, this strategy is similar to traditional particle identification methods.

In a practical study, the experimental data exhibited a highly unbalanced distribution. Randomly extracting data can lead to disparate data category distributions among the training, validation, and test sets. This can lead to critical code errors and poor performance. Therefore, to address this problem, stratified sampling was employed as an alternative to random sampling.

Results and discussion

As the core task of particle identification involves multiple classifications, the use of suitable evaluation metrics for multiple classification algorithms is crucial. Common evaluation metrics include the accuracy, recall, precision, and f1-score [72, 73, 74, 75, 76]. These metrics help assess the performance of the algorithm from different aspects. The results of the classification task can be categorized into the following four types:

(a) Predict positive samples as positive. (TP)

(b) Predict negative samples as negative. (TN)

(d) Predict positive samples as negative. (FN)

When evaluating the algorithm, the corresponding evaluation metrics were calculated using the classification results. The equations are as follows: $Accuracy = (T P + T N) / (T P + T N + F P + F N)$ (1) $Precision = (T P) / (T P + F P)$ (2) $Recall = (T P) / (T P + F N)$ (3) $F 1 - score = 2 \times Precision \times Recall / (Precision + Recall)$ (4) Accuracy is defined as a measure of correctness. The precision measures the accuracy of a model in predicting positive examples. The recall represents the coverage of positive samples that are correctly predicted. The f1-score is a compound evaluation metric consisting of precision and recall.

Because the problem of positive and negative samples is extended to multiple categories in multi-classification tasks, methods for computing comprehensively evaluated metrics are essential. Commonly used strategies are the micro average, macro average, and weighted average.

Macro-averaging calculates the average precision and recall of each class.

The micro-average ignores category differences and calculates the overall TP, FP, TN, and FN.

The weighted average is similar to the macro average, but uses category proportions as weights to calculate performance metrics.

In particle identification, all generated particles have equal significance. Therefore, the macro average was chosen as the calculation method for the evaluation metrics. The macro-average provides a balanced assessment across all classes and facilitates a comprehensive understanding of model performance. Because the mass and charge determined the particle category, the charge and mass numbers were merged into a binary data format to calculate the evaluation metric.

The particles detected by the NIMROD–ISiS detector array can be categorized as light ions (with proton numbers ranging from one to four) or heavy ions. Most heavy ions cannot penetrate the Si detector, whereas most light particles pass through it. The disparity in the production yield between light particles and heavy ions during the reaction process leads to imbalances in data distribution. The dataset was split based on whether the particles hit the CsI detector. This approach solves the data imbalance problem in particle analysis and improves the algorithm performance. The XgBoost ensemble-learning algorithm was selected for testing. The input data features were the total energy, energy deposition in the Si and CsI detectors, and the detector position. The charge and mass numbers of the particles were used as data labels. Figure 5 shows the results of XgBoost on the telescope data. Table 1 shows the results of XgBoost on the super telescope data.

Test results on NIMROD–ISiS SuperTelescope data

CsIE	Accuracy	Precision	Recall	F1-score	Label	Strategy
Zero	0.996	0.996	0.957	0.969	Z	Independence
Zero	0.934	0.874	0.844	0.856	A	Independence
Zero	0.932	0.893	0.877	0.883	Z+A	Independence
Zero	0.997	0.997	0.958	0.969	Z	FirstZ,SecondA
Zero	0.966	0.908	0.893	0.9	A	FirstZ,SecondA
Zero	0.964	0.924	0.916	0.919	Z+A	FirstZ,SecondA
Non-Zero	0.974	0.473	0.402	0.425	Z	Independence
Non-Zero	0.892	0.3	0.244	0.261	A	Independence
Non-Zero	0.87	0.316	0.247	0.266	Z+A	Independence
Non-Zero	0.974	0.468	0.401	0.425	Z	FirstZ,SecondA
Non-Zero	0.903	0.326	0.274	0.289	A	FirstZ,SecondA
Non-Zero	0.881	0.335	0.272	0.295	Z+A	FirstZ,SecondA

Fig. 5

(Color online) Test results of XgBoost on NIMROD–ISiS telescope data. Figures 3 (a) and (b) show the results of XgBoost on particles w/o registrations on CsI detector. The latter results are better than the former

The model performed well when tested on particles that were not registered on a CsI detector. The evaluation metrics for each ring generally exceeded 0.85. The model performs better in identifying charge numbers than mass numbers. It also achieves high accuracy for particles registered on CsI detector. However, their precision, recall, and f1-score were low. This discrepancy is attributed to extreme data imbalance. Figure 6 shows the mass distribution of ring 2. The mass distribution of the particles registered on CsI detector was highly non-uniform. There was a significant difference between the categories with the highest and lowest counts. The precise identification of rare categories is challenging for this model. As the evaluation strategy uses a macro average, the evaluation metrics of the classifier are calculated as average values across all categories. Thus, the performance of categories with small percentages significantly affected the overall evaluation metrics. The training strategies for charge and mass numbers did not show any significant differences. Including the charge number as an additional data feature did not effectively improve the identification ability of the model for particles in deficient quantities. If the model fails to precisely predict the charge number of the particles, the accuracy of the mass number identification is also affected.

Fig. 6

Mass number distribution for events w/o registration on CsI detectors in ring 2. The mass number distribution of particles without registration on CsI detector is well balanced, with sample sizes exceeding 1000 in most categories. Among particles with registration on CsI detector, most heavy ions count around 100 occurrences

To address this problem, the following methods have been proposed:

(a) Algorithm parameter optimization: Refining algorithm parameters (reducing the learning rate, increasing iteration numbers, expanding tree depth, etc.) to improve accuracy, precision, and recall. However, adjusting the parameters alone had a limited impact on the categories with limited samples, even when distinct weights were assigned to each category.

(b) Data category adjustment: The imbalance ratio can be reduced by eliminating data categories that comprise only a few or a few dozen samples.

(c) Exploration of data pre-processing methods: Trying out different approaches, including normalization, standardization, or no data pre-processing.

The most effective solution to the severe shortage of samples in specific categories is to include additional data. This reduces the imbalance ratio and thus improves the accuracy. For instance, in ring 10, each category had over 20,000 samples and the imbalance ratio was only 5:1. XgBoost performed excellently, with the evaluation metrics for each type exceeding 0.9.

Other factors, such as detector position and hardware conditions, such as temperature and electronic signal drift, can cause scaling issues, thus affecting algorithm accuracy. To address this issue, Geant4 was used to simulate the experiment and detector performance, enabling a focused research to address the imbalance issue.

In Geant4, the total particle energy, time of flight (ToF), kinetic energy before entering the detector, detector position, and particle deposition energy (E_abs) were selected as input data features. Testing with XgBoost demonstrated that the additional data features alleviated the data imbalance problem, resulting in excellent performance.

To confirm that this is not limited to XgBoost alone, a comparison test was conducted with other machine-learning and deep-learning algorithms. The test results (Fig. 7) confirmed the earlier findings. Tree-based machine learning algorithms, such as XgBoost and deep learning TabNet demonstrated excellent performance, whereas traditional machine learning algorithms, such as LR, SVM, and Bayesian classifiers, exhibited poor performance.

Fig. 7

(Color online) Test results of machine learning and deep learning algorithms on the Geant4 dataset. SVM, MNB, GNB, and LR perform poorly. MLP has relatively low precision and recall. Ensemble learning algorithms such as XgBoost and deep learning algorithm TabNet perform well

These results validate the effectiveness of the proposed approach in mitigating data imbalances and highlight the superiority of tree-based machine learning and deep learning algorithms in addressing this challenge.

Subsequently, the algorithms were evaluated using only the energy deposition as the data feature. XgBoost, LightGBM, CatBoost, and TabNet, which exhibited promising performances in prior tests, were selected for this assessment. The results demonstrated that each algorithm showed decreased accuracy in predicting the particle mass number.

Based on these observations, a series of additional data features were selected for comparative analysis. Numerous tests have shown that particle flight time is important for improving the accuracy of the algorithm. Although this feature alone proved insufficient for charge and mass number identification, its combination with the energy deposition feature enhanced the identification capabilities of the algorithm. Detailed descriptions of the corresponding test results are presented in Table 2 and Table 3.

Classification results from Geant4 simulation data, with independent training on charge and mass numbers

Algorithm	Accuracy	Precision	Recall	F1-score	Feature
XgBoost	0.127	0.045	0.079	0.05	ToF
XgBoost	0.862	0.863	0.827	0.839	E_abs
LightGBM	0.828	0.821	0.795	0.804	E_abs
CatBoost	0.836	0.804	0.765	0.771	E_abs
TabNet	0.813	0.837	0.762	0.791	E_abs
XgBoost	0.97	0.986	0.963	0.971	E_abs,ToF
LightGBM	0.947	0.95	0.936	0.943	E_abs,ToF
CatBoost	0.948	0.948	0.914	0.926	E_abs,ToF
TabNet	0.971	0.99	0.976	0.983	E_abs,ToF

Classification results from Geant4 simulation data, where charge number is one of the data features of dataset defined by mass number

Algorithm	Accuracy	Precision	Recall	F1-score	Feature
XgBoost	0.127	0.056	0.079	0.051	ToF
XgBoost	0.87	0.878	0.839	0.852	E_abs
LightGBM	0.85	0.848	0.812	0.82	E_abs
CatBoost	0.83	0.798	0.752	0.757	E_abs
TabNet	0.828	0.854	0.794	0.821	E_abs
XgBoost	0.971	0.987	0.965	0.972	E_abs,ToF
LightGBM	0.952	0.949	0.943	0.945	E_abs,ToF
CatBoost	0.948	0.947	0.906	0.918	E_abs,ToF
TabNet	0.952	0.985	0.948	0.963	E_abs,ToF

The final phase of the study involved a comprehensive investigation of the generalization ability of the algorithms. Unlike previous tests involving data from all detectors, this phase focuses on training models using a specific subset of detectors, reserving the remaining data for testing. The data features include the time-of-flight (ToF) and energy deposition. Various data preprocessing techniques, including normalization and standardization, were explored during the testing phase. Before model training, datasets from specific detectors were normalized and standardized using the MinMaxScaler and StandardScaler methods from the sklearn.preprocessing package in Python. These methods were also used to test data from other detectors before evaluating the trained model. However, these methods have a significantly negative impact on generalization ability. Therefore, no data preprocessing was performed. The results are shown in Fig. 8.

Fig. 8

(Color online) Generalization ability test results of the XgBoost algorithm on the Geant4 dataset using different data preprocessing methods. It can be noticed that both normalization and standardization severely reduce the model’s generalization ability

The performance of the algorithms was excellent. The evaluation metrics of TabNet and XgBoost are mostly over 0.9 for all detector data (Fig. 9). These findings establish the efficacy of training models with robust generalization abilities even in situations with limited data availability. Overall, these results highlight the advantages and effectiveness of machine learning and deep learning algorithms, and demonstrate their potential for practical applications.

Fig. 9

(Color online) The test results of generalizability ability test of XgBoost, CatBoost, LightGBM, and TabNet. Figures 10(a), (b), (c), and (d), respectively, show their accuracy, precision, recall, and f1-score curves. The evaluation metrics of all algorithms exceed 0.8. The evaluation metrics of XgBoost and TabNet are mostly over 0.9. TabNet shows better generalization ability than ensemble learning algorithms

Inspired by these findings, a similar study of data similarity was conducted on specific rings of NIMROD–ISiS. The input data features included total energy, energy deposition, and detector position. Through testing, it was discovered that, depending on the similarity of the data, the detectors of NIMROD-ISiS can be divided into groups. Taking the data (particles registered on CsI detector) from ring 9 as an example, ring 9 can be further divided into two groups of detectors (Fig. 10(a) and Fig. 10(b)). The results depicted in Fig. 10 provide valuable information on the patterns and characteristics of the NIMROD–ISiS detector. Moreover, they contribute to the optimization of algorithms and the improvement of data-processing methods. These findings also have significant implications for the study and enhancement of the detector array design and performance. Overall, the findings have practical implications.

Fig. 10

(Color online) In the tests carried out on the NIMROD-ISiS ring 9, the predictions for the other detector data can be categorized into two scenarios, (a) and (b). A high degree of similarity can be observed between the detectors on ring 9

Conclusion

Particle identification in machine learning (ML) is an integrated problem. Researchers must consider various factors including data selection, partitioning, feature engineering, preprocessing, algorithm selection, and parameter tuning. Traditional particle identification methods require significant manual effort and are limited by researchers’ experience and available time. Our study aims to develop a universal and adaptable particle identification model that assists in manual processes. Although achieving 100% accuracy may not be possible, ensemble learning algorithms have meaningful results, especially XgBoost. The conclusions are as follows:

First, intelligent algorithms, particularly tree-based ensemble learning algorithms, can effectively identify particles in heavy-ion collisions at low and intermediate energies. This offers a viable alternative to traditional methods.

Secondly, addressing data imbalances is crucial for particle identification. Severe data imbalances significantly affected the results. The solutions include ensuring sufficient data for a balanced distribution, adding additional data features beyond particle energy deposition, and constructing different identification models based on the detector structure.

Third, training a specialized particle identification model using the existing data reduces the time and resources required for traditional particle identification. Laboratories conducting long-term, large-scale heavy-ion collision experiments can be beneficial. This paves the way for the development of a professional particle identification software.

Finally, machine-learning algorithms can be used to study detector similarity, particularly in large-scale detector arrays with complex structures.

Combinations of supervised and unsupervised learning approaches should be explored in future studies. Other physics software such as NpTool [77] will also be used to simulate the experiments. NpTool is known for its efficient project management and simulation of various sophisticated detector arrays.

Because Geant4 simulations are time consuming and resource intensive, there is a need to explore alternative approaches for generating particle collision data. Generative Adversarial Networks (GAN) [78] and Variational Autoencoders (VAE) [79] have shown promise in generating simulated data for detectors in the field of high-energy physics [80-83]. Utilizing GAN and VAE can reduce the time and resources required for massive amounts of simulated data, thereby making the process more efficient and accessible.

Building on the excellent performance of TabNet, further investigations will include exploring additional deep-learning algorithms, such as DeepGBM [84] and GrowNet [85]. Moreover, we attempted to change the existing ensemble learning algorithm into a multi-output algorithm to classify the mass and charge numbers simultaneously. Our research aims to enhance the understanding of the detector system in sophisticated experiments, which can be used to explore interesting clustering phenomena in nuclei [86-89].

References

A. Kalweit,

Particle identification in the ALICE experiment

. J. Phys. G. Nucl. Partic. 38, 124073 (2011). https://doi.org/10.1088/0954-3899/38/12/124073