Application of deep learning methods combined with physical background in wide field of view imaging atmospheric Cherenkov telescopes

NUCLEAR PHYSICS AND INTERDISCIPLINARY RESEARCH

Application of deep learning methods combined with physical background in wide field of view imaging atmospheric Cherenkov telescopes

Ao-Yan Cheng，

Hao Cai ，

Shi Chen，

Tian-Lu Chen，

Xiang Dong，

You-Liang Feng，

Qi Gao，

Quan-Bu Gou，

Yi-Qing Guo，

Hong-Bo Hu，

Ming-Ming Kang，

Hai-Jin Li，

Chen Liu，

Mao-Yuan Liu，

Wei Liu，

Fang-Sheng Min，

Chu-Cheng Pan，

Bing-Qiang Qiao，

Xiang-Li Qian，

Hui-Ying Sun，

Yu-Chang Sun，

Ao-Bo Wang，

Xu Wang，

Zhen Wang，

Guang-Guang Xin，

Yu-Hua Yao，

Qiang Yuan，

Yi Zhang

Nuclear Science and Techniques

Vol.35, No.4

Article number 84

Published in print Apr 2024

Available online 24 May 2024

DOI：10.1007/s41365-024-01448-8

83606

The High Altitude Detection of Astronomical Radiation (HADAR) experiment, which was constructed in Tibet, China, combines the wide-angle advantages of traditional EAS array detectors with the high-sensitivity advantages of focused Cherenkov detectors. Its objective is to observe transient sources such as gamma-ray bursts and the counterparts of gravitational waves. This study aims to utilize the latest AI technology to enhance the sensitivity of HADAR experiments. Training datasets and models with distinctive creativity were constructed by incorporating the relevant physical theories for various applications. These models can determine the type, energy, and direction of the incident particles after careful design. We obtained a background identification accuracy of 98.6%, a relative energy reconstruction error of 10.0%, and an angular resolution of 0.22-degrees in a test dataset at 10 TeV. These findings demonstrate the significant potential for enhancing the precision and dependability of detector data analysis in astrophysical research. By using deep learning techniques, the HADAR experiment’s observational sensitivity to the Crab Nebula has surpassed that of MAGIC and H.E.S.S. at energies below 0.5 TeV and remains competitive with conventional narrow-field Cherenkov telescopes at higher energies. In addition, our experiment offers a new approach for dealing with strongly connected, scattered data.

VHE gamma-ray astronomyHADARDeep learningConvolutional neural networks

Introduction

The investigation of ultrahigh-energy gamma-ray astronomy [1] is pivotal for understanding a range of extreme astrophysical phenomena. These ultrahigh-energy gamma rays predominantly originate from highly active galactic nuclei, cataclysmic supernova explosions, neutron stars, and black holes. Importantly, these observations form an empirical cornerstone for addressing some of the most compelling scientific questions, including the detection of dark matter and the identification of cosmic-ray sources.

Space observatories such as the Fermi Gamma-ray Space Telescope [2, 3] require long observation times to gather statistically relevant data because high-energy gamma-ray flux decays quickly and has a power-law distribution [4]. Consequently, large-area ground detectors are required for detecting high-energy gamma rays.

The Earth’s atmosphere has become a substantial interaction medium for high-energy photons in energy ranges exceeding gigaelectronvolts [5], effectively preventing their transmission. In this regime, high-energy γ-rays interact with the Earth’s upper atmospheric layers [6], leading to the generation of electron-positron e^- e⁺ pairs, which in turn instigate electromagnetic cascades. Within these cascades, relativistic electrons and positrons produce concentrated beams of Cherenkov radiation. These beams serve as the principal observational targets for specialized ground-based detection systems, enabling nuanced studies of the high-energy universe.

Over the last few decades, gamma-ray astronomy has become a frontier discipline for studying the universe’s highest-energy astrophysical phenomena [7, 8]. The Cherenkov Telescope Array (CTA) is an international initiative that seeks to better understand high-energy gamma rays in the universe by identifying atmospheric Cherenkov radiation [9]. The 20–300 TeV energy range covered by CTA aims to close the observational gap currently present in this energy range. A group of five Cherenkov telescopes, called the High Energy Stereoscopic System (HESS), is now operational in Namibia and used to observe cosmic TeV energy rays. It offers crucial information for comprehending gamma-ray sources [10] such as pulsars and supernova remnants [11]. The Major Atmospheric Gamma Imaging Cherenkov (MAGIC) telescopes, two telescopes situated at La Palma in the Canary Islands, are another notable project. By detecting Cherenkov radiation from cosmic gamma-ray sources [12], the physical characteristics and origins of cosmic rays can be investigated [13]. These telescopes, which have a small field of view, are primarily employed to make accurate observations of known high-energy astrophysical objects, providing important knowledge and observational data for a comprehensive understanding of these extreme cosmological occurrences.

However, measurable signals from many extreme celestial occurrences only last for brief periods because of their intrinsically transitory characteristics. Thus, there is an urgent demand for wide-field detectors that can instantly capture high-energy particle signals across a wide spatial range because large detector arrays require time to recalibrate their directed reception. Wide-field Cherenkov imaging telescopes are particularly useful in such situations.

The High Altitude Detection of Astronomical Radiation (HADAR) experiment is one such technology [14, 15], comprising an array of imaging atmospheric Cherenkov refractive telescopes anchored by atmospheric Cherenkov principles. Positioned at an impressive altitude of 4300 m in Yangbajing, Tibet (30.0848 °N, 90.5522 °E), the experiment incorporates four water lenses, each spanning a diameter of 5 m, strategically placed at the vertices of a square with a side length of 100 m [16]. These lenses, ensconced within a hemispherical glass shell 5 m in diameter, are housed in a steel-structured tank with a diameter of 8 m. This tank, rising to a height of 7 m, is filled with high-purity water, and cameras outfitted with photomultiplier tubes are attached to its base. Upon the interception of Cherenkov radiation, a quartet of detectors meticulously logs the charge deposition on the photomultiplier tubes within a predefined temporal frame. All the collected data are subsequently archived in an external storage apparatus.

To effectively detect very high-energy (VHE) γ-ray sources, analytical methodologies must execute several critical tasks.

Background Suppression The identification of distinct shape features within detector images is pivotal. This allows the isolation of target γ-rays from the overwhelmingly abundant background of cosmic rays, which are predominantly composed of protons.

Energy Reconstruction The accurate estimation of the original energy of incident particles is achieved by correlating variables such as deposited charge and the relative spatial coordinates within the detectors.

Direction Reconstruction Leveraging the stereoscopic images acquired from the detectors, the axis direction of the resultant particle shower must be reconstructed. This facilitates the estimation of the original direction from which the incident particles originated.

Deep learning techniques are becoming increasingly integral to the data analysis frameworks employed in Cherenkov telescope experiments. A variety of computational algorithms, particularly those based on convolutional neural networks (CNNs), have shown unparalleled success in addressing the multifaceted analytical challenges endemic to this scientific domain. Prominent initiatives, such as H.E.S.S. [17] and CTA [18] have harnessed the robust capabilities of CNNs to analyze their observational datasets, yielding significant advancements. These technological strides corroborate the findings of the present study, reinforcing the compelling case for the widespread utility and robustness of deep learning methodologies within this specialized field of research.

This study is structured as follows: The use of deep learning is briefly introduced in Section 2, along with our data production process. A more thorough overview is provided in Sections 3–5, and the pertinent data are used as the basis for training the background suppression, energy reconstruction, and incident direction models. The modeling calculations of the flux and observational sensitivity to the Crab Nebula at various energies are presented in Section 6. Finally, Section 7 presents a thorough summary of the materials used.

Deep Learning Approaches for HADAR Data

Deep learning is an important branch of artificial intelligence (AI) that has emerged as a growing field in recent years. With the significant increase in computational power, especially with GPU chips, the real-time computation of large parallel data (e.g., high-dimensional matrices) has become possible. This has led to the convergence of deep learning in various data analysis industries, simplifying tasks that were previously difficult for humans to perform. As a type of AI, deep learning methods simulate the multi-layer neural network structure of the human brain to solve problems. Similar to human learning, deep learning methods continuously update network parameters through a step-by-step understanding of data. This process ultimately allows the network to extract as many useful features as possible from the inputs and obtain model predictions through neural network computations.

In this study, we adopted CNNs. As excellent models for image recognition, CNNs can spontaneously extract local relationships at different positions within the same dimension and across different dimensions based on the input signals, ultimately yielding optimal results. Fundamentally, a CNN is composed of multiple layers consisting of a series of convolutional pooling layers connected to fully connected layers. Owing to this layered structure, errors from one layer propagate to the next, potentially leading to an exponential growth in errors. To mitigate this problem, we used a specialized CNN known as a residual CNN. The advantage of this network is that each layer receives not only processed signals from the previous layer but also original signals from the previous layer. This significantly enhances the model’s ability to fit the data.

By leveraging the power of CNNs, specifically residual CNNs, we aim to address the three main challenges in detecting VHE gamma-ray sources: background suppression, energy reconstruction, and direction reconstruction. Our experimental results, which are detailed in the subsequent sections, demonstrate the effectiveness of deep learning methods in these high-stakes, complex analytical tasks.

Throughout the research process, all the datasets used for training the neural networks were generated using the widely adopted CORSIKA Monte Carlo program [19]. In the simulations, the set altitude was 4300 m, which corresponded to an atmospheric depth of 606 g/cm². Geomagnetic coordinates were set in Yangbajing, Tibet. The simulated primary cosmic-ray particles included both gamma rays (serving as a signal) and protons (serving as a background), with energies ranging from 20 to 10 TeV. The incident zenith angle was set to 20° and the azimuthal angle ranged from 0° to 360°. All events were uniformly scattered within a circle centered on the HADAR array with a radius of 400 m. Following shower simulation, appropriate software packages were used to simulate the response of each HADAR detector, which was placed at each of the four vertices of a square with a side length of 100 m. The size of the field of view determines the detector camera size. When storing a cluster shot example, we chose a 400 × 400 storage range to maintain the integrity. Each photomultiplier tube had a diameter of 5.1 cm [20]. After storing the example, its use is analyzed to reconstruct the scene.

To implement the model incorporating the associated algorithms, we utilized a deep-learning framework based on PyTorch. Training and testing were performed on a machine equipped with two NVIDIA GeForce GTX 3090 GPUs and a high-performance computing cluster furnished with four NVIDIA V100 tensor-core GPUs, achieving nearly identical results on both platforms.

For diverse research tasks, specialized datasets were curated, a range of neural network architectures were adopted, and various loss functions were employed. The images from the HADAR detector can be used in these various combinations as input (the model accepts an image matrix as input, which varies slightly under different preprocessing efforts), and the neural network computations produce the relevant parameters of interest. Detailed descriptions of these components are thoroughly discussed in the following three sections, where we articulate the underlying rationale for these choices. Our codebase, called GPLearn,¹ is publicly available on GitHub. This package features modular functionalities, enabling users to tailor the module parameters to specific needs. With GPLearn, readers have the capability not only to reproduce the results presented in this paper using our supplied datasets but also to adapt the code for actual detector data analysis. We anticipate that these contributions will be broadly applicable to the field of astrophysics.

Background Suppression

The flux of cosmic rays typically exceeds that of high-energy gamma rays by several orders of magnitude, thereby posing a significant challenge for detection. Consequently, the Cherenkov radiation generated by these high-energy cosmic rays acts as the predominant source of background noise. Given that approximately 90% of cosmic rays are protons, this study primarily focuses on evaluating the influence of protons on photon detection. Proton signals with energies three times greater than those of photon signals are particularly problematic, as they are especially difficult to distinguish. To address this challenge, we conducted simulations involving particles with various energies to effectively segregate the target photon signals from the interfering proton background.

The prevailing methodology for discriminating between high-energy gamma-rays and background cosmic rays leverages the intrinsic differences between electromagnetic and hadronic showers. When photons impinge upon the Earth’s atmosphere, they predominantly induce electromagnetic showers, which, in contrast to the hadronic showers triggered by protons, yield more uniform detector images. Specifically, in the data recorded by our detectors, the images were represented as pixel arrays (Fig. 1). For a fitted ellipse, conventional methods compute the image moments of the pixels. If a shower is composed of photons or protons, it can be determined by comparing the major and minor axes of the fitted ellipses. Because photon-generated electromagnetic showers have a more concentrated energy distribution, their pixel layouts follow a linear trajectory. In contrast, hadronic showers caused by protons exhibited more core hits, resulting in significantly spread and uneven pixel patterns. The main goal of existing background suppression work is to accurately distinguish between the two categories of image structures. In particular, CNNs have produced exceptional results in image identification using deep learning approaches. Consequently, compared to conventional curve-fitting techniques, they offer more accurate discrimination capabilities.

Fig. 1

(Color online) Left: Imaging of γ-ray Cherenkov light in a telescope (3 TeV); Right: Imaging of proton Cherenkov radiation in a telescope (9 TeV). Each image is captured by a 400×400 PMT, and all have been cropped and enlarged to 64×64, as shown in Fig. 3b

For the background suppression task, we adopted the ResNet-18 residual CNN model, as depicted in Fig. 2, to meet our specific objectives. Unlike the canonical ResNet-18 model, our customized version employs smaller convolutional kernels with designated strides for initial data convolution. This choice was motivated by the fact that the region of interest in our detector images was generally confined to a narrow central zone. The use of smaller kernels was advantageous for capturing the critical shape and edge features within this focal region, as substantiated by the improved performance in subsequent evaluations.

Fig. 2

(Color online) The modified ResNet-18 model used in the background suppression work, which we have renamed ParticleNet. Through four residual blocks, ParticleNet increases the input data’s initial dimensions to 512. Subsequently, the model processes it after a global pooling layer

The output of our model is formulated as a two-dimensional vector that quantifies the probability of a particle being categorized as either a signal (a photon) or background noise (a proton). This vector undergoes automatic normalization to ensure that its components sum to one, thereby enhancing interpretability. The normalized probabilities sk are computed using the following equation: $s_{k} = \frac{e^{z_{k}}}{\sum_{j = 1}^{K} e^{z_{j}}} .$ (1) Here, zk denotes the raw output value corresponding to the k^th class, and K denotes the total number of classes. In our application, K=2 represents the signal and noise categories. This normalization ensures that the output probabilities are mutually exclusive and exhaustive, summing up to one, which aids in yielding more interpretable outcomes.

To evaluate the divergence between the model’s predicted and actual probability distributions, we employed cross-entropy as our loss function, which is expressed as follows: $C E (y, p) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{g} \ln (p_{p i}) + y_{p} \ln (p_{p i})]$ (2) where $y_{g} = 0$ and $y_{p} = 1$ indicate the labeling of protons and photons, respectively, and $p_{g i}$ and $p_{p i}$ reflect the probabilities of being predicted as a photon and proton, respectively, for the i^th instance model output. In deep learning, the most widely used function for binary classification tasks is the cross-entropy loss function. This helps the model train more quickly and is highly sensitive to changes in its predicted probabilities.

The physical context was considered during the data preparation. Considering that electromagnetic and hadron clusters primarily create differences in the degree of image standardization, we combined the actual background of the current work. Consequently, the absolute positions between different detectors become less important. Therefore, we retained only a small portion of the images (Fig. 3a) centered on the actual signal area (Fig. 3b) and stitched the remaining image together (Fig. 3c), so that the resulting 128×128 matrix was taken as an input to the model. Finally, the model processed the data accurately and precisely, providing a satisfactory answer to this physical problem based only on the intensity and relative positions.

Fig. 3

(Color online) Data preprocessing work involving the cropping and stitching of pixel photos. Every point in the picture is a PMT, and the signal value determined by the associated PMT is represented by the pixel value of each point. (a)Image captured by 400×400 PMTs. Photon image captured by a single water lens. Most of the PMTs (in purple) are not triggered, and only a few PMTs within the white frame captured signals. We cropped and enlarged this image in Fig. 3b; (b) Image captured by 64×64 PMTs. Cropped pixel image centered on the location, which corresponds to the part enclosed by the white frame in Fig. 3a. (c)Image captured by 128×128 PMTs. Each of the four detector images underwent cropping, enlargement, and stitching. The stitched image’s upper left corner contains a white box that contains the pictures depicted in Fig. 3a and 3b

To demonstrate the effectiveness and performance of the model, we introduced a threshold factor, denoted by ζ₀, and set its value to 0.5 as a criterion for interpreting the classification of incoming particles. In this context, ζ refers to the second component of the two-dimensional output vector of the model. In an ideal scenario, the photon signal output should be aligned at ζ_γ = 0, whereas the proton signal output should converge at ζ_proton = 1. However, in practice, any intersection between the distribution curves of ζ_γ and ζ_proton signifies the error rate of the model in the classification, as illustrated in Fig. 4.

Fig. 4

Distribution of ζ values for photons and protons in simulated signals. The distribution of photons is concentrated at approximately ζ = 0, while the distribution of protons is centered at approximately ζ = 1

We retained the events identified as photons and excluded those identified as protons. Simultaneously, we recorded the correct identification rates for both photon and proton events. The quality factor Q is defined as follows: $Q = \frac{ε_{s}^{γ}}{\sqrt{ε_{s}^{p}}} .$ (3) Here, $ε_{s}^{γ}$ represents the survival rate of photons after discrimination, and $ε_{s}^{p}$ represents the survival rate of protons after discrimination. The higher the correct identification rate of the model, the higher the quality factor, indicating better background suppression.

Figure 5 presents the accuracy of the model for particle identification across various energy thresholds. The yellow curve, which corresponds to protons, consistently demonstrates high identification rates across the entire energy spectrum. Conversely, the blue curve, representing photons, exhibits a more variable identification rate with a marked improvement at elevated energy levels. Previous research indicated that the threshold energy for the production of Cherenkov radiation is approximately 50 GeV. Near this energy threshold, conventional electromagnetic showers are prone to generating anomalous results. As the energy level increased, these showers exhibited an increasing regularity, leading to a consequent enhancement in the model’s identification accuracy.

Fig. 5

(Color online) The identification accuracy for photons/protons at different energy points in the test dataset. (The training process and results are from GPLearn, and variations may occur with different datasets.)

The results from the test set indicated that the identification rates for both photons and protons exhibited a marked increase as the particle energy increased. At lower energy levels, particles generate smaller atmospheric showers, resulting in a limited number of pixels being captured by ground-based detectors. This scarcity of data hampers the ability of the model to render accurate identification. Conversely, as the particle energy increases, the quality of the images captured by the detectors improves significantly. This augmentation in data quality, coupled with an adequate number of parameters, led to a significant increase in the identification accuracy of the model.

The yellow curve in Fig. 6 illustrates the performance gains achieved through the utilization of deep learning techniques, in stark contrast to the blue curve, which represents the efficacy of traditional methods. In the domain of low-energy particles, the quality of identification markedly increased with increasing energy levels, leading to a swift incline in the quality factor curve. Conversely, at higher energy thresholds, the model has ample raw parameters available for making accurate judgments, resulting in a plateauing of the result curve. Notably, the model’s identification rate has already approached a robust 98% at this stage.

Fig. 6

(Color online) Comparison of identification performance between deep learning and traditional methods for particle discrimination. (The deep learning results are provided by GPLearn; the traditional methods are calculated using the maximum ellipse fitting technique, which may vary slightly from current mainstream algorithms.)

Energy Reconstruction

The outcome may not be satisfactory if the model’s output is solely labeled as particle energy without any changes to the model’s inputs or structure. The reconstructed particle energy is related to the incidence zenith angle, distance from the cluster core, and quantity of charge deposited in the detector after accounting for the pertinent physical background. Traditional approaches employ simulated data to determine the energy distribution function at each location and actual observational data to fit the following function [21]: $E_{erc} = f (q, R, θ) .$ (4) Because the absolute positional information of the relative detector array is required to calculate the distance and direction information of the signal to the actual incident center, and because this information was removed during the dataset construction process for background suppression, data retaining only the relative positional information is insufficient for reconstructing the final energy.

As a result, we added the relative position of the pixel centroid and the absolute position of the detector array as new inputs to the model based on the data preprocessing employed in the background suppression. That is, we added a 4×4 matrix to the 128×128 matrix used for background suppression to represent the absolute and relative coordinates of the image centroid for each detector. The model was created as depicted in Fig. 7, considering that the energy resolution does not require particle classification work. Four detector photos were processed independently and combined after a series of fully connected layers, and the final result was represented by a one-dimensional vector. This model, which does not include convolutional layers, not only retains a high level of reconstruction accuracy but also significantly accelerates the calculations.

Fig. 7

The GoogLeNet model used in the energy reconstruction work. The model is further processed by combining the four detectors after independently processing the charge pictures and relative location data from four distinct detectors with fully linked layers

Expanding upon the data preprocessing techniques deployed in our background suppression task, we incorporated additional variables: the relative positions of the pixel centroids and the absolute coordinates of the detector array. This is in recognition of the fact that energy is fundamentally correlated with the amount of charge deposited. Accordingly, our model architecture, illustrated in Fig. 7, processes images from four individual detectors in parallel. These processed outputs are subsequently fused through a sequence of fully connected layers, culminating in a one-dimensional output vector.

In the test dataset, we employed both the model-predicted energy and the actual particle energy to compute the associated energy resolution, denoted as Δ E/E. As illustrated in Fig. 8, our findings reveal a trend of improving energy resolution with increasing particle energy.

Fig. 8

Energy reconstruction’s relative error distribution. The relative error rapidly decreases with the particle energy until it reaches a value of 10.0% at 10 TeV. By comparison, the relative error for the conventional technique is approximately 20% at 1 TeV

To provide an intuitive assessment of the efficacy of our model, we analyzed 2,500 samples from the test dataset, each representing a different energy level. We generated a distribution curve to illustrate the energy prediction performance of the model. The predicted energy values are plotted on the x-axis, and the y-axis represents the probability associated with each specific energy prediction. To facilitate easier interpretation, we normalized the data for each discrete energy level, ensuring that the integral of each curve across all energy points was equal to unity.

Figure 9 illustrates the distribution curves for the energy prediction at 500 GeV intervals. The areas of overlap between these curves signify regions where the model’s estimations may deviate from the actual conditions, constituting a principal constraint on the accuracy of the energy resolution. More importantly, our model achieved a relative error of only 13.0% at an energy level of 1 TeV. Remarkably, at higher energy levels, such as 10 TeV, the relative error narrowed further to 10.0%. This value approaches the theoretical limits of the energy resolution achievable with Cherenkov telescopes, underscoring the significance of our research.

Fig. 9

(Color online) Energy prediction results using deep learning methods. Different color curves barely overlap one another, showing that the model can almost entirely differentiate gamma rays at the appropriate energy levels

Direction Reconstruction

After the atmospheric Cherenkov radiation reaches the ground-level detectors, it is refracted through specialized water lenses before being captured on the image plane. Calculating the path of refracted light is complex. Our initial efforts were focused on estimating the projected position of the shower core on this image plane. By calculating the fitting ellipse of the four detector images and identifying the location where clustering occurs based on the weighted intersection of the long axes of each ellipse, conventional methods (such as the Hillas reconstruction [22]) can determine where clustering occurs and, ultimately, the direction of the incidence of γ-rays [23, 24].

Unlike the energy reconstruction task, both the intensity and absolute position of each pixel are intricately linked, serving as indirect indicators of the incident direction of the gamma rays. However, generating two-dimensional matrices with large input sizes is improbable, and it is implausible to create pixel matrices based solely on actual positions, according to the description of the detector array in Sect. 2. We continued to use the cropping method in response to this. Unlike background suppression, each cropped pixel point is considered a three-dimensional dataset, which is actually a 4×64×64×3 matrix. The first dimension of this matrix indicates the detector’s serial number; the second and third dimensions indicate the relative position of the corresponding signal on the detector (with respect to a single detector); and the fourth dimension indicates the signal intensity detected by the photomultiplier tube at this position, along with the point’s absolute position (with respect to the ground). Our tests demonstrate that the performance of the model is significantly enhanced when the absolute positioning information of the pixels is retained.

To accommodate the specialized input dataset, we employed a CNN model featuring three-dimensional convolutional layers, as shown in Fig. 10. The raw data underwent a sequence of six upsampling convolutional layers, each followed by a pooling layer. The processed data were then fed into a linear regression layer, culminating in a two-dimensional vector output. After normalization, this vector served as an azimuthal direction vector in Earth’s plane. Subsequently, the model computed the corresponding incident angle based on this vector.

Fig. 10

Model used for direction reconstruction: AngleNet Model. Through six convolutions, AngleNet increases the input’s original dimension to 512 and then processes it through a fully connected layer after a global pooling layer. AngleNet makes use of residual blocks that are appropriate for three dimensions, just like ParticleNet does. This helps to some extent with the under-fitting issue with the model and considerably improves its ability to forecast the future

The error angle distance Ω was calculated from the azimuthal angle using a spherical coordinate transformation. $\cos Ω = \sin θ \sin ϕ \cdot \sin θ \sin ϕ^{'} + \sin θ \cos ϕ \cdot \sin θ \cos ϕ^{'} + \cos^{2} θ$ (5) Here, ϕ represents the true incident azimuthal angle, and ϕ’ represents the azimuthal angle reconstructed by the model.

All our simulations were based on an incident zenith angle of θ=20°. The results plotted in Fig. 11 reveal that the deep learning method we employed yields an approximately 60% improvement in accuracy over traditional methods, as cited in [25]. Notably, the reconstruction error decreases substantially with increasing energy levels. This enhanced accuracy is attributable to the higher number of photons collected by the telescope at higher energy levels, which allows for a more precise reconstruction of the event direction. In the energy range exceeding 1 TeV, our model reduced the angular error to less than 0.2°, providing a robust foundation for subsequent research.

Fig. 11

(Color online) Angular distance difference between the reconstructed and true incident directions at different energies, in degrees. Deep learning techniques have more potential for obtaining angle reconstruction results than conventional elliptical fitting techniques since they rebuild this physical process by looking at relationships among nearly all of the original data. (Deep learning results are provided by GPLearn; traditional method data is sourced from the literature [25])

Simulated measurements of the Crab Nebula

6.1

Sensitivity measurements

The Crab Nebula serves as a prototypical supernova remnant, emitting a wide range of wavelengths, including the radio, X-ray, and gamma-ray bands. The study of this celestial object is pivotal for gaining insight into the physical mechanisms underlying supernova explosions, the generation and acceleration of cosmic rays, and for addressing important scientific queries, such as the nature of dark matter. Among the numerous physical attributes associated with the Crab Nebula that require precise measurement, sensitivity is a crucial parameter. The level of sensitivity directly impacts the quality–both in terms of accuracy and precision–of our estimates concerning various physical phenomena related to the Crab Nebula.

In the preceding sections, we delved into the deployment of deep learning methodologies for sophisticated analysis of detector data, achieving a marked improvement in accuracy. In this section, our focus shifts to the specific application of deep learning to enhance the sensitivity measurements related to the Crab Nebula. We illustrate how these advanced computational techniques outperform traditional methods and offer a more precise and reliable assessment of this crucial parameter. Additionally, we juxtapose our findings with those obtained from other Cherenkov imaging telescope experiments to contextualize the efficacy and innovation of our approach.

To calculate the sensitivity, we conducted a simulation that accounted for both gamma rays and scattered protons across an energy spectrum ranging from 100 GeV to 10 TeV. The simulated sampling area was set to $S_{sample} = 800 m \times 800 m$ , and a fixed zenith angle of 20° was used. The azimuthal angle in the simulation was varied between 0° and 180°. Additionally, we integrated the geographical factors specific to the Yangbajing site and the celestial trajectory of the Crab Nebula. Based on these considerations, we estimated the annual observation time dedicated to the Crab Nebula to be approximately 320 h, which translates to $T_{crab,obs} = 1.152 \times 10^{6} s$ .

For gamma rays, we can calculate the effective number of events coming from the Crab Nebula received by the HADAR experiment in one year at energy i by using the following equation: $N_{γ}^{1yr} [i] = T_{crab,obs} \cdot S_{sample} \cdot F_{crab} \cdot \frac{N_{γ, sim} [i]}{N_{γ, sim,all} [i]} \cdot ε_{γ}$ (6) Here, $F_{crab} = \int_{i}^{\infty} 2.83 \cdot 10^{- 11} ph {cm}^{- 2} s^{- 1} {TeV}^{- 1} \cdot {(E / TeV)}^{- 2.62} d E$ represents the integrated flux of the Crab Nebula for energies greater than i. $N_{γ, sim} [i]$ is the number of effective simulated photon events in energy range i, $N_{γ, sim,all} [i]$ is the total number of simulated photon events, and $ε_{γ}$ represents the ratio of events located within an angular resolution set at 68%.

For scattered protons, the effective number of observed events for one year can be calculated as $N_{CR}^{1yr} [i] = T_{crab,obs} \cdot S_{sample} \cdot F_{CR} \cdot \frac{N_{CR,sim} [i]}{N_{CR,sim,all} [i]} \cdot Ω_{i} .$ (7) Here, F_CR represents the integrated flux of protons in the cosmic rays. $N_{CR,sim} [i]$ is the number of effective simulated proton events in the energy range i, $N_{CR,sim,all} [i]$ is the total number of simulated proton events, and Ωi represents the solid angle within the photon angular resolution range.

After the implementation of background suppression techniques and the exclusion of proton-like events, we can derive the integrated significance across the designated energy spectrum using Eq. [26]: $S [i] = \frac{N_{γ}^{1yr} [i]}{\sqrt{N_{CR}^{1yr} [i]}} \cdot Q$ (8) Utilizing the framework of Gaussian statistics, sensitivity is defined as the minimal flux from the Crab Nebula required for the detector to register a signal at a 5-sigma significance level. Consequently, the specific sensitivity of the Crab Nebula for the HADAR experiment was calculated using the following formula: $F_{sensitivity} [i] = F_{crab} \cdot \frac{5}{S [i]} \cdot i$ (9) To offer a comprehensive perspective, we graphically delineated the calculated sensitivity metrics both prior to and following the incorporation of the deep learning algorithms. These were juxtaposed with the corresponding data from other pertinent experiments [27-29], as depicted in Fig. 12.

Fig. 12

(Color online) Comparison of sensitivity between the HADAR experiment and other experiments. The HADAR experiment’s sensitivity curve for the Crab Nebula with a 5-fold significance over a year (320 h of useful observation time) is shown in the figure. The Crab Nebula sensitivity curves from experiments like the Fermi Satellite (one year of effective observation time); MAGIC and H.E.S.S. (50 h of effective observation time); and ARGO-YBJ and HAWC (one year of effective observation time) are also shown in the figure for comparison

Following the integration of advanced deep learning algorithms, the HADAR experiment exhibited a substantial enhancement in sensitivity within the low-energy domain. Although it has yet to attain the level of sensitivity exhibited by CTA, the revamped HADAR experiment now stands as a formidable contender to established initiatives such as MAGIC and H.E.S.S. Importantly, HADAR boasts a unique attribute: an expansive field of view not typically afforded by conventional reflective Cherenkov telescopes. This distinct advantage facilitates real-time capture of transient celestial sources within its observational purview, rendering the outcomes particularly compelling.

6.2

Observation of the Pulsar in the Crab Nebula

The primary objective of this section is to perform a rigorous analysis of the observational data pertaining to the Crab Nebula pulsar in the VHE regime, as acquired through the MAGIC telescope [30]. The significance of the observations from the HADAR experiment was calculated to lay the groundwork for future inquiries into pulsar emissions at high-energy wavelengths.

We determined the spectral properties of the primary pulsar (P1) and interpulsar (P2) of the Crab Nebula using data obtained from the MAGIC telescope [31], as shown graphically in Fig. 13. The formula for the integral flux of each energy band is as follows: $F_{P1} = 1.1 \times 10^{- 11} \cdot {(\frac{E / TeV}{0.15})}^{- 3.2} {cm}^{- 2} s^{- 1}$ (10) $F_{P2} = 2 \times 10^{- 11} \cdot {(\frac{E / TeV}{0.15})}^{- 2.9} {cm}^{- 2} s^{- 1}$ (11) Following the implementation of the background filtering procedures, only events characterized by gamma-like profiles were retained for subsequent analyses of the pulse signals of the primary pulsar of the Crab Nebula (P1) and interpulsar (P2). To quantify the observational significance S of the HADAR experiment, we employed the following equation: $S = \frac{N_{on} - α N_{off}}{\sqrt{α (N_{on} + N_{off})}} .$ (12) Here, N_on signifies the photon count emanating from the pulsar, and N_off represents the background photon count. Remarkably, at the 100 GeV energy threshold, the dominant background source in the observations of the Crab Nebula pulsar transitions from protons to gamma rays emanating from the Crab Nebula itself. Consequently, N_off encapsulates both proton and nonpulsar gamma-ray backgrounds. The observation time ratio α is defined as $α = t_{on} / t_{off}$ , where t_on, t_on, and t_off denote the durations of the HADAR detector observations for the source and background, respectively.

Fig. 13

The energy spectra for the Crab Nebula’s main pulsar P1 (represented by black circles) and the interpulsar P2 (represented by blue circles) in the 70 GeV–1.5 TeV energy range have been measured by MAGIC. The energy spectrum of the Crab Nebula itself (depicted by hollow squares) is also shown for comparison. (Data sourced from [31])

Tables 1 and 2 list the observational data sets acquired from the MAGIC and HADAR experiments, respectively. These tables encapsulate the derived significances corresponding to distinct energy bands for both the main pulsar (P1) and interpulsar (P2) of the Crab Nebula, as corroborated by references [32] and [33].

Five years of observational (320 h of effective observation time) data from the MAGIC experiment shows corresponding significances for different energy ranges for P1 and P2. (Data sourced from [31])

Energy range (GeV)	P1		P2
	N_ex	Significance	N_ex	Significance
100-400	1252±442	2.8σ	2537±454	5.6σ
>400	188±88	2.2σ	544±92	6.0σ
>680	130±66	2.0σ	293±69	4.3σ
>950	119±54	2.2σ	190±56	3.5σ

Deep learning algorithms are used to determine the comparable significance in the same energy range estimated from the observation results of P1 and P2 pulsars in the HADAR experiment over a year (320 h of effective observation time)

Energy range (GeV)	P1		P2
	N_ex	Significance	N_ex	Significance
100-400	3698	4.65σ	6943	7.31σ
>400	500	0.708σ	1432	2.046σ
>680	175	0.390σ	584	1.306σ
>950	87	0.261σ	321	0.963σ

Capitalizing on the robust data processing capabilities afforded by deep learning methodologies, the HADAR experiment demonstrated a notably superior level of significance within the low-energy domain compared with the MAGIC experiment. Nevertheless, as the energy spectrum increases, certain constraints, including equipment specifications and geographical considerations, result in a discernible performance disparity compared with MAGIC. Moving forward, we aim to augment the sensitivity of our apparatus through a multi-faceted approach, encompassing the expansion of the detector’s effective observational area as well as the refinement of computational models to enhance the angular resolution.

Outlook

With the emergence of large-scale AI language models, AI has been progressively incorporated into many industrial applications. In physics, data analytics are a burgeoning arena for the expansion of AI technologies. By leveraging its computational advantages, AI has the potential to uncover latent relationships within complex datasets, accelerate data-fitting procedures, and simulate realistic models of experimental measurements.

In high-energy physics, the synergy between AI and big data allows for the precise reconstruction of energy, momentum, and mass metrics for particles, which is a critical step towards elucidating the underlying properties and behaviors of subatomic particles. Additionally, exact spatial reconstructions are indispensable for tracking the decay products of heavy particles and capturing rare events that could signal novel particles or interaction mechanisms. In the field of astronomy, AI technologies facilitate the real-time processing of astronomical signals, enabling more efficient studies of distant cosmic events and contributing to our understanding of the origin and composition of the universe.

In this study, we effectively addressed complicated physical issues by integrating real-world physical contexts using appropriate data preparation and models. Additionally, the excellent results from deep learning also serve to confirm the accuracy of the corresponding physical theories. The training process, which was built in accordance with real-world physical theories, both theoretically supported and verified the deep learning results.

During the actual research process, we demonstrated the following:

1. The relative position and shape information of Cherenkov radiation can be used to discriminate particle types; at the same time, information contained in data that is not sensitive to absolute position will not be lost after slicing and cropping.

2. The energy information from the Cherenkov radiation reaction is not sensitive to the relative position information of pixels; moreover, energy can be approximately represented by a linear function of charge amount, incidence angle, and relative center distance.

3. The directional information of Cherenkov radiation is sensitive to the absolute position of pixels; therefore, retaining the absolute position information of pixels is crucial for correctly inferring the initial direction of particles when processing image data.

Deep learning techniques have simplified and improved labor-intensive physical reconstruction procedures. However, because physical data features are typically sensitive to a limited scale range, adopting small-scale convolutional kernels aids neural networks in acutely recognizing features. A network can gather global feature information more efficiently by expanding its dimensions. The cropping of data was made possible for geographically distinct trials by adding the necessary positional dimensions, which also suggests a new approach for data analysis in large detector arrays.

References

G. G. Fazio,

High-energy gamma-ray astronomy

. Nature 225, 905-911 (1970). https://doi.org/10.1038/225905a0