A novel method for EPID transmission dose generation using Monte Carlo simulation and deep learning

ACCELERATOR, RAY AND APPLICATIONS

A novel method for EPID transmission dose generation using Monte Carlo simulation and deep learning

Tao Qiu，

Ning Gao，

Yan-Kui Chang，

Xi Pei，

Huan-Li Luo，

Fu Jin

Nuclear Science and Techniques

Vol.37, No.4

Article number 65

Published in print Apr 2026

Available online 29 Jan 2026

DOI：10.1007/s41365-026-01898-2

CSTR：32136.14.NST.2026.0465

2000

This study aimed to integrate Monte Carlo(MC) simulation with deep learning(DL)-based denoising techniques to achieve fast and accurate prediction of high-quality electronic portal imaging device (EPID) transmission dose (TD) for patient-specific quality assurance (PSQA). A total of 100 lung cases were used to obtain the noisy EPID TD by the ARCHER MC code under four kinds of particle numbers (1×10⁶, 1×10⁷, 1×10⁸ and 1×10⁹), and the original EPID TD was denoised by the SUNet neural network. The denoised EPID TD was assessed both qualitatively and quantitatively using the structural similarity (SSIM), peak signal-to-noise ratio (PSNR), and gamma passing rate (GPR) with respect to 1×10⁹ as a reference. The computation times for both the MC simulation and DL-based denoising were recorded. As the number of particles increased, both the quality of the noisy EPID TD and computation time increased significantly (1×10⁶: 1.12 s, 1×10⁷: 1.72 s, 1×10⁸: 8.62 s, and 1×10⁹: 73.89 s). In contrast, the DL-based denoising time remained at 0.13–0.16 s. The denoised EPID TD shows a smoother visual appearance and profile curves, but differences between 1×10⁶ and 1×10⁹ still remain. SSIM improves from 0.61 to 0.95 for 1×10⁶, 0.70 to 0.96 for 1×10⁷, and 0.90 to 0.97 for 1×10⁸. PSNR increases by > 20% for 1×10⁶ and 1×10⁷, and > 10% for 1×10⁸. GPR improves from 48.47% to 89.10% for 1×10⁶, 61.04% to 94.35% for 1×10⁷, and 91.88% to 99.55% for 1×10⁸. The method that combines MC simulation with DL-based denoising for EPID TD generation can accelerate TD prediction and maintain high accuracy, offering a promising solution for efficient PSQA.

PSQAEPIDMonte CarloDeep learning

Introduction

With the growing adoption of advanced techniques such as intensity-modulated radiation therapy (IMRT), volumetric-modulated arc therapy (VMAT), and adaptive radiation therapy (ART), patient-specific quality assurance (PSQA) is routinely implemented to ensure the accuracy of both treatment planning and dose delivery [1-4]. Numerous studies have explored the role of PSQA and have reported promising developments [5-8]. However, most existing studies have primarily focused on pretreatment PSQA and are limited in their ability to capture inter- and intra-fractional variations. In recent years, the emphasis has shifted toward real-time in vivo dose verification. Among the various available tools, the electronic portal imaging device (EPID) has emerged as a practical and efficient solution over traditional tools, such as films and ion chambers, owing to its superior operational efficiency and ease of use, as highlighted in the AAPM TG-58 report, which recognizes the ability of EPIDs to provide quantitative data and real-time feedback [9-13]. However, the EPID-based transmission dose (TD) is often affected by statistical noise when generated by Monte Carlo (MC) simulations under limited particle settings to reduce the computation time. Noise can significantly degrade the image quality, making it difficult to extract reliable dose information for PSQA. With the development of deep learning (DL) techniques in radiotherapy, these methods provide a promising solution for denoising MC–based dose calculations. Therefore, combining MC simulation with DL-based denoising methods presents an effective approach for enabling accurate dose verification and supporting efficient online ART workflows.

The forward-projection approach is a representative EPID-based PSQA approach that compares measured 2D images or TD with those predicted at the EPID level [14]. It is widely used in clinical practice because it effectively identifies errors related to data transfer, treatment delivery, and anatomical variations, thereby significantly enhancing radiotherapy accuracy. The forward-projection approach relies on both measured and predicted EPID images and the TD. The measured data, representing the actual beam delivery, were acquired as 2D images or dose distributions in calibration units (CU) and could be converted into Gy through appropriate calibration procedures. Predicted data can be generated using dose calculation methods. Dose calculation methods encompass both analytical algorithms and MC simulations. Analytical algorithms have limited capability in accurately modeling tissue heterogeneities, which may lead to dose calculation inaccuracies in complex anatomical areas. MC, widely recognized as the “gold standard” for dose calculation, serves to simulate particle transport processes to closely approximate actual physical interactions, which can provide high accuracy for EPID TD prediction. Currently, two types of MC methods are used to predict the EPID TD. The first is the kernel convolution method, which simulates the dose kernel on the EPID plane and convolves it with the fluence map to obtain dose deposition [15, 16]. The second is the full MC method, which simulates the entire process of particle transport through the accelerator and dose deposition in the EPID [17]. Both methods rely on precise modeling of the accelerator and EPID; however, the EPID model is typically proprietary, and its construction requires specialized expertise. Therefore, finding an appropriate MC simulation tool is crucial. The details of the MC simulation tool used in this study are presented in the next section.

Although MC simulations can yield accurate results, achieving clinically acceptable statistical precision requires simulating a large number of particles. This process consumes substantial time and computational resources, making it challenging to apply in clinical practice, particularly in the context of online ART, which relies on specialized software and hardware platforms with streamlined and automated workflows. It utilizes updated anatomical images and contours to reconstruct the delivered dose, based on the original treatment plan. After evaluating the reconstructed dose on the new anatomy, a decision is made regarding whether adaptation is required. If so, a new treatment plan is generated based on this new anatomy. The entire online ART process includes image generation (obtaining new anatomy), contour generation (OAR and targets), dose reconstruction, and new treatment planning on new anatomy. Before the adapted plan is delivered, comprehensive QA must be performed. The total online ART workflow will take several to tens of minutes, and the time allocated for QA should be minimized. Excessive delays during this phase can lead to patient discomfort or movement, potentially causing intra-fractional anatomical variations (e.g., bladder filling or respiratory motion), which may compromise the dosimetric benefits of ART and reduce clinical efficiency [18-20]. For QA in online ART, the EPID serves as a critical clinical tool owing to its ability to perform rapid, noninvasive verification of delivered dose distributions, which enables efficient and time-sensitive PSQA. Therefore, the rapid and accurate generation of EPID-based TD is essential to support the implementation of online ART.

Simulating fewer particles can shorten the computation time, but statistical uncertainty may be introduced, causing noise in the EPID TD. If the noise can be effectively mitigated while preserving the advantage of reduced computation time under a low particle number, the MC-based forward-projection approach could become clinically viable. Traditional denoising methods are typically algorithm-driven and are often referred to as prior- or model-based approaches, such as self-similarity, sparse coding, and total variations. Although these methods are effective in addressing ill-posed problems, they often involve substantial computational costs and may struggle in complex or highly noisy scenarios [21-23]. With the advancement of DL, its applications in radiotherapy have expanded to include automatic delineation, automated planning, and image registration. Zhou et al. developed and tested a 3D DL model for predicting 3D voxel-wise dose distributions for IMRT [24]. Xing et al. attempted to resolve the dilemma that fast algorithms were generally less accurate, whereas accurate dose engines were often time-consuming, by exploring DL for dose calculation [25]. Zhang et al. developed a slice classification model-facilitated 3D encoder–decoder network for segmenting organs at risk in head and neck cancer [26]. Zhen et al. proposed a deep convolutional neural network with transfer learning for rectal toxicity prediction in cervical cancer radiotherapy [27]. However, few studies have been conducted on denoising EPID TD using DL. To address the noise under low particle numbers in MC simulations, DL can be employed to denoise by learning the mapping between low-particle-number EPID TD and high-particle-number EPID TD. Its goal is to establish a function F that corrects a noisy low-quality TD₁ to a high-quality TD_h [28].

TD_h=F(FD₁)

U-Net is one of the most widely used DL architectures. To meet various demands, numerous U-Net variants have been developed, including Dense-UNet, U-Net++, UNet3+, and 3D-UNet [29-31]. Swin-Unet was proposed as a Transformer-based image processing network that replaces traditional convolutional layers in the classic U-Net with Swin Transformer models. Recently, modifications based on Swin-Unet have been implemented to tailor it for denoising tasks, resulting in the development of the SUNet, which incorporates several denoising-specific enhancements, including a dual upsampling module designed to mitigate checkerboard artifacts and strengthen spatial detail reconstruction [32-34]. These improvements enabled SUNet to effectively suppress noise in the EPID TD while preserving structural fidelity. Although SUNet was originally developed for natural image denoising, its direct application to EPID TD denoising demonstrates meaningful potential for extension into the medical imaging domain. More details and implementation strategies of the SUNet are elaborated in the next section to provide a comprehensive understanding of its design and application.

This study aims to apply SUNet to denoise the EPID TD generated by the MC simulation with a low particle number, enhancing the low-quality TD to a high-quality one while reducing the computation time without compromising the accuracy. First, the EPID TD was generated for all fields at different particle numbers: 1×10⁶, 1×10⁷, 1×10⁸, and 1×10⁹. Existing research indicates that while particle numbers exceeding approximately 1×10⁹ continue to enhance the simulation accuracy, the improvement becomes clinically insignificant when considering the substantial increase in computational requirements [28]. Therefore, 1×10⁹ was selected as a reasonable upper limit for the MC simulation and was used as the ground truth. Then, three separate SUNet models were trained for 1×10⁶, 1×10⁷, and 1×10⁸. After training, the denoising results were assessed qualitatively and quantitatively to identify the optimal trade-off between computational efficiency and dosimetric accuracy across different particle numbers, thus providing a suitable particle number for future research and evidence-based guidance for practical clinical applications.

Material and Methods

We collected data from 100 patients with lung cancer who underwent IMRT with a five-field beam arrangement, yielding 500 fields in the dataset. Treatment plans were completed using the Eclipse 15.6 treatment-planning system (Varian Medical Systems, Palo Alto, CA, USA). The corresponding RT files, including RT-Plan, RT-Structure, and RT-Dose, along with CT images, were exported as input data for MC simulation. The employed accelerator model was the Varian TrueBeam operating in a 6 MV flattened photon beam mode.

2.1

Monte Carlo Simulation for EPID TD

ARCHER, a GPU-accelerated fast MC code, was employed to generate the EPID TD for training and testing the neural network model. ARCHER has been validated in various radiotherapy dose calculation studies [35-38]. Recently, it has been expanded to EPID dosimetry, providing an efficient and accurate platform for simulating radiation transport through the treatment head, patient phantom, and EPID model [39]. The framework of ARCHER for TrueBeam accelerator is illustrated in Fig. 1.

Fig. 1

(Color online) The framework of ARCHER for TrueBeam accelerator. It includes particle transport in the treatment head, patient, and EPID. EPID was developed according to realistic material composition

ARCHER developed a treatment head model of the Varian TrueBeam linear accelerator to facilitate radiation transport simulations. The model incorporates a High-Definition Multi-Leaf Collimator (HDMLC) with 60 leaf pairs, consisting of 32 central pairs with a width of 0.25 mm and 14 pairs on each side with a width of 0.5 mm, providing a total treatment field width of 22 cm. The treatment head describes the process by which electrons are emitted from the accelerating waveguide and accelerated to a high energy. These electrons then strike a metal target, generating secondary photons and electrons. Photons are used for irradiation, whereas secondary electrons contribute to the buildup dose near the surface. The generated photons are collimated and shaped to form the desired treatment beam, which is delivered to the patient for radiotherapy purposes. Because the upper accelerator components are independent of the patient and solely related to the accelerator’s mechanical structure, the same dataset can be used for all simulations. To save computation time, the electron-target interaction process was replaced by a phase space file for the radiation source, which was generated using the MC code BEAMnrc. Various modeling components, including SLAB, CONS3R, FLATFILT, CHAMBER, and MIRROR, were used to define the target, primary collimator, ion chamber, flattening filter, and mirror, respectively [35, 36, 39]. Subsequently, ARCHER simulates radiation transport through the treatment head and secondary collimation system, including the jaws and MLC, using an explicit approximation transport. The transported particles that pass through the MLC are then used for subsequent radiation transport simulations in the patient phantom and EPID models.

Three phase space planes are defined in ARCHER: (1) Phase space 1 is located downstream of the treatment head but before the secondary collimators. (2) Phase space 2 is positioned after the secondary collimators but before reaching the patient’s body. (3) Phase space 3 is located after the patient. The spatial locations of these phase spaces are illustrated in Fig. 1. In this study, the recorded particle number corresponds to those in phase space 2. The dose calculation on the EPID is performed in two main steps. The first step simulates particles transport within the patient, following the same procedure as conventional dose calculations. Through inverse translation and rotation, the transmitted particles are mapped back to the gantry angle of 0°, where they are collected in phase space 3. Finally, the dose deposited by the exiting particles on the EPID was calculated to obtain the TD for each field with a resolution of 160 × 160. This resolution was chosen to improve the computational efficiency while maintaining sufficient accuracy in the calculated dose distributions, as supported by a previous study [39].

2.2

Denoising using SUNet for EPID TD

We used the SUNet [33] to denoise the EPID TD generated by a low particle number using ARCHER. The network architecture is shown in Fig. 2, it can be roughly divided into three modules: 1) Shallow feature extraction module. 2) Unet feature extraction module. 3) Reconstruction module. Among these three modules, the Unet feature extraction module is the most critical, and its architecture is similar to that of Swin-Unet, with a specific model to be detailed later. A noisy EPID TD with a resolution of was input into the network, where a single 3×3 convolutional layer extracted shallow features such as edges, textures, and basic dose distribution information. Although these shallow features are simple, they are vital for subsequent processing because they provide the network with contextual information. These shallow features are then passed to the Unet feature extraction module, which aims to hierarchically extract multi-scale features from the image. First, the original input was divided into non-overlapping patches of resolution 4×4, with each patch treated as an independent unit to reduce the computational load and better capture local information, resulting in a feature resolution of H/4×W/4. Subsequently, a patch merging layer merges adjacent 4 patches into a larger patch, reducing the feature resolution by a factor of 2. During the merging process, the features within each patch are concatenated, increasing the new patch’s feature dimension to four times the original. However, to unify the dimensions, the patch merging layer employs a linear layer to transform the concatenated features by adjusting the merged feature dimension to twice the original. This process is repeated multiple times in subsequent network modules, gradually reducing the resolution of the feature maps while increasing their dimensionality, thereby extracting deeper and more multidimensional information.

Fig. 2

(Color online) The architecture of SUNet. It includes the following: 1) shallow feature extraction module. 2) Unet feature extraction module. 3) Reconstruction module

The Swin Transformer Block (STB) is a key component of the SUNet model, as shown in Fig. 3, replacing the convolutional layers in the original UNet model. The network uses five STB layers, each containing eight Swin Transformer Layers (STL). Each STL consists of Layer Normalization (LN), Multi-Head Self Attention (MSA), Residual Connection, and a Multi-Layer Perceptron (MLP) with Gaussian Error Linear Unit (GELU) activation. The STB utilizes two attention mechanisms: Window-based Multi-Head Self Attention (W-MSA) and Shifted Window-based Multi-Head Self Attention (SW-MSA). W-MSA reduces computational complexity by performing self-attention within non-overlapping windows, whereas SW-MSA shifts the windows to capture long-range dependencies, enhancing the model’s ability to model global information. The entire process is illustrated as follows:

Fig. 3

The architecture of Swin Transformer Block. It includes:Layer Normalization (LN), Multi-Head Self Attention (MSA), Residual Connection, and a Multi-Layer Perceptron (MLP) with Gaussian Error Linear Unit (GELU) activation

where f^L and represent the outputs of the W-MSA module and the MLP module in the L_th block, respectively.

To restore the feature map to its original resolution, SUNet introduces a dual upsample method based on two existing upsample methods: Bilinear and PixelShuffle. Compared with the original method, it effectively mitigates checkerboard artifacts. Skip connections fuse high-resolution features from different scales in the encoder with progressively upsampled features in the decoder, helping the decoder recover fine details and minimizing spatial information loss caused by downsampling.

The resolution of the EPID TD generated by ARCHER was 160×160, which was resized to 512×512 using opencv library in Python. In this study, 100 cases with 500 fields were collected, 90 cases with 450 fields were used as the training set, and the remaining 10 cases with 50 fields were used as the testing set. Three separate models were trained for the particle number of 1×10⁶, 1×10⁷, and 1×10⁸. For each model, the resampled EPID TD was used as the input, and the EPID TD of 1×10⁹ was used as the output. During training, all models were trained for 200 epochs using the L1 loss function and Adam optimizer, with an initial learning rate of 0.0002. A cosine annealing scheduler with a 5-epoch warm-up phase was employed to adjust the learning rate during training. The batch size was set to 4, and both the training and validation patch sizes were fixed at 512×512. The training loss consistently decreased and stabilized after approximately 150 epochs, indicating a good convergence behavior. The training was carried out on a GTX 3090 GPU (24GB memory), conducted in the Pytorch 1.7.0.

To improve model generalization and emphasize clinically relevant features, several preprocessing and data-augmentation strategies were applied. First, we extracted the central 50% region from both the input and target EPID TD images along each spatial dimension. This cropping focused the network on the clinically relevant high-dose area and helped suppress the low-dose background noise. The cropped images were then resized back to the original resolution of 512×512 using bilinear interpolation to maintain consistent input dimensions across all samples. To further expand the dataset and simulate variations in the patient setup and beam geometry, we applied geometric augmentations: 1) Horizontal flipping (left–right) using NumPy’s np.fliplr(). 2) Vertical flipping (top–bottom) using np.flipud(). These operations were independently applied to both the input and target TD images to maintain spatial alignment and ensure consistency during supervised learning.

2.3

Quantitative Evaluation Metrics

Structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) are often used to evaluate image performance. The SSIM shows the structural similarity between two TD images on a scale of 0 to 1, with higher values indicating greater similarity. It is defined as:

where and are the mean intensities, and are the variances, is the covariance, and C₁ and C₂ are small constants to avoid division by zero.

The PSNR quantifies the fidelity of TD images based on the MSE. Higher PSNR values indicate better image quality. Generally, a PSNR above 30 dB suggests high fidelity, whereas values below 20 dB indicate significant distortion.

where MAX is the maximum pixel value. MSE is computed as:

where I(i, j) and K(i, j) are the pixel intensities of the original and denoised images, respectively. .

The gamma passing rate (GPR) quantitatively evaluates dosimetric and spatial agreements between two dose distributions, making it particularly suitable for detecting discrepancies. The GPR evaluates the agreement between the predicted and reference TD based on the gamma index :

where and are the dose and spatial differences, respectively, and and are the dose and distance criteria, respectively. The GPR represents the percentage of points satisfying which was evaluated using the 3%/2 mm criterion, with a 10% low-dose threshold based on AAPM TG-307. For 2D relative mode transit gamma analysis, a passing rate greater than 95% with 3%/2 mm criteria is typically recommended, and for 2D absolute dose mode, the tolerance may be looser owing to the uncertainties of the dose conversion from the EPID image pixel data [14]. The SSIM and PSNR were computed using the skimage and sklearn libraries in Python, and the GPR was calculated using PTW software.

Results

To evaluate the impact of different particle numbers on the EPID TD. Figure 4(a) displays five randomly selected MC-simulated TD images. Each column represents the results generated by the same particle number, whereas each row shows the results for varying particle numbers. From left to right: 1×10⁶, 1×10⁷, 1×10⁸, and 1×10⁹. The TD images were plotted using Python’s Matplotlib library with the “jet” colormap, where blue indicates low-dose regions and yellow to red indicates high-dose areas. It can be observed that TD images corresponding to 1×10⁶ exhibit severe graininess, with highly blurred structural details, making it difficult to discern dose boundaries, which is noise caused by insufficient particle number. As the number of particles increased, the noise progressively diminished, revealing clearer internal structures. Notably, the TD images at 1×10⁸ are already comparable to those at 1×10⁹. These results highlight a direct correlation between the particle number and the quality of the TD images, where a higher particle number leads to an improved signal-to-noise ratio (SNR), producing clearer and smoother TD images. Figure 4(b) displays the denoised TD images corresponding to the same five fields shown in Fig. 4(a). The TD images at 1×10⁹, regarded as the ground truth, remain unchanged as the reference. The layout of Fig. 4(b) is consistent with Fig. 4(a). Compared to the original TD images, those with 1×10⁶, 1×10⁷, and 1×10⁸ exhibit significant improvements, with noticeable noise reduction and a smoother visual appearance. However, at the lowest particle number, 1×10⁶, some degree of distortion was observed. For instance, in comparison to 1×10⁹, high-dose regions appear overly smoothed, suggesting that under extremely low SNR conditions, DL-based denoising may compromise certain details of the TD images. Differences are also present at 1×10⁷, but they are relatively smaller and less significant.

Fig. 4

(Color online) Comparison of EPID TD with different particle number and DL-based denoising. (a) original, (b) denoised

Figure 5 presents the central horizontal profile curves of the EPID TD, illustrating the effect of DL-based denoising. The selected cases correspond to those shown in Fig. 4, with the first row showing the original profile curves and the second row showing the profile curves after denoising. The blue, orange, green, and red solid lines represent the particle number of 1×10⁶, 1×10⁷, 1×10⁸, and 1×10⁹, respectively. The profile curve of 1×10⁹ was not denoised in either row and is presented as a reference for comparison. Under a low particle number, the profile curves exhibited large fluctuations and a noticeable jagged pattern. After DL-based denoising, the profile curves became smoother, and the noise fluctuation amplitude decreased significantly. However, differences remained between low and high particle numbers. The profile agreement between 1×10⁶ and 1×10⁹ was relatively limited, with noticeable deviations across the field. In contrast, 1×10⁷ showed some discrepancies, which were primarily localized to high-gradient regions, such as dose peaks and valleys. This aligns with the slight distortion observed in the denoised low-particle-number TD images, as shown in Fig. 4. To quantitatively evaluate the denoising performance, the SSIM and PSNR were calculated. Figure 6 illustrates the improvement in the EPID TD across different particle numbers after applying the DL-based denoising model. Table 1 and 2 present the SSIM, PSNR, and relative improvement ratio of the DL-based denoising model across 10 cases in the test set. The relative improvement ratio was calculated as follows:

Fig. 5

(Color online) Comparison of profile curves with different particle number and DL-based denoising. (a) original, (b) denoised

Fig. 6

The comparison of SSIM and PSNR before and after denoising. (a) SSIM, (b) PSNR

Mean SSIM for 10 cases in the test set

ID	1×10⁶			1×10⁷			1×10⁸
ID	Original	Denoised	Ratio	Original	Denoised	Ratio	Original	Denoised	Ratio
#1	0.65	0.95	45.71%	0.75	0.96	28.75%	0.93	0.98	5.37%
#2	0.74	0.98	33.26%	0.87	0.99	13.60%	0.98	0.99	1.55%
#3	0.57	0.95	68.12%	0.65	0.96	46.57%	0.88	0.97	9.42%
#4	0.56	0.94	68.72%	0.66	0.95	44.26%	0.85	0.95	11.77%
#5	0.61	0.96	57.51%	0.70	0.97	37.95%	0.91	0.97	6.94%
#6	0.67	0.97	44.79%	0.79	0.97	24.07%	0.94	0.98	4.17%
#7	0.58	0.92	60.60%	0.62	0.93	50.16%	0.84	0.94	11.75%
#8	0.56	0.95	69.97%	0.65	0.95	46.89%	0.86	0.96	11.89%
#9	0.63	0.97	52.81%	0.76	0.97	27.84%	0.95	0.98	3.97%
#10	0.54	0.93	73.88%	0.57	0.94	64.05%	0.83	0.95	14.23%
Mean	0.61	0.95	57.54%	0.70	0.96	38.41%	0.90	0.97	8.11%

Mean PSNR for 10 cases in test set

ID	1×10⁶			1×10⁷			1×10⁸
ID	Original	Denoised	Ratio	Original	Denoised	Ratio	Original	Denoised	Ratio
#1	27.46	33.50	21.99%	29.82	35.27	18.29%	36.87	40.57	10.01%
#2	29.05	36.30	24.94%	34.08	40.82	19.76%	43.03	45.96	6.80%
#3	25.56	33.07	29.36%	27.38	34.96	27.70%	34.72	39.41	13.51%
#4	24.93	32.07	28.66%	26.94	33.62	24.80%	32.93	37.53	13.97%
#5	26.45	34.20	29.31%	28.32	36.09	27.47%	35.66	40.31	13.04%
#6	27.31	35.00	28.16%	30.81	37.64	22.19%	38.63	42.58	10.21%
#7	25.93	33.19	28.01%	27.04	33.89	25.35%	33.12	37.10	12.02%
#8	25.20	32.73	29.86%	27.25	34.42	26.29%	33.45	38.38	14.73%
#9	26.65	33.97	27.45%	29.99	37.01	23.38%	38.53	42.27	9.72%
#10	24.79	31.94	28.84%	25.34	32.70	29.03%	31.98	36.99	15.67%
Mean	26.33	33.60	27.66%	28.70	35.64	24.43%	35.89	40.11	11.97%

where R denotes the relative improvement ratio, and and represent the metric values before and after denoising, respectively.

In Fig. 6, the X-axis categorizes EPID TD by particle number: 1×10⁶, 1×10⁷, and 1×10⁸ with each group divided into “original” (before denoising) and “denoised” (after denoising) subgroups. The Y-axis represents the SSIM and PSNR values compared to the reference 1×10⁹. For the original EPID TD, the 1×10⁶ cases had the lowest SSIM of 0.61 and PSNR of 26.33, indicating the highest noise level and poorest accuracy. As the number of particles increased, the quality of the EPID TD improved, with 1×10⁷ reaching an SSIM of 0.70 and PSNR of 28.70. However, this was still below 1×10⁸, which exhibited the highest SSIM of 0.90 and PSNR of 35.89. The boxplot for 1×10⁷ shows the greatest variation, indicating a higher inconsistency and suggesting room for further improvement. After DL-based denoising, the SSIM and PSNR values increased significantly for all cases. The 1×10⁶ cases showed the most substantial improvement, with the SSIM rising to 0.95, even surpassing the original 1×10⁸ cases. However, the PSNR remained slightly lower, indicating residual noise. The 1×10⁷ and 1×10⁸ cases also exhibit marked enhancement, with 1×10⁸ achieving the highest quality, SSIM of 0.97, PSNR 40.11. Because the original 1×10⁸ cases already have high quality, the relative improvement is less pronounced.

For SSIM, under low particle numbers, 1×10⁶ and 1×10⁷, the improvement in SSIM is most notable, with the average improvement reaching up to 73.88% and 64.05% for individual cases, respectively. Although the original cases at 1×10⁸ already have a very high SSIM, denoising still results in an average improvement of approximately 8.11%. After denoising, the overall average SSIM across all cases in the test set exceeded 0.95. The PSNR followed a trend similar to that of the SSIM. The DL-based denoising method effectively improved the PSNR under all conditions. For 1×10⁶ and 1×10⁷, the PSNR improvement ratio was mostly above 20%. For 1×10⁸, the mean improvement ratio is still above 10%. However, it is noteworthy that while the PSNR significantly improves for 1×10⁶ and 1×10⁷, their average values remain lower than those for 1×10⁸. These results suggest that the inherent information loss caused by lower simulated particle numbers may still limit the maximum achievable quality. Nevertheless, the improvements in the SSIM and PSNR strongly demonstrate that the DL-based denoising method can substantially enhance the quality of the EPID TD with a low particle number.

Table 3 presents the original, denoised GPR and relative improvement ratio across 10 cases in test set. The GPR strongly depends on the particle number, and a higher particle number typically reduces the statistical noise, which is evident in the original GPR values. As the number of particles increased from 1×10⁶ to 1×10⁷, the mean GPR increased from 48.47% to 61.04%, and further to 91.88% at 1×10⁸. This confirms that low-particle-number simulations suffer from noise-induced degradation, which affects the dose verification reliability. After applying DL-based denoising, the GPR for all particle numbers improved dramatically. For 1×10⁶, the mean GPR increased from 48.47% to 89.10%, representing an 83.83% improvement. For 1×10⁷, the mean GPR increased from 61.04% to 94.35%, reflecting a 54.57% enhancement. However, for 1×10⁸, the increase is more modest, from 91.88% to 99.55%, with 8.34% gain. Notably, the denoised results for 1×10⁷ exceed the original GPR for 1×10⁸, highlighting the model’s capability to recover accuracy even in simulations that are computationally less expensive. The ability of the DL-based denoising method to enhance the accuracy of low-particle simulations beyond that of high-particle simulations underscores its potential to reduce computational costs while maintaining high accuracy, offering a practical and efficient solution for PSQA.

Mean GPR for 10 cases in test set

ID	1×10⁶			1×10⁷			1×10⁸
ID	Original	Denoised	Ratio	Original	Denoised	Ratio	Original	Denoised	Ratio
#1	51.14%	87.44%	70.98%	63.50%	91.60%	44.25%	94.00%	99.62%	5.98%
#2	50.82%	89.42%	75.95%	77.76%	98.32%	26.44%	98.96%	99.98%	1.03%
#3	46.08%	89.56%	94.36%	57.78%	94.88%	64.21%	90.78%	99.48%	9.58%
#4	46.18%	86.68%	87.70%	50.48%	92.40%	83.04%	86.70%	99.08%	14.28%
#5	45.20%	89.22%	97.39%	58.74%	94.34%	60.61%	90.90%	99.48%	9.44%
#6	49.54%	89.44%	80.54%	66.90%	96.16%	43.74%	96.26%	99.90%	3.78%
#7	56.46%	95.04%	68.33%	62.44%	96.20%	54.07%	91.60%	99.62%	8.76%
#8	44.34%	88.86%	100.41%	55.36%	93.36%	68.64%	87.40%	99.48%	13.82%
#9	47.80%	87.14%	82.30%	65.88%	95.88%	45.54%	96.66%	99.84%	3.29%
#10	47.12%	88.18%	87.14%	51.60%	90.40%	75.19%	85.54%	98.98%	15.71%
Mean	48.47%	89.10%	83.83%	61.04%	94.35%	54.57%	91.88%	99.55%	8.34%

To assess the computation efficiency, Table 4 summarizes the time of the MC simulation, DL-based denoising, and total computation across different particle numbers for one field. As expected, the time of the MC simulation increased with the particle number, whereas the time of DL-based denoising remained relatively stable at approximately 0.1 s, with minor fluctuations depending on the system performance. Notably, the denoising time was significantly lower than that of the MC simulation and remained largely unaffected by the particle number. Overall, the total computation time was primarily driven by the MC simulation, increasing with the number of particles. However, incorporating DL-based denoising dramatically reduces the computation time while preserving the quality of the EPID TD. For 1×10⁶ and 1×10⁷, the total computation time is reduced to 1.24 s and 1.88 s, respectively, representing a 40-fold to 60-fold increase in efficiency compared to the 73.89 s required for 1×10⁹. While improvements are also observed at 1×10⁸ particles, the efficiency gain is comparatively modest, with an approximately 9-fold increase compared to 1×10⁹. It is also worth noting that the computation time for 1×10⁶ and 1×10⁷ remains under 2 s and shows little difference. However, as discussed earlier, the denoised TD images and profile curves for 1×10⁶ exhibit distortions, and their GPR is also relatively low. Considering both quality and computational efficiency, 1×10⁷ offers a reasonable trade-off, ensuring a high-quality EPID TD; if higher accuracy is required, the particle number can be increased to 1×10⁸.

Computation time for different particle number and methods

Time	1×10⁶	1×10⁷	1×10⁸	1×10⁹
MC	1.12 s	1.72 s	8.62 s	73.89 s
Denoised	0.13 s	0.16 s	0.15 s	–
Total	1.24 s	1.88 s	8.76 s	73.89 s

Discussion

PSQA plays a crucial role in radiotherapy, ensuring treatment accuracy and safety. The forward-projection approach compares the measured and predicted EPID TD, whose clinical applicability depends largely on the suitability of the predicted data. MC, recognized as the “gold standard” for dose calculation, can be used to predict the EPID TD with high accuracy. The precise modeling of linear accelerators and EPID remains a substantial challenge. Zhang et al. attempted to use the PRIMO MC code to compute the EPID TD and opted to use a homogeneous water phantom as a substitute for the complex EPID [40]. Li et al. performed EPID dosimetry studies using DL models with water-equivalent materials [41]. Approximating the EPID model may introduce inaccuracies; therefore, finding an appropriate MC simulation tool is crucial. Although MC-based EPID TD prediction can provide accurate results, it relies on statistical sampling, requiring a sufficient number of particles to achieve an accurate simulation. Martins et al. spent 14 h using the Geant4 MC code to simulate accurate dose distributions and respective EPID signals for only one subfield of the treatment plan, which may not be a good choice for clinical applications [42]. Lazaro et al. utilized an MC code to develop an EPID model. When simulating a 1024×1024 image on a Linux cluster with 100 processors (2.26 GHz), the computation time was 30 min for a simulation involving over 100 million photons from the PSF. Conversely, increasing the photon number to 500 million extended the computation time to 2 h and 30 min [43]. Such computation times are impractical for clinical use, particularly in time-critical scenarios. This study employed ARCHER, a GPU-accelerated MC code, to predict the EPID TD, which incorporates detailed models of the linear accelerator and EPID, enabling accurate results. Although ARCHER offers faster simulations, it cannot meet all clinical conditions, such as online ART, where rapid calculations are required.

Based on lung IMRT cases, this study showed the impact of different particle numbers on the quality of the EPID TD in ARCHER. As the number of particles increased, the quality of the TD images improved progressively, from noticeable graininess under 1×10⁶ to clear structural details at 1×10⁹. However, a high particle number demands extensive computational costs, reducing its feasibility for clinical applications. Applying a low particle number often results in significant noise, making the EPID TD clinically unacceptable. Therefore, it is crucial to find an effective denoising approach for EPID TD prediction. Traditional image processing methods, such as Gaussian filtering and non-local means filtering, can suppress noise to some extent but often sacrifice details and may even distort the dose distribution. This study proposes the use of DL to denoise the EPID TD generated by MC simulation at low particle numbers, aiming to mitigate noise while ensuring quality and reducing computation costs. By introducing the SUNet neural network model, the model learns the characteristics of low-particle-number and high-particle-number TD images, enabling noise removal. The results indicate that the DL-based denoising method significantly enhanced the SSIM and PSNR. However, for 1×10⁶, the extremely low original SNR may still lead to detail loss and distortion after the denoising. A computationl efficiency analysis showed that DL-based denoising remained stable at approximately 0.1 s, significantly reducing the total computation time compared with the MC simulation. At 1×10⁷, the total computation time is only 1.88 s, achieving a 40-fold speedup, and at 1×10⁸, the total computation time is only 8.76 s, achieving a 9-fold speedup compared to 1×10⁹. The timing results should be interpreted in the context of hardware used. These results were obtained using a single RTX-3090 GPU, which reflects the hardware setup available in many clinical environments. Although the use of multi-GPU or cloud-based platforms may further accelerate denoising, such resources are not universally available. Therefore, our study provides a practical and accessible solution that balances performance and broad applicability.

This approach, which integrates MC simulation with a DL-based denoising technique, offers substantial potential for clinical applications, particularly in online ART, in which treatment plans must be updated and verified quickly during each treatment session to account for changes in tumor shape and position. This process requires rapid processing and analysis to adjust treatment plans quickly and ensure precision and efficacy. EPID TD serves as a critical tool for dose verification in this process. By applying the trained DL-based model to low-particle-number EPID TD, we can effectively suppress noise and provide reliable support for PSQA, which can help in clinical quick radiotherapy decision-making. Additionally, this work not only serves as an independent PSQA tool based on EPID TD but also lays a foundation for subsequent 3D dose reconstruction on patient CT images. DVH-based dosimetric evaluations of targets and organs-at-risk rely on this. Despite these promising results, this study has several limitations. First, it only includes 100 lung IMRT cases, and the current dataset is limited to a single anatomical site. Models trained solely on lung data may not generalize well to other regions, such as the pelvis, owing to differences in anatomical complexity and beam configurations. Therefore, the applicability of the proposed method to other treatment sites requires further investigation. Future studies will focus on evaluating the model’s performance across different anatomical regions to further assess its generalizability. Second, although the SUNet model demonstrates significant improvements in denoising and effectively removes noise from low-particle-number simulations, such as 1×10⁶ particles or fewer, remains a challenge. Further enhancements to the model or integration of complementary techniques may be necessary to overcome this limitation. Finally, future research should focus on systematically comparing a range of state-of-the-art neural network architectures to identify models with superior denoising capabilities for EPID TD images. This includes exploring hybrid models, ensemble learning approaches, and transfer learning strategies that could potentially improve robustness and generalization. Concurrently, iterative refinements to the existing SUNet architecture will be pursued to optimize its performance and computational efficiency for practical clinical implementation.

Conslusion

This study introduces an integrated approach that combines MC simulation with DL-based denoising for EPID TD generation and provides a recommended particle number for the MC simulation. The EPID TD at different particle numbers (1×10⁶, 1×10⁷, 1×10⁸, and 1×10⁹) was obtained using the ARCHER MC code for lung IMRT cases, which were used to train the neural network model SUNet. The trained model is applied to denoise TD images at low particle numbers, significantly enhancing the quality while reducing the computational cost. Considering both image quality and computational efficiency, 1×10⁷ offers a reasonable trade-off, particularly for time-constrained scenarios in which expedited dose verification is critical. However, when higher dosimetric precision is required, 1×10⁸ may be more appropriate because of its higher accuracy. This flexibility allows the method to be adapted to varying clinical demands, providing a practical and scalable solution for PSQA in online ART. Future work will focus on further optimizing the model performance, expanding its clinical applications, and integrating complementary computational techniques to enhance its clinical utility.

References

G.A. Ezzell, J.M. Galvin, D. Low et al.,

Guidance document on delivery, treatment planning, and clinical implementation of IMRT: Report of the IMRT subcommittee of the AAPM radiation therapy committee

. Med. Phys. 30, 2089-2115 (2003). https://doi.org/10.1118/1.1591194