Cone-beam computed tomography noise reduction method based on U-Net with convolutional block attention module in proton therapy

ACCELERATOR, RAY TECHNOLOGY AND APPLICATIONS

Cone-beam computed tomography noise reduction method based on U-Net with convolutional block attention module in proton therapy

Xing-Yue Ruan，

Xiu-Fang Li，

Meng-Ya Guo，

Mei Chen，

Ming Lv，

Rui Li，

Zhi-Ling Chen

Nuclear Science and Techniques

Vol.35, No.7

Article number 122

Published in print Jul 2024

Available online 12 Jul 2024

DOI：10.1007/s41365-024-01495-1

1046015

Cone-beam computed tomography (CBCT) is mostly used for position verification during the treatment process. However, severe image artifacts in CBCT hinder its direct use in dose calculation and adaptive radiation therapy re-planning for proton therapy. In this study, an improved U-Net neural network named CBAM-U-Net was proposed for CBCT noise reduction in proton therapy, which is a CBCT denoised U-Net network with convolutional block attention modules. The datasets contained 20 groups of head and neck images. The CT images were registered to CBCT images as ground truth. The original CBCT denoised U-Net network, sCTU-Net, was trained for model performance comparison. The synthetic CT(SCT) images generated by CBAM-U-Net and the original sCTU-Net are called CBAM-SCT and U-Net-SCT images, respectively. The HU accuracies of the CT, CBCT, and SCT images were compared using four metrics: mean absolute error (MAE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structure similarity index measure (SSIM). The mean values of the MAE, RMSE, PSNR, and SSIM of CBAM-SCT images were 23.80 HU, 64.63 HU, 52.27 dB, and 0.9919, respectively, which were superior to those of the U-Net-SCT images. To evaluate dosimetric accuracy, the range accuracy was compared for a single-energy proton beam. The γ-index pass rates of a 4 cm × 4 cm scanned field and simple plan were calculated to compare the effects of the noise reduction capabilities of the original U-Net and CBAM-U-Net on the dose calculation results. CBAM-U-Net reduced noise more effectively than sCTU-Net, particularly in high-density tissues. We proposed a CBAM-U-Net model for CBCT noise reduction in proton therapy. Owing to the excellent noise reduction capabilities of CBAM-U-Net, the proposed model provided relatively explicit information regarding patient tissues. Moreover, it maybe be used in dose calculation and adaptive treatment planning in the future.

Proton therapyCone beam CTCBAM-U-Netγ-index

Introduction

Radiotherapy is among the most efficient cancer treatments, and approximately 80% of cancer patients require radiotherapy during treatment [1]. In recent years, proton therapy has gained increasing interest owing to its superior physical properties compared to conventional radiotherapy [2-7]. The dose of the proton beams increases slowly with depth within the entrance region and drastically increases at the end of the range, forming a Bragg peak. Thereafter, it declines rapidly in the distal fall-off region [8-10]. Because of the advantage of the Bragg peak in the physical dose, a proton beam can accurately deliver a dose to the target volume and improve dose distribution and target volume conformality. This further protects patients and reduces damage to adjacent normal tissue caused by radiotherapy. Therefore, proton therapy is superior to traditional photon therapy for the treatment of tumors located close to organs at risk (OAR) [11-13]. Owing to the distinctive dose distribution of proton beams and the high sensitivity of the proton beam range to Hounsfield Unit(HU) values, proton therapy requires more frequent imaging information to improve the effect of therapy and reduce physical uncertainties [14, 15].

Owing to the rapid development of radiation and medical imaging technologies, the era of image-guided therapy is constantly developing in the field of radiotherapy. Utilizing a vast array of applicable and effective technologies ensures that the radiation dose corresponds to the anatomical structure of the radiation target to the greatest extent possible, thereby improving the treatment quality [16-18]. Cone-beam computed tomography (CBCT) has become among the most important components of image guidance equipment in the field of medical image guidance technology because of its multiple advantages, such as short scanning time, high spatial resolution, low exposure dose, and the ability to be located at the treatment site [19, 20]. CBCT is typically used either daily or weekly to verify patient position and monitor the patient’s anatomical structural changes. The patient need not be moved to avoid errors in positioning over the course of treatment. However, CBCT is a three-dimensional image reconstructed from two-dimensional projection images, and its scattering noise artifacts are more severe than those produced by conventional fan-beam CT (FBCT). Moreover, the CBCT imaging method is not promising for clinical dose calculations. However, CBCT is typically performed on patients during radiation treatment, although the CBCT images contain severe artifacts. The relatively more accurate results of dose calculations based on CBCT images would provide more medical information about the anatomical changes in patients [21-24].

In recent years, many studies have been conducted to reduce the noise in CBCT images. Theoretically, the HU values in CBCT images can be recovered by deforming the planning CT(pCT) images using deformable image registration (DIR) [25-28]. However, the DIR-based methods are complicated and require careful evaluation. Once the anatomical structure of the patient changes significantly, the accuracy of DIR for CBCT image improvement is limited [29]. The process of generating an image using DIR can take a few minutes to complete. The other frequently used method is the intensity correction method, which scales CBCT image intensities to the HU range of pCT images using population-based lookup tables. However, the limitations of this method are intrinsic to CBCT, specifically the shadowing effect artifacts, which cannot be corrected by the intensity correction method [30]. The application of the Monte Carlo (MC) method for CBCT noise reduction has also been investigated. MC-based correction techniques, which may require hours for calculation, are not suitable for clinical applications in adaptive radiotherapy [31], although this method has good performance, and certain GPU-based MC dose calculation methods have reduced computational cost [32]. With the development of computer technology and artificial intelligence, the application of artificial intelligence technologies for CBCT noise reduction has garnered attention. The use of deep learning as a method for CBCT noise reduction offers various benefits, including excellent image quality, continual learning, and quick computation after the model has been trained. CBCT and CT image noise-reduction technologies have been improved using techniques based on convolutional neural networks (CNNs) [33, 34]. Ronneberger et al. developed a U-Net network architecture in 2015 [35] that demonstrated exceptional performance in applications involving biological segmentation. In 2020, based on U-Net, Chen et al. created sCTU-Net for CBCT noise reduction and realized the function of CBCT to CT. The sCTU-Net may enable improved CBCT technology to be employed in adaptive treatment planning [36].

This study developed a CBAM-U-Net network that integrates convolutional block attention modules, also known as CBAM blocks, into sCTU-Net down-sampling and up-sampling modules [37]. In this small module, the feature map was assigned different weights to increase the accuracy of the output results. The noise reduction performance of the proposed network was evaluated using the mean absolute error (MAE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM). In addition to the traditional image evaluation, the IDD curve analysis of a single-energy proton beam and the γ-index pass rate of the proton beam square field and simple plan were used to evaluate the performance of the network in noise reduction. This technology will be used in the CBCT imaging system of the proton therapy center [38] at Ruijin Hospital after verification.

Materials and Methods

2.1

Dataset and image preprocessing

The preliminary data processing in this study was based on the analysis of the original data, which included 1477 head and neck CT slices and 1477 CBCT slices obtained from a cohort of 20 different patients. The original CBCT images were processed using the Varian iCBCT technology for noise reduction during image acquisition. The tube voltage for CT was 120 kVp and CBCT was 100 kVp. The voxel size of the CT images was 0.511 mm × 0.511 mm × 1.989 mm with an image resolution of 512×512. Further, the voxel size of CBCT images was 0.976 mm × 0.976 mm × 3 mm with an image resolution of 512 × 512.

The acquisition time of CBCT images is about one month after that of CT images. Owing to the different acquisition times of the CT and CBCT images, there was a minor difference in deformation. Therefore, image registration between CT and CBCT images is required. The anatomical deformation registration technique was selected in 3D Slicer software and optimized using an adaptive stochastic gradient descent algorithm [39]. The CT images were registered to the CBCT images to maintain the anatomical geometry consistent, and the registered CT images were resampled to a resolution of 0.976 mm × 0.976 mm × 3 mm, the same with the CBCT images.

In this study, we strictly followed the registration results to produce CT-CBCT image pairs for training and testing. We randomly selected 1267 CT-CBCT image pairs from 17 patients as the training datasets and 210 CBCT-CT image pairs from the remaining three patients as the testing datasets where each patient dataset contained approximately 70 pairs.

2.2

Network structure

Figure 1 shows the network structure of CBAM-U-Net. The input data for the CBAM-U-Net network were CBCT images, and the output data were SCT images that were processed by the neural network. The first layer of the network was used to expand the original CBCT data to 32 channels. Using the down-sampling module, the height and width of the data were reduced to half of the input image, and the number of channels was doubled. After 6-layer down-sampling modules, 8 × 8 × 1024 feature images were created. Contrary to the function of the downsampling module, the feature images were processed by the upsampling module, which doubled the feature image height and width and reduced the number of channels by half. The 8 × 8 × 1024 feature images were up-sampled using 6-layer up-sampling modules, processing the feature image dimensions to 512 × 512 × 32. Finally, an SCT image with dimensions 512 × 512 × 1 was produced by the output module. The attention mechanism modules were integrated into both the downsampling and upsampling modules. These modules enabled the network to allocate calculation weights to the sampling modules in each layer, thereby enhancing the accuracy of the neural network outputs.

Fig. 1

(Color online) Overall structure of the CBAM-U-Net network

The weight calculation of the feature image was executed in the sampling module of each layer to increase the output accuracy of the neural network. The module of the convolutional attention mechanism is illustrated in Fig. 2. The convolutional block attention module comprised two submodules: the channel and spatial attention modules. Each feature image was assigned a weight parameter using a module called the channel attention mechanism module. The spatial attention mechanism module then processed the feature image by applying weight parameters to each pixel of the feature image. This process occurred during feature-image generation. To increase the accuracy of the HU values in the output results, the convolutional block attention module provided additional weight parameters for certain feature images and areas within all feature images.

Fig. 2

Convolutional block attention module structure

A description of the channel attention module is presented in Fig. 3. The input feature images were initially processed using an adaptive average pool, with the output having dimensions of 1 × 1 × C, where C is the total number of channels. The weight parameters were generated after the input data passed through two linear layers and a Sigmoid layer. The input image was multiplied by the weight parameters to produce the output at the end of the module for the channel-attention mechanism. During the training of a neural network, certain feature images significantly affected the quality of the ideal output SCT image. After passing through the channel-attention mechanism module, these significant feature images were assigned larger weights.

Fig. 3

Channel attention mechanism module structure

Figure 4 shows the spatial attention mechanism module. Subsequent to the channel attention mechanism module, the spatial attention mechanism module accepted the output of the channel attention mechanism module as its input. The spatial attention mechanism module compressed the feature images into a one-channel image and processed the feature image using an average pooling layer and a maximum pooling layer. The image maps of the average and maximum pooling layers were concatenated and processed using a convolution layer. The feature images were multiplied by the spatial weight to obtain the final output of the spatial attention mechanism module. Therefore, the spatial attention mechanism module can assign higher weights to key areas of the feature images and lower weights to areas outside the key areas to improve the capability of the network. The channel and spatial attention mechanism modules both assigned calculation weights to the feature images of the network. Throughout the learning process, the calculation weights were continuously modified to generate improved SCT images.

Fig. 4

Spatial attention mechanism module structure

2.3

Network training parameter

PyTorch was used as the computing framework, and L1loss (Eq. 1) was chosen as the loss function. The model training engines were two NVIDIA RTX 2080Ti with 24 GB of computational memory. The batch size was set to 8, and each GPU received 4 images as the input. The initial learning rate was set at 1×10^-4, and there were 180 training epochs. Adam was used as the optimizer and StepLR was used as the learning rate adaptor. Training was performed on both the sCTU-Net and CBAM-U-Net models with the same training parameters, and the training weight parameters were saved for each network. $L 1 L o s s (S (x_{i}), y_{i}) = \frac{1}{n} \sum_{i = 1}^{n} | S (x_{i}) - y_{i} |$ (1) where S(xi) is the pixel value of the generated SCT image, yi is the pixel value of the corresponding CT image, and n is the total number of pixels in the image.

2.4

Image evaluation

To evaluate the noise reduction performance, the MAE, RMSE, PSNR, and SSIM were used. The ground truth images were registered as CT images. The values of the data from the three testing datasets were analyzed using these four evaluation parameters, and the average values of the three testing datasets were calculated as the final assessment parameter.

The MAE and RMSE are expressed in Eqs. 2 and 3, respectively. Both are the most prevalent assessment criteria used in image processing and offer the advantages of being easily understood and quantifiable. The smaller the MAE and RMSE of the two images, the higher the similarity between them. The corresponding formulas are as follows: $M A E (S (x_{i}), y_{i}) = \frac{1}{m} \sum_{i = 1}^{m} | S (x_{i}) - y_{i} |$ (2) $R M S E (S (x_{i}), y_{i}) = \sqrt{\frac{1}{m} \sum_{i = 1}^{m} {(S (x_{i}) - y_{i})}^{2}}$ (3) where S(xi) is the pixel value of the generated SCT image, yi is the pixel value of the corresponding CT image, and m denotes the total number of image pixels.

PSNR is defined as the ratio between the maximum possible value of a signal and distorting noise power that affects the quality of its representation. The larger the PSNR, the higher is the similarity between the two images. The formula for PSNR is expressed as Eq. 4: $P S N R = 10 \log_{10} (\frac{M a x_{i}^{2}}{M S E})$ (4) where Maxi is the maximum value of the image pixel sampling point and MSE is the mean square error.

SSIM is a statistic used to determine the overall similarity between two images. It is commonly used as a statistic to measure image generation and processing. The dynamic range of SSIM ranges from -1 to 1. The closer the SSIM is to 1, the higher is the similarity in the total information content between the two images. SSIM was calculated using Eq. 5: $S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}$ (5) where μx is the mean of x, μy is the mean value of y, $σ_{x}$ is the variance of x, $σ_{y}$ is the variance of y, $σ_{x y}$ is the covariance of x and y, C₁=(k₁L)², C₂=(k₂L)² is the constant used to maintain stability, and L is the maximum value of image pixels. Further, k₁ and k₂ were 0.01 and 0.03, respectively.

To show the matching degree between SCT, CBCT, and CT images in different HU value ranges. Q-Q (quantile-quantile) plots [40] between SCT and CT images and between CBCT and CT images from samples of the testing data were also plotted for detailed evaluation. Q-Q plots are commonly used in mathematical statistics to test the consistency of the distribution of two groups of data. The closer the two sets of data were distributed, the closer the Q-Q plot of the two groups of data was to 45 from the reference line.

2.5

Evaluation of dose calculation accuracy

The matRad dose calculation engine, an open-source software developed by the German Cancer Research Center (DKFZ, Heidelberg) for scientific purposes of radiotherapy planning, was used to calculate the dose distribution of proton beams, where the ray-tracing algorithm was used for dose calculation [41].

First, for the depth dose analysis, the IDD curve analysis of a proton beam with a nominal energy of 114.5 MeV is used in matRad. The dose grid was set as 1 mm × 1 mm × 1.5 mm. A proton beam in air with a full width at half maximum (FWHM) of 5 mm was applied to the dose calculation model of the head and neck CT, CBCT, CBAM-SCT, and U-Net-SCT images of the patient testing datasets. Second, for the lateral dose profile analysis, CBCT, CBAM-SCT, and U-Net-SCT images were analyzed using a square-field γ-index pass rate to compare the calculation results. The scanned field size was set to 4 cm × 4 cm, the spot spacing was 2 mm, and the square-field dose calculation employed proton beams with nominal energy of 114.5 MeV. Third, for further lateral dose analysis, a simple plan was also implemented, and under the same radiation conditions, the dose distribution of CT was used as a reference to compare their γ-index pass rates.

Results

3.1

Comparative analysis of CT images

The network weight parameters of CBAM-U-Net and sCTU-Net were obtained under the same training conditions. The weight parameters produced by the sCTU-Net and CBAM-U-Net models were utilized to generate the SCT images of the three patient datasets. The data obtained from the generation of SCT images were evaluated using the four indicators mentioned above. Figure 5 shows representative transversal images of CT, CBCT, and SCT from a patient. Both the sCTU-Net and CBAM-U-Net networks effectively reduced image noise and restored the original CT image features in regions with CBCT image artifacts. The quantitative analysis mean value data are presented in Table 1.

Fig. 5

Typical transverse planes of (a) CT, (b) CBCT, (c) U-Net-SCT, and (d) CBAM-SCT images of patient

Four image indicators comparison between CBCT, SCT and CT images of three patients

	Image dataset	MAE (HU)	RMSE (HU)	PSNR (dB)	SSIM (a.u.)
	CBCT	39.3256	104.8787	48.3045	0.9809
Patient1	U-Net-SCT	24.5496	65.5988	52.1678	0.9911
	CBAM-SCT	22.6519	57.1462	53.1665	0.9936
	CBCT	41.2931	111.9404	47.6491	0.9780
Patient2	U-Net-SCT	25.4210	73.6275	51.0182	0.9892
	CBAM-SCT	23.1739	68.6911	51.6913	0.9909
	CBCT	47.7311	115.2558	47.6525	0.9783
Patient3	U-Net-SCT	26.9864	76.3210	51.1504	0.9885
	CBAM-SCT	25.4442	67.2830	52.0463	0.9915

Table 1 demonstrates that the U-Net-SCT and CBAM-SCT images were significantly superior in the four evaluation metrics compared to the CBCT images. Moreover, CBAM-SCT images exhibited better image quality than the U-Net-SCT images.

To demonstrate the CBAM-U-Net’s ability to reduce noise, error distribution maps were obtained by subtracting the CT images from CBCT and SCT images, as shown on the left side of Fig. 6. The CBCT, and CBAM-SCT images had an MAE values of 23.68, 17.75, and 13.02 HU, respectively. The image errors generated by CBCT were worse than those generated by SCT, which mainly appeared around the bone border. CBCT images also contained severe scattered noise. CBAM-U-Net and original sCTU-Net both reduced scatter artifacts and controlled noise to an acceptable level. Moreover, CBAM-U-Net exhibited a better noise reduction ability than sCTU-Net, particularly in high-density areas such as the skull.

Fig. 6

(Color online) Absolute HU value difference of between CT and CBCT, SCT images. (left column). Q-Q plot between CT and CBCT images, and between CT and U-Net-SCT images, and between CT and CBAM-SCT images. (right column)

Q-Q plots between CBCT and CT images and between SCT and CT images from one test patient are shown in Fig. 6. The range of the horizontal coordinates of the Q-Q plots was set as the HU value range of the CT image, whereas that of the vertical coordinates of the Q-Q plot was set as the HU value range of CBCT or SCT. The Q-Q plot of CT-CBCT deviated from the 45 reference line, particularly in high-density tissue. Whereas, sCTU-Net corrected most of the errors, although there were still errors in high-density tissue areas. The Q-Q plot of CT-CBAM along the reference line of 45, regardless of the soft tissue and bone areas, demonstrated that CBAM-SCT images had the same data distribution as the CT images. The Q-Q plots show that the CBAM-U-Net network exhibited a better noise reduction ability in high-density tissue areas than the original sCTU-Net network.

3.2

Single-beam depth dose analysis

The IDD curve of the proton beam was analyzed using matRad. The high-density and soft-tissue areas of the three tested patients were selected as the analysis areas. The selection of these two areas reflected the actual clinical treatment conditions. Moreover, in dose calculations, the artifacts of high-density tissue areas are commonly the cause of large dose calculation errors. Figure 7 shows the dose distribution map for Patient 1 along the beam is shown in Fig. 7.

Fig. 7

(Color online) (a) CT images high density areas dose distribution of patient 1; (b) CBCT images high density areas dose distribution of patient 1; (c) U-Net-SCT images high density areas dose distribution of patient 1; (d) CBAM-SCT images high density areas dose distribution of patient 1; (e) CT images soft tissue areas dose distribution of patient 1; (f) CBCT images soft tissue areas dose distribution of patient 1; (g) U-Net-SCT images soft tissue areas dose distribution of patient 1; and (h) CBAM-SCT images soft tissue areas dose distribution of patient 1;

The corresponding IDD curves are presented in Fig. 8. The IDD curves of the CT images were used as the reference curves. The IDD curves for the high-density and soft-tissue areas were compared.

Fig. 8

(Color online) (a) Complete IDD curve in high density areas; (b)partial IDD curve in high density areas; (c) complete IDD curve in soft tissue areas; and (d)partial IDD curve in soft tissue areas

To show more details of the IDD curve at the Bragg peak region, for the IDD curve of the high-density areas, the depth 100 mm–145 mm IDD curve was selected for analysis, and for the IDD curve of the soft tissue areas, the depth 120 mm–145 mm IDD curve was selected for analysis. Both sCTU-Net and CBAM-U-Net exhibited effective noise reduction in soft tissue areas, as shown in Fig. 8, and their IDD profiles were comparable. However, the noise reduction effect of CBAM-U-Net was better than that of the original sCTU-Net in high-density areas. The CBAM-U-Net IDD curve was closer to the CT’s IDD curve than to that of sCTU-Net. To quantify this difference, Table 2 presents the depth at the peak of IDD curves for the three patients.

Depth at the peak of the IDD curve in soft and high density tissue of three patients

		CT	CBCT	U-Net-SCT	CBAM-SCT
Patient1	High density tissue	122.6 mm	113.4 mm	126.7 mm	125.2 mm
	Soft tissue	133.4 mm	129.3 mm	134.4 mm	133.9 mm
Patient2	High density tissue	122.6 mm	113.4 mm	124.7 mm	124.7 mm
	Soft tissue	150.7 mm	147.7 mm	151.8 mm	151.8 mm
Patient3	High density tissue	123.7 mm	119.1 mm	126.7 mm	126.2 mm
	Soft tissue	147.7 mm	143.6 mm	149.2 mm	147.7 mm

Because of the image artifacts in CBCT images, the IDD curves of the CBCT exhibited considerable inaccuracy compared to those of CT. Moreover, the results of the IDD curves of CBCT cannot reflect the anatomical structure information of patients from the perspective of proton radiotherapy. The IDD curves of CBAM-SCT and U-Net-SCT exhibited a better degree of matching than that of CBCT. Whereas, those of CBAM-SCT and U-Net-SCT had the same profile as the IDD curves of CT. According to the IDD curves, the CBAM-U-Net network was more effective than the original sCTU-Net network in reducing noise in high-density tissues. In addition, the results of CBAM-SCT images with a single-energy beam from the treatment angle and other key angles reflected tissue changes at the radiation site and the degree of anatomical deformation of the related tissue.

3.3

Lateral dose comparison analysis

In matRad, the dose results of CT were used as a reference. CBCT, CBAM-SCT, and U-Net-SCT images were analyzed using the 1%/1mm and 3%/3mm γ-index criteria in a square field to compare the calculation results. The square field size was set to 4 cm × 4 cm, and the spot spacing was 2 mm. The experiment employed proton beams with nominal energy of 114.5 MeV. The high-density boundary was selected as the radiation analysis area because it can assess the difference in dose distribution between the high-density and soft tissue areas. Figure 9 shows the scanned field lateral dose distribution and absolute dose difference of Patient 1 compared with the dose distribution in the CT images. The γ-index pass rates of CBCT, CBAM-SCT, U-Net-SCT, and CT for the three patients are listed in Table 3.

Fig. 9

(Color online) (a) Dose distribution in CT image square fields of patient 1; (b)dose distribution in CBCT images square field of patient 1; (c)dose distribution in U-Net-SCT image square fields of patient 1; (d)dose distribution in CBAM-SCT images square field of patient 1; (e)absolute dose difference between CBCT and CT image square fields of patient 1; (f)absolute dose difference between U-Net-SCT and CT image square fields of patient 1; and (g)absolute dose difference between CBAM-SCT and CT image square fields of patient 1

γ-index pass-rate of CBCT, U-Net-SCT, CBAM-SCT of three patients

		CBCT	U-Net-SCT	CBAM-SCT
Patient1	1%/1 mm	70.67%	79.03%	82.23%
	3%/3 mm	93.89%	97.04%	98.09%
Patient2	1%/1 mm	85.05%	85.23%	85.73%
	3%/3 mm	98.53%	98.57%	98.62%
Patient3	1%/1 mm	80.52%	86.88%	88.94%
	3%/3 mm	97.12%	98.77%	99.25%

A smaller absolute dose difference existed between the CBAM-SCT and CT image square fields than that between the CBCT, U-Net-SCT, and CT images. In addition, Figure 9 shows that the absolute dose difference of CBAM-SCT images in the bone region was lower. The γ-index pass rate of CBAM-SCT was superior to that of the original sCTU-Net network under the criteria of 1%/1 mm and 3%/3 mm γ-index pass rates. In particular, under the 1%/1 mm calculation condition for patient 1, the γ-index pass rate of CBAM-SCT was 82.23%, which is considerably higher than that of U-Net-SCT’s 79.03%. Noise in higher-density tissues with a higher stopping power ratio leads to a greater dose error in radiotherapy dose calculation. CBAM-U-Net has a more robust noise reduction function in high-density areas; therefore, CBAM-SCT images are more suitable for clinical dose calculation requirements than the original sCTU-Net network.

According to the results of the square-field calculation, the CBAM-U-Net network can improve the accuracy of the dose calculation and eliminate the image artifacts existing in the original CBCT images. The γ-index pass rate of the SCT images generated by the CBAM-U-Net network was higher than that produced by the original sCTU-Net network. The application of square-field dose calculations can partially consider the influence of anatomical structural changes on the original treatment plan and provide additional data support for subsequent treatment.

To further evaluate the effect of image quality on the dose calculation results, delineation of a simple PTV and OAR was implemented. CT, CBCT, U-Net-SCT, and CBAM-SCT used the same PTV and OAR contours as in Patient 1 to eliminate errors caused by manual delineation. A single-field plan was adopted with a 180° gantry angle and 0° couch angle, and single-field inverse planning was used to optimize the dose distribution. The squared deviation function was used for dose optimization in the PTV area and the squared overdosing function was used for dose optimization in the OAR. The results of the CT images were selected as references, and the CBAM-SCT, and U-Net-SCT images were analyzed using the 2%/2mm γ-index criterion to compare the dose distribution. The absolute dose differences are shown in Fig. 10.

Fig. 10

(Color online) (a) Dose distribution in CT images of patient 1; (b)dose distribution in CBCT images of patient 1; (c)dose distribution in U-Net-SCT images of patient 1; (d)dose distribution in CBAM-SCT images of patient 1; (e)absolute dose difference between CBCT and CT images of patient 1; (f)absolute dose difference between U-Net-SCT and CT images of patient 1; and (g)absolute dose difference between CBAM-SCT and CT images of patient 1;

The γ-index pass rate of CBAM-SCT was 88.97%, which is better than that of U-Net-SCT and CBCT (88.06% and 77.01%, respectively). Using the CBAM-U-Net network to eliminate CBCT image artifacts may provide a foundation for proton-adaptive therapy.

Discussion

The proposed deep learning method was realized by synthesizing CT images from CBCT images, which have the clarity of CT images, while retaining the anatomical structure of the CBCT images. By adding CBAM Block modules to sCTU-Net, CBAM-U-Net performed better than the original sCTU-Net in terms of noise reduction. CT, CBCT and SCT images were compared using four image metric parameters. It was concluded that the image assessment parameters (Table 1) of CBAM-SCT were superior to those of U-Net-SCT and original CBCT. The image difference error map and Q-Q plots were applied to image analysis; CBAM-U-Net exhibits a better image noise reduction capacity and image correction performance in high-density tissues. High-density soft tissue areas are commonly treated with radiotherapy. Particularly in high-density areas, noise probably leads to serious dose-calculation errors. Therefore, the CBAM-U-Net exhibited better accuracy in case of the potential CBCT dose calculations.

The IDD curves of a single-energy proton beam were compared. Although the IDD curve of CBAM was closer to that of CT, errors were present in soft tissue and high-density tissue between the IDD curve peak of CBAM-SCT and CT. Therefore, further improvement of image correction will be implemented for accurate dose calculation. The results of the lateral dose profile analysis showed that CBAM-SCT images had a higher γ-index pass rate than the U-Net-SCT images under the 1%/1 mm and 3%/3 mm calculation criteria. Owing to the selected scanned field at the junction of the soft tissue and bone regions, Figure 9 shows that the absolute dose difference of CBAM-SCT was lower than those of U-Net-SCT and CBCT in the soft tissue and bone areas. The results of γ-index pass rate of the simple plan further demonstrated that CBAM-U-Net exhibited a certain level of practical therapeutic performance. Compared with the original sCTU-Net network, the CBAM-U-Net proposed in this study performed better in image denoising for potential dose calculation. Because this study did not use real treatment plans for dose calculation comparison, it may not directly reflect the impact of image quality on real treatment plans from the perspective of treatment. However, the monoenergetic scanned field and simple plan calculation results demonstrated the improvement in image quality for dose calculation accuracy to a certain extent.

In this study, CBAM-U-Net was still a supervised learning model, and the registered CT images with identical anatomical structures as ground truths inevitably contained registration errors, which lead to defective network learning. However, ideal CBCT-CT pairs are not easily available because they are not acquired simultaneously. The process of obtaining raw data is complicated; therefore, an unsupervised learning model should be developed for more flexible training and faster training data acquisition [42-44]. The proposed model was two-dimensional, and the input data and output data were as per a slice-by-slice model. A 3D generative model will be researched in the future to enhance the performance of noise reduction in CBCT images [45]. Because of the strong flexibility of U-Net, noise reduction in CBCT image projection domain is also a promising method [46, 47]. In addition, the acquisition of real CT data is not straightforward, and only the CT datasets of 17 patients were used as the training datasets, which may lead to limit generalization of the network. However, in small dataset training samples, CBAM-U-Net shows better noise reduction performance and robustness. Additional data will be used in the future to improve network robustness.

Conclusion

This study proposed the CBAM-U-Net model to generate synthetic CT images from original CBCT images based on an sCTU-Net neural network with a convolutional block attention module called CBAM-U-Net. Synthetic CT images stored the anatomical and structural information of the original CBCT images with relatively accurate HU value. In addition, it reduced noise in high-density areas and restored the original CBCT data better than the original sCTU-Net network. Our approach improved the accuracy of HU values of CBCT images, thus facilitating further quantitative applications of CBCT, such as potential dose calculation and adaptive treatment planning in the future.

References

R.L. Siegel, K.D. Miller, A. Jemal,

Cancer statistics, 2018

. Ca-Cancer. J. Clin. 68, 7-30 (2018). https://doi.org/10.3322/caac.21442