Material decomposition of spectral CT images via attention-based global convolutional generative adversarial network

NUCLEAR CHEMISTRY, RADIOCHEMISTRY, AND NUCLEAR MEDICINE

Material decomposition of spectral CT images via attention-based global convolutional generative adversarial network

Xiao-Dong Guo，

Peng He，

Xiao-Jie Lv，

Xue-Zhi Ren，

Yong-Hui Li，

Yuan-Feng Liu，

Xiao-Hua Lei，

Peng Feng，

Hong-Ming Shan

Nuclear Science and Techniques

Vol.34, No.3

Article number 45

Published in print Mar 2023

Available online 27 Mar 2023

DOI：10.1007/s41365-023-01184-5

81506

Spectral computed tomography (CT) based on photon counting detectors can resolve the energy of every single photon interacting with the sensor layer and be used to analyze material attenuation information under different energy ranges, which can be helpful for material decomposition studies. However, there is a considerable amount of inherent quantum noise in narrow energy bins, resulting in a low signal-to-noise ratio, which can consequently affect the material decomposition performance in the image domain. Deep learning technology is currently widely used in medical image segmentation, denoising, and recognition. In order to improve the results of material decomposition, we propose an attention-based global convolutional generative adversarial network (AGC-GAN) to decompose different materials for spectral CT. Specifically, our network is a global convolutional neural network based on an attention mechanism that is combined with a generative adversarial network. The global convolutional network based on the attention mechanism is used as the generator, and a patchGAN discriminant network is used as the discriminator. Meanwhile, a clinical spectral CT image dataset is used to verify the feasibility of our proposed approach. Extensive experimental results demonstrate that AGC-GAN achieves a better material decomposition performance than vanilla U-Net, fully convolutional network, and fully convolutional denseNet. Remarkably, the mean intersection over union, structural similarity, mean precision, PAcc, and mean F1-score of our method reach up to 87.31%, 94.83%, 93.22%, 97.39%, and 93.05%, respectively.

Photon-counting CTMaterial decompositionAttention mechanismGAN

Introduction

Traditional x-ray computed tomography (CT) employs an energy-integrating detector whose output is proportional to the energy integrated over the entire incident spectrum, while spectral responses of different materials are ignored [1,2]. To address this issue, photon-counting CT uses an energy-resolving detector to gain energy-dependent information, which can be applied to multi material detection and decomposition [3,4]. In classical material decomposition methods, the attenuation of objects is usually expressed as a linear combination, such as a linear combination of photoelectric absorption, incoherent (Compton) and coherent (Rayleigh) scattering [5,6] or several basis materials [7]. However, owing to the low signal-to-noise ratio of spectral CT images in a narrow energy bin, the material decomposition performance is affected by artifacts and noise, particularly for material components with similar x-ray absorption characteristics.

Several methods have been proposed to solve these problems. These methods can be classified into three categories: one-step inversion, projection domain-based, and image domain-based methods. One-step inversion algorithms combine reconstruction and decomposition processes to realize material decomposition directly from projection data. Mory et al. investigated the convergence speed of five one-step algorithms. One-step inversion methods are typically slower than their two-step counterparts because of their high computational cost [8]. Tilley et al. presented a model-based material decomposition algorithm that enabled reconstruction and decomposition to be synchronously performed using a multienergy forward model [9]. Meanwhile, Fang et al. proposed an iterative one-step inversion material decomposition algorithm with a Noise2Noise prior. This algorithm estimated material images directly from projection data and used a Noise2Noise prior for denoising [10].

Projection domain-based methods first decompose projection data into sinograms of materials and then perform reconstruction to obtain material images. Noh et al. proposed two novel sinogram restoration methods based on statistical models and suppress noise [11]. Petrongolo et al. used convex optimization to reduce noise in decomposed images in the projection domain [12]. Image domain-based methods first reconstruct images and then perform decomposition in the image domain. Zhao et al. developed an image-domain material decomposition algorithm by incorporating an edge-preserving filter into a local highly constrained backprojection reconstruction framework [13]. Niu et al. formulated an algorithm that included the inverse of the estimated variance-covariance matrix of the decomposed images as a penalty weight in the least-square term [14]. Chen et al. attempted to use the end-to-end AUTOMAP for fitting solutions in the image domain [15]. By using simulated data, Clark et al. trained CNNs with a U-Net structure [16]. After training, they tested the CNNs on experimentally acquired dual energy-energy integrating detector- and photon counting detector-based micro-CT data in mice. Their U-Net performed a more robust spectral micro-CT decomposition compared to sensitivity-based decomposition [17]. Chen et al. proposed a convolutional material decomposition algorithm that exhibited robustness during multimaterial decomposition of spectral CT images even when obvious ring artifacts were present [18]. Zimmerman et al. separated the soft tissue, bone, and gadolinium nanoparticles of an ex vivo rat leg specimen into different basis images with machine learning and transfer learning [19].

Theoretically, the purpose of spectral CT material decomposition in the image domain is to decompose the different material components in spectral CT images, which is the same goal as that of image segmentation tasks performed in certain clinical applications. Lungs, bones, and tissues are treated as the base material. Recently, deep learning has been widely used for image segmentation [20], classification [21], object detection [22], image denoising [23, 24], and artifact reduction [25, 26]. Deep learning can be used to perform feature learning and hierarchical feature extraction to solve image segmentation problems; thus, it is feasible to use it improve the accuracy of material decomposition of spectral CT images. Currently, some relevant studies have used deep learning technology for material decomposition. Feng et al. designed an end-to-end network model to obtain a virtual monochromatic attenuation map from a multicolor reconstruction map for material decomposition [27]. Xu et al. proposed a deep neural network composed of a fully convolutional network (FCN) and a fully connected network to solve a material decomposition problem [28]. Wu et al. presented a fully convolutional DenseNet (FC-DenseNet) based on densely connected blocks of Densenet to classify and quantify different materials [29]. Zhang et al. studied a model-based butterfly network to perform image domain material decomposition for dual-energy CT [30]. Gong et al. developed a neural network called Incept-net, which had an encoder—decoder framework, for material decomposition of spectral CT images [31]. Inspired by the generative adversarial network, Geng et al. proposed a novel parallel multi-stream generative adversarial network (PMS-GAN) to perform projection-based multi-material decomposition in spectral CT, in which they designed multiple sub-generators to perform multimaterial decomposition simultaneously [32]. Popescu et al. applied Pix2Pix GAN in retinal images and achieved satisfactory results in the segmentation of blood vessels; their network included a U-Net type generator and PatchGAN type discriminator [33]. However, these two GAN-based studies did not apply an attention mechanism and ignored some key feature information. Ma et al. proposed a residual dense convolutional neural network for low-dose CT denoising via an attention mechanism, which achieved excellent performance [34]. Lan et al. introduced a self-attention mechanism to a conditional generative adversarial network, which enabled robust deep learning-based neuroimaging synthesis. The self-attention module calculated the similarity among pixels of the image [35], which is computationally expensive. Unlike the methods proposed in previous studies, our global attention mechanism can extract global information using considerably fewer parameters.

To effectively improve the accuracy of material identification based on deep learning, we introduce the concept of adversarial learning to material decomposition research [36,37] and propose an attention-based global convolutional generative adversarial network (AGC-GAN) for material decomposition, which is capable of extracting feature information at multiple scales of spectral CT images. AGC-GAN employs large convolutional kernels to extract global feature information in feature maps at multi-scales and embeds an attention mechanism [38] in the fusion process of multi-layers of feature maps. Spectral CT images datasets were constructed by scanning a mouse specimen with a photon-counting CT system. Then, we performed experiments to evaluate the performance of AGC-GAN on this spectral CT image dataset. Compared to U-Net, FCN [39], and FCD [40] our proposed method achieved a better discrimination performance on three materials: lung, bone, and soft tissue.

Methodology

2.1

OVERVIEW OF AGC-GAN

Although there exist similar structural features of spectral CT images among different energy bins, they contain different gray feature information. Therefore, it is possible to construct a sufficiently large dataset according to the gray differences and structural correlation of spectral CT images and to improve the accuracy of material decomposition in spectral CT images by deep learning technology. Deep learning can be used for feature information extraction and learning, allowing inherent laws and deep features of the available dataset to be learnt.

In this study, we use deep learning technology to identify lung, bone, and soft tissue from the chest spectral CT images of a mouse. Generative adversarial networks (GANs) have recently become a powerful model in the deep learning field, opening a new avenue for deep learning frameworks. Based on the GAN algorithm, we designed AGC-GAN for material decomposition of spectral CT images. The structure of AGC-GAN is shown in Fig. 1. Our network consists of two sub-modules: a generator module and a discriminator module. First, spectral CT images are fed into the generator network to produce the corresponding material decomposition prediction maps. Then, the discriminator network uses the reference image and the prediction map to determine the authenticity. The generator produces material decomposition images with the help of feedback from the discriminator and deceives the discriminator, while the discriminator attempts to distinguish between the generative images and the reference images. Functionally, the adversarial learning process corrects inconsistencies between predicted decomposition maps and reference images.

Fig. 1

(Color online) Architecture of AGC-GAN. It consists of two modules: a generator G and a discriminator D. The generator attempts to generate the results of material decomposition at the pixel level, while the discriminator attempts to discriminate whether inputs are reference images or images generated by the generator.

2.2

NETWORK STRUCTURE

2.2.1

GENERATOR NETWORK

As shown in Fig. 1, AGC-GAN has an attention-based global convolutional network (AGCN), which is a U-type network, as the generator. Resnet-152 [41] was employed as the backbone network for AGCN in this study, and the details of backbone are presented in Table 1. Initially, multi-scale feature maps of inputs were extracted from different stages in the backbone network. Subsequently, a global convolutional block (GCB) was employed to extract the global information of the feature map, and a boundary refinement block (BRB) was used to help the model learn the boundary information of the feature map [42]. The GCB uses a large convolution kernel to achieve global convolution, enabling the feature information at each level to be closely related to the pixel categories. However, the large kernel size convolution operation entails excessive computation and storage requirements during the learning process. Therefore, GCB adopts parallel asymmetric convolution to complete the large-kernel-size convolution, effectively reducing the number of parameters and computational burden of the network.

Structure of Resnet152 layers

	Conv2d (kerne_size, out_channels)
Resnet152-layer0	7×7, 64, stride 2
Resnet152-layer1	3×3 max pool, stride 2
	$[\begin{matrix} 1 \times 1, 64 \\ 3 \times 3, 64 \\ 1 \times 1, 256 \end{matrix}] \times 3$
Resnet152-layer2	$[\begin{matrix} 1 \times 1, 128 \\ 3 \times 3, 128 \\ 1 \times 1, 512 \end{matrix}] \times 8$
Resnet152-layer3	$[\begin{matrix} 1 \times 1, 256 \\ 3 \times 3, 256 \\ 1 \times 1, 1024 \end{matrix}] \times 36$
Resnet152-layer4	$[\begin{matrix} 1 \times 1, 512 \\ 3 \times 3, 512 \\ 1 \times 1, 2048 \end{matrix}] \times 3$

Low-resolution feature maps were then up-sampled by using an up-sampling block (USB) and merged with higher-resolution feature maps to generate new feature maps. The final prediction maps were generated after the last BRB, which was used to output the decomposition results of AGCN. The compositions of the GCB, BRB, and USB are shown in Table 2.

Structure of GCB, BRB, and USB

GCB	BRB	USB
1×15 Conv, 15×1 Conv	3×3 Conv	3×3 Transposed Conv
15×1 Conv, 1×15 Conv	BN + ReLu 3×3 Conv

To enhance the feature extraction performance of the generator network, we added an attention mechanism to the fusion of feature maps for attention mapping. The attention mechanism can help the network assign different weights to various parts of feature maps and extract critical information, enabling it to make more accurate judgments in the learning process. Recently, the attention mechanism has become a powerful tool for performing deep learning tasks. Woo et al. proposed the convolutional block attention module (CBAM), which improves the accuracy of networks by concatenating spatial and channel attention [43]. Inspired by the CBAM, we designed a channel attention block (CAB) and a spatial attention block (SAB) and incorporated them into the generator network. Figure 2 shows the structure of the CAB and SAB.

Fig. 2

Specific structure of the attention modules (CAB and SAB)

The CAB contributes to assigning different weights for convolutional feature maps. These weights indicate the importance of channel-wise feature maps. Specifically, two different layers of feature maps provide the inputs of the CAB. First, feature maps of two different scales are concatenated in the channel dimension and then fed into 1×1 convolution (Conv), batch normalization (BN), and rectified linear unit (ReLU) layers sequentially after an average pooling (Avg-Pool) operation. Second, the channel attention map is obtained using the sigmoid activation function. Finally, the input feature map of the CAB is dot multiplied (DM) with the channel attention map and summed with the next-level feature map to obtain the feature information extracted by the CAB as the output. The SAB helps the network assign different weights to various spatial regions according to their feature content. Specifically, the SAB takes the output of the CAB as its input. First, the input feature maps are subjected to maximum pooling (Max-Pool) and Avg-Pool; subsequently, they are concatenated in the channel dimension. The feature map is then sequentially fed into 3×3 Conv, BN, and ReLU layers and activated using the sigmoid function to obtain the spatial attention map. Eventually, the input feature map is DM with this spatial attention map to obtain the final output feature map. Thus, more informative and discriminative features can be obtained using the CAB and SAB, thereby effectively enhancing the decomposition performance of the network.

2.2.2

DISCRIMINATOR NETWORK

The discriminator network of AGC-GAN in this study is PatchGAN, which is also called a Markovian discriminator [44]. As shown in Fig. 1, the original spectral CT image and the corresponding decomposition result image are concatenated together as the input of discriminator. Specifically, the reference image is considered “real” for the discriminator, while the decomposition result image produced by the generator is considered “fake”. The discriminator mainly consists of 4 down blocks (DB) and 1 output convolutional layer that finally outputs a 30×30 size confidence map. The structural parameters of the discriminator network are listed in Table 3.

Structure of down sampling block.

Down sampling block 1	4×4 Conv LeakyReLU
Down sampling block 2	$[\begin{matrix} 4 \times 4 Conv \\ BN \\ LeakReLU \end{matrix}] \times 3$

2.2.3

LOSS FUNCTION

The loss of GAN based on binary cross-entropy loss calculated at the discriminator can be defined as $L_{GAN} = \min_{G} \max_{D} E_{x} [\log (D (x))] + E_{z} [\log (1 - D (G (z)))]$ (1) where G and D represent the generator and discriminator in GAN respectively, x is the corresponding label maps after decomposition, and z is the spectral CT images. The training procedure of GAN is a process of confrontation between G and D. For G, the output image produced by itself is expected to deceive the opponent D, so the optimization goal of D is to minimize $L_{GAN}$ . In contrast, D attempts to distinguish the authenticity of the input image, so the optimization goal of D is to maximize $L_{GAN}$ . As can be seen above, the process of training and learning between D and G resembles two people playing a Max-Min game, which is an adversarial process, and thus $L_{GAN}$ is also referred to as generative adversarial loss.

In this study, the objective loss function of the AGC-GAN designed herein consists of two components, the generative adversarial loss and the cross-entropy loss of the generator network, which is defined as shown in Eq. (2): $L_{AGC - GAN} = L_{GAN} (G, D) + λ_{CE} L_{CE} (G),$ (2) where $L_{CE} (G)$ indicates the cross-entropy loss of generator, which acts a regularization item role. A hyperparameter λ_CE refers to the regularization item weight of $L_{CE} (G)$ . The cross-entropy loss of the generator can be formulated as $L_{CE} (G) = - \sum_{m = 1}^{M} \sum_{n = 1}^{N} y_{m, n} \log (p_{m, n})$ (3) When calculating the cross-entropy loss of the decomposition network, the label image will be processed into the form of one-hot coding. M and N are the total number of pixels and the number of categories of materials in the decomposition image, respectively. ym,n is the category label value corresponding to the m-th pixel point in the decomposition image, and pm,n is the predicted value of the decomposed network output for the n-th material.

2.3

SPECTRAL CT DATASET

To obtain our spectral CT image dataset, we used a spectral CT imaging system based on a photon counting detector to scan a mouse specimen in our study. The photon-counting detector (SANTIS 0804) produced by DECTRIS has four fully calibrated and independently adjustable energy gating thresholds, and its effective detection area is 1024×256 pixels, with a pixel size is 150 μM. The SANTIS 0804 is a hybrid photon counting (HPC) detector with two configurations: high-resolution mode and multi-energy mode. We set the detector mode of the spectral CT imaging system to multi-energy mode, where the maximum input photon counting rate is 4.0×10⁸ photons/s/mm².

We selected one mouse of approximately 10 cm in length and 160-180 g in weight to be scanned. The scanned mouse specimen was treated with the required reagents. The scanning parameters for the mouse specimen in the experiment are listed in Table 4. The spectral CT system scanned the mouse three times, and the SANTIS 0804 detector was set with four different energy thresholds for each scan. Consequently, we obtained spectrum data of the mouse specimen in 12 different energy bins. We selected 10 distinctive spectral ranges from 12 energy channels, namely 25–90, 30–90, 35–90, 40–90, 45–90, 50–90, 55–90, 60–90, 65–90, and 70–90 keV. In each scanning process, 250 projection views were acquired for each energy bin. We selected 100 slices from the projection data to reconstruct CT images, and each spectral CT image mainly contained three materials: lung, bone, and soft tissue. The selected projection data was reconstructed via the Split-Bregman algorithm, and finally a dataset containing 1000 spectral CT images was constructed.

Parameters of spectral CT imaging system

Tube voltage (kvp)	90
Tube current (μA)	200
Source to detector distance (mm)	350
Source to mouse specimen distance (mm)	210
Total number of angels	250
Length of detector	1024
Number of projections	256

In addition, we manually made labels for each spectral CT image by using LabelMe, a professional labeling software. Each label image had the same resolution as the original spectral CT image. Some of the original spectral CT images and labeled reference images in the final dataset are shown in Fig. 3. Because all the data were from one mouse, we performed 5-fold cross-validation experiments to avoid overfitting. We divided the 100 slices into five equal parts from mouse head to mouse tail. Every part contained 20 slices, with every slice having 10 images with 10 different energy bins. We used three parts for training, one part for validation, and the remaining part for testing. We repeated this process five times as shown in Fig. 4 and reported the average results on these five testing parts to evaluate the performance of the model.

Fig. 3

(Color online) Samples of the original images and the corresponding labeled reference images in the dataset; the first row presents the original images, and the second row contains the manually labeled reference images corresponding to the original images. The corresponding energy ranges from left to right are as follows: 25–90, 35–90, 45–90, 55–90, and 65–90 keV.

Fig. 4

(Color online) Data set of cross-validation experiments

EXPERIMENTAL RESULTS

To verify the feasibility of AGC-GAN for material decomposition of spectral CT images, U-Net, FCN, and FCD were selected for comparison. The programming language used in the experiments was Python 3.7, and the individual network was implemented based on the deep learning framework PyTorch 1.5.1. The hardware platform configuration for running the network is as follows: Intel i5-9600kf CPU, Nvidia TITAN V (12GB/Nvidia) GPU, and 16G DDR4 RAM. The initialization learning rate of each network was set to 0.001, the decay rate of the learning rate was 0.95/(2 epochs), and the parameters of each layer of network were optimized by the Adam (Adaptive Moment Estimation) algorithm. In every rotation, we used the same training dataset to train AGC-GAN and other comparison networks, and we then used the same test dataset to test the decomposition performance of each network. The differences among the results of each network are shown in Fig. 5. The results of the three slices are from rotation 1, 3, and 4.

Fig. 5

(Color online) Comparison of qualitative results of different networks on the test dataset.

From Fig. 5, we can see that the decomposition results of AGC-GAN are closer to the ground truth compared with other networks. AGC-GAN achieves superior material decomposition results, particularly for lung and bones. In addition, to further evaluate the material decomposition performance of each network, we calculated intersection over union (IoU), structural similarity (SSIM) index, precision, pixel accuracy (PAcc), and F1-score (F1) metrics for each network. IoU is a standard measurement method to check the degree of similarity of the output with the ground truth. IoU can be expressed as $IoU = \frac{ground truth \cap prediction}{ground truth \cup prediction} .$ (4)

SSIM index is used as a metric to measure the similarity between two given images. $SSIM = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}$ (5) $c_{1} = (k_{1} L)^{2}, c_{2} = (k_{2} L)^{2}$ (6) In Eqs. (5) and (6), μx and μy are the average of the pixel values. σxy is the covariance of two images, $σ_{x}^{2}$ and $σ_{y}^{2}$ are the variances of the two images. k₁ and k₂ are normal constants. L is the dynamic range for pixel values. Precision indicates how many of the values predicted positive by the model are actually positive. It can be calculated as $Precision = \frac{T P}{T P + F P}$ (7) PAcc is a measurement method to indicate the ratio of correct prediction pixels and total pixels. It can be calculated by Eq. (8910): $PAcc = \frac{T P + T N}{T P + T N + F P + F N}$ (8) $Recall = \frac{T P}{T P + F N}$ (9) $F 1 = \frac{2 \times Precision \times Recall}{Precision + Recall}$ (10) Recall is the ratio of correctly predicted positive observations to the all observations in the actual class. F1-score is the weighted average of precision and recall. TP denotes true positive, TN is true negative, FP is false positive, and FN is false negative. $MIoU = \frac{1}{k} IoU,$ (11) $MPre = \frac{1}{k} Precision,$ (12) where k is the number of material categories. The above evaluation indexes are positively correlated with the decomposition performance of the network, and a higher index value indicates better material decomposition performance of the network. To evaluate the performance of our network, we used U-Net, FCN, and FCD on the same dataset. We also performed an ablation experiment to estimate the attention mechanism. The GCN-GAN network was AGC-GAN without an attention mechanism. The quantitative results of AGC-GAN and other networks in terms of IoU, SSIM, precision, PAcc, and F1-score are shown in Tables 5, 6, and 7, respectively. Compared with U-Net, FCN, and FCD, AGC-GAN showed improvement in the IoU, SSIM, precision, PAcc, and F1-score metrics of three materials: lung organ, bone, and soft tissue. Moreover, the AGC-GAN incorporating attention and generative adversarial learning theory improved the IoU metrics to 87.00%, 80.37%, and 94.54% for lung organ, bone, and soft tissue materials, respectively. In terms of SSIM index, AGC-GAN achieved the best performance (94.83%). Moreover, AGC-GAN improved the precision metric on bone and tissue and achieved the same performance as GCN-GAN. For PAcc index, we calculated all the pixels predicted correctly. AGC-GAN achieved the best performance, up to 97.39%. As a balance index between precision and recall, F1-score can value the network incorporates. AGC-GAN achieved the highest F1-score at 93.05%. From the specific quantitative results, it can be seen that AGC-GAN achieved the highest values of MPre, MIoU, PAcc, and F1-score compared with U-Net, FCN, and FCD, indicating that AGC-GAN can outperform other networks in material decomposition.

IoU metrics of cross-validation experiments

Networks	IoU
	Lung (%)	Bone (%)	Tissue (%)	MIoU (%)
U-Net	86.84±3.32	80.27±4.28	94.09±0.60	87.20±2.48
FCN	86.57±3.88	79.00±4.46	93.65±0.82	86.54±2.86
FCD	79.45±8.89	58.59±6.98	90.98±1.76	76.46±5.38
GCN-GAN	85.52±3.60	75.30±3.99	93.71±1.09	84.84±2.64
AGC-GAN	87.00±3.01	80.37±3.21	94.54±1.08	87.31±2.16

SSIM and precision metrics of cross-validation experiments

Networks	SSIM	Precision
		Lung (%)	Bone (%)	Tissue (%)	MPre (%)
U-Net	94.21+0.53	93.37+2.29	88.04+5.35	97.04+0.91	92.95+1.77
FCN	93.96+0.62	93.80+1.72	87.83+4.91	96.94+1.00	92.89+2.00
FCD	92.37+1.11	89.01+3.64	66.24+8.93	95.71+0.63	83.65+4.20
GCN-GAN	93.32+0.74	93.39+3.09	82.39+3.61	96.74+1.21	90.84+1.69
AGC-GAN	94.83+0.81	93.85+1.19	88.68+1.87	97.13+0.99	93.22+0.85

PAcc and F1 metrics of cross-validation experiments

Networks	PAcc	F1-score
		Lung (%)	Bone (%)	Tissue (%)	Mean (%)
U-Net	97.21±0.41	92.48±0.19	88.05±2.75	96.48±0.32	92.47±1.53
FCN	97.18±0.42	92.31±2.25	87.34±2.83	96.25±0.43	92.10±1.73
FCD	96.32±1.76	88.21±1.31	73.54±5.59	95.26±0.97	86.42±5.11
GCN-GAN	97.35±0.87	92.12±2.13	85.72±2.67	96.75±0.58	91.53±1.65
AGC-GAN	97.39±0.47	93.00±1.75	88.97±2.00	97.19±0.58	93.05±1.29

DISCUSSION AND CONCLUSION

Photon-counting CT is a promising new technology for CT systems. It can provide spectral information for identifying different components than existing technologies, and it can improve the accuracy of material identification. However, the noise in images in narrow energy bins is very strong, making the grey value of images inaccurate. Traditional material decomposition methods do not provide satisfactory decomposition results. Although spectral CT images with different energy bins have different grey values, they have the same structural similarity. Finally, we proposed a novel methodology to improve the accuracy of material decomposition. We intend to replace traditional material decomposition with segmentation based on deep learning. This will help improve the accuracy and aid radiologists in diagnosis. Our global attention mechanism contains channel attention and spatial attention. For photon-counting CT data, every slice has ten images under ten different energy bins. Channel attention mechanism can assign different weights to every channel, while spatial attention mechanism assigns different weights on the same position of every channel. Thus, we can extract feature information of the different energy bins. We also utilized the global convolution block and boundary refinement block to realize global convolution. However, self-attention mechanism usually uses a three-feature space for calculating the similarity among image pixels, which is computationally expensive and requires more samples to avoid overfitting. We performed five-fold cross-validation experiments to test our algorithm. The proposed method achieved the best values for precision, IoU, PAcc, and F1-score metrics in comparison with three other networks, providing better performance for the material decomposition of spectral CT images. Note that the proposed AGC-GAN here is used for material decomposition of two-dimensional tomographic images. There is a high structural similarity and a strong correlation in the spatial structure for spectral CT images with different energy ranges. Therefore, we will use volume data to train a three-dimensional network model in our follow-up research. This model could learn more local feature information in the spatial dimension, improving the accuracy of material decomposition.

In summary, we proposed a material decomposition method, named AGC-GAN, for spectral CT images. In our network, the generator generates the results of material decomposition, while the discriminator attempts to distinguish whether the inputs are the reference or generated image. The generator and discriminator facilitate the material decomposition accuracy of AGC-GAN by learning against each other in a confrontational manner. The experimental results demonstrated that compared with U-Net, FCN, and FCD, our network exhibited excellent results the material decomposition of spectral CT. We performed cross-validation experiments to avoid the overfitting problem. In the future, a large amount of photon-counting CT data will be collected to improve the generalization ability of our network.

References

[1]

M.J. Willemink, M. Persson, A. Pourmorteza et al.,

Photoncounting CT: technical principles and clinical prospects

. Radiology 289, 293 (2018). doi: 10.1148/radiol.2018172656