logo

Sparse-view phase-contrast and attenuation-based CT reconstruction utilizing model-driven deep learning

ACCELERATOR, RAY TECHNOLOGY AND APPLICATIONS

Sparse-view phase-contrast and attenuation-based CT reconstruction utilizing model-driven deep learning

Xia-Yu Tao
Qi-Si Lin
Zhao Wu
Yong Guan
Yang-Chao Tian
Gang Liu
Nuclear Science and TechniquesVol.36, No.4Article number 71Published in print Apr 2025Available online 08 Mar 2025
1501

Grating-based X-ray phase-contrast imaging enhances the contrast of imaged objects, particularly soft tissues. However, the radiation dose in computed tomography (CT) is generally excessive owing to the complex collection scheme. Sparse-view CT collection reduces the radiation dose, but with reduced resolution and reconstructed artifacts particularly in analytical reconstruction methods. Recently, deep learning has been employed in sparse-view CT reconstruction and achieved state-of-the-art results. Nevertheless, its low generalization performance and requirement for abundant training datasets have hindered the practical application of deep learning in phase-contrast CT. In this study, a CT model was used to generate a substantial number of simulated training datasets, thereby circumventing the need for experimental datasets. By training a network with simulated training datasets, the proposed method achieves high generalization performance in attenuation-based CT and phase-contrast CT, despite the lack of sufficient experimental datasets. In experiments utilizing only half of the CT data, our proposed method obtained an image quality comparable to that of the filtered back-projection algorithm with full-view projection. The proposed method simultaneously addresses two challenges in phase-contrast three-dimensional imaging, namely, the lack of experimental datasets and the high exposure dose, through model-driven deep learning. This method significantly accelerates the practical application of phase-contrast CT.

Sparse-view CTPhase-contrast CTAttenuation-based CTDeep learning networkFrequency loss function
1

Introduction

Computed tomography (CT) has become an indispensable imaging tool in clinical practice [1]. CT contributes to the noninvasive and painless diagnosis of human organs of interest and crucial in preoperative evaluation and treatment planning [2]. Medical CT is generally based on the absorption principle; however, the low contrast of soft tissues hinders the early diagnosis of cancer and other diseases [3]. Grating-based X-ray phase-contrast CT offers multicontrast and enhanced contrast for low-Z soft tissues and provides the possibility of early diagnosis [4-7]. Regrettably, CT requires an extra X-ray dose, which can be damaging to patients [8], particularly their DNA [9-11]. In phase-contrast CT, the radiation dose is several times higher than that in conventional CT because it requires multiple projections at each tomographic viewing angle to retrieve multicontrast information [12]. Reducing the dose generally leads to lower image quality and potential misdiagnosis [11]. However, low-dose imaging methods have been proposed to maintain the signal-to-noise ratio (SNR) by leveraging prior knowledge. Therefore, achieving a balance between effective medical examinations and minimizing radiation damage is crucial [10].

There are two approaches to decreasing X-ray radiation damage: low-dose and sparse-view CT. In low-dose CT, the X-ray exposure in each view is reduced, and a photon-counting detector can be utilized to maintain the SNR of the projections. In sparse-view CT, violations of the Shannon/Nyquist sampling theorem lead to reduced resolution, artifacts, and distortions in the reconstructed image [13]. This study focuses on sparse-view CT and introduces a new reconstruction algorithm aimed at suppressing these artifacts and distortions, thereby enhancing image quality.

Filtered back projection (FBP) is a method widely used in modern CT systems for high-dose full-view CT because it provides rapid and high-quality results with minimal computational resources [14, 15]. However, when applied to sparse-view CT, FBP often generates significant stripe artifacts. Iterative reconstruction (IR) algorithms, such as the simultaneous algebraic reconstruction technique (SART) [16] and simultaneous iterative reconstruction technique (SIRT) [17], can partially suppress artifacts through iterative forward projection and backward correction based on convex optimization theory [18]. With the advancement of the compressed sensing (CS) algorithm, which enables signal reconstruction from undersampled data [19, 20], numerous studies have focused on the total variation (TV) model, which utilizes the gradient of parametric L1 to smooth images [21, 22]. TV methods serve as regularization terms in the cost functions of IR algorithms by incorporating prior knowledge [23]. In certain ideal scenarios, TV models, such as TV-Projection Onto Convex Sets (POCS) [24], SART-TV [21], and Total Generalized Variation Regularization (TGV) [23] can effectively eliminate stripe artifacts in the reconstructed images [25-27].

Recently, deep learning (DL) has been widely adopted for various image processing tasks, including image denoising [28, 29], image recognition [30], image segmentation [31], image inpainting [32], and image super-resolution [33]. DL-based algorithms have shown remarkable performance improvements over traditional methods, particularly in handling noisy images and enhancing the image quality [34-37]. Researchers have also applied DL technology in sparse-view CT and investigated the significance of datasets. Hwan et al. proposed an end-to-end deep convolutional network-based U-net (FBPConvNet) that was trained using reconstructed slices from both sparse-view and full-view CT scans as the input and output, respectively, [38]. They utilized a biomedical dataset for training and achieved superior results compared with the TV method. Han et al. demonstrated that DL networks could effectively distinguish streaking artifacts from artifact-free images [39]. They employed a deep residual-learning architecture trained on data from nine patients to suppress streaking artifacts. Han et al. highlighted the limitations of the U-Net architecture, which excessively emphasized the low-frequency components of the signal, resulting in blurred image edges [40]. To address this issue, they proposed a new multiresolution DL framework to recover high-frequency edges in sparse-view CT. Guan et al. introduced the fully dense U-Net (FD-UNet) to remove artifacts in 2D-PAT (photoacoustic tomography) images reconstructed from sparse data [41]. However, they observed that the performance of FD-UNet deteriorated when the training and testing data did not effectively match. Asif et al. utilized a GAN to generate cardiac images and suppress cardiac motion artifacts [42]. They also proposed diffusion and score-matching models for generating CT images from MRI images [43]. Nevertheless, the current DL-based sparse-view CT reconstruction algorithms rely heavily on experimental datasets. Acquiring sample datasets such as medical datasets is a laborious, time-consuming, and expensive process. Moreover, the limited data do not guarantee the reliability of DL algorithms. In phase-contrast imaging experiments, the test samples are varied, and obtaining numerous full-view datasets in advance is not consistently feasible. Consequently, alternative datasets are required for training.

Previous studies have shown that natural and medical images share common low-level features and similarities in terms of edges, points, and textures [44]. The transfer of prior knowledge from natural image processing to medical image processing has been validated in several studies. For instance, Zhong et al. synthesized the noise in natural images for low-dose CT (LDCT) denoising and transferred the learned knowledge to medical images to prevent overfitting during training [45]. In another study, Zhen et al. pretrained a classification network on ImageNet and fine-tuned a convolutional neural network (CNN) for transfer learning to predict the toxicity of a cervical cancer rectal dose [46]. These studies demonstrated similarities between natural and medical images in terms of pixel correlation and low-level features. Consequently, natural image datasets are excellent for deep learning reconstruction of phase-contrast images without training data.

Motivated by these studies, we propose a physical model of limited-angle CT that utilizes natural images to generate abundant high-quality data [47]. Excellent reconstruction results were achieved by incorporating an optimized network structure and loss function.

In this study, model-driven DL was introduced to solve the issues of limited experimental training datasets and high exposure doses in sparse-view phase-contrast CT. The CT device was parameterized for both attenuation-based and phase-contrast CT procedures, allowing the generation of simulation datasets. The reconstruction results of sparse-view attenuation-based CT and phase-contrast CT demonstrate that the proposed method substantially suppresses artifacts.

The main contributions of this study are summarized as follows.

1. We propose a novel DL CT reconstruction method that integrates an X-ray phase-contrast imaging model with superior generalization capabilities. By eliminating the network's dependence on the experimental data, our method improves the accuracy and robustness of sparse-view CT reconstruction.

2. Furthermore, a new frequency loss function is introduced based on the Fourier Slice Theorem. This loss function transforms the projection data into a fidelity term of the network via a Fourier transform, resulting in enhanced image generation quality.

3. Superior performance compared to traditional algorithms was realized using experimental data from both attenuation-based CT and phase-contrast CT. This improved performance highlights the potential of our method for advanced applications of phase-contrast CT.

The remainder of this paper is organized as follows: Section 2 introduces and discusses the proposed algorithm and its detailed framework. Section 3 presents experimental results obtained by applying the proposed method to a laboratory phase-contrast CT. Section 4 discusses the strengths and limitations of the proposed algorithm. Finally, Sect. 5 summarizes the findings.

2

Methods

Figure 1 illustrates the architecture of the proposed sparse-view reconstruction framework based on model-driven simulated big data. We acquired natural images from the Common Objects in Context (COCO) 2017 dataset [48], which consists of various animals, scenery, architecture, food, and more. The first step of the proposed method involves image batch preprocessing, which standardizes the sizes of all images and eliminates grayscale variations. Subsequently, sparse-view CT data were simulated from the preprocessed normalized images using a forward projection algorithm specifically designed for grating-based X-ray phase-contrast CT equipment. Three-dimensional images with artifacts were reconstructed using the FBP algorithm. These reconstructed images were fed into an end-to-end DL network based on Unet. The normalized images served as the ground truth (GT) to train the network parameters and were evaluated using a loss function. Finally, a DL network was employed to obtain artifact-free reconstructed images by applying FBP to images with artifacts.

Fig. 1
(Color online) The framework of the proposed sparse-view reconstruction based on model-driven simulated big data. The process is as follows: (a) The dataset consists of raw data from natural images. (b) The images undergo pre-processing and normalization. (c) The normalized images are forward-projected and reconstructed using the FBP algorithm. (d) The network architecture and its training are based on simulated data. (e) The experimental artifact-free images are obtained by feeding the FBP-reconstructed images into a DL network trained on the simulated dataset
pic
2.1
Data set

In this study, a CT model was used to generate a simulated natural dataset rather than experimental CT projection data. Natural data were obtained from the COCO2017 dataset, which contained a diverse collection of 118,287 images featuring animals, scenery, architecture, and food. The natural dataset exhibited a rich image distribution, facilitating generalization to unknown sample data. In the results section, we utilize the training data from the natural dataset for our proposed method. However, dissimilarities arose in the final training data employed for attenuation-based CT and phase-contrast CT because of disparities in their respective CT projection models. Consequently, the sparse-view reconstruction images obtained through CT model simulations exhibited variations between the two modalities.

A medical dataset was used for comparison. The medical dataset was sourced from the American Association of Physicists in Medicine (AAPM) Low Dose CT Grand Challenge and the Cancer Imaging Archive (TCIA) "Low Dose CT Image and Projection Data" dataset [49]. The medical dataset was simulated and derived from full-dose lung CT images from the AAPM dataset. A total of 5,623 images were obtained from the data of 30 patients and through data augmentation techniques such as image rotation, a final dataset of 118,287 images was generated. The calculation methods employed for the medical dataset were consistent with those used for the proposed natural dataset, except for image inconsistency.

2.2
Image batch pre-processing

Because of the input size limitation of the DL network, the images were resized to a uniform size of 512 × 512 pixels. The resizing process involves interpolation, scaling, and cropping. Subsequently, the images were converted to grayscale and normalized. During training, the GT image for the network was generated using circular artifact-free images obtained via image-batch preprocessing.

2.3
Attenuation-based projection and CT reconstruction

In the calculation of attenuation-based CT projections, the relationship between the object image function and the projection data in each view can be expressed using the following equation: Pθ=AθX (1) In the discrete model, Pθ=(pi)RM represents a one-dimensional column vector array that captures the line integral of an object image at projection angle θ, where M is the number of detectors. As each projection value may be influenced by all values in the object image, we arrange the two-dimensional image into a one-dimensional column vector: X=(xj)RN where xj represents each pixel in X, and N represents the total number of pixels. Aθ=(aij)RM*N represents a two-dimensional projection system matrix of size M*N, where aij represents the coefficient of contribution of pixel point xj in the object image to the line integral projection value pi.

To reduce the errors in the discrete calculation, we adopted an area projection model, where aij represents the ratio of the overlapping area of ray i with voxel j to the area of the pixel [50]. This study primarily focused on the geometry of the parallel beam and employed the 'fan2para' function in MATLAB to convert the experimental data from fan-beam geometry to the parallel case.

Sparse-view CT reconstructions were generated using 90 and 45 projection views via the FBP algorithm with a ramp filter kernel. During the network training process, we used sparse-view CT reconstruction results as inputs and object images as outputs to train the network and adjust the network parameters using the loss function.

2.4
Phase-contrast projection and CT reconstruction

Compared with attenuation-based projection, phase-contrast projection requires additional differential calculations in the direction perpendicular to the optical and rotation axes. In CT reconstruction, the filter kernel is a Hilbert function that directly retrieves the decrement of the refractive index without requiring integration [51, 52].

2.5
Deep learning network structure

The proposed DL network was optimized from the Unet architecture, incorporating various layers such as convolutional layers (Conv2d), batch normalization layers (BN), layers with the Leaky ReLU activation function, deconvolutional layers (ConvTranspose2d), residual blocks (ResBlock) [53], and a tanh layer to output the result. To enhance the network performance, additional downsampling layers were introduced in the network structure to expand the receptive field and capture more high-level information. To prevent information loss, the pooling layers were replaced with convolutional layers that utilized larger strides. Moreover, residual blocks with expandable depths were utilized to further increase network capacity and improve feature extraction. In the skip connection, the features are directly concatenated with the corresponding features in the upsampling path to retain crucial low-level information. Using the following calculation, the network can be designed based on the dimensions of the input and output images.

Convolutional layers can extract information from the input images, and weights can be shared across the network. Using convolutional kernels of different sizes and strides, various feature sampling functions can be achieved. The sizes of convolutional input and output were calculated as follows: N=(WF+2P)S+1 (2) where W represents the size of the input feature map, N is the size of the output feature map, F is the size of the convolution kernel, P denotes the padding size that fills the feature map with zero values, and S denotes the convolutional stride. The network does not include any pooling layers, as they may lead to the loss of significant information during training. Rather, we employed a convolutional kernel with a stride size of 2 to achieve the desired effect. The effective convolution size of the network can be increased by incorporating feature maps with different resolutions.

Batch normalization layers are effective in accelerating convergence and preventing gradient explosions. They also serve as regularization techniques to mitigate network overfitting. Leaky ReLU layers are nonlinear activation functions that enable a network to perform effectively in nonlinear computations. The mathematical expression for the Leaky ReLU is as follows: y=max(0, x)+leak×min(0, x) (3) Here, the leak is a small constant that retains information from the negative axis.

Deconvolutional layers are utilized to restore the feature map size and facilitate feature extraction and are primarily employed in image generation. The sizes of deconvolutional input and output are calculated as follows: N=(W1)S2P+F (4) Shortcut connections also referred to as skip connections, allow the direct transfer of features from different layers to subsequent layers, thus preventing the loss of information from deeper features.

ResBlock is a convolutional layer that maintains a consistent feature-map size. It increases the network parameters without overly increasing its susceptibility to overfitting, thereby improving the overall performance.

The tanh layer serves as the output layer, generating images with pixel values ranging from 0 to 1. The mathematical expression for the tanh layer is as follows. y=tanh(x)=exexex+ex (5)

2.6
Loss function

A mixed loss function was designed by combining mean absolute error (MAE), perceptual loss [54], and frequency loss. The loss function is expressed as follows: Loss=α×Yf(X)+β×PreceptualLoss+γ×FrequencyLoss (6) where Y represents the GT images, X represents the images input to the network, f(X) represents the output images from the network, and Yf(X) represents the MAE. The scale factors α, β and γ are used to adjust the contribution of each loss component.

MAE loss primarily aims to prevent image distortion, making the network training output resemble that of the GT images. Compared with the mean squared error (MSE) loss ((Yf(X))2), the MAE produces images generated by the DL network with sharper edges. However, MAE may produce special singularities with large error values, necessitating the incorporation of additional loss functions to mitigate pixel-generation errors.

Perceptual Loss enhances the accuracy of the generated images by considering low- and high-level features. For this purpose, a pretrained VGG19 network was utilized as a part of the loss function. The VGG19 network was trained on the ImageNet dataset and its parameters were fixed after training. VGG19 is a CNN capable of extracting high- and low-level features at different resolutions. Low-level features are typically found in the early convolutional layers, whereas high-level features are typically present in the later convolutional layers. The mathematical expression for perceptual loss is as follows: PerceptualLossfeatureϕ,j=1CiHiWiϕj(y^)ϕj(y)22 (7) Here, j represents the number of network layers of VGG16; CjHjWj represents the size of the feature map in layer j; y and y^ represent the GT and network output, respectively; and ϕj(x) represents the output of the feature map in layer j.

The presence of stripe artifacts can be attributed to the loss of information in the frequency spectrum domain, according to the Fourier Slice Theorem. The Fourier-transformed image distinguishes between high- and low-frequency information within the image. According to Parseval's theorem, the MAE loss in the spatial domain is equivalent to the MAE loss in the frequency spectrum domain. In the spectral domain, the low-frequency information coefficients are an order of magnitude higher than the high-frequency information, which explains the smoother appearance of the MAE loss images. The frequency loss component also helps reduce the occurrence of checkerboard artifacts.

To mitigate the impact of low-frequency components and focus network learning on high-frequency information, an adaptive frequency spectrum domain loss based on the MAE loss was employed. Assume that the DL network generates images as f(x,y) with a two-dimensional Fourier transform of F(u,v)=a+bi and that the GT generates images as f0(x, y) with a two-dimensional Fourier transform of F0(u, v)=a0+b0i. The mathematical expression for the frequency-spectrum domain loss is as follows: FrequencyLoss=mean(w(aa0)+w(bb0)) (8) w=log((aa0)+(bb0)+1) (9) where w denotes the dynamic update factor. During network training, w reduces the overlearning of low-frequency parts and emphasizes the learning of high-frequency information.

2.7
Experimental data testing

For experimental data testing, we normalized the FBP reconstruction results because of the limited range of values for the network input and output. In addition, after obtaining the de-artifacting output from the network, we must apply inverse normalization to the image data. The final computed image represents the reconstruction results obtained using the proposed method.

2.8
Image evaluation criteria

We evaluated the performance of the proposed algorithm using two metrics: Peak SNR (PSNR) and structural similarity index (SSIM). PSNR measures the ratio between the maximum possible power of an image and the power of the corrupting noise that affects the quality of its representation. The PSNR formula is as follows: PSNR=10log10(L1)2MSE (10) where L denotes the number of maximum possible intensity levels (with the minimum intensity level assumed to be zero) in the image.

MSE is defined as follows: MSE=1MNi=0m1j=0n1(O(i, j)D(i, j))2 (11) In the above equation, O represents the original image, D represents the reconstructed image, m represents the number of rows of pixels, i represents the index of that row of the image, n represents the number of columns of pixels, and j represents the index of that column of the image.

SSIM, however, measures the structural similarity between the original and reconstructed images. The SSIM formula is defined as SSIM(x, y)=(2μxμy+c1)(2σxy+c2)(μx2+μy2+C1)(σx2+σy2+c2), (12) where y represents the original image, x is the reconstructed image, μx is the average value of x, μy is the average value of y, μx2 is the variance of x, μy2 is the variance of y, σxy is the covariance of x and y, c1 and c2 are the constants. Specifically, c1=(k1L)2 and c2=(k2L)2, where L is the dynamic range of the pixel values, k1=0.01, and k2=0.03.

3

Results

First, in this section, we outline the key steps involved in training the proposed DL network and provide details regarding the computing environment. Subsequently, the reconstruction efficiency of the proposed algorithm was demonstrated using attenuation CT and phase-contrast CT. Finally, a comparison was made between the generalization performance of training using natural data and that of training using medical data.

3.1
Network training details

The proposed DL network is trained using the Adam algorithm [55]. The learning rate was set to 0.0001 and the mini-batch size was 8. In the loss function, α and β were both set to 1, whereas γ was set to 0 for the first 10 epochs. This configuration allowed the network to learn the main features and converge faster. After 10 epochs, α was set to 0, β was set to 0, and γ was set to 1, which ensured that the network focused more on high-frequency information. The proposed network is implemented using PyTorch [56] on a personal workstation equipped with an Intel(R) Core (TM) i9-10940X CPU and 128GB of RAM. NVIDIA GeForce RTX 3090Ti was used to accelerate the network training operations.

For the X-ray phase-contrast CT of mice, a grating-based phase-contrast imaging system was employed for X-ray phase-contrast CT of mice. A step-by-step phase method was employed to acquire attenuation-based and phase-contrast sinograms. Experimental results for both modes are presented below to evaluate the performance of the proposed method.

3.2
Experimental results of attenuation-based CT in phase-contrast imaging

Figure 2 illustrates the reconstructed lung slices of mice using various methods, including FBP, SART, SART-TV algorithms, and our proposed method. GT was defined as a 180-views SART-TV reconstruction. Additionally, a phase-contrast slice with enhanced contrast was included for structural comparison. These methods were employed to reconstruct slices using both 2-degree interval projections (90 views) and 4-degree interval projection (45 views) interval projections. The residual images are the error images between the reconstruction results of the different algorithms and the GT.

Fig. 2
(Color online) Comparison of reconstructed attenuation-based slices by different methods in phase-contrast imaging of a mouse. The proposed algorithm is compared with the conventional algorithms of FBP, SART, and SART-TV in sparse-view CT of 90 views and 45 views. The GT images were reconstructed using the SART-TV algorithm from 180 views projection. The SSIM and PSNR metrics with respect to the GT are provided in the top left corner of each panel. The regions of interest are magnified and indicated by the red boxes in the figure. The results also include residual images between different algorithms and the GT, where white pixels represent larger errors. The display range for all results is normalized to [0,255] HU
pic

The FBP and SART algorithms performed poorly in both cases. In a scenario with 90 views, the SART-TV algorithm effectively suppressed stripe artifacts caused by sparse-view projections. However, noticeable smoothness and blocky artifacts were observed in the zoomed-in inner edges. When there were only 45 views, the SART-TV algorithm performed poorly, displaying excessively smooth local details and prominent streaking artifacts in the overall image. In contrast, the proposed algorithm successfully removed artifacts and maintained consistency with the GT slices in both cases. It also avoids generating blocky artifacts observed in the SART-TV algorithm. For the residual images, the proposed algorithm exhibits the smallest error. When using the phase-contrast results as references, the local zoomed-in structure is even clearer in the 90-view case using the proposed method than in the 180-view case using the SART-TV algorithm. Quantitative assessments of image quality showed that our proposed algorithm outperformed existing methods.

3.3
Experimental results of phase-contrast CT in phase-contrast imaging

Phase-contrast reconstruction of the laboratory phase-contrast CT equipment is shown in Figure 3. Currently, phase-contrast imaging algorithms rely primarily on the FBP algorithm, and we did not include a comparison with iterative algorithms. The phase-contrast projection image was calculated from the attenuation-based projection using the information separation method described in Section 2 (Methods). The GT results were reconstructed using the FBP algorithm with 180 projection views. In both the 90-view and 45 views scenarios, our proposed method outperformed the FBP algorithm in effectively removing streak artifacts caused by sparse-view projections and maintained consistency with the 180-view GT image. From the comparison of the residual images in Figure 3, the proposed method exhibits less error in the reconstruction results of the 90 views and 45 views projection data compared to the FBP algorithm. The evaluation metrics of the image quality also demonstrated the superior performance of our algorithm in reconstructing the results from sparse-view phase-contrast projections.

Fig. 3
(Color online) Comparison of the reconstructed slices of biological samples by the FBP algorithm and the proposed method in phase-contrast CT. The proposed algorithm is compared with the conventional algorithms of FBP in sparse-view CT scenarios with 90 views and 45 views. The GT image is reconstructed using the FBP algorithm with 180 and 360 views projection. The SSIM and PSNR metrics with respect to the GT are displayed in the top left corner of each panel. The results also include residual images between different algorithms and the GT, where white pixels represent larger errors. The display range for all results is normalized to [0,255] HU
pic

To validate the applicability of our method, we conducted additional calculations using the synchrotron radiation phase-contrast experimental data. Figure 3 shows the phase-contrast reconstruction obtained from the BL13W1 beamline at the Shanghai Synchrotron Radiation Facility (SSRF), China [57]. As this is a synchrotron light source, the projections are already in a parallel-beam geometry and do not require additional interpolation. The image shows a phase-contrast slice of a bee immersed in a microcentrifuge tube filled with formalin. The GT results were reconstructed using the FBP algorithm with 360-degree projection. Given the relatively simple structure of the bee, even with only 45 views, the proposed method achieved results comparable to those obtained with 360 views. The residual image comparison shows that the proposed method effectively removed the stripe artifacts.

The two sets of data presented in Figure 3 demonstrate the excellent performance of the proposed method in the field of grating-based phase-contrast imaging.

3.4
Experimental results under natural dataset and medical dataset

To verify the high generalization performance of the natural datasets, we compared the results generated by the networks trained on different datasets using attenuation-based CT and phase-contrast CT. This comparison is shown in Figure 4. For the attenuation-based CT, the GT image was defined as the reconstructed result obtained using the SART-TV algorithm from a 180-view projection. By contrast, in phase-contrast CT, the GT image is reconstructed using the FBP algorithm with 180-view projections. This selection was made because of the current prominence of the FBP algorithm in phase-contrast CT reconstruction, whereas the SART-TV algorithm is considered optimal for attenuation-based CT.

Fig. 4
(Color online) Comparison of reconstructed slices using different methods in mice attenuation-based CT and phase-contrast CT. This figure compares reconstructed slices obtained from various methods in mice attenuation-based CT and phase-contrast CT for phase-contrast imaging. The results shown are derived from the same slice of mice for both attenuation-based CT and phase-contrast CT. In attenuation-based CT, the GT image is reconstructed using the SART-TV algorithm from a set of 180 views. In phase-contrast CT, the GT image is reconstructed using the FBP algorithm from a set of 180 views. We then compared the GT images with the results of the traditional algorithm, the proposed algorithm trained on the natural dataset, and the proposed algorithm trained on the medical data. The comparison was conducted for both 90 and 45 views projection data. The SSIM and PSNR metrics with respect to the GT are displayed in the lower-left corner of each panel. Regions of interest are magnified and indicated by red boxes in the figure. The display range for all results is normalized to [0,255] HU
pic

The attenuation-based CT and phase-contrast CT results are presented in Figure. 4 were obtained from the same mouse slice. Additionally, reconstructions were performed using 90 and 45 views to investigate the impact of data sparsity. The results shown in Figure 4 demonstrate that the network trained on the natural dataset outperformed that trained on the medical dataset for both attenuation-based CT and phase-contrast CT. The results obtained from the medical dataset exhibited a notably poorer performance, resulting in unclear and blurred local details. This discrepancy can be attributed to the limited image distribution of the medical dataset compared with the more diverse image distribution of the natural dataset. The greater diversity in the natural dataset contributes to its superior generalization performance when confronted with unknown samples. Furthermore, we conducted simulation verification using enough medical clinical CT data. The results are provided in Supplementary Information (SI). These results demonstrate that natural data can achieve outcomes comparable to those of a sufficient amount of medical data, even without prior medical knowledge.

In attenuation-based CT, soft tissue contrast is low, and sparse sampling leads to a significant decrease in image resolution, resulting in blurred structures. Conversely, in phase-contrast CT, the soft-tissue contrast is high, and the structures remain remarkably clear, even with only 45 views. These findings further validate the promising prospects of the proposed algorithm for phase-contrast CT.

4

Discussion

Although the generalization performance of DL is currently not well understood, the experiments provided compelling evidence of the effectiveness of the proposed method in terms of generalization. Natural and phase-contrast sample images share common characteristics such as low-rank properties and similar low-level features such as points, lines, and edges. In traditional methods, the TV model is based on connections between pixels that are common to natural images. This explains the excellent generalization performance of natural image datasets in sparse-view CT reconstruction. These relationships warrant further investigation.

By modeling the imaging procedure, we can apply DL techniques with simulated training datasets. This approach is particularly valuable in scientific research and practical applications. Furthermore, the results obtained from our experiments demonstrate the effectiveness of the proposed method in both attenuation-based CT and phase-contrast CT. Notably, the proposed method maintains a high contrast in phase-contrast imaging and preserves clear structures even with a reduced dosage in sparse-view CT. Moreover, this physical model-based approach can be extended to other fields by adapting the parametric representation of the experimental equipment. In addition, the proposed method can be applied to other tasks involving artifact removal.

The current approach only considers samples within the field of view; reconstructing samples outside the field of view may require a reconsideration of the model design. Experimental results show that the proposed method can produce results comparable to those of the GT image in 90 views CT, but further optimization is required for complex samples in 45 views CT. Therefore, additional improvements are necessary to enhance reconstruction quality under severely sparse conditions, which may involve incorporating other inference models into the algorithm.

5

Conclusion

This study introduces a novel and promising approach that integrates a model-driven DL reconstruction algorithm into sparse-view phase-contrast three-dimensional imaging. This overcomes the limited availability of experimental training datasets. The experimental results demonstrate the superiority of the proposed method over conventional algorithms in terms of reconstruction quality. This effectively enhances the accuracy and fidelity of reconstructions in both sparse-view attenuation-based and phase-contrast CT. The reduced imaging time of the proposed method may enable in vivo phase-contrast imaging of biological specimens. This advancement opens new possibilities for applications in biological medicine, where the ability to capture high-resolution real-time images of living tissues and organs can provide valuable insights for diagnosis, treatment planning, and research. Furthermore, the proposed method offers a potential avenue for future research and development as well as potential clinical translation in the pursuit of more accurate and efficient imaging techniques.

References
1. T.M. Buzug, Computed Tomography. in the Springer Handbook of Medical Technology. R. Kramme and K.-P. Hoffmann, R. S., Pozos, Eds. (Springer Berlin Heidelberg, Berlin, Heidelberg, 2011), pp. 311342. https://doi.org/10.1007/978-3-540-74658-4_16
2. P. Rajiah and P. Schoenhagen,

The role of computed tomography in pre-procedural planning of Cardiovasc

. Surg. and intervention. Insight Imaging 4, 671689 (2013). https://doi.org/10.1007/s13244-013-0270-8
Baidu ScholarGoogle Scholar
3. Z. Wu, W.B. Wei, K. Gao et al.,

Prototype system of noninterferometric phase-contrast computed tomography utilizing medical imaging components

. J. Appl. Phys. 129, 074901 (2021). https://doi.org/10.1063/5.0031392
Baidu ScholarGoogle Scholar
4. Z. Wang, N. Hauser, G. Singer et al.,

Non-invasive classification of microcalcifications with phase-contrast X-ray mammography

. Nat. Commun. 5, 3797 (2014). https://doi.org/10.1038/ncomms4797
Baidu ScholarGoogle Scholar
5. X. Li, H. Gao, Z. Chen et al.,

Diagnosis of Breast Cancer-Tokyo based on microcalcifications using grating-based phase contrast CT

. Eur. Radiol. 28, 37423750 (2018). https://doi.org/10.1007/s00330-017-5158-4
Baidu ScholarGoogle Scholar
6. A. Aminzadeh, B.D. Arhatari, A. Maksimenko et al.,

Imaging Breast Microcalcifications Using Dark-Field Signal in Propagation-Based Phase-Contrast Tomography

. IEEE T. Med. Imaging 41, 29802990 (2022). https://doi.org/10.1109/TMI.2022.3175924
Baidu ScholarGoogle Scholar
7. P. Baran, S. Mayo, M. McCormack et al.,

High-Resolution X-Ray Phase-Contrast 3-D Imaging of Breast Tissue Specimens as a Possible Adjunct to Histopathology

. IEEE T. Med. Imaging 37, 26422650 (2018). https://doi.org/10.1109/TMI.2018.2845905
Baidu ScholarGoogle Scholar
8. M. Mercuri, T. Sheth, and M.K. Natarajan,

Radiation exposure from medical imaging: a silent harm

? Cmaj 183, 413414 (2011). https://doi.org/10.1503/cmaj.101885
Baidu ScholarGoogle Scholar
9. E. Burgio, P. Piscitelli, and L. Migliore,

Ionizing Radiation and Human Health: Reviewing Models of Exposure and Mechanisms of Cellular Damage. Epigenetic Perspectives

. Int. J. Env. Res. Pub. He. 15, 1971 (2018). https://doi.org/10.3390/ijerph15091971
Baidu ScholarGoogle Scholar
10. H.M. Shi, Z.C. Sun, and F.H. Ju,

Understanding the harm of low-dose computed tomography radiation to the body (Review)

. Exp. Ther. Med. 24, 534 (2022). https://doi.org/10.3892/etm.2022.11461
Baidu ScholarGoogle Scholar
11. M.D. Cohen,

CT radiation dose reduction: can we do harm by doing good

? Pediatr Radiol 42, 397398 (2012). https://doi.org/10.1007/s00247-011-2315-9
Baidu ScholarGoogle Scholar
12. D. Chapman, W. Thomlinson, R.E. Johnston et al.,

Diffraction enhanced x-ray imaging

. Phys. Med. Biol. 42, 20152025 (1997). https://doi.org/10.1088/0031-9155/42/11/001
Baidu ScholarGoogle Scholar
13. Z. Zhu, K. Wahid, P. Babyn et al.,

Improved compressed sensing-based algorithm for sparse-view CT image reconstruction

. Comput. Math. Methods Med. 2013, 185750 (2013). https://doi.org/10.1155/2013/185750
Baidu ScholarGoogle Scholar
14. D. Fleischmann and F.E. Boas,

Computed tomography–old ideas and new technology

. Eur. Radiol. 21, 510517 (2011). https://doi.org/10.1007/s00330-011-2056-z
Baidu ScholarGoogle Scholar
15. M.J. Willemink and P.B. Noël,

The evolution of image reconstruction for CT-from filtered back projection to Artif

. Intell. Eur. Radiol. 29, 21852195 (2019). https://doi.org/10.1007/s00330-018-5810-7
Baidu ScholarGoogle Scholar
16. A.H. Andersen and A.C. Kak,

Simultaneous Algebraic Reconstruction Technique (SART): A superior implementation of the ART algorithm

. Ultrasonic Imaging 6, 8194 (1984). https://doi.org/10.1016/0161-7346(84)90008-7
Baidu ScholarGoogle Scholar
17. J. Gregor and T. Benson,

Computational analysis and improvement of SIRT

. IEEE Trans. Med. Imaging 27, 918924 (2008). https://doi.org/10.1109/tmi.2008.923696
Baidu ScholarGoogle Scholar
18. G. Zang, M. Aly, R. Idoughi et al.,

Super-Resolution and Sparse View CT Reconstruction

. at the European Conference on Computer Vision 2018). https://doi.org/10.1007/978-3-030-01270-0_9
Baidu ScholarGoogle Scholar
19. D.L. Donoho,

Compressed sensing

. IEEE T. Inform. Theory 52, 12891306 (2006). https://doi.org/10.1109/TIT.2006.871582
Baidu ScholarGoogle Scholar
20. Q. Xu, H. Yu, X. Mou et al.,

Low-dose X-ray CT reconstruction via dictionary learning

. IEEE Trans. Med. Imaging 31, 16821697 (2012). https://doi.org/10.1109/tmi.2012.2195669
Baidu ScholarGoogle Scholar
21. E.Y. Sidky and X. Pan,

Accurate image reconstruction in circular cone-beam computed tomography by total variation minimization: a preliminary investigation

. in 2006 IEEE Nucl Sci Conf R., 2006), 29042907. https://doi.org/10.1109/NSSMIC.2006.356484
Baidu ScholarGoogle Scholar
22. L.-Z. Deng, P. He, S.-H. Jiang et al.:

Hybrid reconstruction algorithm for computed tomography based on diagonal total variation

. Nucl. Sci. Tech. 29, 45 (2018). https://doi.org/10.1007/s41365-018-0376-2
Baidu ScholarGoogle Scholar
23. S. Niu, Y. Gao, Z. Bian et al.,

Sparse-view x-ray CT reconstruction via total generalized variation regularization

. Phys. Med. Biol. 59, 29973017 (2014). https://doi.org/10.1088/0031-9155/59/12/2997
Baidu ScholarGoogle Scholar
24. E.Y. Sidky and X. Pan,

Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization

. Physics in Medicine & Biology 53, 4777 (2008).
Baidu ScholarGoogle Scholar
25. K. Deng, C. Sun, W. Gong et al.,

A Limited-View CT Reconstruction Framework Based on Hybrid Domains and Spatial Correlation

. Sensors (2022). https://doi.org/10.3390/s22041446
Baidu ScholarGoogle Scholar
26. Z. Tian, X. Jia, K. Yuan et al.,

Low-dose CT reconstruction via edge-preserving total variation regularization

. Phys. Med. Biol. 56, 59495967 (2011). https://doi.org/10.1088/0031-9155/56/18/011
Baidu ScholarGoogle Scholar
27. Y. Liu, J. Ma, Y. Fan et al.,

Adaptive-weighted total variation minimization for sparse data toward low-dose x-ray computed tomography image reconstruction

. Phys. Med. Biol. 57, 79237956 (2012). https://doi.org/10.1088/0031-9155/57/23/7923
Baidu ScholarGoogle Scholar
28. K. Zhang, W. Zuo, Y. Chen et al.,

Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising

. IEEE Trans. Image Process 26, 31423155 (2017). https://doi.org/10.1109/tip.2017.2662206
Baidu ScholarGoogle Scholar
29. Y.-J. Ma, Y, Ren, P. Feng et al.:

Sinogram denoising via attention residual dense convolutional neural network for low-dose computed tomography

. Nucl. Sci. Tech. 32, 41 (2021). https://doi.org/10.1007/s41365-021-00874-2
Baidu ScholarGoogle Scholar
30. K. Simonyan and A. Zisserman,

Very Deep Convolutional Networks for Large-Scale Image Recognition

. (2015). arxiv.org/abs/1409.1556
Baidu ScholarGoogle Scholar
31. O. Ronneberger, P. Fischer, and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation. in Medical Image Computing and Computer-Assisted Intervention-MICCAI 2015, (Springer International Publishing, 2015), 234241. https://doi.org/10.1007/978-3-319-24574-4_28
32. D. Pathak, P. Krähenbühl, J. Donahue et al., A. A. Efros,

Context Encoders: Feature Learning by Inpainting

. in 2016 Proc Cvpr IEEE (CVPR), 2016), 25362544. https://doi.org/10.1109/CVPR.2016.278
Baidu ScholarGoogle Scholar
33. C. Ledig, L. Theis, F. Huszar et al.,

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

. in 2017 Proc Cvpr IEEE (CVPR), 2017), 105114. https://doi.org/10.1109/CVPR.2017.19
Baidu ScholarGoogle Scholar
34. H.-K. Yang, K.-C. Liang, K.-J. Kang et al.

performed slice-wise reconstruction for low-dose cone-beam CT using a deep residual convolutional neural network

. Nucl. Sci. Tech. 30, 59 (2019). https://doi.org/10.1007/s41365-019-0581-7
Baidu ScholarGoogle Scholar
35. S.-Y. Zhang, Z.-X. Wang, H.-B. Yang et al.:

Hformer: Highly efficient vision transformer for low-dose CT denoising

. Nucl. Sci. Tech. 34, 61 (2023). https://doi.org/10.1007/s41365-023-01208-0
Baidu ScholarGoogle Scholar
36. Z. Zhang, X. Liang, X. Dong et al.,

A Sparse-View CT Reconstruction Method Based on Combination of DenseNet and Deconvolution

. IEEE Trans. Med. Imaging 37, 14071417 (2018). https://doi.org/10.1109/tmi.2018.2823338
Baidu ScholarGoogle Scholar
37. F. Zhang, M. Zhang, B. Qin et al.,

REDAEP: Robust and Enhanced Denoising Autoencoding Prior for Sparse-View CT Reconstruction

. IEEE Transactions on Radiation and Plasma Medical Sciences 5, 108119 (2021). https://doi.org/10.1109/TRPMS.2020.2989634
Baidu ScholarGoogle Scholar
38. J. Kyong Hwan, M.T. McCann, E. Froustey et al.,

Deep Convolutional Neural Network for Inverse Probl. in Imaging

. IEEE Trans Image Process 26, 45094522 (2017). https://doi.org/10.1109/tip.2017.2713099
Baidu ScholarGoogle Scholar
39. Y. Han, J.J. Yoo, and J.C. Ye,

Deep Residual Learning for Compressed Sensing CT Reconstruction via Persistent Homology Analysis

. CoRR abs/1611.06391(2016). https://doi.org/10.48550/arXiv.1611.06391
Baidu ScholarGoogle Scholar
40. Y. Han and J.C. Ye,

Framing U-Net via Deep Convolutional Framelets: Application to Sparse-View CT

. IEEE Trans. Med. Imaging 37, 14181429 (2018). https://doi.org/10.1109/tmi.2018.2823768
Baidu ScholarGoogle Scholar
41. S. Guan, A.A. Khan, S. Sikdar et al.,

Fully Dense UNet for 2-D Sparse Photoacoustic Tomography Artifact Removal

. IEEE J. Biomed. Health 24, 568576 (2020). https://doi.org/10.1109/JBHI.2019.2912935
Baidu ScholarGoogle Scholar
42. S. Asif, X. Yongshun, M. Olivia et al.,

A data generation pipeline for cardiac vessel segmentation and motion artifact grading

. in Proc.SPIE, 2022), 122421J. https://doi.org/10.1117/12.2642869
Baidu ScholarGoogle Scholar
43. Q. Lyu and G. Wang,

Conversion Between CT and MRI Images Using Diffusion and Score-Matching Models

. ArXiv abs/2209.12104(2022). https://doi.org/10.48550/arXiv.2209.12104
Baidu ScholarGoogle Scholar
44. S. Misra, C. Yoon, K.-J. Kim et al.

developed a deep learning-based multimodal fusion network for the segmentation and classification of breast Cancer-Tokyos using B-mode and elastography ultrasound images

. Bioengineering and Translational Medicine 8, e10480 (2023). https://doi.org/10.1002/btm2.10480
Baidu ScholarGoogle Scholar
45. A. Zhong, B. Li, N. Luo et al.,

Image Restoration for Low-Dose CT via Transfer Learning and Residual Network

. IEEE Access 8, 112078112091 (2020). https://doi.org/10.1109/ACCESS.2020.3002534
Baidu ScholarGoogle Scholar
46. X. Zhen, J. Chen, Z. Zhong et al.,

Deep convolutional neural network with transfer learning for rectum toxicity prediction in cervical cancer radiotherapy: a feasibility study

. Phys. Med. Biol. 62, 8246 (2017). https://doi.org/10.1088/1361-6560/aa8d09
Baidu ScholarGoogle Scholar
47. X. Tao, Z. Dang, Y. Zheng et al.,

Limited-angle artifacts removal and jitter correction in soft x-ray tomography via physical model-driven deep learning

. Appl. Phys. Lett. 123, 191101 (2023). https://doi.org/10.1063/5.0167956
Baidu ScholarGoogle Scholar
48. T.-Y. Lin, M., Maire, S. J., Belongie et al. Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_48
49. T.R. Moen, B. Chen, D.R. Holmes et al.,

Low-dose CT image and projection dataset

. Med. Phys. 48, 902911 (2021). https://doi.org/10.1002/mp.14594
Baidu ScholarGoogle Scholar
50. H. Yu and G. Wang,

Finite detector based projection model for high spatial resolution

. J Xray Sci. Technol. 20, 229238 (2012). https://doi.org/10.3233/xst-2012-0331
Baidu ScholarGoogle Scholar
51. Z.-F. Huang, K.-J. Kang, Li Z, et al.:

Direct computed tomography reconstruction for directional-derivative projections of computed tomography of diffraction-enhanced imaging

. Appl. Phys. Lett. 89, 041124 (2006). https://doi.org/10.1063/1.2219405
Baidu ScholarGoogle Scholar
52. Z. Wu, K. Gao, Z. Wang et al.,

Generalized reverse projection method for grating-based phase tomography

. J. Synchrotron Radiat. 28, 854863 (2021). https://doi.org/10.1107/s1600577521001806
Baidu ScholarGoogle Scholar
53. K. He, X. Zhang, S. Ren et al.,

Deep Residual Learning for Image Recognition

. in 2016 Proc Cvpr IEEE (CVPR), 2016), 770778. https://doi.org/10.1109/CVPR.2016.90
Baidu ScholarGoogle Scholar
54. J. Johnson, A. Alahi, and L. Fei-Fei, Perceptual Losses for Real-Time Style Transfer and Super-Resolution. in Computer Vision (ECCV 2016; Springer International Publishing, 2016), 694711. arXiv:1603.08155
55. D.P. Kingma and J. Ba,

Adam: A Method for Stochastic Optimization

. CoRR abs/1412.6980(2014). https://doi.org/10.48550/arXiv.1412.6980
Baidu ScholarGoogle Scholar
56. A. Paszke, S. Gross, F. Massa et al.,

PyTorch: An Imperative Style, High-Performance Deep Learning Library

. ArXiv abs/1912.01703(2019). https://doi.org/10.48550/arXiv.1912.01703
Baidu ScholarGoogle Scholar
57. X. Ge, P. Yang, Z. Wu et al.,

Virtual differential phase-contrast and dark-field imaging of x-ray absorption images via deep learning

. Bioeng. Trans. Med. 8, e10494 (2023). https://doi.org/10.1002/btm2.10494
Baidu ScholarGoogle Scholar
Footnote

The authors declare that they have no competing interests.