logo

Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction

NUCLEAR ELECTRONICS AND INSTRUMENTATION

Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction

Ou-Yi Li
Yang Wang
Qiong Zhang
Yong-Hui Li
Nuclear Science and TechniquesVol.34, No.7Article number 101Published in print Jul 2023Available online 17 Jul 2023
42708

Accurate 3-Dimensional (3-D) reconstruction technology for non-destructive testing based on digital radiography (DR) is of great importance for alleviating the drawbacks of the existing computed tomography (CT)-based method. The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR. However, for 3-D reconstruction, it is limited by its high time consumption. To solve this problem, this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application. The images are utilized for 3-D reconstruction of the test model. We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes—multithreaded applications (G4-MT) and message-passing interfaces (G4-MPI)—by assessing parallel speedup and efficiency. This study explores the scalability of the hybrid G4-MPI and G4-MT modes. The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth, and the parallel efficiency is maintained at a high level. The hybrid mode has strong scalability, as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted, and the parallel speedup can be up to 151.36. The 3-D reconstruction of the model is achieved based on the filtered back projection (FBP) algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT. The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model. This method is applied to a complex model, and the quality of the reconstructed images is evaluated.

Parallel computingMonte CarloDigital Radiography3-D reconstruction
1

Introduction

Non-destructive detection technology can obtain details about defects inside an object without damaging its original structure and properties, indicating high efficiency for object quality testing. Among these, X-ray-based testing is a popular and effective method for the inspection of industrial products. In recent decades, digital radiography (DR) [1] and computed tomography (CT) [2] have been playing important roles in non-destructive testing techniques for industrial and medical examinations. As the energy and penetration ability of X-ray sources increase, DR and CT can detect high-density industrial components and display the location, orientation, shape, and size of defects in workpieces. Both provide highly accurate visualization solutions for detecting the internal structures of objects. As CT is typically used to produce images of each slice layer of a target object, it can reconstruct a complete 3-D structure, thereby accurately reflecting the internal structure information of the object [3]. In contrast, DR is limited to producing 2- Dimensional (2-D) images; therefore, the detection efficiency is lower for more hidden and overlapping objects [4]. However, CT radiation is high and the equipment is large, inflexible, and expensive. In contrast, DR radiation is relatively low, the equipment is easy to operate with a shorter imaging time, and it is relatively cheaper. Therefore, the use of DR to obtain 3-D reconstructed images of an object is of great importance. Chen et al. applied DR technology to analyze the 3-D position of defects in gas turbines, and the relative positioning error was less than 2% [5]. Susanto et al. applied a DR system that is relatively cheaper than the CT system for the 3-D reconstruction of an aluminum step wedge cylinder using the filter back projection (FBP) method; the resulting image quality was satisfactory, as it was easily interpreted by the naked eye [5]. Staub et al. developed an algorithm for computing digital reconstructed radiographs (DDRs) to match the real cone-beam CT without artificial adjustments, and the algorithm produced DDRs in approximately 0.35 s for full detector and CT resolution [6]. Recent studies [7-10] have contributed to the development of X-ray imaging.

The DR system is primarily composed of an X-ray source tube, test object, and detector. After the tube emits X-ray particles, the particles enter the object and interact with the material through photoelectric absorption, Compton scattering, or Rayleigh scattering, such that some of the particles are absorbed or scattered. The attenuated particles are captured by the detector crystal and converted into current signals. Subsequently, the signals are transformed into grey images with different contrasts. These images are referred to as projection images in this paper. The pixel value of the projected images is relative to the intensity decay of the X-rays along the path. The X-ray intensity decay equation based on numerical calculations of determinism is as follows [11]: I=I0eμx, (1) where I represents the intensity of the incident particles in the detector, I0 represents the intensity from the tube, and μ and x are the linear decay coefficient and thickness of the medium interacting with the X-ray, respectively. For DR, applying Eq. (1) for an object with a simple structure is sufficient, but for a more complicated model, this method has quite a severe error because of the scattering of particles or other cases [12]. To obtain images with well-performing quality, applying the Monte Carlo simulation method is an efficient choice. This is a calculation method based on probability and statistical theory. Souza et al. introduced a methodology for DR simulation based on Monte Carlo simulation. They compared simulated and experimental images of a steel pipe containing corrosion defects and observed that the two images had good consistency [13].

However, a limitation of the Monte Carlo simulation is its significant time consumption, particularly in the 3-D reconstruction of objects [14,15]. Because reconstruction requires numerous projection images from different viewing angles of the object, Monte Carlo simulations must be performed several times. With the rapid development of high-performance computing (HPC) in recent years, large-scale parallel computing has effectively solved time-consuming and computationally resource-intensive problems. Parallel computing on HPC platforms is usually based on both message passing interface (MPI) and shared memory parallel programming (OpenMP) approaches for resource planning and memory sharing, to ensure scalability and accuracy [16]. The Monte Carlo simulation toolkit Geant4 provides two parallel computing modes: an MPI (G4-MPI) and a multithreaded application (G4-MT) for accelerating the Monte Carlo simulation [17]. Wang et al. applied a Geant4-based parallel computing method to nuclear logging and evaluated the performances of the two modes. The results show that the computing process combining G4-MT and G4-MPI execution can significantly reduce the runtime of the Monte Carlo simulations [18]. In addition, studies [19-21] have demonstrated the great ability of the Geant4 toolkit for X-ray imaging simulation.

This study proposes a parallel computing method based on two parallel computing modes, G4-MPI and G4-MT, for the simulation of DR to obtain projection images for 3-D reconstruction. First, the accuracy of parallel computing was validated. Second, the performances of the two modes were evaluated, followed by an evaluation of the scalability of the hybrid G4-MPI and G4-MT. After reconstructing the 2-D slicing images, the quality of the images was measured. Subsequently, a 3-D view image of the model was reconstructed. Finally, the method was applied to a more complex model for validation, and the quality of the reconstructed images was measured.

2

DR system based on Geant4

Geant4 is a Monte Carlo application developed mainly by the European Organization for Nuclear Research (CERN) based on C++ object-oriented technology. Owing to the realistic nature of its simulation, it can provide users with simulation results that do not differ significantly from real experimental results, making it easy for users to evaluate and modify the simulation experiments. The composition of the DR system and the projection image with a projection angle of 0° are presented in Fig. 1, whereas the test model parameters are listed in Table 1. The test model was a large cube with two small cube defects and a cylindrical defect. Three defects were filled with air, whereas the remainder were made of aluminum. The two small cube defects were centrally symmetric about the center of the model. This symmetrical regularity structure ensures a good hierarchy of the reconstructed sliced images for analyzing the reconstruction results.

Table 1
Parameters of the test model
  Large cube Cylindrical defect 1 Small cube defect 2 Small cube defect 3
Side length 4 cm - 1.5 cm 1.5 cm
Height - 4 cm - -
Radius - 0.3 cm - -
Material Aluminum Air Air Air
Show more
Fig. 1
(Color online) Projection image of 0° and three components of DR system (a) Digital Radiography System (b) Flat Panel Detector (c) Test Model (d) Projection image of 0°
pic

In the Geant4 simulation model, X-ray particles with an energy of 180 keV were emitted by the X-ray source in a cone beam. The direct flat panel detector consisted of CsI crystal, α-Si:H thin-film transistors (TFT), and glass substrate. The detector crystal can convert X-ray into visible light according to the incident intensity of the X-ray, while the TFT array stores the relevant signal to form a pixel matrix [11]. In this study, the size of the detector was 6 cm × 6 cm × 1.5 cm and the TFT array had a pixel sampling interval of 200 μm. Therefore, the modeled detector pixel matrix was set to 300 × 300, and thus, projection images with a pixel size of 300 columns × 300 rows could be obtained.

3

Implementation of parallel computing function based on Geant4

3.1
Basic principle of parallel computing

The primary concept of parallel computing is to divide a large task into subtasks for some processors and speed up the calculation; each processor works in concert with the others to conduct subtasks in parallel. MPI and OpenMP are primarily used in parallel computing for HPC to facilitate internode information delivery, resource allocation, and memory sharing.

Scalability, which refers to a system's ability to expand in response to future changes in requirements, is an important metric for evaluating parallel computing methods in HPC. Specifically, in this study, satisfactory scalability means that the parallel computing method is less time-consuming and ensures favorable efficiency when more projection images with more emitting particles are required. This study assessed the scalability and performance of the parallel computing mode by considering parallel speed-up and efficiency [21,22]. Speedup is defined as the ratio of the single-thread execution time to the execution time when one or more nodes with multiple threads are used. Parallel speedup, denoted as Sn, can be expressed as Sn=T1Tn, (2) where T1 and Tn are the run time using one and n threads, respectively. Parallel efficiency, denoted as En, is defined as the average utilization of the n allocated threads and it can be expressed as En=Snn, (3) where n is the number of threads and Sn refers to the speedup with n threads.

3.2
Implementation of parallel computing

Parallel computing techniques have been widely used since the release of Geant4 (version 10.0). G4-MPI cross-node scaling and G4-MT cross-thread scaling are used to realize hybrid parallel computing supported by data collection methods such as tuples, scorers, and histograms. A master node and several slave nodes are used to implement the computing process. Each slave node is connected to several threads that cooperate to output the preliminary computing results, and once the results from each node are merged, they form the final output, which is sent to the master node. In this study, the final output is the energy deposition collected from the histograms.

In this study, a parallel interface and DR application were developed. The DR application in this study included a DR system and function-calling scripts that deployed parallel interfaces. In addition to incorporating the X-ray tube, flat panel detector, and test model in the geometric modeling, Monte Carlo simulation parameters such as particle definition, physical process, and data collection were specified by the imaging system to accurately simulate X-ray interactions with the imaged object. The parallel interface connected the parallel computing functions to a DR system by transmitting data to different nodes and threads. Relevant scripts were developed to construct the interface. A combination of nodes and threads was defined for the submission of the parallel input script, then the energy deposition was collected in the automated merging of the output data script.

Figure 2 shows the application of Geant4-based parallel computing to the simulation of a DR system. The blue boxes represent the DR system and parallel interface, whereas the yellow box displays the 3-D reconstruction process, all of which were developed by the methods described in this article. The red box represents the Geant4 kernel and the purple box represents the HPC developed by Chengdu HPC. The script extracts of the parallel input submission, energy deposition results collection, grayscale transform, and 3-D reconstruction are shown in Fig. 2 below.

Fig. 2
Overview of Geant4-based parallel computing applied in DR. Red pointer line: the DR system's parameters and scripts deploying the parallel interface are settled before the parallel computing. Pink pointer lines: the mapping relationship of parallel computing modes between the Geant4 kernel and HPC. Green pointer lines: the overall system operation flow which starts from the parallel interface deploying scripts and ends at the reconstructed 3-D image generation. The numbers in green indicate the sequences of main calculation and processing.
pic

By employing the interface and application, the simulation of the DR was enabled by G4-MPI and G4-MT, which form the basis of the following sections.

Script extract 1 Parallel Input Submission
//mpirun.sh instruction is used submit parallel input files.
// P:job partition, n:number of nodes, t: threads utilized in each node, TN: total threads
#SBATCH -P p #SBATCH -N n
#SBATCH –ntasks-per-node t
module load mpi/openmpi/4.0.2/gcc-7.3.1
module load apps/Geant4.10.06.p02/openmpi-4.0.2-gcc-7.3.1
mpirun -np TN ./parallel_computing run.mac
Show more
Script extract 2 Energy Deposition Results Collection
for (itr = evtMap -> GetMap() -> begin(); itr != evtMap -> GetMap () -> end(); itr++)
{
  fedep1 = *(itr -> second)/ keV;
  a[copyNb] = a [copyNb] + 1;
  b[copyNb]=b[copyNb]+fedep1;
}
Show more
Script extract 3 Grayscale Transform and 3-D Reconstruction
//Two lines below are extract of core code for turning energy deposition into projection image
readbook = np.array (data_array1). reshape ((300,300))
readbook1 = (255 * (readbook - min1) / (max1 - min1)). astype(np.int16)
//Four lines below are for turning projection images into slicing images.
proj_fft = my_fft (R, width)
proj_filtered [:,i] = np.multiply (proj_fft [:,i], R_L_filter) proj_ifft = (my_ifft (proj_filtered)). real
fbp [x, y] = fbp [x, y] + proj_ifft [t, i]
Show more
4

Parallel performance evaluation of G4-MPI and G4-MT

This study utilized Chengdu HPC as the computational platform. In this HPC, each node was configured with a 32-core x86 processor with a main frequency of 2.5 GHz. The MPI interface adopts hpcx-2.4.1, and is compiled with gcc7.3.1, which is compatible with the G4-MPI and G4-MT environments. To evaluate the capabilities of G4-MPI and G4-MT for DR, we considered three aspects: the accuarcy of parallel computing for DR, performance of G4-MPI and G4-MT, and application of hybrid G4-MPI and G4-MT modes.

4.1
Accuracy of parallel computing

This section verifies the accuracy of parallel computing by comparing the projection images obtained using parallel and non-parallel computing models. In the parallel computing mode, ten nodes with full threads, which is one case of the hybrid G4-MPI and G4-MT modes, were used, whereas in the non-parallel computing mode, only one thread was used in the simulation. 10 billion particles with an energy of 180 keV were emitted to ensure the convergence of the Monte Carlo simulation. Peak Signal to Noise Ratio (PSNR) and Root Mean Square Error (RMSE) [23,24] were utilized to measure the correctness of parallel images with non-parallel images as the benchmark. The RMSE can be calculated as RMSE(Ip,Inp)=1m×ni=1mj=1m((Ip(i,j)Inp(i,j)))2, (4) where Ip and Inp are the images obtained in the parallel and non-parallel computing modes, respectively. The PNSR can be calculated as PNSR=10×log10(MAXI)2(RMSE)2, (5) where MAXI is the largest pixel value one image has.

Two images from two modes in the same projection angle are shown in Fig. 3, and it can be seen that the two images basically remain the same. The RMSE and PSNR between them are 3.83 and 36.5, respectively, indicating that the two images have a small difference in grayscale. This demonstrates that the results of the parallel computing mode based on Geant4 remain consistent with those of the non-parallel mode; thus, the following sections of the article can be presented.

Fig. 3
Projection images of two computing modes. (a) Projection image of parallel mode. (b) Projection image of non-parallel mode.
pic
4.2
Performance evaluation of G4-MPI and G4-MT

Six test groups were designed using different combinations of test nodes and threads for G4-MPI and G4- MT in Chengdu HPC. To ensure the accuracy and repeatability of the simulation results, each test was repeated 30 times to obtain the population mean value μi and population standard deviation value δi [25]. μi=1nj=130xi(j), (6) δi=1n1j=130(xi(j)μi), (7) where xi(j) represents the jth sample value (speedup or parallel efficiency) of test group i and n is the number of repetitions of each test. The confidence interval is calculated as in Eq. (8) [26]. interval(c1,c2)=μi±Zα/2×δi/n, (8) where α is the significance level and c1 and c2 are the upper and lower bounds of the interval, respectively. The critical value, Zα∕2, is calculated according to the normal distribution table considering 99.7% as the confidence level for each test [27]. The detailed results are shown in Table 2 and Table 3. The line graphs of parallel speedup and efficiency for six tests are shown in Fig. 4.

Table 2
Parallel performance of G4-MPI
G4-MPI Speedup (Sn) Efficiency (En)
Combination of node and thread μi δi Interval (c1,c2) μi (%) δi Interval (c1,c2)
1×1 1 0.04 1±0.02 100 0.33 100±0.16
1×3 2.886 0.26 2.886±0.12 96.2 3.23 96.2±1.52
1×5 4.710 0.31 4.710±0.14 94.2 3.73 94.2±1.76
1×7 6.729 0.37 6.729±0.17 96.1 3.13 96.1±1.47
1×10 9.424 0.43 9.424±0.20 94.2 3.75 94.2±1.77
Show more
Table 3
Parallel performance of G4-MT
G4-MT Speedup (Sn) Efficiency (En)
Combination of node and thread μi δi Interval(c1,c2) μi (%) δi Interval(c1,c2)
1×1 1 0.04 1±0.02 100 0.33 100±0.16
3×1 2.886 0.26 2.886±0.12 96.4 3.23 96.4±1.51
5×1 4.895 0.31 4.710±0.15 97.9 3.73 97.9±1.18
7×1 6.831 0.37 6.729±0.17 97.6 3.13 97.6±1.29
10×1 9.194 0.43 9.424±0.20 91.9 3.75 91.9±1.75
Show more
Fig. 4
Parallel speedup and efficiency of G4-MPI and G4-MT. (a) Parallel acceleration (Sn); (b) Parallel efficiency (En)
pic

Figure 4 shows that the speedup increases linearly with the number of nodes or threads, whereas the parallel efficiency tends to stabilize at a relatively high value as the number of nodes or threads increases. Furthermore, according to Tables 2 and 3, with a 99.7% confidence level, the true values of the speedup and parallel efficiency of the six tests were within the confidence level for both G4-MT and G4-MPI. This demonstrates that both G4-MT and G4-MPI have good repeatability and efficiency in simulating the DR process. The speedup and efficiency of G4-MPI are marginally higher than those of G4-MT because G4-MPI adopts a shared memory mode, which improves computing by having each thread share information during Monte Carlo calculations.

4.3
Scalability evaluation of hybrid G4-MPI and G4-MT for DR

In this section, ten tests for the DR simulation are designed to evaluate the scalability of the hybrid G4-MPI and G4-MT modes. The G4-MT in this section ran in the full-thread mode for each computational node with 32 threads. The nodes of the 10 tests ranged from 1 to 10. Ten billion X-ray particles were emitted during each test. To illustrate the scalability evaluation, the runtime of one node using one thread was utilized as the benchmark time for 10 tests. The results of the parallel computing of the DR simulation are listed in Table 4, and the parallel speedup and efficiency of the ten tests are plotted in Fig. 5.

Table 4
Scalability test of hybrid G4-MPI and G4-MT for DR simulations
Combination of node and thread Run time (s) Parallel speedup (Sn) Parallel efficiency (En) (%)
1×1 46,468 1 100
1×32 1607 28.91 90.36
2×32 864 53.78 84.04
3×32 623 74.58 77.69
4×32 514 90.40 70.63
5×32 450 103.26 64.38
6×32 410 113.33 59.03
7×32 381 121.96 54.45
8×32 354 131.26 51.28
9×32 330 140.81 48.89
10×32 307 151.36 47.35
Show more
Fig. 5
(Color online) Scalability evaluation of hybrid G4-MPI and G4-MT. (a) Runtime and parallel speedup (Sn); (b) Parallel efficiency (En).
pic

Table 7 and Fig. 5 demonstrate that as the number of nodes increases, the speedup deviates from linear growth and the parallel efficiency decreases within a certain range. According to Amdahl's Law [28], which describes how speedup increases theoretically when multiple processors are used in parallel computing, different nodes have parallel overhead time such as communication and waiting; therefore, when the number of nodes increases, parallel speedup will deviate from ideality, which leads to a decrease in parallel efficiency. This is acceptable in this study because the speedup can reach up to 151.36 and the reduction in runtime is significant. For example, in the next section, 180 projection images were required for reconstruction. Based on the runtime required for one projection image using one thread, it can be estimated that 2,323.4 h are required. However, using G4-MPI in full threads, the process will only cost approximately 15.35 h in total. The strong scalability of G4-MPI in the full-thread mode for DR simulation was proven.

Table 7
Metrics of three slicing images with Ram-Lak filter
Complex Model CNR AG IE
150th 4.13 7.2291 0.15260
300th 4.48 5.7916 1.33991
450th 2.93 8.6436 1.04886
Show more
5

3-D reconstruction of the model and quality analysis of image

This section implements the DR simulation to obtain sufficient original projection images using parallel computing with G4-MPI in full threads and utilizes the projection images to reconstruct the test model in three dimensions to obtain information about the defects inside the model. The simulation was implemented 180 times, and each time the test model was rotated by 1°, as shown in Fig. 6; thus, 180 2-D projection grayscale images 300 × 300 pixels in size were generated. During the simulation process, the X-ray generator emitted 10 billion gamma particles with an energy of 180 keV in 10-node mode with 32 threads per node. The rotation operation of the test model was automatically implemented by a script developed in this study.

Fig. 6
(Color online) Rotational imaging schematic
pic
5.1
3-D reconstruction using filtered back projection algorithm

The FBP algorithm is widely used. It is a spatial processing technique based on Fourier transform theory [29]. One of its characteristics is that the projection under each acquisition projection angle is convolved by specific filters before back-projection, such that the toroidal artifact caused by the point spread function can be reduced. This study adopted the FBP algorithm to obtain the 3-D reconstructed images of the model. The FBP process is described as follows.

Step 1: Obtain one 300 × 180 original projection matrix using the same rows of pixel information in all the original projection images.

Step 2: Pre-process the original projection matrix data for calibration.

Step 3: Conduct a one-dimensional Fourier transform for the result of Step 2.

Step 4: Perform convolution filtering on the result of Step 3.

Step 5: Conduct a one-dimensional inverse Fourier transform for the result of Step 4.

Step 6: Perform a direct inverse projection on the result of Step 5 for the 2-D reconstructed slicing image.

Step 7: Repeat Steps 1 to 6 300 times to produce 300 2-D slicing images.

Step 8: Utilize the 300 images to obtain 3-D reconstructed images.

Step 1 considers the Nth row of the pixel information of each original projection image. N is initially equal to one and increases by one with each repetition. Thus, 300 original 300 × 180 projection matrices are generated after 300 repetitions. Three hundred 3-D reconstructed slicing images, each 300 × 300 pixels in size, are generated.

As the toroidal artifact is one of the main factors affecting the quality of 3-D reconstructed images, this study adopted a projection data pre-calibration method combining polynomial fitting and the probability statistics correction method in Step 2 [30]. The range of alternative correction factors was determined by polynomial fitting of the projected data column-by-column, and correction factors were determined using the maximum probability principle.

Among the eight steps above, the design of the filter in Step 4 has a significant impact on the final reconstruction results [31]. The selection of filters is actually the selection of window functions and to gain better-reconstructed image resolution, the inverse Fourier transform of the window functions should have a high and narrow central protrusion [32]. To obtain reconstructed images with better quality, this study used the three most commonly used filters: the Ram-Lak, Shepp-Logan, and Hamming filters to perform convolution filtering on the result of the one-dimensional Fourier transform. The 2-D reconstructed slicing image under 0° rotation is consistent with that shown in Fig. 1.

5.2
Quality evaluation of reconstructed images

To quantitatively measure the quality of the reconstructed images, three commonly used image quality assessment metrics, contrast-to-noise ratio (CNR) [33], average gradient (AG) [34], and image entropy (IE) [35], were used. CNR is defined as the ratio of peak signal intensity to background intensity and is an objective indicator of average image quality. Its value is proportional to the image quality, and a higher CNR value indicates a better recognition of target defects. AG is sensitive to the ability of an image to express small details in contrast. In a certain direction of an image, AG tends to be larger with a greater change in gray level. Thus, AG could be used to measure the clarity of an image and reflects small detail contrasts and texture transformation features in an image. IE reflects the amount of average information in the image and represents the aggregation characteristics of the image grayscale distribution. The greater the entropy of the image, the richer the pixel grayscale contained in the image, and the more uniform the grayscale distribution.

The results of the reconstructed image quality assessment of the model used in this article are shown in Table 5. Based on the spatial characteristics of the model, this study presented the results of a quality assessment of the reconstructed images of the 100th, 150th, and 200th layers of the test model. Reconstructed slicing images of the three layers depict the representation of hollow defects and solid aluminum. The quality of the reconstructed images varied considerably for different filters and the number of slicing layers in the reconstructed images.

Table 5
Metrics of reconstructed images
Test mode Filter CNR AG IE
100th Ram-Lak 8.65 0.038840 0.73650
  Shepp-Logan 8.22 0.041788 0.72003
  Hamming 2.86 0.013373 0.71352
150th Ram-Lak 9.03 0.031075 0.38491
  Shepp-Logan 8.11 0.033115 0.37352
  Hamming 2.35 0.013373 0.37053
200th Ram-Lak 8.34 0.038681 0.74198
  Shepp-Logan 7.86 0.043203 0.72891
  Hamming 2.75 0.013372 0.71265
Show more

First, the reconstructed slicing images using the Ram-Lak filter had better quality than those using the Shepp-Logan or Hamming filters. Second, there was a high degree of similarity between the metric values of the 100th and 200th layers because of the consistent physical structures of the two slices in the model, such as the density, material, and shape of the defect. Moreover, the CNRS of the 100th and 200th slices are smaller than that of the 150th slice, mainly because the defect area in the 150th slice of the image is smaller than those of 100th and 150th slices. This demonstrates that the 3-D reconstruction method based on parallel computing has good recognition of small defects. Furthermore, the AG and IE of the 100th and 200th slices were larger than those of the 150th because of the greater change in the grayscale value of the former. It can be concluded that the images of the 100th and 200th slices showed more detail in contrast. In general, the value of the three metrics meets the desired expectation for the designated model. Figure 7 shows the reconstructed 2-D images using the three aforementioned filters. It is evident that the toroidal artifacts are more prominent in the images reconstructed with the Shepp-Logan and Hamming filters compared to those reconstructed with the Ram-Lak filter.

Fig. 7
Representative reconstructed slicing images a1-a3: Reconstructed images of 100th slice b1-b3: Reconstructed images of 150th slice c1-c3: Reconstructed images of 200th slice a1,b1,c1: Reconstructed images of Hamming filter a2,b2,c2: Reconstructed images of Shepp-Logan filter a3,b3,c3: Reconstructed images of Ram-Lak filter
pic

CNR=|μiμo|δi2+δo2, δi2=E{(|si|2μi)}2,  δo2=E{(|so|2μo)}2, μi and μo are the average pixel value of the region of the interest (ROI) and background, respectively, and δi and δo are the variances of ROI and background, respectively. AG=1M × Ni=1Mj=1N(f(i,j)x)2+(f(i,j)y)2, whereas M×N is the pixel size of the image. f(i,j)x and f(i,j)y are the differences of the pixel value in the x and y directions, respectively. IE=g=0L1Pg×log2Pg, L is the overall grayscale level of the image, and Pg is the probability of gray level g.

To further discuss the quality of the 3-D-reconstructed images, we compared the defect situation of the model in the ideal case with that of the reconstructed slicing image sets. As the test model was accurately modeled using Geant4, this study utilized the parameter table as an inspection standard for the set of 300 reconstructed slicing images.

Among the 300 images were some blank images because the size of the flat-panel detector was larger than that of the test model. During this process, the X-ray particles were captured by a flat-panel detector without decay. In this case, after the data of the associated units on the flat-panel detector were processed, the corresponding pixels appeared blank.

In this process, excluding 100 blank images, the model was divided into 200 slices. A total of 148 slices contained small square and cylindrical defects, whereas 52 slices contained only cylindrical defects. The ratio of layers containing the square defect and non-blank layers (r1) is 0.74:1, whereas the corresponding ratio (R1) according to parameter Table 1 is 0.75:1. The ratio of the length of the side of the small square defect to the length of the side of the large square (r2) in Fig. 7 is 0.37489:1, whereas the corresponding ratio (R2) according to parameter Table 1 is 0.375:1. The ratio of the diameter of the round defect to the length of the side of the large square is 0.0749:1 (r3), whereas the corresponding ratio according to parameter Table 1 is 0.075:1 (R3). The difference between the three ratios r1, r2, and r3 is within the acceptable error range. Examining the images of the remaining layers showed that the results were similar (Table 6).

Table 6
Proportional compatibility of the test model
Ratio R1 r1 R2 r2 R3 r3
Value 0.75:1 0.7485:1 0.375:1 0.37489:1 0.075:1 0.0749:1
Show more

Fig. 8 illustrates the grayscale value variation curves of the same row of pixels of the two layers of the reconstructed slicing images. These two images were selected according to the parameter table whose defect situation was theoretically consistent, such as two adjacent layers. Owing to the hollow defects inside the model, the reconstructed sliced images exhibited obvious differences in the grayscale values for specific row pixels. From Fig. 8, it can be concluded that the grayscale value varied between [0,1] with the appearance of defects in the image, and that of the reconstructed slicing image coincided closely with the adjacent layer of the reconstructed slicing image. The RMSE values of the three sets of pixel data were 0.000378, 0.000230, and 0.000496, indicating an outstanding match within each set of pixel data.

Fig. 8
Grayscale value variation curves. (a) 200th row pixel position; (b) 150th row pixel position; (c) 100th row pixel position
pic

To obtain the complete 3-D reconstruction view shown in Fig. 9, a 3-D model was formed by stacking 180 2-D slice images in the correct order in 3-D space. However, inconsistencies in the images can result in a blurred appearance of the final 3-D model. To address this issue, a smooth filtering operation was applied to obtain a highly restored original model. The reconstructed surface geometry had almost no holes, indicating a high level of accuracy in the 3-D reconstruction process. It can be seen that the reconstructed model reproduced the defect features and external shapes modeled in Geant4 very well. The disadvantage of this method is that some irregularities exist on the surface, which may be caused by incomplete elimination of toroidal artifacts.

Fig. 9
Reconstructed image of 3-D view of the test model
pic
6

Application on a complex model

The complex model is a plug used in the rock layer probe, which is made of aluminum and exported to Geant4 using the software developed in-house, GMAC [36]. First, the complex model was imported from computer-aided design (CAD) software to GMAC, after which its materials were defined and filled with the GMAC. Next, the center coordinates were set at the center of the Geant4 world coordinate system. Subsequently, it was converted into a high-fidelity model and imported into the Geant4 simulation system shown in Fig. 10. The location of the X-ray source, material of the flat-panel detector, and manner in which the X-ray particles were emitted and collected were consistent with the above model, but the size of the detector was 30 cm ×30 cm. To obtain projection images from 180 angles, the imaging simulation was performed 180 times; each time, the model was rotated by 1° and performed with hybrid G4-MT and G4-MPI with 320 threads. During each simulation, 10 billion X-ray particles with an energy of 150 keV were emitted, and the resulting projection image had a resolution of 600 × 600 pixels. In the parallel simulation process, the runtime was shortened largely as the average run time of each simulation is 313 s. Moreover, the run time in the non-parallel computing mode is 46,073 s, whereas the parallel speedup can be up to 147.2 while parallel efficiency is approximately 46%. A total of 180 original projection images were utilized to generate 2-D reconstructed images using the FBP algorithm with a Ram-Lak filter. The model in CAD is shown in Fig. 11(a) and 2-D reconstructed images of the 150th, 300th, and 450th slices are shown in Fig. 11(b), (c), and (d), respectively, for evaluation, as these three slices are representative based on observations that divide the model into four equal parts. From Fig. 11, it is evident that the visual characteristics of the three images are consistent with the corresponding slices of the complex model. For example, the 150th slice in Fig. 11(b) was composed of two separate ellipse-like shanks, as shown in Fig. 11(a). Three metrics, CNR, AG, and IE, are displayed in Table 7; the values of the three metrics indicate that the reconstructed images are of good quality. A 3-D reconstructed view of the model is generated and displayed in Fig. 12, which restores numerous details of the model information.

Fig. 10
Geant4 simulation system for the complex model (a) Digital Radiography System (b) Flat Panel Detector (c) Complex Mode (d) Projection image of 0°
pic
Fig. 11
(a) Shape of the model in CAD and illustration of which layer of the model these three images are (b), (c), and (d) 2-D slicing reconstructed images of the 150th, 300th, and 450th layers
pic
Fig. 12
3-D reconstructed view of the complex model
pic
7

Conclusion

This study proposes a parallel computing approach for simulation to 3-D reconstruct a test model. This was achieved by developing a parallel interface and DR simulation. The interface contributed to delivering information between Geant4-based parallel computing and the DR simulation system, thereby optimizing the use of computational resources. The application included a DR simulation system and function-calling scripts. The performances of the two parallel computing modes, G4-MPI and G4-MT, were compared by performing a DR simulation process with various combinations of threads and nodes. The results demonstrated that the speedup increased approximately linearly as the number of threads increased, whereas the parallel efficiency was maintained at over 94% for both modes. Hybrid G4-MPI and G4-MT have strong scalability and can significantly accelerate the Monte Carlo simulation, with a speedup of up to 151.36. The quality of the reconstructed images was good, and these images can reflect details of information of the defects inside the model accurately. Future work will focus on transforming complex parts into Geant4 models for imaging simulations by integrating the computer-aided design (CAD) function of radiation and the object geometry prior information in the X-ray reaction process. We also use a pre-calibrated depth-learning reconstruction algorithm for the projected image based on sparse view imaging to eliminate circular artifacts from the two-dimensional reconstructed slice image and obtain the best three-dimensional reconstruction result.

References
1. P.F. Stelt,

Better imaging: the advantages of digital radiography

. J. Am. Dent. Assoc. 139, S7-S13 (2008). doi: 10.14219/jada.archive.2008.0357
Baidu ScholarGoogle Scholar
2. J. Kruth, M. Bartscher, S. Carmignato et al.,

Computed tomography for dimensional metrology

. CIRP Annals -Manufacturing Technology, 60(2), 821-842 (2011). doi: 10.1016/j.cirp.2011.05.006
Baidu ScholarGoogle Scholar
3. C. Zhang, X. Pan, H. Shang et al.,

Improvements to conventional x-ray tube-based cone-beam computed tomography system

. Nucl. Sci. Tech. 29, 43 (2018). doi: 10.1007/s41365-018-0370-8
Baidu ScholarGoogle Scholar
4. X. Lu, X. Wang, D. Li et al.,

Comparative study of dr and ct in the application of close contacts screening for tuberculosis outbreaks

. Radiol. Infect. Dis. 3(1), 34-39 (2016). doi: 10.1016/j.jrid.2016.01.004
Baidu ScholarGoogle Scholar
5. L. Chen, B. Li, L. Zhang et al.,

3D positioning of defects for gas turbine blades based on digital radiographic projective imaging

. NDT & E Int. 133, 102751 (2022). doi: 10.1016/j.ndteint.2022.102751
Baidu ScholarGoogle Scholar
6. A.T. Susanto, P. Prajitno, K. Kurnianto,

Development of low-cost industrial x-ray computed tomography system based on digital fluoroscopy

. J. Phys. Conference Series 1825, 012033 (2021). doi: 10.1088/1742-6596/1825/1/012033
Baidu ScholarGoogle Scholar
7. X.L. Ju, B. Deng, K. Li et al.,

Calibrating the linearity between grayscale and element content for X-ray KES imaging of alloys

. Nucl. Sci. Tech. 33, 1 (2022). doi: 10.1007/s41365-022-00986-3
Baidu ScholarGoogle Scholar
8. Y.Q. Yang, W.C. Fang, X.X. Huang et al.,

Static superconducting gantry-based proton CT combined with X-ray CT as prior image for FLASH proton therapy

. Nucl. Sci. Tech. 34, 11(2023) doi: 10.1007/s41365-022-01163-2.
Baidu ScholarGoogle Scholar
9. Y.J. Ma, Y. Ren, P. Feng et al.,

Sinogram denoising via attention residual dense convolutional neural network for low-dose computed tomography

. Nucl. Sci. Tech. 32, 41 (2021) doi: 10.1007/s41365-021-00874-2.
Baidu ScholarGoogle Scholar
10. Z. He, N. Huang, P. Wang et al.,

Spatial resolution and image processing for pinhole camera- based X-ray fluorescence imaging: A simulation study

. Nucl. Sci. Tech. 33, 64 (2022) doi: 10.1007/s41365-022-01036-8.
Baidu ScholarGoogle Scholar
11. D. Mery, Computer vision for X-ray testingCambridge International Law Journal (2015). doi: 10.1007/978-3-319-20747-6
12. G. Wang, L.F. Xu, J.L. Shen et al.,

Iterative and accurate determination of small angle X-ray scattering background

. Nucl. Sci. Tech. 27, 105 (2016). doi: 10.1007/S41365-016-0108-4
Baidu ScholarGoogle Scholar
13. E.M. Souza, S.C.A. Correa, A.X. Silva et al.,

Methodology for digital radiography simulation using the Monte Carlo code MCNPX for industrial applications

. Appl. Radiat. Isotop. 66, 587-592 (2008). doi: 10.1016/j.apradiso.2007.11.004
Baidu ScholarGoogle Scholar
14. P. Liaparinos, I. Kandarakis, D.A. Cavouras et al.,

Modeling granular phosphor screens by Monte Carlo methods

. Med. Phys. 33, 4502 (2006). doi: 10.1118/1.2372217
Baidu ScholarGoogle Scholar
15. J.C. Wagner, D.E. Peplow, S.W. Mosher et al.,

Review of hybrid (deterministic/Monte Carlo) radiation transport methods, codes, and applications at Oak Ridge National Laboratory

. Prog. Nucl. Sci. Technol. 2, 808-814 (2011). doi: 10.15669/PNST.2.808
Baidu ScholarGoogle Scholar
16. H. Jin, D.C. Jespersen, P. Mehrotra et al.,

High performance computing using MPI and OpenMP on multi-core parallel systems

. Parallel Comput. 37, 562-575 (2011). doi: 10.1016/j.parco.2011.02.002
Baidu ScholarGoogle Scholar
17. J. Allison, K. Amako, J. Apostolakis et al.,

Recent developments in Geant4

. Nucl. Instrum. Meth. Phys. Res. Sect. A 835, 186-225 (2016). doi: 10.1016/j.nima.2016.06.125
Baidu ScholarGoogle Scholar
18. Y. Wang, J.G. Liang, Q. Zhang et al.,

Development and verification of Geant4-based parallel computing Monte Carlo simulations for nuclear logging applications

. Ann. Nucl. Energy 172, 109079 (2022). doi: 10.1016/j.anucene.2022.109079.
Baidu ScholarGoogle Scholar
19. X.Y. Wang, J.G. Liang, Y.L. Li et al.,

Hybrid Monte Carlo methods for Geant4 based nuclear well logging implementation

. Ann. Nucl. Energy 169, 108824 (2022). doi: 10.1016/j.anucene.2021.108824..
Baidu ScholarGoogle Scholar
20. L.L. Lin, Y. Wang, Q. Zhang et al.,

A Monte Carlo-based Adaptive Reduced Order Model for Gamma Density Measurement

. Ann. Nucl. Energy 178, 109341 (2022). doi: 10.1016/j.anucene.2022.109341.
Baidu ScholarGoogle Scholar
21. W. Tang, J.G. Liang, Y. Ge et al.,

A method for neutron-induced gamma spectra decomposition analysis based on Geant4 simulation

. Nucl. Sci. Tech. 33, 154 (2022) doi: 10.1007/s41365-022-01144-5.
Baidu ScholarGoogle Scholar
22. M. Kazemi, H. Afarideh, Z. Riazi,

Evaluation of open MPI and MPICH2 performances for the computation time in proton therapy dose calculations with Geant4

. J. Korean Phys. Soc. 67, 1686-1691 (2015). doi: 10.3938/JKPS.67.1686
Baidu ScholarGoogle Scholar
23. M.D. Holbrook, D.P. Clark, C.T. Badea,

Dual source hybrid spectral micro-CT using an energy-integrating and a photon-counting detector

. Phys. Med. Biol. 65, 205012 (2020). doi: 10.1088/1361-6560/aba8b2
Baidu ScholarGoogle Scholar
24. N. Gholami, M.M. Dehshibi, A.I. Adamatzky et al.,

A novel method for reconstructing CT images in GATE/GEANT4 with application in medical imaging: A complexity analysis approach

. J. Inf. Process. 28, 161-168 (2020). doi: 10.2197/ipsjjip.28.161
Baidu ScholarGoogle Scholar
25. B. I. Weiner, W.E. Craighead, The Corsini Encyclopedia of Psychology (2010). doi: 10.5860/choice.47-6008
26. R.F. Tate, G.W. Klett,

Optimal confidence intervals for the variance of a normal distribution

. J. Am. Stat. Assoc. 54, 674-682 (1959). doi: 10.1080/01621459.1959.10501528
Baidu ScholarGoogle Scholar
27. S.F. O'Brien, L.Q. Yi,

How do I interpret a confidence interval?

Transfusion 56(7), 1680-1683 (2016). doi: 10.1111/trf.13635
Baidu ScholarGoogle Scholar
28. M. Amdah, Validity of the single processor approach to achieving large scale computing capabilities. Association for Computing Machinery, New York, NY, USA, 483485(1967). doi: 10.1145/1465482.1465560
29. R. Schofield, L. King, U. Tayal et al.,

Image reconstruction: Part 1 - understanding filtered back projection, noise and image acquisition

. J. Cardiovasc. Comput. 14, 219-225 (2020). doi: 10.1016/j.jcct.2019.04.008
Baidu ScholarGoogle Scholar
30. J. Ma., Y. Song, Q.S. Wang et al.,

Ring artifact correction for X-ray computed tomography

. High Power Laser and Particle Beams 26, 124001 (2014). doi: 10.3788/HPLPB20142612.124001 (in Chinese)
Baidu ScholarGoogle Scholar
31. M.T. Hussani, M.H. Hayani,

The use of filtered back projection algorithm for reconstruction of tomographic image

. Al-Nahrain Journal for Engineering Sciences 17, 151-156 (2014). https://nahje.com/index.php/main/article/view/216
Baidu ScholarGoogle Scholar
32. Y.H. Luo,

Study of exponential filter in ct image reconstruction filtering inverse projection algorithm

. Computer Science 41(z1), 220-223 (2014). https://ir.lzu.edu.cn/handle/262010/131252
Baidu ScholarGoogle Scholar
33. M. Welvaert, Y. Rosseel,

On the definition of signal-to-noise ratio and contrast-to-noise ratio for fMRI data

. PLoS ONE, 8, e77089 (2013). doi: 10.1371/journal.pone.0077089
Baidu ScholarGoogle Scholar
34. J. Wu, H.L. Huang, Y. Qiu et al.,

Remote sensing image fusion based on average gradient of wavelet transform

. IEEE International Conference Mechatronics and Automation 4, 1817-1821 (2005). doi: 10.1109/ICMA.2005.1626836
Baidu ScholarGoogle Scholar
35. D.Y. Tsai, Y. Lee, E. Matsuyama et al.,

Information entropy measure for evaluation of image quality

. J. Digit. Imaging 21(3), 338-347 (2008). doi: 10.1007/s10278-007-9044-5
Baidu ScholarGoogle Scholar
36. Q. Zhang, Q. Zhang, Y. Ge et al.,

GMAC: A Geant4-based Monte Carlo automated computational platform for developing nuclear tool digital twins

. Appl. Radiat. Isotopes 192, 110579 (2023). doi: 10.1016/j.apradiso.2022.110579
Baidu ScholarGoogle Scholar
Footnote

The authors declare that they have no competing interests.