Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction

NUCLEAR ELECTRONICS AND INSTRUMENTATION

Parallel computing approach for efficient 3-D X-ray-simulated image reconstruction

Ou-Yi Li，

Yang Wang，

Qiong Zhang ，

Yong-Hui Li

Nuclear Science and Techniques

Vol.34, No.7

Article number 101

Published in print Jul 2023

Available online 17 Jul 2023

DOI：10.1007/s41365-023-01264-6

79008

Accurate 3-Dimensional (3-D) reconstruction technology for non-destructive testing based on digital radiography (DR) is of great importance for alleviating the drawbacks of the existing computed tomography (CT)-based method. The commonly used Monte Carlo simulation method ensures well-performing imaging results for DR. However, for 3-D reconstruction, it is limited by its high time consumption. To solve this problem, this study proposes a parallel computing method to accelerate Monte Carlo simulation for projection images with a parallel interface and a specific DR application. The images are utilized for 3-D reconstruction of the test model. We verify the accuracy of parallel computing for DR and evaluate the performance of two parallel computing modes—multithreaded applications (G4-MT) and message-passing interfaces (G4-MPI)—by assessing parallel speedup and efficiency. This study explores the scalability of the hybrid G4-MPI and G4-MT modes. The results show that the two parallel computing modes can significantly reduce the Monte Carlo simulation time because the parallel speedup increment of Monte Carlo simulations can be considered linear growth, and the parallel efficiency is maintained at a high level. The hybrid mode has strong scalability, as the overall run time of the 180 simulations using 320 threads is 15.35 h with 10 billion particles emitted, and the parallel speedup can be up to 151.36. The 3-D reconstruction of the model is achieved based on the filtered back projection (FBP) algorithm using 180 projection images obtained with the hybrid G4-MPI and G4-MT. The quality of the reconstructed sliced images is satisfactory because the images can reflect the internal structure of the test model. This method is applied to a complex model, and the quality of the reconstructed images is evaluated.

Parallel computingMonte CarloDigital Radiography3-D reconstruction

Introduction

Non-destructive detection technology can obtain details about defects inside an object without damaging its original structure and properties, indicating high efficiency for object quality testing. Among these, X-ray-based testing is a popular and effective method for the inspection of industrial products. In recent decades, digital radiography (DR) [1] and computed tomography (CT) [2] have been playing important roles in non-destructive testing techniques for industrial and medical examinations. As the energy and penetration ability of X-ray sources increase, DR and CT can detect high-density industrial components and display the location, orientation, shape, and size of defects in workpieces. Both provide highly accurate visualization solutions for detecting the internal structures of objects. As CT is typically used to produce images of each slice layer of a target object, it can reconstruct a complete 3-D structure, thereby accurately reflecting the internal structure information of the object [3]. In contrast, DR is limited to producing 2- Dimensional (2-D) images; therefore, the detection efficiency is lower for more hidden and overlapping objects [4]. However, CT radiation is high and the equipment is large, inflexible, and expensive. In contrast, DR radiation is relatively low, the equipment is easy to operate with a shorter imaging time, and it is relatively cheaper. Therefore, the use of DR to obtain 3-D reconstructed images of an object is of great importance. Chen et al. applied DR technology to analyze the 3-D position of defects in gas turbines, and the relative positioning error was less than 2% [5]. Susanto et al. applied a DR system that is relatively cheaper than the CT system for the 3-D reconstruction of an aluminum step wedge cylinder using the filter back projection (FBP) method; the resulting image quality was satisfactory, as it was easily interpreted by the naked eye [5]. Staub et al. developed an algorithm for computing digital reconstructed radiographs (DDRs) to match the real cone-beam CT without artificial adjustments, and the algorithm produced DDRs in approximately 0.35 s for full detector and CT resolution [6]. Recent studies [7-10] have contributed to the development of X-ray imaging.

The DR system is primarily composed of an X-ray source tube, test object, and detector. After the tube emits X-ray particles, the particles enter the object and interact with the material through photoelectric absorption, Compton scattering, or Rayleigh scattering, such that some of the particles are absorbed or scattered. The attenuated particles are captured by the detector crystal and converted into current signals. Subsequently, the signals are transformed into grey images with different contrasts. These images are referred to as projection images in this paper. The pixel value of the projected images is relative to the intensity decay of the X-rays along the path. The X-ray intensity decay equation based on numerical calculations of determinism is as follows [11]: $\begin{matrix} I = I_{0} e^{- μ x}, \end{matrix}$ (1) where $I$ represents the intensity of the incident particles in the detector, $I_{0}$ represents the intensity from the tube, and μ and x are the linear decay coefficient and thickness of the medium interacting with the X-ray, respectively. For DR, applying Eq. (1) for an object with a simple structure is sufficient, but for a more complicated model, this method has quite a severe error because of the scattering of particles or other cases [12]. To obtain images with well-performing quality, applying the Monte Carlo simulation method is an efficient choice. This is a calculation method based on probability and statistical theory. Souza et al. introduced a methodology for DR simulation based on Monte Carlo simulation. They compared simulated and experimental images of a steel pipe containing corrosion defects and observed that the two images had good consistency [13].

However, a limitation of the Monte Carlo simulation is its significant time consumption, particularly in the 3-D reconstruction of objects [14,15]. Because reconstruction requires numerous projection images from different viewing angles of the object, Monte Carlo simulations must be performed several times. With the rapid development of high-performance computing (HPC) in recent years, large-scale parallel computing has effectively solved time-consuming and computationally resource-intensive problems. Parallel computing on HPC platforms is usually based on both message passing interface (MPI) and shared memory parallel programming (OpenMP) approaches for resource planning and memory sharing, to ensure scalability and accuracy [16]. The Monte Carlo simulation toolkit Geant4 provides two parallel computing modes: an MPI (G4-MPI) and a multithreaded application (G4-MT) for accelerating the Monte Carlo simulation [17]. Wang et al. applied a Geant4-based parallel computing method to nuclear logging and evaluated the performances of the two modes. The results show that the computing process combining G4-MT and G4-MPI execution can significantly reduce the runtime of the Monte Carlo simulations [18]. In addition, studies [19-21] have demonstrated the great ability of the Geant4 toolkit for X-ray imaging simulation.

This study proposes a parallel computing method based on two parallel computing modes, G4-MPI and G4-MT, for the simulation of DR to obtain projection images for 3-D reconstruction. First, the accuracy of parallel computing was validated. Second, the performances of the two modes were evaluated, followed by an evaluation of the scalability of the hybrid G4-MPI and G4-MT. After reconstructing the 2-D slicing images, the quality of the images was measured. Subsequently, a 3-D view image of the model was reconstructed. Finally, the method was applied to a more complex model for validation, and the quality of the reconstructed images was measured.

DR system based on Geant4

Geant4 is a Monte Carlo application developed mainly by the European Organization for Nuclear Research (CERN) based on C++ object-oriented technology. Owing to the realistic nature of its simulation, it can provide users with simulation results that do not differ significantly from real experimental results, making it easy for users to evaluate and modify the simulation experiments. The composition of the DR system and the projection image with a projection angle of 0° are presented in Fig. 1, whereas the test model parameters are listed in Table 1. The test model was a large cube with two small cube defects and a cylindrical defect. Three defects were filled with air, whereas the remainder were made of aluminum. The two small cube defects were centrally symmetric about the center of the model. This symmetrical regularity structure ensures a good hierarchy of the reconstructed sliced images for analyzing the reconstruction results.

Parameters of the test model

	Large cube	Cylindrical defect 1	Small cube defect 2	Small cube defect 3
Side length	4 cm	-	1.5 cm	1.5 cm
Height	-	4 cm	-	-
Radius	-	0.3 cm	-	-
Material	Aluminum	Air	Air	Air

Fig. 1

(Color online) Projection image of 0° and three components of DR system (a) Digital Radiography System (b) Flat Panel Detector (c) Test Model (d) Projection image of 0°

In the Geant4 simulation model, X-ray particles with an energy of 180 keV were emitted by the X-ray source in a cone beam. The direct flat panel detector consisted of CsI crystal, α-Si:H thin-film transistors (TFT), and glass substrate. The detector crystal can convert X-ray into visible light according to the incident intensity of the X-ray, while the TFT array stores the relevant signal to form a pixel matrix [11]. In this study, the size of the detector was 6 cm × 6 cm × 1.5 cm and the TFT array had a pixel sampling interval of 200 μm. Therefore, the modeled detector pixel matrix was set to 300 × 300, and thus, projection images with a pixel size of 300 columns × 300 rows could be obtained.

Implementation of parallel computing function based on Geant4

3.1

Basic principle of parallel computing

The primary concept of parallel computing is to divide a large task into subtasks for some processors and speed up the calculation; each processor works in concert with the others to conduct subtasks in parallel. MPI and OpenMP are primarily used in parallel computing for HPC to facilitate internode information delivery, resource allocation, and memory sharing.

Scalability, which refers to a system's ability to expand in response to future changes in requirements, is an important metric for evaluating parallel computing methods in HPC. Specifically, in this study, satisfactory scalability means that the parallel computing method is less time-consuming and ensures favorable efficiency when more projection images with more emitting particles are required. This study assessed the scalability and performance of the parallel computing mode by considering parallel speed-up and efficiency [21,22]. Speedup is defined as the ratio of the single-thread execution time to the execution time when one or more nodes with multiple threads are used. Parallel speedup, denoted as Sn, can be expressed as $\begin{matrix} S_{n} = \frac{T_{1}}{T_{n}}, \end{matrix}$ (2) where T₁ and T_n are the run time using one and n threads, respectively. Parallel efficiency, denoted as E_n, is defined as the average utilization of the n allocated threads and it can be expressed as $\begin{matrix} E_{n} = \frac{S_{n}}{n}, \end{matrix}$ (3) where n is the number of threads and Sn refers to the speedup with n threads.

3.2

Implementation of parallel computing

Parallel computing techniques have been widely used since the release of Geant4 (version 10.0). G4-MPI cross-node scaling and G4-MT cross-thread scaling are used to realize hybrid parallel computing supported by data collection methods such as tuples, scorers, and histograms. A master node and several slave nodes are used to implement the computing process. Each slave node is connected to several threads that cooperate to output the preliminary computing results, and once the results from each node are merged, they form the final output, which is sent to the master node. In this study, the final output is the energy deposition collected from the histograms.

In this study, a parallel interface and DR application were developed. The DR application in this study included a DR system and function-calling scripts that deployed parallel interfaces. In addition to incorporating the X-ray tube, flat panel detector, and test model in the geometric modeling, Monte Carlo simulation parameters such as particle definition, physical process, and data collection were specified by the imaging system to accurately simulate X-ray interactions with the imaged object. The parallel interface connected the parallel computing functions to a DR system by transmitting data to different nodes and threads. Relevant scripts were developed to construct the interface. A combination of nodes and threads was defined for the submission of the parallel input script, then the energy deposition was collected in the automated merging of the output data script.

Figure 2 shows the application of Geant4-based parallel computing to the simulation of a DR system. The blue boxes represent the DR system and parallel interface, whereas the yellow box displays the 3-D reconstruction process, all of which were developed by the methods described in this article. The red box represents the Geant4 kernel and the purple box represents the HPC developed by Chengdu HPC. The script extracts of the parallel input submission, energy deposition results collection, grayscale transform, and 3-D reconstruction are shown in Fig. 2 below.

Fig. 2

Overview of Geant4-based parallel computing applied in DR. Red pointer line: the DR system's parameters and scripts deploying the parallel interface are settled before the parallel computing. Pink pointer lines: the mapping relationship of parallel computing modes between the Geant4 kernel and HPC. Green pointer lines: the overall system operation flow which starts from the parallel interface deploying scripts and ends at the reconstructed 3-D image generation. The numbers in green indicate the sequences of main calculation and processing.

By employing the interface and application, the simulation of the DR was enabled by G4-MPI and G4-MT, which form the basis of the following sections.

Script extract 1 Parallel Input Submission
//mpirun.sh instruction is used submit parallel input files.
// P:job partition, n:number of nodes, t: threads utilized in each node, TN: total threads
#SBATCH -P p #SBATCH -N n
#SBATCH –ntasks-per-node t
module load mpi/openmpi/4.0.2/gcc-7.3.1
module load apps/Geant4.10.06.p02/openmpi-4.0.2-gcc-7.3.1
mpirun -np TN ./parallel_computing run.mac

Script extract 2 Energy Deposition Results Collection
for (itr = evtMap -> GetMap() -> begin(); itr != evtMap -> GetMap () -> end(); itr++)
{
	fedep1 = *(itr -> second)/ keV;
	a[copyNb] = a [copyNb] + 1;
	b[copyNb]=b[copyNb]+fedep1;
}

Script extract 3 Grayscale Transform and 3-D Reconstruction
//Two lines below are extract of core code for turning energy deposition into projection image
readbook = np.array (data_array1). reshape ((300,300))
readbook1 = (255 * (readbook - min1) / (max1 - min1)). astype(np.int16)
//Four lines below are for turning projection images into slicing images.
proj_fft = my_fft (R, width)
proj_filtered [:,i] = np.multiply (proj_fft [:,i], R_L_filter) proj_ifft = (my_ifft (proj_filtered)). real
fbp [x, y] = fbp [x, y] + proj_ifft [t, i]

Parallel performance evaluation of G4-MPI and G4-MT

This study utilized Chengdu HPC as the computational platform. In this HPC, each node was configured with a 32-core x86 processor with a main frequency of 2.5 GHz. The MPI interface adopts hpcx-2.4.1, and is compiled with gcc7.3.1, which is compatible with the G4-MPI and G4-MT environments. To evaluate the capabilities of G4-MPI and G4-MT for DR, we considered three aspects: the accuarcy of parallel computing for DR, performance of G4-MPI and G4-MT, and application of hybrid G4-MPI and G4-MT modes.

4.1

Accuracy of parallel computing

This section verifies the accuracy of parallel computing by comparing the projection images obtained using parallel and non-parallel computing models. In the parallel computing mode, ten nodes with full threads, which is one case of the hybrid G4-MPI and G4-MT modes, were used, whereas in the non-parallel computing mode, only one thread was used in the simulation. 10 billion particles with an energy of 180 keV were emitted to ensure the convergence of the Monte Carlo simulation. Peak Signal to Noise Ratio (PSNR) and Root Mean Square Error (RMSE) [23,24] were utilized to measure the correctness of parallel images with non-parallel images as the benchmark. The RMSE can be calculated as $\begin{matrix} RMSE (I_{p}, I_{np}) = \sqrt{\frac{1}{m \times n} {\sum_{i = 1}^{m} \sum_{j = 1}^{m} ((I_{p} (i, j) - I_{np} (i, j)))}^{2}}, \end{matrix}$ (4) where I_p and I_np are the images obtained in the parallel and non-parallel computing modes, respectively. The PNSR can be calculated as $\begin{matrix} PNSR = 10 \times {log}_{10} \frac{{({MAX}_{I})}^{2}}{{(RMSE)}^{2}}, \end{matrix}$ (5) where MAXI is the largest pixel value one image has.

Two images from two modes in the same projection angle are shown in Fig. 3, and it can be seen that the two images basically remain the same. The RMSE and PSNR between them are 3.83 and 36.5, respectively, indicating that the two images have a small difference in grayscale. This demonstrates that the results of the parallel computing mode based on Geant4 remain consistent with those of the non-parallel mode; thus, the following sections of the article can be presented.

Fig. 3

Projection images of two computing modes. (a) Projection image of parallel mode. (b) Projection image of non-parallel mode.

4.2

Performance evaluation of G4-MPI and G4-MT

Six test groups were designed using different combinations of test nodes and threads for G4-MPI and G4- MT in Chengdu HPC. To ensure the accuracy and repeatability of the simulation results, each test was repeated 30 times to obtain the population mean value μi and population standard deviation value δi [25]. $\begin{matrix} μ_{i} = \frac{1}{n} \sum_{j = 1}^{30} x_{i} (j), \end{matrix}$ (6) $\begin{matrix} δ_{i} = \sqrt{\frac{1}{n - 1} \sum_{j = 1}^{30} (x_{i} (j) - μ_{i})}, \end{matrix}$ (7) where xi(j) represents the j^th sample value (speedup or parallel efficiency) of test group i and n is the number of repetitions of each test. The confidence interval is calculated as in Eq. (8) [26]. $\begin{matrix} interval (c_{1}, c_{2}) = μ_{i} \pm Z_{α / 2} \times δ_{i} / \sqrt{n}, \end{matrix}$ (8) where α is the significance level and c₁ and c₂ are the upper and lower bounds of the interval, respectively. The critical value, Z_α∕2, is calculated according to the normal distribution table considering 99.7% as the confidence level for each test [27]. The detailed results are shown in Table 2 and Table 3. The line graphs of parallel speedup and efficiency for six tests are shown in Fig. 4.

Parallel performance of G4-MPI

G4-MPI	Speedup (S_n)			Efficiency (E_n)
Combination of node and thread	μ_i	δ_i	Interval (c₁,c₂)	μ_i (%)	δ_i	Interval (c₁,c₂)
1×1	1	0.04	1±0.02	100	0.33	100±0.16
1×3	2.886	0.26	2.886±0.12	96.2	3.23	96.2±1.52
1×5	4.710	0.31	4.710±0.14	94.2	3.73	94.2±1.76
1×7	6.729	0.37	6.729±0.17	96.1	3.13	96.1±1.47
1×10	9.424	0.43	9.424±0.20	94.2	3.75	94.2±1.77

Parallel performance of G4-MT

G4-MT	Speedup (S_n)			Efficiency (E_n)
Combination of node and thread	μ_i	δ_i	Interval(c₁,c₂)	μ_i (%)	δ_i	Interval(c₁,c₂)
1×1	1	0.04	1±0.02	100	0.33	100±0.16
3×1	2.886	0.26	2.886±0.12	96.4	3.23	96.4±1.51
5×1	4.895	0.31	4.710±0.15	97.9	3.73	97.9±1.18
7×1	6.831	0.37	6.729±0.17	97.6	3.13	97.6±1.29
10×1	9.194	0.43	9.424±0.20	91.9	3.75	91.9±1.75

Fig. 4

Parallel speedup and efficiency of G4-MPI and G4-MT. (a) Parallel acceleration (S_n); (b) Parallel efficiency (E_n)

Figure 4 shows that the speedup increases linearly with the number of nodes or threads, whereas the parallel efficiency tends to stabilize at a relatively high value as the number of nodes or threads increases. Furthermore, according to Tables 2 and 3, with a 99.7% confidence level, the true values of the speedup and parallel efficiency of the six tests were within the confidence level for both G4-MT and G4-MPI. This demonstrates that both G4-MT and G4-MPI have good repeatability and efficiency in simulating the DR process. The speedup and efficiency of G4-MPI are marginally higher than those of G4-MT because G4-MPI adopts a shared memory mode, which improves computing by having each thread share information during Monte Carlo calculations.

4.3

Scalability evaluation of hybrid G4-MPI and G4-MT for DR

In this section, ten tests for the DR simulation are designed to evaluate the scalability of the hybrid G4-MPI and G4-MT modes. The G4-MT in this section ran in the full-thread mode for each computational node with 32 threads. The nodes of the 10 tests ranged from 1 to 10. Ten billion X-ray particles were emitted during each test. To illustrate the scalability evaluation, the runtime of one node using one thread was utilized as the benchmark time for 10 tests. The results of the parallel computing of the DR simulation are listed in Table 4, and the parallel speedup and efficiency of the ten tests are plotted in Fig. 5.

Scalability test of hybrid G4-MPI and G4-MT for DR simulations

Combination of node and thread	Run time (s)	Parallel speedup (Sn)	Parallel efficiency (En) (%)
1×1	46,468	1	100
1×32	1607	28.91	90.36
2×32	864	53.78	84.04
3×32	623	74.58	77.69
4×32	514	90.40	70.63
5×32	450	103.26	64.38
6×32	410	113.33	59.03
7×32	381	121.96	54.45
8×32	354	131.26	51.28
9×32	330	140.81	48.89
10×32	307	151.36	47.35

Fig. 5

(Color online) Scalability evaluation of hybrid G4-MPI and G4-MT. (a) Runtime and parallel speedup (S_n); (b) Parallel efficiency (E_n).

Table 7 and Fig. 5 demonstrate that as the number of nodes increases, the speedup deviates from linear growth and the parallel efficiency decreases within a certain range. According to Amdahl's Law [28], which describes how speedup increases theoretically when multiple processors are used in parallel computing, different nodes have parallel overhead time such as communication and waiting; therefore, when the number of nodes increases, parallel speedup will deviate from ideality, which leads to a decrease in parallel efficiency. This is acceptable in this study because the speedup can reach up to 151.36 and the reduction in runtime is significant. For example, in the next section, 180 projection images were required for reconstruction. Based on the runtime required for one projection image using one thread, it can be estimated that 2,323.4 h are required. However, using G4-MPI in full threads, the process will only cost approximately 15.35 h in total. The strong scalability of G4-MPI in the full-thread mode for DR simulation was proven.

Metrics of three slicing images with Ram-Lak filter

Complex Model	CNR	AG	IE
150th	4.13	7.2291	0.15260
300th	4.48	5.7916	1.33991
450th	2.93	8.6436	1.04886

3-D reconstruction of the model and quality analysis of image

This section implements the DR simulation to obtain sufficient original projection images using parallel computing with G4-MPI in full threads and utilizes the projection images to reconstruct the test model in three dimensions to obtain information about the defects inside the model. The simulation was implemented 180 times, and each time the test model was rotated by 1°, as shown in Fig. 6; thus, 180 2-D projection grayscale images 300 × 300 pixels in size were generated. During the simulation process, the X-ray generator emitted 10 billion gamma particles with an energy of 180 keV in 10-node mode with 32 threads per node. The rotation operation of the test model was automatically implemented by a script developed in this study.

Fig. 6

(Color online) Rotational imaging schematic

5.1

3-D reconstruction using filtered back projection algorithm

The FBP algorithm is widely used. It is a spatial processing technique based on Fourier transform theory [29]. One of its characteristics is that the projection under each acquisition projection angle is convolved by specific filters before back-projection, such that the toroidal artifact caused by the point spread function can be reduced. This study adopted the FBP algorithm to obtain the 3-D reconstructed images of the model. The FBP process is described as follows.

• Step 1: Obtain one 300 × 180 original projection matrix using the same rows of pixel information in all the original projection images.

• Step 2: Pre-process the original projection matrix data for calibration.

• Step 3: Conduct a one-dimensional Fourier transform for the result of Step 2.

• Step 4: Perform convolution filtering on the result of Step 3.

• Step 5: Conduct a one-dimensional inverse Fourier transform for the result of Step 4.

• Step 6: Perform a direct inverse projection on the result of Step 5 for the 2-D reconstructed slicing image.

• Step 7: Repeat Steps 1 to 6 300 times to produce 300 2-D slicing images.

• Step 8: Utilize the 300 images to obtain 3-D reconstructed images.

Step 1 considers the Nth row of the pixel information of each original projection image. N is initially equal to one and increases by one with each repetition. Thus, 300 original 300 × 180 projection matrices are generated after 300 repetitions. Three hundred 3-D reconstructed slicing images, each 300 × 300 pixels in size, are generated.

As the toroidal artifact is one of the main factors affecting the quality of 3-D reconstructed images, this study adopted a projection data pre-calibration method combining polynomial fitting and the probability statistics correction method in Step 2 [30]. The range of alternative correction factors was determined by polynomial fitting of the projected data column-by-column, and correction factors were determined using the maximum probability principle.

Among the eight steps above, the design of the filter in Step 4 has a significant impact on the final reconstruction results [31]. The selection of filters is actually the selection of window functions and to gain better-reconstructed image resolution, the inverse Fourier transform of the window functions should have a high and narrow central protrusion [32]. To obtain reconstructed images with better quality, this study used the three most commonly used filters: the Ram-Lak, Shepp-Logan, and Hamming filters to perform convolution filtering on the result of the one-dimensional Fourier transform. The 2-D reconstructed slicing image under 0° rotation is consistent with that shown in Fig. 1.

5.2

Quality evaluation of reconstructed images

To quantitatively measure the quality of the reconstructed images, three commonly used image quality assessment metrics, contrast-to-noise ratio (CNR) [33], average gradient (AG) [34], and image entropy (IE) [35], were used. CNR is defined as the ratio of peak signal intensity to background intensity and is an objective indicator of average image quality. Its value is proportional to the image quality, and a higher CNR value indicates a better recognition of target defects. AG is sensitive to the ability of an image to express small details in contrast. In a certain direction of an image, AG tends to be larger with a greater change in gray level. Thus, AG could be used to measure the clarity of an image and reflects small detail contrasts and texture transformation features in an image. IE reflects the amount of average information in the image and represents the aggregation characteristics of the image grayscale distribution. The greater the entropy of the image, the richer the pixel grayscale contained in the image, and the more uniform the grayscale distribution.

The results of the reconstructed image quality assessment of the model used in this article are shown in Table 5. Based on the spatial characteristics of the model, this study presented the results of a quality assessment of the reconstructed images of the 100^th, 150^th, and 200^th layers of the test model. Reconstructed slicing images of the three layers depict the representation of hollow defects and solid aluminum. The quality of the reconstructed images varied considerably for different filters and the number of slicing layers in the reconstructed images.

Metrics of reconstructed images

Test mode	Filter	CNR	AG	IE
100th	Ram-Lak	8.65	0.038840	0.73650
	Shepp-Logan	8.22	0.041788	0.72003
	Hamming	2.86	0.013373	0.71352
150th	Ram-Lak	9.03	0.031075	0.38491
	Shepp-Logan	8.11	0.033115	0.37352
	Hamming	2.35	0.013373	0.37053
200th	Ram-Lak	8.34	0.038681	0.74198
	Shepp-Logan	7.86	0.043203	0.72891
	Hamming	2.75	0.013372	0.71265

First, the reconstructed slicing images using the Ram-Lak filter had better quality than those using the Shepp-Logan or Hamming filters. Second, there was a high degree of similarity between the metric values of the 100^th and 200^th layers because of the consistent physical structures of the two slices in the model, such as the density, material, and shape of the defect. Moreover, the CNRS of the 100^th and 200^th slices are smaller than that of the 150^th slice, mainly because the defect area in the 150^th slice of the image is smaller than those of 100^th and 150^th slices. This demonstrates that the 3-D reconstruction method based on parallel computing has good recognition of small defects. Furthermore, the AG and IE of the 100^th and 200^th slices were larger than those of the 150^th because of the greater change in the grayscale value of the former. It can be concluded that the images of the 100^th and 200^th slices showed more detail in contrast. In general, the value of the three metrics meets the desired expectation for the designated model. Figure 7 shows the reconstructed 2-D images using the three aforementioned filters. It is evident that the toroidal artifacts are more prominent in the images reconstructed with the Shepp-Logan and Hamming filters compared to those reconstructed with the Ram-Lak filter.

Fig. 7

Representative reconstructed slicing images a1-a3: Reconstructed images of 100^th slice b1-b3: Reconstructed images of 150^th slice c1-c3: Reconstructed images of 200^th slice a1,b1,c1: Reconstructed images of Hamming filter a2,b2,c2: Reconstructed images of Shepp-Logan filter a3,b3,c3: Reconstructed images of Ram-Lak filter

$CNR = \frac{| μ_{i} - μ_{o} |}{\sqrt{δ_{i}^{2} + δ_{o}^{2}}}$ , $δ_{i}^{2} = E {(| s_{i} |^{2} - μ_{i})}^{2}$ , $δ_{o}^{2} = E {(| s_{o} |^{2} - μ_{o})}^{2}$ , μi and μo are the average pixel value of the region of the interest (ROI) and background, respectively, and $δ_{i}$ and $δ_{o}$ are the variances of ROI and background, respectively. $AG = \frac{1}{M \times N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} \sqrt{{(\frac{\partial f (i, j)}{\partial x})}^{2} + {(\frac{\partial f (i, j)}{\partial y})}^{2}}$ , whereas M×N is the pixel size of the image. $\frac{\partial f (i, j)}{\partial x}$ and $\frac{\partial f (i, j)}{\partial y}$ are the differences of the pixel value in the x and y directions, respectively. $IE = - \sum_{g = 0}^{L - 1} P_{g} \times {log}_{2} P_{g}$ , L is the overall grayscale level of the image, and $P_{g}$ is the probability of gray level g.

To further discuss the quality of the 3-D-reconstructed images, we compared the defect situation of the model in the ideal case with that of the reconstructed slicing image sets. As the test model was accurately modeled using Geant4, this study utilized the parameter table as an inspection standard for the set of 300 reconstructed slicing images.

Among the 300 images were some blank images because the size of the flat-panel detector was larger than that of the test model. During this process, the X-ray particles were captured by a flat-panel detector without decay. In this case, after the data of the associated units on the flat-panel detector were processed, the corresponding pixels appeared blank.

In this process, excluding 100 blank images, the model was divided into 200 slices. A total of 148 slices contained small square and cylindrical defects, whereas 52 slices contained only cylindrical defects. The ratio of layers containing the square defect and non-blank layers (r₁) is 0.74:1, whereas the corresponding ratio (R₁) according to parameter Table 1 is 0.75:1. The ratio of the length of the side of the small square defect to the length of the side of the large square (r₂) in Fig. 7 is 0.37489:1, whereas the corresponding ratio (R₂) according to parameter Table 1 is 0.375:1. The ratio of the diameter of the round defect to the length of the side of the large square is 0.0749:1 (r₃), whereas the corresponding ratio according to parameter Table 1 is 0.075:1 (R₃). The difference between the three ratios r₁, r₂, and r₃ is within the acceptable error range. Examining the images of the remaining layers showed that the results were similar (Table 6).

Proportional compatibility of the test model

Ratio	R₁	r₁	R₂	r₂	R₃	r₃
Value	0.75:1	0.7485:1	0.375:1	0.37489:1	0.075:1	0.0749:1

Fig. 8 illustrates the grayscale value variation curves of the same row of pixels of the two layers of the reconstructed slicing images. These two images were selected according to the parameter table whose defect situation was theoretically consistent, such as two adjacent layers. Owing to the hollow defects inside the model, the reconstructed sliced images exhibited obvious differences in the grayscale values for specific row pixels. From Fig. 8, it can be concluded that the grayscale value varied between [0,1] with the appearance of defects in the image, and that of the reconstructed slicing image coincided closely with the adjacent layer of the reconstructed slicing image. The RMSE values of the three sets of pixel data were 0.000378, 0.000230, and 0.000496, indicating an outstanding match within each set of pixel data.

Fig. 8

Grayscale value variation curves. (a) 200^th row pixel position; (b) 150^th row pixel position; (c) 100^th row pixel position

To obtain the complete 3-D reconstruction view shown in Fig. 9, a 3-D model was formed by stacking 180 2-D slice images in the correct order in 3-D space. However, inconsistencies in the images can result in a blurred appearance of the final 3-D model. To address this issue, a smooth filtering operation was applied to obtain a highly restored original model. The reconstructed surface geometry had almost no holes, indicating a high level of accuracy in the 3-D reconstruction process. It can be seen that the reconstructed model reproduced the defect features and external shapes modeled in Geant4 very well. The disadvantage of this method is that some irregularities exist on the surface, which may be caused by incomplete elimination of toroidal artifacts.

Fig. 9

Reconstructed image of 3-D view of the test model

Application on a complex model

The complex model is a plug used in the rock layer probe, which is made of aluminum and exported to Geant4 using the software developed in-house, GMAC [36]. First, the complex model was imported from computer-aided design (CAD) software to GMAC, after which its materials were defined and filled with the GMAC. Next, the center coordinates were set at the center of the Geant4 world coordinate system. Subsequently, it was converted into a high-fidelity model and imported into the Geant4 simulation system shown in Fig. 10. The location of the X-ray source, material of the flat-panel detector, and manner in which the X-ray particles were emitted and collected were consistent with the above model, but the size of the detector was 30 cm ×30 cm. To obtain projection images from 180 angles, the imaging simulation was performed 180 times; each time, the model was rotated by 1° and performed with hybrid G4-MT and G4-MPI with 320 threads. During each simulation, 10 billion X-ray particles with an energy of 150 keV were emitted, and the resulting projection image had a resolution of 600 × 600 pixels. In the parallel simulation process, the runtime was shortened largely as the average run time of each simulation is 313 s. Moreover, the run time in the non-parallel computing mode is 46,073 s, whereas the parallel speedup can be up to 147.2 while parallel efficiency is approximately 46%. A total of 180 original projection images were utilized to generate 2-D reconstructed images using the FBP algorithm with a Ram-Lak filter. The model in CAD is shown in Fig. 11(a) and 2-D reconstructed images of the 150th, 300th, and 450th slices are shown in Fig. 11(b), (c), and (d), respectively, for evaluation, as these three slices are representative based on observations that divide the model into four equal parts. From Fig. 11, it is evident that the visual characteristics of the three images are consistent with the corresponding slices of the complex model. For example, the 150th slice in Fig. 11(b) was composed of two separate ellipse-like shanks, as shown in Fig. 11(a). Three metrics, CNR, AG, and IE, are displayed in Table 7; the values of the three metrics indicate that the reconstructed images are of good quality. A 3-D reconstructed view of the model is generated and displayed in Fig. 12, which restores numerous details of the model information.

Fig. 10

Geant4 simulation system for the complex model (a) Digital Radiography System (b) Flat Panel Detector (c) Complex Mode (d) Projection image of 0°

Fig. 11

(a) Shape of the model in CAD and illustration of which layer of the model these three images are (b), (c), and (d) 2-D slicing reconstructed images of the 150^th, 300^th, and 450^th layers

Fig. 12

3-D reconstructed view of the complex model

Conclusion

This study proposes a parallel computing approach for simulation to 3-D reconstruct a test model. This was achieved by developing a parallel interface and DR simulation. The interface contributed to delivering information between Geant4-based parallel computing and the DR simulation system, thereby optimizing the use of computational resources. The application included a DR simulation system and function-calling scripts. The performances of the two parallel computing modes, G4-MPI and G4-MT, were compared by performing a DR simulation process with various combinations of threads and nodes. The results demonstrated that the speedup increased approximately linearly as the number of threads increased, whereas the parallel efficiency was maintained at over 94% for both modes. Hybrid G4-MPI and G4-MT have strong scalability and can significantly accelerate the Monte Carlo simulation, with a speedup of up to 151.36. The quality of the reconstructed images was good, and these images can reflect details of information of the defects inside the model accurately. Future work will focus on transforming complex parts into Geant4 models for imaging simulations by integrating the computer-aided design (CAD) function of radiation and the object geometry prior information in the X-ray reaction process. We also use a pre-calibrated depth-learning reconstruction algorithm for the projected image based on sparse view imaging to eliminate circular artifacts from the two-dimensional reconstructed slice image and obtain the best three-dimensional reconstruction result.

References

P.F. Stelt,

Better imaging: the advantages of digital radiography

. J. Am. Dent. Assoc. 139, S7-S13 (2008). doi: 10.14219/jada.archive.2008.0357