A multithreaded parallel upwind sweep algorithm for the SN transport equations discretized with discontinuous finite elements

NUCLEAR ENERGY SCIENCE AND ENGINEERING

A multithreaded parallel upwind sweep algorithm for the S_N transport equations discretized with discontinuous finite elements

Zhi-Wei Zong，

Mao-Song Cheng ，

Ying-Chi Yu，

Zhi-Min Dai

Nuclear Science and Techniques

Vol.34, No.12

Article number 200

Published in print Dec 2023

Available online 11 Dec 2023

DOI：10.1007/s41365-023-01355-4

62002

The complex structure and strong heterogeneity of advanced nuclear reactor systems pose challenges for high-fidelity neutron-shielding calculations. Unstructured meshes exhibit strong geometric adaptability and can overcome the deficiencies of conventionally structured meshes in complex geometry modeling. A multithreaded parallel upwind sweep algorithm for S_N transport was proposed to achieve a more accurate geometric description and improve the computational efficiency. The spatial variables were discretized using the standard discontinuous Galerkin finite-element method (DGFEM). The angular flux transmission between neighboring meshes was handled using an upwind scheme. In addition, a combination of a mesh transport sweep and angular iterations was realized using a multithreaded parallel technique. The algorithm was implemented in the 2D/3D S_N transport code ThorSNIPE, and numerical evaluations were conducted using three typical benchmark problems: IAEA, Kobayashi-3i, and VENUS-3. These numerical results indicate that the multithreaded parallel upwind sweep algorithm can achieve high computational efficiency. ThorSNIPE, with a multithreaded parallel upwind sweep algorithm, has good reliability, stability, and high efficiency, making it suitable for complex shielding calculations.

Shielding calculationDiscrete ordinates methodDiscontinuous Galerkin finite element methodUnstructured meshes

Introduction

High-fidelity neutron-shielding calculations are an essential part of advanced nuclear reactor system design. The discrete ordinate (S_N) method is one of the most popular deterministic particle transport methods owing to its high angular resolution [1]. Reliable modeling and simulation of S_N transport depend on two key essential conditions and fundamental components: reasonable spatial discretization schemes and mesh distributions with sufficient modeling fidelity. The two key factors interacted with and supported each other throughout the process.

Conventional Cartesian structured grids are widely used and implemented because of their regular arrangement and friendliness to finite-difference (FD) spatial discretization schemes [2-4]. However, there is no general robust meshing algorithm for acquiring exact descriptions of complex structures for realistic modeling and simulations [5]. Adaptive mesh refinement techniques retain the original mesh distribution features and mark cells with large errors for refinement [6]. Local error estimation techniques and geometric data structures play significant roles in determining the stabilities and computational efficiencies of these systems. Unstructured grids use piecewise flat or curved surfaces to provide complicated geometries with the highest possible flexibility. With the development of the finite element method (FEM), unstructured grids have been widely applied to reactor physical analyses [7], multiphysics simulations [8], and thermal-fluid dynamics [9]. Combining finite element numerical methods and the SN method with unstructured meshes has become a trend in nuclear shielding design.

Unfortunately, the first-order SN equation cannot be solved stably using the continuous FEM [10]. However, while the solutions to this hyperbolic equation are smooth along the streamlines, they may be discontinuous perpendicular to the chosen direction. To obtain a robust solution, several Petrov-Galerkin (PG) approaches that update the test function as the sum of the trial functions and introduce an additional stabilization term [11-13]. It was observed that the free stabilization parameter affects the convergence of the S_N equation and raises uncertainty because it is usually considered as a matter of experience. Additionally, PGs cannot generally produce symmetric positive-definite coefficient matrices or inherit the best approximation property when the transport equations contain nonself-adjoint operators.

The discontinuous Galerkin finite element method (DGFEM) is a different category of stability improvement. This was pioneered by Reed and Hill for neutron transport problems in the early 1970s [14]. The DGFEM connects adjacent elements through the incoming interface fluxes from the upwind (upstream) meshes and applies the usual FEM procedure to a single mesh. However, the number of unknowns in the DGFEM is much larger than that in the continuous FEMs, which demands a matrix assembly and solution method. The standard matrix assembly procedure looks for a reliable iterative solver for a banded sparse system matrix solution and must handle the storage brought on by the increased freedom [15]. The matrix-free strategy stores the diagonal blocks and realizes the inversion of the transport operator by means of the matrix-vector product; however, a sum factorization technique to enable translation between vector entries and values or gradients in quadrature points is required [16]. Another commonly used technique is the conventional sweeping procedure, such as FDs, in which parallelization of the sweeping procedure is implemented to reduce the computing time, which may become important in a large SN order.

In terms of parallelization, complex grid connection relationships, and irregular element sweep orders pose challenges to the algorithms. In 2000, Plimpton described an asynchronous, parallel message-passing algorithm that simultaneously performs sweeps from many directions across unstructured grids [17]. In 2002, Pautz used a low-complexity list-ordering heuristic to determine sweep ordering on a partitioned mesh [18]. The essence of these two algorithms is the conversion of the abstract element sweep order into a directed acyclic graph (DAG). Since 2010, several research institutions have conducted extensive studies on high-performance architectural applications [19, 20].

A multigroup 2D SN transport code, ThorSNIPE, with unstructured meshes was developed and validated [21]. This code was extended to 3D SN transport calculations and employed a multithreaded parallel upwind sweep algorithm to achieve a more accurate description and improve the reliability and efficiency of transport calculations in complex geometries. The finite element method was implemented using an open-source finite element library deal.II-9.4.0 [22]. The deal.II is an open-source C++ program library targeted at the computational solution of partial differential equations using adaptive finite elements and focuses on generality, dimension-independent programming, parallelism, and extensibility. It includes many state-of-the-art programming techniques for solving partial differential equations, linear algebra problems, and computer science strategies and offers users a modern interface for complex data structures and algorithms. With the assistance of this library, ThorSNIPE can be made dimension-independent, supporting different mesh sizes and types of finite elements.

The remainder of this paper is organized as follows. In Sect. 2, the main theory and methodology of the ThorSNIPE code are described in detail, including the S_N weak-form equation based on DGFEM theory and the multithreaded parallel upwind sweep algorithm. The numerical results are presented and discussed in Sect. 3. The conclusions are summarized in Sect. 4.

Theory and methodology

2.1

Weak form of the discrete ordinates method equations

The S_N form was obtained by solving the Boltzmann transport equation along discrete directions and replacing the integrals over the unit sphere with quadrature sets. The steady-state multi-group S_N neutron transport equation can be written as $Ω_{m} \cdot \nabla ψ_{m}^{g} (r) + \sum_{t}^{g} (r) ψ_{m}^{g} (r) = Q_{m}^{g} (r), g = 1, ..., G; m = 1, ..., M; r \in D$ (1) Where $D$ is the domain. $ψ_{m}^{g} (r)$ is the angular flux in group $g$ , direction $Ω_{m}$ and position $r$ . $\sum_{t}^{g}$ is the macroscopic total cross-section in group $g$ . $Q_{m}^{g} (r)$ is a source term in group $g$ , direction $Ω_{m}$ and position $r$ . For fixed source problems, it can be expressed as: $Q_{m}^{g} (r) = \sum_{g' = 1}^{G} \sum_{n = 0}^{N} \frac{2 n + 1}{4 π} \sum_{s, n}^{g" \to g} (r) \sum_{k = - n}^{n} ϕ_{n}^{k} (r) Y_{n}^{k} (Ω_{m}) + q_{g} (r)$ (2) and for eigenvalue problems, it can be expressed as: $\begin{matrix} Q_{m}^{g} (r) = \sum_{g' = 1}^{G} \sum_{n = 0}^{N} \frac{2 n + 1}{4 π} \sum_{s, n}^{g" \to g} (r) \sum_{k = - n}^{n} ϕ_{n}^{k} (r) Y_{n}^{k} (Ω_{m}) \\ + \frac{χ_{g}}{4 π k_{eff}} \sum_{g' = 1}^{G} υ^{g'} (r) \sum_{f}^{g"} (r) ϕ_{g'} (r), \end{matrix}$ (3) $ϕ_{g} (r) = \sum_{m = 1}^{M} ω_{m} ψ_{m}^{g} (r),$ (4) where $G$ and $M$ is total groups and angles. $\sum_{s, n}^{g" \to g} (r)$ is the $n$ th-order Legendre moment of scattering cross section from group $g'$ to group $g$ at position $r$ . $\sum_{f}^{g"} (r)$ is the macroscopic fission cross section in group $g'$ at position $r$ . $υ^{g'} (r)$ is the average number of neutrons produced per fission in group $g'$ . $k_{e f f}$ is the eigenvalue. $ϕ_{n}^{k} (r)$ represents kth, nth-order flux moment at position $r$ . $Y_{n}^{k} (Ω_{m})$ is $k$ th, $n$ th-order spherical harmonic in direction $Ω_{m}$ at position $r$ . $q_{g} (r)$ is fixed-source in group $g$ at position $r$ . $ϕ_{g} (r)$ is scalar flux in group g at position $r$ . $ω_{m}$ is weight factor of quadrature set.

Eq. (1) is a linear hyperbolic equation: It is necessary to transform this "strong form" differential-integral equation into "weak form" for solving. The spatial domain $D$ is decomposed into $K$ elements, which can be unstructured because the FEM is independent of the space dimension and grid choice. Consider $\partial V_{k}$ as the surface of the $k$ th element and $V_{k}$ as the volume of the $k$ th element, where $k = 1, ..., K$ . To obtain the weak form, Eqs. (1) is multiplied by an arbitrary test function $ψ_{m}^{*} (r)$ and integrated into any element $k$ . Eq. (1) can be rewritten in monoenergetic form as $\begin{array}{l} \int_{V_{k}} Ω_{m} \cdot \nabla ψ_{m} (r) ψ_{m}^{*} (r) d V \\ + \int_{V_{k}} Σ_{t} (r) ψ_{m} (r) φ_{m}^{*} d V = \int_{V_{k}} Q_{m} (r) ψ_{m}^{*} d V . \end{array}$ (5)

The divergence theorem is then applied to Eq. (5), yielding: $\begin{array}{l} \int_{\partial V_{k}} Ω_{m} \cdot n (r) ψ_{m}^{*} (r) ψ_{m} (r) d S - \int_{V_{k}} ψ_{m} (r) Ω_{m} \cdot \nabla ψ_{m}^{*} (r) d V + \\ \int_{V_{k}} Σ_{t} (r) ψ_{m} (r) ψ_{m}^{*} (r) d V = \int_{V_{k}} Q_{m} (r) ψ_{m}^{*} (r) d V \end{array}$ (6) where $n (r)$ is the unit normal vector to the surface of cell $\partial V_{k}$ at position $r$ . Inner products on the volume and surface of any element are introduced to express the bilinear form [23]. ${〈 a, b 〉}_{V_{k}} = \int_{V_{k}} a (r) b (r) d V$ (7) ${(a, b)}_{\partial V_{k}} = \int_{\partial V_{k}} | Ω_{m} \cdot n (r) | a (r) b (r) d S$ (8)

Substituting Eq. (7) and (8) into (6), $\begin{array}{l} {(ψ_{m}^{*}, ψ_{m})}_{\partial V_{k}} - {〈 ψ_{m}, Ω_{m} \nabla ψ_{m}^{*} 〉}_{V_{k}} \\ + {〈 \sum_{t} ψ_{m}, ψ_{m}^{*} 〉}_{V_{k}} = {〈 Q_{m}, ψ_{m}^{*} 〉}_{V_{k}} . \end{array}$ (9)

The general boundary can be divided into three types based on its relative position: interior boundary, outflow domain boundary $(\partial V_{k}^{+} = {r \in \partial V_{k}, Ω \cdot n (r) > 0})$ , and inflow domain boundary $(\partial V_{k}^{-} = {r \in \partial V_{k}, Ω \cdot n (r) < 0})$ . The inflow-domain boundary conditions were prescribed as either vacuum or reflective for the incoming directions. So, Eq. (9) can be rewritten as $\begin{array}{l} {(ψ_{m}^{*}, ψ_{m}^{+})}_{\partial V_{k}^{+}} - {〈 ψ_{m}, Ω_{m} \nabla ψ_{m}^{*} 〉}_{V_{k}} + {〈 Σ_{t} ψ_{m}, ψ_{m}^{*} 〉}_{V_{k}} \\ = {〈 Q_{m}, ψ_{m}^{*} 〉}_{V_{k}} + {(ψ_{m}^{*}, ψ_{m}^{-})}_{\partial V_{k}^{-}} + {(ψ_{m}^{*}, ψ_{m}^{in})}_{\partial V_{k}^{in}}, \end{array}$ (10) where $ψ_{m}^{+}$ , $ψ_{m}^{-}$ , and $ψ_{m}^{in}$ are the outflow angular, inflow angular, and interior face fluxes, respectively. Eq. (10) represents the weak form of the S_N equation.

2.2

The discontinuous Galerkin finite element discretization scheme

The Galerkin method is theoretically based on the weighted residual method, which multiplies the original partial differential equation by a test function. Seeking a numerical approximation of the angular flux term using Lagrange polynomials of degree P $ψ_{m} = \sum_{j = 1}^{J_{P}} φ_{m, j} b_{j} (r),$ (11) where $b_{j} (r)$ is the Lagrange shape function, and $φ_{m, j}$ is an unsolved coefficient. The matrix form is convenient for solving Eq. (10), which expresses $φ_{m, j}$ and $b_{j} (r)$ in row vector notation, and Eq. (11) can be written as $ψ_{m} = [φ_{m, 1}, ..., φ_{m, J_{P}}] {[b_{1}, ..., b_{J_{P}}]}^{Τ} = φ_{m} b^{Τ} .$ (12)

In Galerkin discretization schemes, test functions share similar function space as basis functions, $ψ_{m}^{*} = {[b_{1}, ..., b_{J_{P}}]}^{Τ} = b .$ (13)

Substituting Eq. (12) and (13) into (10), $\begin{array}{r} {(b, b^{Τ})}_{\partial V_{k}^{+}} φ_{m}^{+} - {〈 b^{Τ}, Ω_{m} \nabla b 〉}_{V_{k}} φ_{m} + {〈 Σ_{t} b^{Τ}, b 〉}_{V_{k}} φ_{m} \\ = {〈 Q_{m}, ψ_{m}^{*} 〉}_{V_{k}} + {(ψ_{m}^{*}, ψ_{m}^{-})}_{\partial V_{k}^{-}} + {(ψ_{m}^{*}, ψ_{m}^{i n})}_{\partial V_{k}^{in}} . \end{array}$ (14)

The upwind scheme is the essence of the DGFEM. Notably, although the test functions on the edges are always taken from within the element, the primal functions on the edges are taken upwind with respect to the streaming direction. For outgoing edges, the primal functions are taken from within the element itself, whereas for incoming edges, the primal functions are taken from the upwind neighbor element.

2.3

Multithreaded parallel upwind sweep algorithm

2.3.1

Cell-based sweep algorithm

In the S_N method, a transport sweep was performed for all elements along each discrete direction. The ordering in which the elements and degrees of freedom (DOF) traverse affects the speed of convergence. As mentioned in Section 1, the DGFEM makes it possible to use a cell-based sweeping procedure in S_N methods rather than assembling a global matrix.

The element sweep order for each discrete direction in the ThorSNIPE code was developed using an upwind scheme based on the deal.II library. In our case, an array with all relevant cells is created and sorted in the upwind direction using a "dealii :: DoFRenumbering :: CompareDownstream" object in the deal.II FEM library. A straightforward 2D model with three material regions and various sweep schemes is shown in Fig. 1. Cells in a conventional scheme are often categorized from edge to center to make it easier to facilitate the creation of a global matrix. The downwind system performs renumbering in the direction opposite to the discrete direction, whereas the upwind scheme manages renumbering alongside it. The definitions of "upwind" and "downwind" are determined by multiplying the unit normal vector to the cell's surface by the cosines of the corresponding discrete direction. In reality, the mesh location, discrete direction, and sweep-define the order of each element. The sweep solution technique also complies with neutron transport laws. Not all pathways are mutually exclusive, and each distinct direction results in a unique order in which this solution technique can be applied.

Fig. 1

(Color online) (a) Material region distribution; (b) Conventional scheme; (c) Upwind scheme; (d) Downwind scheme

From the perspective of solving the equations, various sweep schemes also affect the calculation of the incoming angular fluxes $ψ_{m}^{in}$ in Eq. (14). Primal functions obtained from neighboring elements require numerical values with a certain precision. The calculation error was gradually reduced when the transport equation was solved along the direction of particle motion using the upwind scheme. However, other schemes must process nondirectional data, adapt, and learn based on the data received. This process can be laborious and error-prone, and the accumulation of errors inevitably affects the robustness of the numerical calculation method. Thus, the entire solution procedure required more iterations.

Algorithm 1 displays the cell-based sweep algorithm for each group and angle once the sweep order has been determined. The local equation can be divided into two parts as follows: The transport matrix first makes contributions from the streaming and removal operators determined by the fixed mass matrix and total cross-section. Second, the right-hand side (RHS) vector includes contributions from scattering source operations and fixed source or fission source operators. The in-group and cross-group couplings comprised the contributions of the scattering sources. Lastly, a "dealii :: types :: global dof index" object translates the local solution to the global angular solution. This cell-based sweep technique delivers the DGFEM solution to all cells in a single sweep order without coupling integral terms, fully utilizing the key benefits of DGFEM approaches.

Algorithm 1 cell-based sweep for each group and angle
Input: g, m, q_fixed(r)
for upwind sorted cell do
	for DOF of cell i do
		for DOF of cell $j$ do
			$t r a n s p o r t_m a t r i x (i, j) = - \int_{V_{k}} b_{i}^{Τ} (r) Ω_{m} \cdot \nabla b_{j} (r) d V + \int_{V_{k}} \sum_{t} (r) b_{i} (r) b_{j}^{Τ} (r) d V$
	for moment n do
		for moment k do
			for DOF of cell i do
				$s o u r c e_r h s (i) + = \int_{V_{k}} b_{i} (r) (\sum_{r} (r) ϕ_{n}^{k} (r) + q_{f i x e d} (r)) Y_{n}^{k} (Ω_{m}) d V$
	for faces on cell do
		if outflow face then
				for DOF of face i do
					for DOF of face j do
						$b o u n d a r y_m a t r i x (i, j) = \int_{\partial V_{k}^{+}} Ω_{m} \cdot n (r) b_{i} (r) b_{j}^{Τ} (r) d S$
		else if inflow face then
				for DOF of face i do
					$b o u n d a r y_r h s (i) = {\begin{matrix} \int_{\partial V_{k}^{-}} Ω_{m} \cdot n (r) b_{i} (r) ψ_{m}^{-} (r) d S \\ \int_{\partial V_{k}^{in}} Ω_{m} \cdot n (r) b_{i} (r) ψ_{m}^{i n} (r) d S \end{matrix}$
	cell_rhs(i)=source_rhs(i)+boundary_rhs(i)
	cell_matrix(i,j)=transport_matrix(i,j)+boundary_matrix(i,j)
	solution(i)=cell_matrix(i,j)^-1cell_rhs(i)
	for DOF of cell i do
		angular_solution[m](local_dof_indices[i])=solution(i)

The following two steps were employed to compute the local contributions of an individual cell and RHS: First, the integral is transformed from an actual cell $k$ into a unit/reference cell $\hat{k}$ . For example, the streaming operator is transformed into $\begin{array}{l} \int_{V_{k}} b^{Τ} (r) Ω_{m} \nabla b (r) d V \\ = \int_{V_{\hat{k}}} [J^{- 1} (\hat{r}) b^{Τ} (\hat{r})] [Ω_{m} J^{- 1} (\hat{r}) \hat{\nabla} b (\hat{r})] | \det J (\hat{r}) | d \hat{V}, \end{array}$ (15) where the hat indicates the reference coordinates, and $J (\hat{r})$ is the Jacobian of the mapping $r = F_{k} (\hat{r})$ . Second, this integral is approximated using the Gauss-Legendre quadrature. This yields the formula $\begin{array}{l} \int_{V_{k}} b^{Τ} (r) Ω_{m} \nabla b (r) d V \\ = \int_{V_{\hat{k}}} [J^{- 1} ({\hat{r}}_{q}) b^{Τ} ({\hat{r}}_{q})] [Ω_{m} J^{- 1} ({\hat{r}}_{q}) \hat{\nabla} b ({\hat{r}}_{q})] | \det J ({\hat{r}}_{q}) | w_{q}, \end{array}$ (16) where q is the index of the quadrature point, ${\hat{r}}_{q}$ is the location of the reference cell, and $w_{q}$ is the Gaussian quadrature weight. In this study, triangular and tetrahedral grids were used for the 2D and 3D cases, respectively.

2.3.2

Multithread parallel strategy

Parallel transport computation has gained popularity in the nuclear field because of the emergence of high-performance computing clusters [24]. How the processors schedule parallel jobs significantly affects the performance of code using parallel computing. The two basic techniques are spatial-domain decomposition (SDD) [25] and angular-domain decomposition (ADD) [26]. SDD techniques subdivide the spatial domain into subdomains, such that the common mesh-sweep method is performed concurrently in numerous subdomains at diverse angles. However, parallelizing the S_N spatial sweep is more challenging because of the upwind-downwind dependencies between the domain boundaries.

ADD techniques are frequently considered more expandable and require less memory [27]. They entail a mesh sweep along a similar quadrant and are the same for all angles in the chosen quadrature [28]. All operations are statically scheduled to the participating processors without negatively affecting the load balance and are ideally suited for personal computers (PCs). In shared memory machines, the traditional approach to parallelism is to break code down into threads. The deal.II leverages the Threading Building Blocks (TBB) library [29] as basic wrappers to build an object called "tasks.” TBB offers transferable interfaces across many different platforms and abstracts the specifics of run-state signals onto individual threads.

The entire procedure for the multithreaded parallel approach in the ThorSNIPE code is shown in Fig. 2. Both inner and outer iterations contain an angular loop. The number of CPU cores determines the total number of tasks in a PC system. The angular loops were divided into a specific number of subranges that were evenly distributed across the threads. A cell-based sweep sequence is used for each angular loop. A subrange is a task. The tasks are essentially independent during the iterations. They performed the calculations using the same mesh and equation structure as the respective quadrature set. However, the storage of angular fluxes is based on the local DOF indices calculated using sweep schemes. The calculation results for different subranges were collected to update the scattering source. Following the execution of the subranges, threads keep themselves busy by stealing complete portions of the subranges from other threads if they have finished their work. Thus, the code can fully exploit the system thread resource, which requires frequent interruptions by the operating system to allow other threads to execute on the available processor cores.

Fig. 2

Flowchart depicting multithreaded parallel strategy

Results and discussions

The multithreaded parallel upwind sweep algorithm is validated using three typical benchmark problems. The IAEA benchmark with five different regions was used to test the performances of the different sweep schemes. The Kobayashi-3i benchmark with cavities and bend ducts was adopted to evaluate the performance of the parallel computation for different mesh sizes and angular quadrate sets. The VENUS-3 benchmark was used to validate the algorithm’s capabilities for realistic transport calculations. In all benchmarks, convergence criteria are 10^-5 for the eigenvalue and 10^-4 for the average flux in each space interval.

3.1

IAEA benchmark

The IAEA benchmark is a 2D five-region calculation problem consisting of two large source zones and two large absorber zones surrounded by light water [30] and is designed for IAEA advanced reactor neutron transport computation. There is a strong nonuniformity in the fuel region, as shown in Fig. 3. The cross-sections are given in Table 1, and all boundary conditions are vacuum.

Cross-section of the IAEA 5-region fixed problem

Region	$\sum_{t} ({cm}^{- 1})$	$\sum_{f} ({cm}^{- 1})$	$\sum_{s} ({cm}^{- 1})$	$s ({cm}^{- 2} s^{- 1})$
R1	0.60	0.079	0.53	1.0
R2	0.48	0.0	0.20	0.0
R3	0.70	0.0	0.66	1.0
R4	0.65	0.043	0.50	0.0
R5	0.90	0.0	0.83	0.0

Fig. 3

(Color online) Geometry of the IAEA 5-region fixed problem

This benchmark problem is mainly used to analyze the effect of the element sweep order on the cell-based algorithm efficiency and assert the superiority of the upwind sweep scheme. Under the P_NT_N-S₈ angular approximation, comparisons of the numerical results for the fixed source and eigenvalue cases simulated with 2338 elements are presented in Tables 2 and 3, respectively. The relative deviations in the average flux were within 2.50% in the fixed-source case and within 5.00% in the eigenvalue case. All the numerical results show excellent agreement with the references, demonstrating the accuracy and validity of the algorithm and modeling approach for such strongly heterogeneous problems. The change in the element sweep order does not make a large difference to the results. This is because the transport sweep operations were performed individually on each element, and the results were added. Because matrix addition is commutative and these operators in the equation can be described as applying a sum of matrices, the orders of various components will not impact the final result.

Comparison of average flux of the IAEA benchmark

	Reference	Average flux (Relative errors) cm ^-2 s ^-1
	Reference	Conventional	Downwind	Upwind
R1	11.941	11.872 (0.58%)	11.874 (0.56%)	11.871 (0.59%)
R2	0.545	0.547 (0.37%)	0.547 (0.37%)	0.546 (0.18%)
R3	19.168	19.045 (0.64%)	19.051 (0.61%)	19.039 (0.67%)
R4	0.843	0.838 (0.60%)	0.840 (0.36%)	0.837 (0.71%)
R5	1.530	1.542 (0.78%)	1.553 (1.50%)	1.531 (0.07%)

Comparison of average flux and k_eff of the IAEA benchmark

	Reference	Average flux (Relative Errors) cm ^-2 s ^-1
	Reference	Conventional	Downwind	Upwind
R1	0.01686	0.016371(2.90%)	0.01661(1.18%)	0.016643(1.29%)
R2	0.000125	0.000131(4.80%)	0.000128(2.40%)	0.000128(2.40%)
R3	0.000041	0.000040(2.44%)	0.000041(0.00%)	0.000041(0.00%)
R4	0.000295	0.000295(0.00%)	0.000293(0.68%)	0.000293(0.68%)
R5	0.000791	0.000799(1.01%)	0.000788(0.38%)	0.000789(0.25%)
k_eff	1.0083	1.007076	1.008371	1.008305
pcm		-12	7	1

As shown in Fig. 4, the number of iterations was compared with the change in the number of elements in the fixed-source calculation. The numerical results indicated that the upwind sweep scheme had a stable convergence speed, regardless of the number of elements. The relationship between the number of iterations and elements grows exponentially in both types of sweep schemes. The primary functions on the edges are determined based on the upwind edges with regard to the streaming direction, and transport problems involve a directional flow of information. The upwind scheme fully uses the initial boundary conditions and provides spatial flux moments for the neighboring elements. The other sweep schemes increased the number of source and power iterations required to meet the convergence criteria in the subsequent matrix calculations to compensate for the lack of constraints. In addition, no overhead cost of assembling and solving the stiffness matrix in the current framework was observed for the three sweep schemes mentioned above. Therefore, the upwind sweep algorithm can save significant computational costs with fewer iterations and is suitable for DGFEM simulations.

Fig. 4

Number of iterations with the change in the number of elements for different sweep schemes

3.2

Kobayashi-3i benchmark

The Kobayashi-3i benchmark [31] is a bend duct-type problem with void regions in a highly absorbing medium, as shown in Fig. 5. The cross sections used in this benchmark problem are listed in Table 4. This problem was set up to analyze the features of spatial and angular discretization and demonstrate the parallel efficiency of the multithreaded strategy.

Cross-section of the Kobayashi-3i benchmark

Region	$\sum_{t} (c m^{- 1})$	$\sum_{s} (c m^{- 1})$	$s (c m^{- 2} s^{- 1})$
R1	0.1	0.05	1.0
R2	0.0001	0.00005	0.0
R3	0.1	0.05	0.0

Fig. 5

(Color online) Geometry of the Kobayashi-3i benchmark

Angular ﬂux distribution over full domain with mesh sizes of 5 cm, 4 cm, 3 cm, 2.5 cm, and 2 cm are modeled and simulated. The quality of the quadrature sets was quantified by taking the root mean square (RMS) of the relative scalar flux of all N point detectors. $RMS = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(\frac{ϕ_{c a l} - ϕ_{r e f}}{ϕ_{r e f}})}^{2}}$ (17) where the reference solutions $ϕ_{ref}$ were obtained using an analytical method [31]. The RMS of the relative errors in the scalar ﬂux using different quadrature sets versus the number of ordinates is presented in Fig. 6a. It should be noted that the various approximations in the DGFEM with different mesh sizes converged at variable speeds and started to flatten as the number of ordinates increased under S₂₀ angular approximations.

Fig. 6

(a) Root mean square of Scalar flux L2-error versus the order of quadrature sets for the Kobayashi-3i benchmark; (b) Speedup ratio versus the order of quadrature sets for the Kobayashi-3i benchmark

The ThorSNIPE code runs on a PC with 8 cores / 16 threads (one 11th Gen Intel(R) Core(TM) i7-11700F CPU running at 2.5 GHz) to demonstrate the performance and gain of the multithread technology. The speedup ratio with respect to serial mode versus the order of quadrature sets is presented in Fig. 6b, where the parallel implementation efficiency exceeds 100% when the S12 angular approximations are attained, and the mesh size is less than 3 cm. The comparative results show that the proposed method can achieve a better acceleration effect as the number of meshes increases. It can also be observed that the speedup ratio increases more slowly when the S20 angular approximations are attained. In this case, each thread has over 30 parallel tasks, and memory access becomes a bottleneck in the matrix solvers for partial differential equations. The numerical results are shown in Fig. 7, and the neutron fluxes at the key points are listed in Table 5. Compared with MCNP [31], the ThorSNIPE code effectively reduces angular discretization errors and fulfills the transport computational requirement to a certain degree.

Total flux comparison in the Kobayashi-3i benchmark

Case	Co-ordinates	MCNP (cm^-2s^-1)	ThorSNIPE (cm^-2s^-1)	Relative errors
3A	(5,5,5)	8.61578×10⁻⁰	8.61432×10⁻⁰	−0.02
	(5,15,5)	2.16130×10⁻⁰	2.13602×10⁻⁰	−1.17
	(5,25,5)	8.93784×10⁻¹	8.93551×10⁻¹	−0.03
	(5,35,5)	4.78052×10⁻¹	4.72878×10⁻¹	−1.08
	(5,45,5)	2.89424×10⁻¹	2.87933×10⁻¹	−0.52
	(5,55,5)	1.92698×10⁻¹	1.91669×10⁻¹	−0.53
	(5,65,5)	1.04982×10⁻¹	1.04268×10⁻¹	−0.68
	(5,75,5)	3.37544×10⁻²	3.43534×10⁻²	1.77
	(5,85,5)	1.08158×10⁻²	1.08608×10⁻²	0.42
	(5,95,5)	3.39632×10⁻³	3.47318×10⁻³	2.26
3B	(5,55,5)	1.92698×10⁻¹	1.91669×10⁻¹	−0.53
	(15,55,5)	6.72147×10⁻²	6.74001×10⁻²	0.28
	(25,55,5)	2.21799×10⁻²	2.20850×10⁻²	−0.43
	(35,55,5)	9.90646×10⁻³	9.74980×10⁻³	−1.58
	(45,55,5)	3.39066×10⁻³	3.43623×10⁻³	1.34
	(55,55,5)	1.05629×10⁻³	1.04113×10⁻³	−1.44
3C	(5,95,5)	3.44804×10⁻⁴	3.45115×10⁻⁴	0.09
	(15,95,5)	2.91825×10⁻⁴	2.98723×10⁻⁴	2.36
	(25,95,5)	2.05793×10⁻⁴	2.07437×10⁻⁴	0.80
	(35,95,5)	2.62086×10⁻⁴	2.63330×10⁻⁴	0.47
	(45,95,5)	1.05367×10⁻⁴	1.07620×10⁻⁴	2.14
	(55,95,5)	4.44962×10⁻⁵	4.46921×10⁻⁵	0.44

Fig. 7

(Color online) Mesh distribution and scalar flux of the Kobayashi-3i benchmark

3.3

VENUS-3 benchmark

The VENUS-3 benchmark was proposed to validate the capabilities of the calculation methodologies and cross-section libraries for predicting fluence rates in RPVs because it has a very clean structural geometry representing standard PWR pressure vessel conditions [32]. Fig. 8 illustrates the geometric configuration and material distribution of the VENUS-3 facility. The VENUS-3 model includes two types of fuel rods: the PLSA, core baffle, water reflector, barrel, Pyrex control rods, barrel, neutron pad, three types of grids, bottom support, reactor vessel, and jacket inner walls. Detailed descriptions and qualifications were obtained from the benchmark document. The benchmark arranges 386 dosimeters, with 244 ⁵⁸Ni(n, p)⁵⁸Co dosimeters, 104 ¹¹⁵In(n, n’)^115mIn dosimeters, and 38 ²⁷Al(n, a)²⁴Na dosimeters placed in 268 different spatial locations. The experimental rate sets are provided as a ratio of the calculated reaction rate to the average dosimeter cross-section, named the equivalent fission flux.

Fig. 8

(Color online) Geometry of the Venus-3 benchmark

The numerical results shown in Fig. 9 were modeled and simulated using the P_NT_N-S₈ angular discretization, 199 KASHIL-E70 neutron groups multigroup cross-sections [33], and 407819 tetrahedral elements. Fig. 10 shows a C/E comparison of the equivalent fission fluxes at the three detector positions. For the ¹¹⁵In(n, n’)^115mIn dosimeters, almost all the equivalent fission flux deviations are included within +/-10%, except for a detector position (No. 31 located in the core barrel with 0.7° angle) where the equivalent fission flux numerical value is affected by reflective boundary. For the ²⁷Al(n, a)²⁴Na dosimeters, 92% of the equivalent fission flux deviations were within +/-10%, and the remaining three dosimeters overestimated the corresponding experimental values by approximately 11.9%. For the ⁵⁸Ni(n, p)⁵⁸Co dosimeters, 95% of the equivalent fission flux deviations are limited within +/-10%, except for 11 detector positions where the equivalent fission flux numerical values overestimated the corresponding experimental values by about 10.1%-20.7%. They are concentrated at the junction of the PLSA region, outer baffle, and core barrel. The scalar flux distribution is not smooth because of the strong spatial nonuniformity of the geometry model, and higher requirements are proposed for local mesh refinement, which significantly degrades the simulated accuracy of these regions. The overall agreement between the numerical results from the ThorSNIPE code and the experimental results was very good, which shows that the ThorSNIPE code is highly reliable and stable for complex shielding calculations.

Fig. 9

(Color online) Mesh distribution and fast neutron fluxes of the VENUS-3

Fig. 10

C/E comparison of equivalent fission fluxes at detector positions: (a)⁵⁸Ni(n,p)⁵⁸Co from No. 1 to 139; (b)⁵⁸Ni(n,p)⁵⁸Co from No. 140 to 244; (c)¹¹⁵In(n,n’)^115mIn; (d)²⁷Al(n,a)²⁴Na

The numerical results [32] from TORT-3.2 with 1004295 hexahedral grids and 26 BUGLE-96 neutron energy groups were selected as comparative references. For the uniformly distributed 3.3% fuel regions, TORT-3.2 with hexahedral structured meshes achieved better results. The ThorSNIPE code with tetrahedral unstructured meshes can decrease the numerical oscillation for complex regions. For the ²⁷Al(n, a)²⁴Na dosimeters, the numerical results of ThorSNIPE tended to be lower than those of TORT-3.2 in the core barrel. This underestimation might have been caused by an error in the average dosimeter cross section.

In Fig. 11, the speedup ratios of the three quadrature sets with the P₃ Legendre expansion order and the four Legendre expansion orders with P_NT_N-S₈ angular discretization are shown. The test conditions were consistent with TORT-3.2 code. Overall, the behavior of the proposed algorithm was similar to that of the Kobayashi-3i benchmark. For P_NT_N-S₁₂, the acceleration ratio was 6.47 with 407819 elements. Comparing the results of different expansion orders, the P₅ expansion order has the steepest upward trend, and the maximum speedup ratio achieved was 6.98. The numerical results indicate that the proposed algorithm achieves a favorable speedup ratio as the scale of the mesh increases, as expected.

Fig. 11

Speedup ratio with the change in the (a) order of quadrature sets; (b) order of Legendre expansion

Conclusion

In this study, a multithreaded parallel upwind algorithm to solve the first-order Boltzmann neutron equation is proposed and implemented in the multigroup 2D/3D S_N transport code ThorSNIPE. The cell-based upwind sweep algorithm realizes angular flux transmission between neighboring meshes and achieves a stable solution by applying DGFEM spatial discretization. The multithread parallel strategy automatically manages the angular sweep stack of each thread and improves the quality and efficiency of the ThorSNIPE code.

The performance and accuracy of the proposed algorithm were tested using several typical benchmark problems. In the IAEA benchmark, the upwind sweep algorithm was proven to be the optimal method for DGFEM calculations and can save significant computational costs. The results show that the ThorSNIPE code agrees well with the reference, with a deviation of 1 pcm in eigenvalue and within 2.50% for ﬂux distribution. In the Kobayashi-3i benchmark, the results for angular ﬂux simulations over the full domain in different mesh sizes and quadrature set orders indicate that the multithreaded parallel algorithm has potential extensibility. In the VENUS-3 benchmark, 96% of the equivalent fission flux deviations are limited within +/-10%, except for 15 detector positions where the equivalent fission flux numerical values overestimated the corresponding experimental values by about 10.1%-20.7%. The maximum speedup ratio was 6.98 with 407819 elements. The results of the VENUS-3 benchmark show that this robust algorithm can play a reliable role in practical engineering applications and improve computational efficiency.

In the future, angular and algebraic multigrid methods planned to solve discretized equations to overcome the reduced convergence speed of the iteration method will be investigated. Furthermore, we plan to address more complex problems to assess the accuracy and effectiveness of the proposed method and present a rigorous mathematical convergence analysis of the new scheme.

References

E.E. Lewis, W.F. Miller, Computational methods of neutron transport. (American Nuclear Society, Illinois, 1981)

W.A. Rhoades, Jr.W.W. Engle,

New weighted-difference formulation for discrete-ordinates calculations. Tech. rep.

, Oak Ridge National Laboratory, CONF-771109-53, 1977