A non-invasive diagnostic method of cavity detuning based on a convolutional neural network

ACCELERATOR, RAY TECHNOLOGY AND APPLICATIONS

A non-invasive diagnostic method of cavity detuning based on a convolutional neural network

Liu-Yuan Zhou，

Hao Zha，

Jia-Ru Shi ，

Jia-Qi Qiu，

Chuan-Jing Wang，

Yun-Sheng Han，

Huai-Bi Chen

Nuclear Science and Techniques

Vol.33, No.7

Article number 94

Published in print Jul 2022

Available online 20 Jul 2022

DOI：10.1007/s41365-022-01069-z

137705

As modern accelerator technologies advance toward more compact sizes, conventional invasive diagnostic methods of cavity detuning introduce negligible interference in measurements and run the risk of harming structural surfaces. To overcome these difficulties, this study developed a non-invasive diagnostic method using knowledge of scattering parameters with a convolutional neural network and the interior point method. Meticulous construction and training of the neural network led to remarkable results on three typical acceleration structures: a 13-cell S-band standing-wave linac, a 12-cell X-band traveling-wave linac, and a 3-cell X-band RF gun. The trained networks significantly reduced the burden of the tuning process, freed researchers from tedious tuning tasks, and provided a new perspective for the tuning of side-coupling, semi-enclosed, and total-enclosed structures.

Cavity detuningConvolutional neural networkEquivalent circuit

Introduction

Frequency detuning is an ineluctable concern for accelerator cavities. Cavity frequency detuning can be caused by a drift in temperature and humidity during machining, deformation of thermal stress during welding, vibration caused by mechanical movement during transportation, and breakdown of RF pulses during conditioning [1]. Concentrating solely on the machining process, the derivative of TM₀₁₀ mode frequency of a cylindrical resonator with respect to its radius yields the following estimation formula: $d f [M H z] \approx - (2.9 \times 10^{7}) / r^{2} d r [μ m]$ . For X-band structures, an error of 1 μm in the cavity radius results in frequency detuning of approximately 1 MHz.

An accelerator can be considered equivalent to a coupled-cavity chain. The field amplitude and phase advance of each cavity are determined by the frequencies and couplings of the chain. To guarantee proper field distributions and phase advancements, the accelerator cavities must be tuned prior to operation [2]. Two conventional methods generally work in low-frequency bands and even in some X-band structures. One is the so-called SLAC-type method that diagnoses cavity detuning by inserting two inflexible conductive probes into the accelerator and measuring the resonant frequencies of each individual cavity and neighbor-coupled cavities one by one [3]. To measure the frequency of each resonant cavity accurately, the position of the probes must be adjusted carefully so that the resonant peak signal of the vector network analyzer no longer drifts with small movements of the probe. However, this is a time-consuming task. In addition to the growing risk of harming the soft inner surface, the measurement results become sensitive to the probe positions for cavities in higher-frequency bands because of their more compact size. The second method, initially verified by Khabiboulline et al. in 1995, is to measure the field distributions by bead-pulling to invert the correlative cavity frequencies [4]. Later, Shi et al. extended this method to the coupler tuning process [5], and Yang et al. simplified the equivalent circuit and solved the matrix coefficients directly from field distributions [6]. The bead-pulling method is an on-axis field-distribution measurement technique that uses a tensioned string to pull a dielectric bead, and calculates the on-axis field amplitude by measuring the frequency shift caused by the field perturbation of the bead. However, applying the bead-pulling technique to non-through structures, such as RF guns or side-coupling structures, is difficult and imprecise. Moreover, the bead-pull line and rotations of the bead may cause considerable frequency drift in higher frequency bands. Consequently, non-invasive diagnostic methods have attracted considerable attention. An emerging idea is based on the scattering parameters of the structures. Habel et al. were the first to compute the detuning of a superconducting linac using its dispersion parameters [7]. Owing to the lossless simplification caused by superconductivity, the equivalent circuit of the superconducting linac has the form of a tridiagonal matrix, which is easily solved using the conventional least-squares method. However, the situation is different at room temperature. Ni et al. devised a genetic algorithm to diagnose a normal conducting linac based on its scattering characteristics without the lossless simplification introduced by superconducting [8]. Complex numbers were added to the matrix. Ni et al. applied the nonlinear least-squares method with the NL2SOL operator to accelerate the convergence speed of the algorithm. Unfortunately, because of its strong reliance on the starting datasets, this method performed poorly on actual accelerators.

As a new discipline that is developing rapidly, artificial intelligence has shown increasing advantages in feature extraction and data modeling. At the junction of accelerators and artificial intelligence, groundbreaking work such as the computation of space charge force, beam line operation, and quench error diagnosis have been accomplished. Knowledge of Taylor maps was first reported in a polynomial neural network by Ivanov and Agapov for fast simulations of beam dynamics. The trained network approximated the dynamic beam system with perfect accuracy. With additional experimental data as the network input, they provided a means to tune network weights in real time [9]. Based on the same idea, Zou et al. successfully applied the asynchronous advantage actor–critic machine learning algorithm to the real-time tuning of the low-energy beam transport section (LEBT) of the Xi'an Proton Application Facility [10]. Considering the two-dimensional phase-space as a spatial image, Ren et al. simulated thousands of pairs of electron beams and X-ray power profiles to train convolutional neural networks, and used the trained networks to predict the X-ray power profiles with the input of electron beam phase spaces. This approach demonstrated a significant improvement over the traditional algorithm for a range of conditions [11]. Kain et al. used reinforcement learning algorithms that learned the optimal policy for a certain control problem and increased the efficiency of optimization algorithms for accelerator controls. They developed a continuous model-free reinforcement learning deep network with up to 16 degrees of freedom that can avoid the time-consuming exploration phase required for numerical optimizers after training [12]. With the knowledge of electromagnetic-based modeling, Rayas-Sánchez developed RF circuit designs using artificial neural networks. By designing the neural network training scheme to incorporate available knowledge, which can be obtained from an empirical equivalent circuit model based on quasi-static approximations, the knowledge-based neural network (KBNN) approach has become the most mature and automated technique for the development of neuromodels of RF circuits [13].

Following the footprints of pioneers, this study offers a novel diagnostic approach for cavity detuning based on convolutional neural networks (CNN) and the interior point method (IPM). The measured S₁₁ is used as input to the CNN to obtain a coarse estimate of cavity detuning, and the IPM optimizes the coarse model based on the derivative computed from formulaic circuit theory. The remainder of this paper is organized as follows: Section 2 explains the diagnosis steps, describes the construction of the convolution neural network, and computes the derivatives of scattering parameters with respect to the frequency of each cavity; Section 3 documents the experimental results for three typical structures, including an X-band traveling-wave linac, an S-band standing-wave linac, and a 3-cell X-band RF gun, and discusses the performance of the method with respect to errors in other cavity parameters and sampling noise; finally, Section 4 concludes with a brief summary.

Method

Owing to the advantage of easy accessibility, the scattering parameters (e.g., S₁₁) are considered to be the starting point of our non-invasive diagnostic method. In addition, for most linacs with a completed package of electron guns and dose conversion targets, the scattering parameters are the only physical quantities accessible that reflect the RF states of the linac. The common finishing error is approximately 10 μm to 20 μm, resulting in a frequency detuning of approximately 10 MHz to 20 MHz to 20 MHz for X-band structures (11.424 GHz) or 5 MHz to 10 MHz to 10 MHz for S-band structures (2.998 GHz); therefore, the diagnosis of cavity detuning is precisely a constrained optimization problem that involves finding a set of resonant frequencies ωi in the above constrained zone to minimize $| | S_{11, c} - S_{11, m} {| |}^{2}$ , where S_11,c represents the S₁₁ calculated by the diagnostic method, while S_11,m is the measured S₁₁ in reality. Although many traditional algorithms are available for solving optimization problems, the diagnosis of cavity detuning is strongly non-convex and nonlinear. Under these circumstances, traditional algorithms are highly likely to converge to a local optimum. Inspired by the applications of neural networks in the design of RF devices [14], the powerful fitting ability of artificial neural networks may provide a coarse estimate of cavity detuning close to the global optimum, and empower a traditional algorithm to cross the local optima in the constrained multidimensional space of the cavity diagnosis problem. Therefore, the diagnostic method for cavity detuning is divided into two steps. First, the scattering parameters measured from the waveguide of the structure coupler are normalized as the input of the neural network to obtain a coarse approximation of the cavity detuning. Starting from this coarse estimate, the IPM algorithm [15] based on an equivalent circuit model is applied to compute the fine detuning parameters. The role of the neural network is to circumvent the limitations of local optima in traditional optimization algorithms.

2.1

Construction of the Neural Network

The first step of the diagnostic method is to train a proper artificial neural network to predict the frequency detuning of each cavity based on the measured S₁₁. Hopfield neural networks (HNN), recurrent neural networks (RNN), and convolutional neural networks are three types of networks that are generally used in data regression prediction. Of these, convolutional neural networks (CNNs) have been widely used in the field of computer vision. For the diagnostic problem, the input of the network should be a vector whose elements correspond to the measured values S₁₁ from the input port, whereas the output of the network is a vector whose elements correspond to the frequency of each cavity. As shown in Fig. 3, the S₁₁ of an accelerator has significant characteristics that can be considered as a superposition of multiple Lorentzian resonance peaks, each of which contains information regarding the resonant frequency and Q-factor of the corresponding mode. Considering the S₁₁ signals as a one-dimensional picture, many of the experiences learned from using convolutional neural networks for computer vision problems can be applied to our diagnostic problem. Therefore, we intuitively choose the CNN to process the input S₁₁ information. We hope to obtain suitable convolution kernels through network training to express the intrinsic connection between the scattering parameters and detuning. After several cycles of modifications and debugging, the CNN, as shown in Fig. 1, comprises an input layer, a drop layer, a fully connected layer, an output layer with the mean square error as its loss function, and three convolutional units consisting of a convolution layer, normalization layer, rectified linear unit (ReLU), and pooling layer [16].

Fig. 3

(Color online) Vacuum parts, magnitude, and phase of S₁₁ of the test structures: (a) XT12; (b) SS13; (c) XG3

Fig. 1

(Color online) Construction of the convolutional neural network

The S₁₁ value of a normal conducting accelerator measured from its coupler is a complex array within a circle of radius 1. To take full advantage of the information in the real and imaginary parts of the measured S₁₁, the input layer of the CNN is a two-dimensional array composed of the magnitude and phase parts of the S₁₁ measured from the feeding coupler. Each dimension contains measurements of 2048 frequency points. The frequency band of the measurement varies with the accelerator design. To avoid the difference in the weight sizes of the two dimensions of the input data, both the magnitude and phase parts of the input S₁₁ are normalized to (0,1) using Eq. (1). $C N N_{input} = [\begin{array}{l} 1 - m a g (S_{11}) \\ p h a s e (S_{11}) / π \end{array}]$ (1) To accelerate the training and reduce the sensitivity to network initialization, the second layer is a standard batch normalization layer. This first calculates the mean and standard deviation of the input mini-batch, and normalizes the input mini-batch by subtracting the mean and dividing by the standard deviation. The input mini-batch is then added and scaled by a learnable offset factor β and learnable scale factor γ.

The third layer is the first convolutional layer. This applies 48 sliding convolutional filters to the layer input, computes the dot product of the learnable weights of the filters with the layer input, and then adds a learnable bias term to the result. To better capture the resonance peak characteristics of the input, the size of the convolution kernel was first set to ［64,1］. The length of the kernel is close to the full width at half maximum of the S₁₁ resonant peaks of normal conducting accelerators. However, the large size of the kernel was found to slow the training speed and result in overfitting during numerical tests. Therefore, the kernel size was reset to ［4,1］. With the following pooling process, the convolution kernel can still capture long-range characteristics.

Another batch normalization layer follows the convolution layer. The fifth layer is an activation layer, with the ReLU function as its activation function. The ReLU function performs a threshold operation on the input elements. Layer input values larger than 0 are set equal to each other, whereas values less than 0 are set to 0. This avoids the gradient explosion and gradient disappearance problems of network training, and reduces the overall computational cost of neural networks. The sixth layer is a pooling layer with a pooling region size of ［2,1］. The pooling layer divides the layer input into pooling regions and computes the maximum value of each region. It can reduce the dimensionality of the data and represent input information with higher-level features.

A convolutional layer, batch normalization layer, ReLU layer, and pooling layer together comprise a convolutional unit. Three of these units comprise the main body of the CNN. Through iterative testing, the number of convolution kernels for each unit was respectively tuned to 48, 24, and 12. To prevent the network from overfitting, a dropout layer is connected to the last convolutional unit. This randomly sets the layer input elements to zero, with a fixed probability of 0.1. A fully connected layer multiplies the output of the previous dropout layer by a learnable weight matrix, and then adds a learnable bias vector. The fully connected layer combines all of the features and reflects them to the frequency of each cavity. To avoid the appearance of a network weight with large fluctuations, the output of the network is normalized using Eq. (2), where ω_design and Δ ωi represent the design frequency and expected detuning range of each cavity, respectively. $C N N_{output} = [\frac{ω_{i} - ω_{design}}{Δ ω_{i}}]$ (2)

2.2

Model of the Equivalent Circuit

Based on the coarse outputs of the CNN, more accurate detuning can be iterated using the IPM algorithm. The gradient required for the algorithm can be calculated using the equivalent circuit model. As described by Wangler [16], an accelerator can be considered equivalent to a series of coupled circuits made up of lumped resistances, capacitances, and inductances, and the beam loading effect and RF power source can be equivalent to a voltage or current source. Each circuit obeys Kirchhoff's equation, and all the equations combine to form the equivalent matrix, as shown in Fig. 2 and Eqs. (3) and (4) for electric and magnetic coupling, respectively. $[\begin{matrix} e_{1} & - \frac{k_{1}}{2} \\ ⋱ \\ - \frac{k_{i - 1}}{2} & e_{i} & - \frac{k_{i}}{2} \\ ⋱ \\ - \frac{k_{n - 1}}{2} & e_{n} \end{matrix}] [\begin{matrix} X_{1} \\ b l u e ⋮ \\ X_{i} \\ ⋮ \\ X_{n} \end{matrix}] = [\begin{matrix} b l u e 0 \\ ⋮ \\ V_{e} \\ ⋮ \\ 0 \end{matrix}]$ (3) $[\begin{matrix} m_{1} & - \frac{k_{1}}{2} \\ ⋱ \\ - \frac{k_{i - 1}}{2} & m_{i} & - \frac{k_{i}}{2} \\ ⋱ \\ - \frac{k_{n - 1}}{2} & m_{n} \end{matrix}] [\begin{matrix} X_{1} \\ ⋮ \\ X_{i} \\ ⋮ \\ X_{n} \end{matrix}] = [\begin{matrix} 0 \\ ⋮ \\ V_{m} \\ ⋮ \\ 0 \end{matrix}]$ (4)

Fig. 2

Equivalent circuits for accelerators in different couplings: (a) Electric coupling; (b) magnetic coupling

In Eqs. (3) and (4), $e_{i} = 1 + j \frac{ω (1 + β)}{ω_{i} Q_{i}} - \frac{ω^{2}}{ω_{i}^{2}}$ , $m_{i} = 1 - j \frac{ω_{i} (1 + β)}{ω Q_{i}} - \frac{ω_{i}^{2}}{ω^{2}}$ , $V_{e} = 2 j ω \sqrt{C_{i}} \sqrt{P_{0} β R_{i}}$ , and $V_{m} = \frac{2 \sqrt{P_{0} β R_{i}}}{j ω \sqrt{L_{i}}}$ ; Qi is the quality factor of each cavity, and ki is the coupling between two adjacent cavities. Ri, Ci, and Li are the lumped parameters; Xi is related to the amplitude of the cavity field, i_e,i represents the beam current, and ω, P₀, and β represent the RF frequency, power, and coupling degree correlated with the feeding coupler (denoted as c).

Focusing on the circuit of the accelerator coupler, the coupler is equal to a voltage transformer with a ratio of n, where $n = \sqrt{\frac{β R_{c}}{Z_{c}}}$ and Z_c is the normalized impedance of the coupler waveguide. Assuming that the equivalent voltage of the RF power source is U_c, and the current flowing along the waveguide is I_c, one can derive the relationship between the current flowing along the waveguide and the field amplitude in the coupled cavity as $I_{c} = n X_{c} \sqrt{C_{c}}$ . Then the normalized impedance of the accelerator coupler can be derived as $Z = \frac{U_{c} - I_{c} Z_{c}}{Z_{c} I_{c}} = - 1 + j \frac{ω Q_{c}}{ω_{c} β X_{c}}$ . Because the scattering parameter S₁₁ satisfies $S_{11} = \frac{Z - 1}{Z + 1}$ , the value of S₁₁ measured from the accelerator coupler and its derivative with respect to the frequency of each cavity ωi can finally be derived as $S_{11, ele} = 1 - j \frac{2 ω β}{ω_{c} Q_{c}} X_{c}, S_{11, mag} = 1 + j \frac{2 ω_{c} β}{ω Q_{c}} X_{c}$ (5) $\begin{array}{l} \frac{d S_{11, ele}}{d ω_{i}} & = - j \frac{2 ω β}{Q_{c}} (\frac{d \frac{1}{ω_{c}}}{d ω_{i}} X_{c} + \frac{1}{ω_{c}} \frac{d X_{c}}{d ω_{i}}) \\ \frac{d S_{11, mag}}{d ω_{i}} & = j \frac{2 β}{ω Q_{c}} (\frac{d ω_{c}}{d ω_{i}} X_{c} + ω_{c} \frac{d X_{c}}{d ω_{i}}) \end{array}$ (6) where the subscripts “ele” and “mag” represent the electric and magnetic coupling structures, respectively. Abbreviate Eqs. (3) and (4) as $M X = b$ , the expression $\frac{d X_{c}}{d ω_{i}}$ is further derived as Eq. (7), where nji is the element in row j and column i of matrix $M^{- 1}$ and dmii is the diagonal element of matrix $M$ . $\frac{d X_{c}}{d ω_{i}} = - n_{j i} \frac{d m_{i i}}{d ω_{i}} X_{c}$ (7) Eqs. (5)–(7) provide the derivatives required for the IPM. For our diagnostic problem, that is, $argmin (| | S_{11, c} (ω_{i}) - S_{11, m} {| |}^{2})$ , such that $ω_{i} \leq ω_{i, upper}$ , $ω_{i} \geq ω_{i, lower}$ , i = 1, 2, ..., N, the logarithmic penalty function F(ωi) and its derivative with respect to the frequency of each cavity ωi used for the IPM can be written as $\begin{array}{l} F (ω_{i}) = & | | S_{11, c} (ω_{i}) - S_{11, m} {| |}^{2} \\ - μ^{(k)} \sum_{i = 1}^{N} (\ln (ω_{i} - ω_{i, lower})) \\ - μ^{(k)} \sum_{i = 1}^{N} (\ln (ω_{i, upper} - ω_{i})) \end{array}$ (8) $\begin{array}{l} \frac{d F (ω_{i})}{d ω_{i}} = & 2 (S_{11, c} - S_{11, m}) \frac{d S_{11, c}}{d ω_{i}} \\ - μ^{(k)} \sum_{i = 1}^{N} (\frac{1}{ω_{i, lower} - ω_{i}}) \\ - μ^{(k)} \sum_{i = 1}^{N} (\frac{1}{ω_{i, upper} - ω_{i}}) \end{array}$ (9) where k is the iteration number, μ^(k) is the penalty factor that satisfies μ⁽⁰⁾>μ⁽¹⁾>.....μ^(k), and $\lim_{k \to + \infty} μ^{(k)} = 0$ . The steps of the IPM are:

1. Choose μ⁽⁰⁾=1 as the starting penalty factor;

2. Choose a starting point within the frequency range as described above;

3. Optimize Eq. (8) with the derivatives calculated via Eqs. (5)–(9) until the maximal frequency error of the iteration is less than 1e-3, or the number of iterations reaches 100. Otherwise, repeat from step(2) and multiply μ⁽⁰⁾=1 by 0.1.

Combining the CNN with the IPM, the diagnostic steps can be summarized as follows:

1. Data Preparation: based on the structure designs, randomly generate groups of ωi in a constraint zone, then calculate the associated S₁₁ for each group using Eq. (5);

2. Network Training: divide the prepared data into a training set and validation set, and normalize the input and output arrays using Eqs. (1) and (2) to train the CNN illustrated in Fig. 1;

3. Coarse Estimation: for a linac to be diagnosed, measure and convert its S₁₁ to the input data form, and transmit it to the trained network to estimate the coarse detuning;

4. Fine Calculation: further optimize the residual error $| | S_{11, c} - S_{11, m} {| |}^{2}$ using the IPM algorithm with the gradients determined by Eq. (6), to precisely diagnose the cavity state.

Results and Discussion

Numerical studies were performed using three typical acceleration structures, including a 13-cell S-band standing-wave linac (SS13), a 12-cell X-band traveling-wave linac (XT12), and a 3-cell X-band RF gun (XG3). SS13 is a double-period axial coupling linac with an output beam energy of 6 MeV, XT12 is a short prototype of a constant impedance structure with 72 similar cavities used for high-gradient studies [17], and XG3 is a field-emission gun whose first cavity operates in TM₀₂ mode. Although the accelerator structures selected for the numerical experiments include only S-band and X-band structures, the experimental findings can be generalized to any frequency band, because of the frequency normalization process of the CNN output layer. The inputs to both the CNN and IPM are dimensionless data, and the diagnostic results of our algorithm are related to the relative sizes of the constrained zone and bandwidth of the input frequencies. Therefore, the results can be scaled to an arbitrary frequency band. The simulation results obtained using HFSS [18] for the vacuum part and the design S₁₁ of each structure are plotted in Fig. 3. The characteristics of both separated and heavily overlapped resonance peaks were considered in the experiments. 2¹³ sets of training data and 2¹⁰ sets of validation data were prepared for XT12 and XG3, and 2¹⁵ sets of training data were prepared for SS13 for better generalization performance. The network was trained using ADAM [19] with a mini-batch size of 2048 and a constant learning rate of 10^-4. A comparison between the S₁₁ of a random validation set and the recalculated S₁₁ from the diagnosis result is plotted in Fig. 4 for all three structures, and all are perfectly matched on the Smith charts. The training process for XG3 was the fastest to converge, whereas the training process for SS13 had the slowest convergence rate and largest diagnostic error. The root mean square (RMS) diagnostic error for the validation datasets for XG3, XT12, and SS13 are 160 Hz, 300 kHz, and 500 kHz, respectively. These results are related to the complexity of the accelerator structures. XG3 has the fewest cavities, whereas SS13 has the most, and the shunt impedance and quality factor vary greatly among the cavities of SS13. Further increasing the depth of the network may improve the diagnostic results for SS13; however, owing to limited computational resources, this has not yet been investigated further.

Fig. 4

(Color online) Smith chart of random validation samples of the three structures: (a) XT12; (b) SS13; (c) XG3.

Because SS13 has the highest complexity, a comparison experiment was performed with SS13 to examine the performance for three different scenarios: diagnosis using the IPM algorithm only, using the CNN only, and using both as in Sect. 2.2. We defined the diagnostic error as the subtraction of the diagnostic cavity frequency from the corresponding cavity frequency of the validation dataset. Fig. 5 shows the RMS diagnostic error histograms of each cavity obtained using the three different methods. The IPM produced the highest diagnostic error, with a maximum diagnostic error of IPM of 10 MHz. The error of the CNN has a Gaussian-like distribution, with an average value of 0 MHz and a standard deviation of approximately 1.2 MHz. The combination of the CNN and IPM achieved the best diagnostic performance. More than 95% of the combined results were accurate to within ±500 KHz, which satisfies most engineering requirements. This comparison proves that the coarse estimate of the cavity detuning computed by the CNN successfully helps the IPM algorithm overcome local optima. The detuning estimation of the neural network in the first step and the optimization of IPM in the latter step are complementary, and neither step can accomplish the diagnosis task on its own.

Fig. 5

(Color online) Diagnostic error histograms for each cavity of SS13. The blue squares are results from the IPM only, the red squares are results from the CNN only, and the green squares are results from the combination of the CNN and IPM

Figure 6a shows a statistical histogram of the root mean square error of the SS13 diagnosis. Comparing the diagnostic accuracy of the neural network for the different cavities in Fig. 6a, it can be seen that cavity No.7, which is directly connected to the coupler, has the smallest RMS diagnostic error, whereas cavity No.1, which is the farthest away from the coupler, has the largest RMS error. The RMS error increases with the distance between the corresponding cavity and the coupler. This can be explained by coupled S-parameter calculation (CSC) theory [20]: an accelerator can be considered equivalent to a topological network constructed from a series of dual-port or triple-port units. Considering a dual-port unit as shown in Fig. 7, the sorted scattering matrix of the unit can be derived using Eq. (10), where ai and bi are the incident and reflected waves of each port, respectively, S_11,N-1 denotes the total scattering measured from the iris between the (N-1)^th cavity and the N^th cavity, and S₁₁, S₁₂, S₂₁, and S₂₂ are the scattering parameters of the isolated N^th cavity. According to CSC theory, the transformation matrices P and F from the intrinsic wave vector to the canonical wave vector can be derived using Eq. (11), and S_11,N of the whole accelerator can then be written as an iteration formula as shown in Eq. (12). It can be seen from Eq. (12) that the S_11,N of the whole accelerator is most comparable to the S₁₁ of the individual coupler. The S_11,N of the cavity farthest from the coupler is multiplied by an iteration factor of less than 1 in each topological layer. The effect of the S_11,N of the farthest cavity on the S_11,N of the whole accelerator becomes negligible. Therefore, the neural network will strengthen the feature extraction of the coupler cavity and surrounding cavities, while the features of the cavities far from the coupler are encrypted layer by layer and become more difficult to be learned by the network. Thus, for structures with additional couplers, such as traveling-wave accelerators, further addition of the scattering parameters from different couplers may improve the network performance. $\begin{array}{l} [\begin{matrix} a_{1, N - 1} \\ a_{1} \\ a_{2} \end{matrix}] & = [\begin{array}{l} S_{11, N - 1} \\ S_{11} & S_{12} \\ S_{21} & S_{22} \end{array}] [\begin{matrix} a_{1, N - 1} \\ a_{2} \\ a_{1} \end{matrix}] \\ = S [\begin{matrix} a_{1, N - 1} \\ a_{2} \\ a_{1} \end{matrix}] \end{array}$ (10) $\begin{matrix} [\begin{matrix} a_{1, N - 1} \\ a_{1} \\ a_{2} \end{matrix}] = [\begin{matrix} 1 \\ 0 & 1 \\ 1 & 0 \end{matrix}] [\begin{matrix} a_{1, N - 1} \\ a_{2} \\ a_{1} \end{matrix}] = P [\begin{matrix} a_{1, N - 1} \\ a_{2} \\ a_{1} \end{matrix}] \\ [\begin{matrix} a_{1, N - 1} \\ a_{1} \\ a_{2} \end{matrix}] = [\begin{matrix} 0 & 1 \\ 1 & 0 \\ 1 & 0 & 0 \end{matrix}] [\begin{matrix} b_{1, N - 1} \\ b_{1} \\ b_{2} \end{matrix}] = F [\begin{matrix} b_{1, N - 1} \\ b_{1} \\ b_{2} \end{matrix}] \\ G = P^{- 1} FSP = [\begin{matrix} G_{11} & G_{12} \\ G_{21} & G_{22} \end{matrix}] \end{matrix}$ (11) $\begin{array}{l} S_{11, N} & = G_{2,1} {(1 - G_{1,1})}^{- 1} G_{1,2} + G_{2,2} \\ = S_{11} + \frac{(S_{11, N - 1} - 1) S_{12} S_{21}}{S_{22} + S_{11, N - 1} - S_{11, N - 1} S_{22}} \end{array}$ (12) To study the influence of sampling noise and errors of β, Qi, and ki, additional numerical experiments were conducted on SS13. Random Δ β generated within ［-1%, 1%］, random Δ Qi generated within ［-1.5%, 1.5%］, and random Δ ki generated within ［-5%, 5%］ were added to the validation data sets. Figure 6b shows the diagnostic RMS error for each cavity. When incorrect values of β, Qi, and ki were introduced, the RMS error of the network estimation increased. This is because the network was trained with only frequency detuning data, and the network forcibly attributed the contributions of β, Qi, ki to cavity frequency detuning. However, the IPM can still converge to a better diagnosis value based on the network output. In addition, a sampling noise of Gaussian distribution with a signal-to-noise ratio of 60 dB was added to the validation datasets, while the network was still trained with noiseless datasets. Figure 6c shows the RMS diagnostic errors. As shown by the red bins in Fig. 6c, the output error has a similar distribution between the non-noisy dataset and the noise-added dataset. This implies that the network is partially resistant to sampling noise, probably because of the pooling layers. The pooling layers serve to downsample the data to reduce the computational cost of the neural network. During this process, the data can be considered to have passed through a low-pass filter, which has the effect of filtering out noise.

Fig. 6

(Color online) Diagnostic RMS error for each cavity with different validation sets: (a) with frequency detuning only; (b) with incorrect values of β, Qi, ki; (c) with data sampling noise.

Fig. 7

(Color online) Topology of the accelerator cavity chain

A well-trained network for XG3 was used to guide the tuning procedure of a real gun, as shown in Fig. 8a. A testing cathode with a hole in its center was installed on the gun for bead-pulling. The field distributions of the three different working modes were measured to apply the diagnostic method based on field distributions. Diagnosis results from our method were compared with those based on the field distribution. As shown in Fig. 8c, the frequency diagnosis differences between the two methods for the first and third cavities are 190 KHz and 230 KHz, respectively, whereas the difference for the second cavity is 3.2 MHz. This is because the field distributions in the second cavity were almost zero in π/2 mode, resulting in a singular matrix in the bead-pulling method. A slight difference or jittering of the pulling string may introduce significant measurement errors into the field distribution of the second cavity. Therefore, it is difficult for a diagnostic method based on field distributions to compute the frequency of the second cavity accurately. Under the guidance of our new method, the π mode frequency of the gun was tuned to 11.424 GHz, and the on-axis electric field distribution was consistent with the design value, as shown in Fig. 8e. The new diagnostic method completely satisfies the tuning requirements of the gun cavity. Another network was trained to help tune a standing-wave linac (SS13-2). SS13-2 is similar to SS13, but differs in the coupler position. The coupler position of SS13-2 was at the 9^th cavity, whereas that of SS13 was at the 7^th. The comparison shown in Fig. 8d was made between the diagnosis results of our method and those of the probe insertion method. The difference between the two was less than 500 KHz. Figure 8d shows that the results of our method fluctuate around those obtained by the probe insertion method. This can be explained in the same way as the difference in the numerical experiments with deliberately incorrect values of β, Qi, and ki. The actual values of these parameters for the tested SS13-2 were slightly different from the design values, adding a frequency shift to the diagnostic results. Including the parameter values in the IPM step may eliminate this frequency shift. Figure 8f shows a comparison of S_11,m and S_11,c for SS13-2. Owing to these factors, there is also a small deviation between the two lines. However, for our engineering needs, this error was negligible. In summary, these two results prove that the combined CNN and IPM diagnostic method is in good agreement with both conventional methods, and has a major advantage in that the detuning data can be obtained in almost real time after training. After training with prepared data sets, one can obtain the accelerator cavity detuning information immediately, while other methods require time for field measurements or probe adjustments. As the tuning process needs to be repeated several times, the conventional methods may take several hours of processing time in total, but the combined CNN and IPM method can reduce the processing time to a few minutes.

Fig. 8

(Color online) Testing with real structures: (a) XG3 gun; (b) SS13-2 linac; (c) benchmark comparison between the CNN & IPM method and the field method; (d) benchmark comparison between the CNN & IPM method and the probe-insertion method; (e) on-axis electric field distribution of XG13; (f) comparison between the S_11,c and S_11,m of SS13-2.

Conclusion

In conclusion, we developed a non-invasive diagnostic method for cavity detuning. This approach first trains a convolutional neural network to estimate the frequency detuning of each cavity with the input of the measured S₁₁, and then uses this estimation value as the starting point for the IPM algorithm to further optimize the divergence between the calculated S₁₁ using equivalent circuits and the measured S₁₁. The convolutional neural network has a total of 15 layers. The 3^rd, 7^th, and 11^th layers are convolutional layers with 48, 24, and 12 kernels, respectively. The network was trained using simulation datasets generated from the equivalent circuits. Numeric experiments were successfully completed on three different acceleration structures, including a 13-cell S-band standing-wave linac, a 12-cell X-band traveling-wave linac, and a 3-cell X-band RF gun. Owing to the topological nature of the structure, the diagnostic accuracy of this method decreases as the distance from the cavity to the coupler increases. This method is robust to sampling noise owing to the use of pooling layers. The well-trained network also aided in tuning real structures. The diagnostic results of this method were in good agreement with those of conventional methods.

This approach provides a fresh perspective on the diagnosis of high-frequency bands, long cavity chains, and encapsulated accelerators. After hours of pre-training, detuning information can be obtained in situ simply by measuring the S₁₁ parameters. We anticipate that this method will significantly reduce the burden of the tuning process and provide a new approach for monitoring the status of encapsulated linacs. In future work, we will continue to tune the structure of the network and attempt to include other accelerator parameters in the diagnostic algorithm to enable the diagnosis of more complex acceleration structures.

References

[1]

X. Pu, H. Hou, Y. Wang et al.,

Frequency sensitivity of the passive third harmonic superconducting cavity for ssrf

. Nucl. Sci. Tech. 31, 176 (2020). doi: 10.1007/s41365-020-0732-x