Machine learning-based analyses for total ionizing dose effects in bipolar junction transistors

NUCLEAR ELECTRONICS AND INSTRUMENTATION

Machine learning-based analyses for total ionizing dose effects in bipolar junction transistors

Bai-Chuan Wang，

Meng-Tong Qiu，

Wei Chen，

Chen-Hui Wang，

Chuan-Xiang Tang

Nuclear Science and Techniques

Vol.33, No.10

Article number 131

Published in print Oct 2022

Available online 12 Oct 2022

DOI：10.1007/s41365-022-01107-w

73007

Machine learning methods have proven to be powerful in various research fields. In this paper, we show that research on radiation effects could benefit from such methods and present a machine learning-based scientific discovery approach. The total ionizing dose (TID) effects usually cause gain degradation of bipolar junction transistors (BJTs), leading to functional failures of bipolar integrated circuits. Currently, many experiments of TID effects on BJTs have been conducted at different laboratories worldwide, producing a large amount of experimental data, which provides a wealth of information. However, it is difficult to utilize these data effectively. In this study, we proposed a new artificial neural network (ANN) approach to analyze the experimental data of TID effects on BJTs. An ANN model was built and trained using data collected from different experiments. The results indicate that the proposed ANN model has advantages in capturing nonlinear correlations and predicting the data. The trained ANN model suggests that the TID hardness of a BJT tends to increase with base current I_B0. A possible cause for this finding was analyzed and confirmed through irradiation experiments.

Total ionizing dose effectsBipolar junction transistorArtificial neural networkMachine learningRadiation effects

Introduction

Bipolar junction transistors (BJTs) are widely used as analog components in electronic systems [1]. Unfortunately, BJTs exhibit total ionizing dose (TID) effects, which induce current gain degradation in radiation environments [2]. Such TID effects are complex and influenced by many factors. A large number of studies have been conducted on the various dependences and underlying mechanics of TID effects [2-6] on bipolar devices, as well as on other types of devices [7-10]. Generally, to investigate the correlations between the TID hardness and the parameters of BJTs, a series of devices should be prepared for comparison. This method is effective in excluding the influence of other parameters, but is costly. Many laboratories worldwide have been conducting TID experiments for many years, producing a large amount of experimental data. Such data are growing rapidly and are readily available. The correlations between the TID hardness and the parameters of the BJTs may be extracted from these data. However, such highly nonlinear data cannot be accurately described using multiple linear regression. Therefore, a new generation of computational tools is needed to assist researchers in extracting valuable information from the growing volumes of data.

Machine learning methods can capture correlations in data and make predictions, thereby providing an alternative approach for scientific investigations [11]. These methods, which exhibit huge potential for generalized suitability for scientific research, have been applied to many different research areas for decades [12]. Examples include genetic research [11], antibiotic discoveries [13], material design [14-16], and quantum entanglement simulations [17, 18]. Recently, machine learning methods have also been successfully applied in an increasing number of areas in nuclear physics research [19], including nuclear theories [20-22], experimental methods [23-25], accelerator science [26, 27], and nuclear data processing [28, 29]. Neural networks are powerful methods that have proven to be effective for nonlinear function fitting and accurate prediction [30, 31].

In this paper, we demonstrate that research on radiation effects could benefit from machine learning methods and present a machine learning-based scientific discovery approach for the TID effects of BJTs. Specifically, an artificial neural network (ANN) model was built and trained using the data collected from different experiments. The results indicate that the trained neural network model has significant advantages over the traditional multiple linear regression in capturing nonlinear correlations and predicting data. The trained ANN model suggests that the TID hardness of a BJT tends to increase with base current I_B0. A possible mechanism for this phenomenon was analyzed and experimentally verified. Our work indicates that machine learning methods have advantages in discovering correlations and predicting experimental data. The proposed approach could be a powerful new tool to discover correlations from experimental datasets and make predictions for radiation effects.

Methods

2.1

Datasets

TID degradation is related to many parameters such as bias, layout, dose, dose rate, passivation layer, and hydrogen content [32-34]. A dataset containing all the related parameters is perfect for analysis. However, the collected historical experimental data do not contain all related parameters. Nevertheless, a dataset containing several parameters may still be helpful in assisting scientific discovery by providing useful information, as described herein.

The experimental dataset to be analyzed was collected from 10 articles [1, 35-43]. The Gummel-plot data of 12 bipolar devices (eight NPNs and four LPNPs) were obtained from the literature. The Gummel plot records the collector and base current values at different base-emitter voltages. Experimental data of bipolar devices irradiated by cobalt-60 gamma sources with different doses and dose rates at room temperature were collected. The Gummel plots before and after irradiation were measured. Degradations of BJTs with different types, V_BE, Beta₀, I_B0, doses, and dose rates were extracted to create the dataset depicted in Fig. 1. The dataset contains 565 radiation response data points for two types of BJTs: NPN and LPNP. V_BE is the base-emitter voltage, whereas Beta₀ and I_B0 are the corresponding common emitter current gain and base current before irradiation, respectively. The maximum dose was 1,902 krad(Si). The dose rate was in the range of 0.0015-312 rad(Si)/s. The absorbed doses and dose rates of silicon were used in this study. The degradation of a BJT is represented by the change in the base current I_B/I_B0, where I_B0 and I_B are the base currents before and after irradiation, respectively. The total ionizing dose effects mainly cause an increase in the base current I_B at a fixed base-emitter voltage V_BE, whereas the collector current I_C remains roughly constant [37, 39]. Therefore, I_B/I_B0 represents gain degradation of the BJT.

Fig. 1

Overview of the dataset: degradation log(I_B/I_B0) versus (a) type, (b) bias condition V_BE, (c) common emitter current gain Beta₀, (d) base current I_B0, (e) dose, and (f) dose rate.

In addition to the Gummel-plot data, the correlations between degradations and doses or dose rates, obtained via experimentation, have been exhibited in the literature [35, 40, 41]. These additional data were adopted as a test set to verify the generalization of the trained ANN model. None of the experimental data in the test set were used to train the ANN.

2.2

Artificial neural network model

We focused on three-layer ANNs, which have proven to be powerful in studies using relatively small datasets [44, 45]. The ANNs were implemented using a deep learning framework called Keras [46]. The inputs of the ANN are the type, ǀV_BEǀ, Beta₀, I_B0, dose, and dose rate. The first neuron of the input layer represents the type of a BJT. Specifically, ‘0’ stands for the NPN type and ‘1’ stands for the LPNP type. Each input parameter is normalized to a mean of 0 and standard deviation of 1 over the training dataset. The output of the ANN was one neuron, corresponding to the change in the base current in the logarithm log(I_B/I_B0). The rectified linear unit (ReLU) [47] was adopted as the activation function for the first two layers, while the linear activation function was adopted for the last layer. The dropout algorithm with the rate of 0.1 was employed for the first two layers. This algorithm is helpful in improving the generalization performance of ANNs [48]. The loss function is the mean squared error. Adam with the default learning rate of 0.001 was employed as the optimizer to update the weights during the training process [49]. The batch size was set to 32 for stable convergence during the training [50].

A suitable number of neurons and training epochs may vary significantly for different problems. Fivefold cross-validation (CV) was utilized to identify the suitable number of neurons in the two hidden layers and training epochs for fitting our dataset. In this technique, the dataset is randomly divided into five parts; Five ANNs were trained and iteratively evaluated. Each ANN was trained using four parts of the dataset and validated using the remaining part. The CV technique was helpful in reducing the influence of training instability. The performance of the ANNs was evaluated using the mean absolute error (MAE), computed as: $M A E = \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} | / N,$ (1) where y and $\hat{y}$ represent the experimental and predicted values for log(I_B/I_B0), respectively, and N represents the number of data points.

The performance of the ANNs with different training epochs and neuron numbers is shown in Fig. 2. The training and validation MAEs are the average values of all five-fold ANNs. The maximum number of neurons to be investigated was limited to 20 × 20 considering the small size of the dataset to be fitted. As can be seen in Fig. 2a, the MAE on the validation set converged after 1,000 epochs of training for the different sizes of ANNs evaluated in our study. Therefore, the number of epochs for training was set to 1,000. Figure 2b presents the influence of the number of neurons in the two hidden layers on the valid MAE. The MAE on the validation set decreased with an increase in neurons. The distribution of the MAE is symmetrical about the diagonal. Therefore, it is a good choice to set the number of neurons in the two layers equal when the total number of neurons is fixed. Figure 2c shows the MAEs of the training and validation sets for ANNs with the same number of neurons in the two layers. As the number of neurons increased, the MAEs of the training and validation sets decreased. However, the MAE of the validation set decreased more slowly than that of the training set. As shown in Fig. 2d, the difference between the two increased. A large difference indicates that the model can fit the training set accurately, but cannot accurately predict the validation set. This indicates poor generalization performance owing to overfitting [51]. To accurately capture the correlations between data, a good model should have a small MAE and a good generalization performance. In our study, we set the number of neurons in each layer to 7. Figure 3 displays a schematic of the proposed ANN model. Applying more neurons could reduce the MAE slightly, but simultaneously reduces the generalization performance. We preferred fewer neurons to avoid capturing fake correlations caused by overfitting.

Fig. 2

(Color online) Performance of ANNs with different training epochs and numbers of neurons. (a) Valid MAE for different training epochs. The valid MAE converges after 1,000 epochs for ANNs with different numbers of neurons. (b) Valid MAE for ANNs with different numbers of neurons in two hidden layers. (c) Training MAE and valid MAE for ANNs with the same number of neurons in the two layers. (d) Differences between training MAE and valid MAE. For (c) and (d), the data points in the figures are the average values of 10 random runs, while their standard deviations are marked with shaded areas.

Fig. 3

(Color online) Schematic of the proposed ANN model for our dataset.

Results

3.1

Predictions

With the help of CV technology, five ANNs were trained using different parts of the dataset. The predictions made by these ANNs were slightly different. We used the average value as the prediction result. The performance of the trained ANNs regarding the test set is expressed in Fig. 4. Figures 4a-g show that the degradations at different doses and dose rates predicted by the trained ANN model agree with the experimental results for the test set. Figure 4h presents a summary of the predictions for the test set. The MAE between the predictions and experimental data was 0.101.

Fig. 4

Performance of the trained ANN model on the test set: (a)-(g) Experimental results and predictions of degradations at different doses and dose rates.(h) Summary of the predictions for the test set. Error bars represent the standard deviations of different CV parts.

Figure 5 presents a comparison of the learning curves for the three different models. The performance of our proposed ANN model was compared with those of the average and multiple linear regression models. The average model considers the average value of the training set samples as the prediction results. As the number of training samples increased, the MAE on the test set decreased significantly for the ANN and multiple linear regression models. When trained with 550 samples, the average MAE for 10 random runs of the ANN model was the smallest, which was nearly half of that of the multiple linear regression model. The results show that the proposed ANN model performs better than the traditional multiple linear regression model in predicting experimental results.

Fig. 5

(Color online) Comparison of learning curves for average model, multiple linear regression, and ANN. The data points in the figures are the average values of 10 random runs, while their standard deviations are marked with shaded areas.

3.2

Correlations

The trained ANN model fits the experimental data well, indicating that the trained ANN model successfully learned the correlations between the degradation and input parameters in the dataset. These correlations were investigated by feeding the trained ANN model with specific inputs. Figure 6 displays the correlations between the degradation and each parameter generated by the trained ANN model. Specifically, dose = 100 krad(Si), dose rate = 100 rad(Si)/s, ǀV_BEǀ = 0.6 V, Beta₀ = 100, and I_B0 = 1×10^-8 A were selected as the reference parameter values. The data points in the figures are the average values of the different CV parts, and the shaded areas represent their standard deviations. The correlations captured via multiple linear regression are shown in Fig. 7 for comparison.

Fig. 6

(Color online) Correlations captured by the trained ANN model between the degradation and each parameter. (a)-(e) results from the trained ANN model with the reference parameters: dose = 100 krad(Si), dose rate = 100 rad(Si)/s, ǀV_BEǀ = 0.6 V, Beta₀ = 100, and I_B0 = 1x10^-8 A. Shaded areas represent the standard deviations of different CV parts.

Fig. 7

Correlations captured by the multiple linear regression model between the degradation and each parameter. Shaded areas represent the standard deviations of different CV parts.

Most correlations captured by the trained ANN model were consistent with classical theories, whereas a novel correlation was found. These correlations are nonlinear and cannot be accurately described using multiple linear regression model. In Fig. 6a, the degradation trend with dose is shown. It is apparent that the degradation of the BJT increases with increasing dose before saturation. In particular, degradation of the LPNP-type BJT was more severe and saturated earlier. These phenomena are consistent with those reported in previous studies [35]. As depicted in Fig. 6b, the degradation at a low dose rate is more severe, implying that enhanced low dose rate sensitivity (ELDRS) effects are captured in the ANN model. Fig. 6c indicates the consistency with previous research results that degradation is more severe at lower bias voltages [1]. In Fig. 6d, the captured correlation between degradation and Beta₀ is unclear. It seems that for the NPN-type device, the degradation increased with Beta₀. However, the degradation changed little with Beta₀ for the LPNP-type. In Fig. 6e, it is noteworthy that the trained ANN model captured the correlation that degradation of a BJT will lessen as the pristine base current I_B0 increases, that is, the TID hardness of a BJT tends to increase with base current I_B0. To the best of our knowledge, this hypothesis has rarely been reported in literature.

The proposed hypothesis shown in Fig. 6e agrees with the statistical analyses. The pristine base current I_B0 and V_BE are related. To exclude the influence of V_BE, the experimental data of the same bias (V_BE = 0.6 V) were selected to analyze the correlation between I_B0 and the degradation log(I_B/I_B0). The correlation coefficient r was calculated from the following equation as -0.344: $r = \frac{\sum (x - \bar{x}) (y - \bar{y})}{\sqrt{\sum {(x - \bar{x})}^{2}} \sqrt{\sum {(y - \bar{y})}^{2}}},$ (2) where x and y represent I_B0 and log(I_B/I_B0), respectively; and $\bar{x}$ and $\bar{y}$ represent the average values. The above value indicates a weak correlation. The two-tailed p-value obtained from the Student's t-test was 0.010, indicating that the negative correlation coefficient was statistically significant.

3.3

Mechanism analyses

The possible causes of the proposed hypothesis could be analyzed using classical TID mechanism theories. For an ideal NPN BJT, the pristine base current can typically be approximated as: $I_{B0} = A q \frac{D_{p} n_{i}^{2}}{N_{E} W_{E}} exp (\frac{V_{BE} - I_{E} R_{s}}{V_{T}}),$ (3) where A is the active emitter area, q is the magnitude of the electronic charge, D_P is the diffusion constant, N_E is the emitter doping, V_T is the thermal voltage, W_E is the effective emitter width, I_E is the emitter current, and R_s is the series resistance between the emitter and base [52, 53].

Ionizing radiation increases the base current owing to the fixed positive oxide-trapped charge (N_OT) and interface traps (N_IT) [53]. The base current after gamma irradiation is often modeled as [34, 53, 54]: $I_{B} = I_{B0} + Δ I_{R} .$ (4)

The increment in the base current mainly contains two parts of the electron-hole recombination current: $Δ I_{R} = Δ I_{R-SCR} + Δ I_{R-NBS} .$ (5)

One part was located above the emitter-base space-charge region (SCR). The other was located above the neutral-base surface (NBS). The excess base current in the SCR can be expressed as: $Δ I_{R-SCR} = \frac{P_{E} Δ s q V_{T} π n_{i}}{2 E_{m}} exp (\frac{V_{BE} - I_{E} R_{s}}{2 V_{T}}),$ (6) where P_E is the emitter perimeter, Δs is the surface recombination velocity, n_i is the intrinsic carrier concentration of silicon, and E_m is the maximum electric field in the SCR [53]. The excess base current in the NBS is expressed as: $Δ I_{R-NBS} = \frac{P_{E} Δ s q W_{B} n_{i}^{2}}{2 n_{s}} [exp (\frac{V_{BE} - I_{E} R_{s}}{V_{T}}) - 1],$ (7) where n_s is the majority carrier concentration at the surface and W_B is the width from the emitter to collector [53, 55]. The degradation can be written as: $I_{B} / I_{B0} =1+Δ I_{R} / I_{B0} .$ (8)

Notably, the total excess base current $Δ I_{R}$ , which is the sum of $Δ I_{R-SCR}$ and $Δ I_{R-NBS}$ , is proportional to the perimeter of the emitter P_E. The pristine base current I_B0 is proportional to the area of the emitter A. When other parameters of the BJTs are approximately the same, the degradation should be proportional to the perimeter-to-area ratio, that is: $Δ I_{R} / I_{B0} \propto P_{E} / A .$ (9)

The perimeter increased as the area of emitter A increased, but the perimeter-to-area ratio (P_E/A) tended to decrease. Consequently, $I_{B} / I_{B0}$ decreased. Therefore, the difference in the perimeter-to-area ratio P_E/A may be one of the possible mechanisms leading to the phenomenon in which a BJT with a larger base current I_B0 tends to have a smaller degradation $I_{B} / I_{B0}$ .

3.4

Irradiation experiments

Irradiation experiments were conducted to verify the influence of the perimeter-to-area ratio on degradation. The irradiation experiments were performed using a cobalt-60 gamma source at room temperature. Five kinds of NPN BJTs with different emitter sizes were irradiated, as shown in Fig. 8a. The other parameters of the BJTs are approximately the same. The devices were manufactured by Analog Foundries, based on a 6-inch bipolar process platform. Accounting for the uncertainties caused by manufacturing process fluctuations, three devices of each kind were used for irradiation. The total doses were 40, 80, 120, and 160 krad(Si), and the dose rate was 0.685 rad(Si)/s in the experiments. All BJTs were grounded during irradiation. The Gummel-plot data were measured using a semiconductor analyzer at room temperature before and after irradiation. V_E was swept from -0.4 to -1.2 V, maintaining V_B = V_C = 0 V. The delay between irradiation and every measurement was within 2 h. Typical Gummel-plot data at different doses are presented in Fig. 8b.

Fig. 8

(Color online) Results of the irradiation experiments. (a) Emitter sizes of BJTs used in experiments. (b) Gummel plots for one of the 18 × 18 μm² emitter BJTs at different doses. (c) Degradation versus perimeter-to-area ratio at different doses (V_BE = 0.6 V). Error bars are the standard deviations of the measurement results. Dashed lines are the linear fitting results of the mean values. (d) Degradation versus pristine base current at different doses (V_BE = 0.6 V).

As shown in Fig. 8c, the degradation $I_{B} / I_{B0}$ increases linearly with the perimeter-to-area ratio P_E/A, which agrees with Eqs. (8) and (9). The measured correlations between the degradation and pristine base current at different doses, shown in Fig. 8d, indicate that a BJT with a larger base current I_B0 tends to have a smaller degradation $I_{B} / I_{B0}$ , which is consistent with the ANN result. Experiments confirmed that the perimeter-to-area ratio could be one of the causes of this phenomenon.

Discussions

4.1

Predictive ability

It should be noted that the trained ANN cannot guarantee the degradation prediction for a device that does not have any data in the training set. This is because of the systematic deviations between different devices and experiments. An ANN may overestimate or underestimate a device that has never been observed. The trained ANN is more suitable for predicting missing values, such as predicting degradations at other doses or dose rates in Sect. 3. More specifically, the ANN was trained to predict missing values in the training process. The dataset included experimental data from different devices. During training, the dataset was shuffled and divided into training and validation sets. The validation set can be considered the missing value of the training set. The ANN model was trained using the training set and verified by predicting the validation set. Therefore, the performance of the validation set correlates with ANN’s ability to predict missing values.

As shown in Fig. 9a, the ANN failed to predict the data from the irradiation experiments in Fig. 8. The degradation was overestimated and the MAE was as large as 0.45. However, after the 130 krad Gummel-plot data of the 18× 18 μm² NPN were included in the dataset, the predictions of the newly trained ANN model on irradiation experiments could be significantly improved. As shown in Fig. 9b, the MAE decreased to 0.14. It is noteworthy that only the degradations at 130 krad of one device were included in the dataset, but the predictions of 9 × 9, 9 × 18, 18 × 18, 18 × 36, and 18 × 72 μm² NPNs at 40, 80, 120, and 160 krad were improved. This implies that the ANN model learned the correlations from the other devices in the dataset. When more data are included, predictions can be further improved. Fig. 9c shows the predictions after 50 krad Gummel-plot data of the 18 × 72 μm² NPN were added to the dataset. MAE reduced to 0.09. Note that all of the experimental data to be predicted in Fig. 9 have never been used for training.

Fig. 9

Performance of predictions on experimental data of Fig.8. (a) ANN trained with original dataset. (b)ANN trained with dataset adding 130 krad Gummel-plot data (18 × 18 μm² NPN). (c) ANN trained with dataset adding 130 krad Gummel-plot data (18 × 18 μm² NPN) and 50 krad Gummel-plot data (18 × 72 μm² NPN).

4.2

Systematic deviations between different devices

There may be systematic deviations between different experiments, such as differences between the radiation sources and measurement instruments. Moreover, devices from different manufacturers may also exhibit systematic deviations. Systematic deviations limit the ability to predict new devices. If we could account for systematic deviations, the predictions of the new devices could be improved.

We propose a simple method for approximately characterizing the system deviations between devices. We introduced a factor F to represent the deviation of one device from the average value of the other devices. To evaluate the factor F, a multiple linear regression model was trained using the dataset, excluding the device to be evaluated. The trained model represents the average of the other devices. The predictions of the device to be evaluated are denoted by $\hat{y}$ , and the actual degradations are denoted by $y$ . Factor F was calculated by linear fitting as follows: $\hat{y} = F \cdot y .$ (10)

Specifically, the fitting coefficient F can be calculated by: $F = \sum_{i = 1}^{N} (y_{i} \cdot {\hat{y}}_{i}) / \sum_{i = 1}^{N} y_{i}^{2} .$ (11)

If the factor F of a device is larger than 1, the device is likely to have less degradation than the other devices in the dataset. The calculated F for the 12 devices in the dataset ranged from 0.36 to 1.48. The calculated F value was set as a new feature of the dataset to account for systematic deviations. A new ANN model with the same structure as the previous model was trained, and the predictions of our irradiation experiments are shown in Fig. 10. It is clear that this model can accurately predict the experimental results. However, the factor F of the device is required when predicting. This implies that one should have some device data to compute factor F in advance. In our case, the factor F was calculated with degradations at 130 krad.

Fig. 10

(Color online) Predictions on irradiation experiments of ANN trained with a new feature factor F. (a) Predictions versus experimental results in Fig.8. (b) Degradations versus pristine base current at different doses (V_BE = 0.6 V). Shaded areas represent the standard deviations of different CV parts.

Conclusion

We presented a machine learning-based scientific discovery approach for radiation effect research. It is shown that the machine learning method could be a powerful new tool to discover correlations from experimental datasets and make predictions. An ANN model was built and trained using the dataset collected from different experiments. The results indicate that the proposed ANN model has advantages over multiple linear regression in capturing most nonlinear correlations and predicting data. Most correlations captured by the trained ANN model were consistent with classical theories, whereas a new correlation was found. The trained ANN model suggests that the TID hardness of a BJT tends to increase with base current I_B0. Further mechanistic analyses and experiments confirmed that the differences in the emitter perimeter-to-area ratio of the BJTs could be one of the causes of this phenomenon. The ANN model presented in this study was trained using a relatively small and simple dataset. This is expected to be more powerful if a larger and more detailed dataset is provided. We plan to conduct ANN analyses on the historical data from several laboratories and build models for other kinds of devices.

References:

[1]

R. Li, C. Wang, W. Chen et al.,

Synergistic effects of TID and ATREE in vertical NPN bipolar transistor

. IEEE T. Nucl. Sci. 66, 1566-1573 (2019). doi: 10.1109/TNS.2019.2909690