A machine learning approach to TCAD model calibration for MOSFET

NUCLEAR ELECTRONICS AND INSTRUMENTATION

A machine learning approach to TCAD model calibration for MOSFET

Bai-Chuan Wang，

Chuan-Xiang Tang，

Meng-Tong Qiu，

Wei Chen，

Tan Wang，

Jing-Yan Xu，

Li-Li Ding

Nuclear Science and Techniques

Vol.34, No.12

Article number 192

Published in print Dec 2023

Available online 07 Dec 2023

DOI：10.1007/s41365-023-01340-x

971021

Machine learning-based surrogate models have significant advantages in terms of computing efficiency. In this paper, we present a pilot study on fast calibration using machine learning techniques. Technology computer-aided design (TCAD) is a powerful simulation tool for electronic devices. This simulation tool has been widely used in the research of radiation effects. However, calibration of TCAD models is time-consuming. In this study, we introduce a fast calibration approach for TCAD model calibration of metal-oxide-semiconductor field-effect transistors (MOSFETs). This approach utilized a machine learning-based surrogate model that was several orders of magnitude faster than the original TCAD simulation. The desired calibration results were obtained within several seconds. In this study, a fundamental model containing 26 parameters is introduced to represent the typical structure of a MOSFET. Classifications were developed to improve the efficiency of the training sample generation. Feature selection techniques were employed to identify important parameters. A surrogate model consisting of a classifier and a regressor was built. A calibration procedure based on the surrogate model was proposed and tested with three calibration goals. Our work demonstrates the feasibility of machine learning-based fast model calibrations for MOSFET. In addition, this study shows that these machine learning techniques learn patterns and correlations from data instead of employing domain expertise. This indicates that machine learning could be an alternative research approach to complement classical physics-based research.

Machine learningRadiation effectsSurrogate modelTCAD model calibration

Introduction

Technology computer-aided design (TCAD) is a powerful simulation tool for electronic devices. This simulation tool has been widely used in the research of radiation effects [1-4]. To obtain reliable simulation results, TCAD models should be calibrated in advance [5-8]. Calibration of TCAD models is essential for all simulation studies [9-11]. Generally, the structure and doping parameters of TCAD models are adjusted to make the simulated current-voltage curves consistent with the process design kit (PDK) results, while some parameters should be in accordance with the PDK information [12-14]. Calibration is time consuming because TCAD simulations are slow and need to be performed iteratively. The calibration procedure typically requires several weeks or more for manual adjustments.

Evolutionary methods, such as genetic algorithms, are possible approaches for automatic calibration [15-18]. However, such methods require a cold start for each task. Even a small change in the calibration goal requires repeating all the simulations in the evolution process.

Currently, machine learning methods provide another possible approach for fast calibration. Once trained, the machine learning-based surrogate model can serve as a quick tool for a variety of tasks within its scope. This method has been adopted to accelerate time-consuming scientific simulations in many research fields, such as partial differential equation solving [19], nanostructure design [20], thermal metamaterial design [21], and diode failure troubleshooting [22]. The trained machine learning-based surrogate models are typically several orders of magnitude faster than the original scientific simulators. However, to the best of our knowledge, machine learning-based TCAD model calibration for metal-oxide-semiconductor field-effect transistors (MOSFETs) has not been reported in the literature. We believe that machine learning-based fast tools will be widely adopted in the future. In this paper, we propose a machine learning approach for fast calibration of the TCAD model and provide a corresponding calibration tool for MOSFET using Python script. MOSFETs are basic components of modern CMOS integrated circuits. We took N-type MOSFET (NMOS) as an example to demonstrate the potential of machine learning methods for fast model calibration.

Three issues need to be addressed when calibrating MOSFETs using machine learning approaches. First, the machine learning-based surrogate model should be widely applicable. Otherwise, every single task requires considerable time to build a new model, and the speed advantage is negated. Second, the validity of the parameter combinations for MOSFETs should be determined to avoid invalid calculations. Third, the important MOSFET parameters for calibration should be identified and focused on.

In our approach, we developed possible solutions to these issues. First, to make the proposed surrogate model more widely applicable, a fundamental model of typical planar MOSFET was introduced. Second, classifiers were introduced to address the validity issues of the parameter combinations. Finally, important parameters were identified using the random forest technique. Their influence on the current-voltage curves was analyzed. A calibration tool based on Python script was developed and tested with different calibration goals for different PDKs. The results indicated that the proposed tool could achieve the desired calibration parameters within several seconds.

We demonstrated a machine learning approach to TCAD model calibration for MOSFET and demonstrated its great advantage in terms of speed. We believe that this approach will become popular in solving similar problems in the near future. In addition, we demonstrated that this data-driven approach could be a new method of identifying valid parameter combinations and important parameters without the help of domain expertise. These results could be referenced for further physical analyses.

TCAD simulations and datasets

2.1

TCAD model for MOS transistors

A fundamental TCAD model containing 26 parameters was introduced to represent the typical structure of MOSFETs. As depicted in Fig. 1, a common planar MOSFET includes doping distributions in various regions. The calibration results of this TCAD model can be referenced for further detailed calibrations or directly applied in preliminary simulations of radiation effects. The TCAD model includes source/drain doping (SD), low-doped drain (LDD), halo doping, and channel doping. Specifically, channel doping consists of three parts: the doping concentration is homogeneous in the middle part and Gaussian in the top and bottom parts. The Gaussian peaks of the top and bottom parts are located at their respective boundaries with the middle part. Their peak values are equal to the concentration in the middle part. The 26 parameters listed in Table 1 are used to describe the MOSFET model. These parameters control the key dimensions and doping concentration. Six of these can be obtained from the PDK information: gateLen, gateWidth, tox, sd_peak, sd_depth, and V_dd. During calibration, these parameters should be assigned according to the PDK information, and the other 20 parameters need to be adjusted.

Parameters of the TCAD model to be calibrated.

Number	Parameter	Description
1	workF	Work function of gate material
2	sub_const	Doping concentration of substrate
3	well_const	Doping concentration of well
4	ch_const	Doping concentration of channel middle part
5	ch_depth_a	Depth of channel top part
6	ch_depth_b	Depth of channel bottom part
7	ch_factor_a	Gaussian factor of channel top part
8	ch_factor_b	Gaussian factor of channel bottom part
9	ch_position_a	Beginning position of channel top part
10	ch_depth_const	Depth of channel middle part
11	ldd_peak	Peak doping concentration of LDD
12	ldd_depth	Depth of LDD
13	ldd_factor	Gaussian factor of LDD
14	sd_peak	Peak doping concentration of source and drain
15	sd_depth	Depth of source and drain
16	wellc_peak	Peak doping concentration of well contact
17	halo_peak	Peak doping concentration of halo
18	halo_depth	Depth of halo
19	halo_factor	Gaussian factor of halo
20	halo_position_z	Beginning position of halo
21	sd_position	X position of source or drain position
22	halo_position_x	X position of halo position
23	gateLen	Gate length
24	gateWidth	Gate Width
25	tox	Thickness of the gate dielectric
26	V_d	Drain voltage

Fig. 1

Schematic of the TCAD model to be calibrated.

2.2

TCAD simulations and calibration goals

The physical models listed in Table 2 were used in TCAD simulations. The calibration targets were the I_d–V_g curves provided by the PDK. Fig. 2 shows typical I_d–V_g curves for an NMOS transistor. V_d was set to a low voltage (V_d = 0.1 V) and the working voltage. Both the linear and semi-logarithmic scales of the curves should be well calibrated. This is because the conduction characteristics of I_d–V_g curves are easy to check on a linear scale, whereas the subthreshold characteristics are easy to check on a semi-log scale.

Physical models used in TCAD simulations.

Physical model	Value
Hydrodynamic	eTemperature
Mobility	DopingDep, HighFieldSaturation, CarrierCarrierScattering, Enormal
EffectiveIntrinsicDensity	BandGapNarrowing
Recombination	SRH, Auger, Avalanche
Temperature	300

Fig. 2

(Color online) Target I_d–V_g curves calculated by the PDK. (a) Target I_d–V_g curves on linear scale. (b) Target I_d–V_g curves on semi-log scale.

Three metrics were extracted to describe the I_d–V_g curves: threshold voltage V_t, transconductance G_m [23], and subthreshold slope S [24]. The threshold voltage V_t was defined using the constant-current method [25]. Specifically, in this study, V_t refers to the gate voltage when the drain current reaches 1 × 10^-7 A. Transconductance G_m refers to the ratio of the drain current to the gate voltage above the threshold voltage. Subthreshold slope S is computed as ${(d (log I_{d}) / d V_{g})}^{- 1}$ , which is the reciprocal of the slope of the I_d–V_g curves in the subthreshold region with the semi-log scale. S^-1 was computed in this study. The metrics at low and working drain voltages V_d were extracted as the calibration goals.

2.3

Techniques in dataset generation

Generally, the training set is created using simulations with randomly generated parameters [20, 21]. However, two problems should be solved in our case. First, the MOSFET may not function when the parameters are randomly generated. A large number of invalid samples would waste considerable computing time. Second, the number of parameters for the MOSFET model is large, indicating that many training samples are needed to build a machine learning model.

If the validity of parameter combinations could be identified before computation, invalid calculations could be avoided. On the other hand, if the importance of every parameter could be determined, excluding the unimportant parameters could reduce the dimensions of the search space, thereby decreasing the number of required training samples.

Generally, device experts are needed to identify the validity of parameter combinations or key parameters. In this study, machine learning methods were utilized instead of domain expertise. Specifically, classification models were trained to identify the validity of the parameters, and a random forest-based feature selection technique was utilized to determine the important parameters.

2.3.1

Classification for valid combinations of parameters

A TCAD model with randomly generated parameters may not function well or have a threshold voltage outside the range of concern, which cannot produce effective samples. We randomly generated 1000 TCAD samples and found that only 321 were valid for creating the dataset. This indicates that the efficiency of training sample generation was only approximately 32%. To improve efficiency, we trained the classification models to predict the validity of the parameters. The parameters had two possible validity values: positive and negative. Positive values indicated that the related parameters were valid for creating the dataset. Only the parameters predicted as valid were sent to the TCAD simulation.

Five types of popular classifiers were trained and compared using the aforementioned 1000-sample dataset. The classifiers include gradient boosting (GB) [26], multilayer perceptron (MLP) [27], random forest (RF) [28], support vector (SV) [29], and stochastic gradient descent (SGD) classifiers [30]. They were implemented using the Scikit-learn Python library [31], which provides off-the-shelf machine learning methods. To make the different features of the dataset comparable in value, the logarithm of the doping concentrations was used, and each feature was normalized to a mean of 0 and standard deviation of 1. The dataset was randomly split into training and test sets in proportions of 90% and 10%, respectively. Generally, the influence of class imbalance starts to become significant when the minority class is less than 10% [32, 33]. Therefore, class imbalance was not considered in this study.

The confusion matrix, shown in Fig. 3, is widely used to assess the performance of classifiers [34]. Some valuable metrics can be calculated from the confusion matrix, such as receiver operating characteristic (ROC) curves [34] and the area under the ROC curve (AUC) [35]. In the confusion matrix, true positives (TPs) refer to correctly predicted positives, true negatives (TNs) refer to correctly predicted negatives, false positives (FPs) refer to negatives which incorrectly classified as positives, and false negatives (FNs) refer to positives which incorrectly classified as negatives.

Fig. 3

Schematic of confusion matrix.

The AUC was used to measure the performance of the classifiers. AUC refers to the area under the ROC curve. Figure 4 shows the ROC curves and corresponding AUCs for different classifiers. Five-fold cross-validation (CV) was used to generate the ROC curves. In this technique, the training set was randomly divided into five parts. The classifiers were trained by four parts and iteratively validated using the remaining part. The abscissa of the ROC curve is the false positive rate, and the ordinate is the true positive rate. The true positive rate (also called recall) refers to the proportion of correctly predicted positives among all positives, and is computed as $tpr = T P / (T P + F N)$ . The false positive rate refers to the proportion of incorrectly predicted negatives among all negatives, which is computed as $fpr = F P / (F P + T N)$ . A good classifier has an ROC curve that lies at the upper left of the figure, whereas the dashed straight line in Fig. 4 corresponds to the results of a random classifier. The AUC is 1 for an ideal classifier and 0.5 for a random classifier. The performance of the classifiers is influenced by the choice of their hyperparameter values. The suitable hyperparameter values vary for different tasks. The optimal hyperparameter values for the different classifiers after manual tuning are listed in Table 3. Figure 4 shows that the GB classifier had the greatest AUC, indicating that it is more suitable for our task than the other classifiers.

Optimal hyperparameter values for each classifier. The key hyperparameters are listed, whereas the others are equal to the default Scikit-learn values.

Classifier	Hyper parameter	Value
GB	n_estimators	200
	max_depth	4
MLP	hidden_layer_size	100
	batch_size	16
RF	n_estimators	200
SV	kernel	RBF
	gamma	0.03
	C	2
SGD	alpha	0.03
	penalty	L1

Fig. 4

(Color online) Performance of five classifiers. The dashed line corresponds to the results of a random classifier.

For a given classifier, the precision can be improved by specifying a higher threshold. However, the recall decreases simultaneously. Precision refers to the proportion of correct predictions among all positive predictions, which is computed as $T P / (T P + F P)$ . The recall rate was the same as the previously defined true positive rate. The correlation between the precision and recall for our GB classifier is shown in Fig. 5. The threshold controls the balance between the two metrics. High-precision classifiers are helpful in improving the efficiency of training sample generation.

Fig. 5

Precision versus recall curve for GB classifier.

A new GB classifier was trained using the entire training set and tested using the test set. The confusion matrix for the test set is presented in Table 4. A precision of 78.1% and recall of 73.5% were achieved.

Confusion matrix of GB classifier on the test set. Precision: 78.1%. Recall: 73.5%.

Predicted class	True class		Total
	Positive	Negative
Positive	25	7	32
Negative	9	59	68
Total	34	66	100

The final GB classifier was trained using the entire 1000-sample dataset. We randomly generated 40,000 samples, and the GB classifier predicted 11,908 samples to be valid. We calculated these samples using TCAD and found that 9,430 samples were valid for creating the dataset. This indicates that the efficiency of the generation of samples is improved to approximately 79.2%, which is approximately equal to the precision of the GB classifier on the test set. The slight improvement in precision may be due to an increase in the number of training samples. These results show that the proposed classifier functions well and saves a significant amount of time in generating the dataset.

2.3.2

Feature selection

We utilized a feature selection technique to identify the important parameters and decrease the dimensions of the TCAD model. The number of parameters for the TCAD model was 26, indicating that a large number of training samples were required to build a machine learning model. Excluding unimportant parameters could reduce the dimensions of the search space, thereby decreasing the required number of training samples.

Random forest regression has proven to be useful in feature selection. This method is helpful in shedding light on the important parameters that govern the output [33]. Random forest refers to a combination of decision trees. Each decision tree in the forest is trained using randomly selected subspaces of the feature space [36]. The final prediction result is obtained by combining the outputs of every tree in the forest [37, 38]. Random forest regression can provide the importance of each input feature. The importance of each feature is typically measured by its influence on the reduction of Gini impurity when training the trees [39]. In our study, the sensitive parameters of the TCAD model were identified using random forest regression. The six parameters specified by the PDK were maintained, and the other 20 parameters were investigated.

The 9430-sample dataset was used to train the RF regression models to evaluate the importance of each parameter for different regression targets, namely V_t, G_m, and S^-1. The RF regressions were performed using the Scikit-learn Python library. The importance values of each parameter obtained for the different regression targets are depicted in Fig. 6, 7, 8 and 9. The sum of the importance of each regression target is 1. A higher importance value indicates that the related parameter is regarded as more important by the RF regression. The parameter importance values for different regression targets differed slightly. For threshold voltage, workF and well_const were the most important parameters. For transconductance, well_const and ldd_depth were the most important parameters. For subthreshold slope, ldd_factor and ldd_depth were the most important parameters. The importance of the different regression targets is summarized in Fig. 9. The important parameters were approximately in the same group for the different regression targets. Therefore, the ten most important parameters in Fig. 9, together with six PDK-specified parameters, were selected to build the final model.

Fig. 6

Parameter importance for threshold voltage regression.

Fig. 7

Parameter importance for transconductance regression.

Fig. 8

Parameter importance for subthreshold slope regression.

Fig. 9

(Color online) Comparison of parameter importance for different regression targets.

The importance of the parameters obtained by the machine learning method is consistent with the results of classical semiconductor theory. In semiconductor theory, when V_d is small, the threshold voltage V_t can typically be approximated as follows [40]: $\begin{matrix} V_{t} = ϕ_{m} - χ - E_{g} / (2 q) \\ + k T ln (N_{a} / n_{i}) / q \\ + \sqrt{4 ε_{Si} N_{a} k T ln (N_{a} / n_{i})} / C_{ox}, \end{matrix}$ (1) where $ϕ_{m}$ is the work function of the gate material, which is equal to the parameter workF in our model. $χ$ is the electron affinity, $E_{g}$ is the bandgap, q is the electronic charge, k is Boltzmann's constant, T is the absolute temperature, $N_{a}$ is the acceptor impurity density, $n_{i}$ is the intrinsic carrier density, $ε_{Si}$ is the silicon permittivity, and $C_{ox}$ is the oxide capacitance per unit area.

Clearly, the threshold voltage V_t is related to the work function of the gate material and the doping density below the dielectric. Therefore, it is consistent with classical semiconductor theories that the workF and well_const parameters are important for controlling the threshold voltage V_t.

However, noting that Eq. (1) applies to simple MOSFETs without channels or LDD doping, it is difficult to obtain an analytical expression for a more complex practical MOSFET. For complex structures, theoretical analysis can qualitatively identify the parameters that may have an impact; however, their impact patterns and importance are difficult to determine. As shown in Fig. 1, MOSFETs typically have channel, LDD, and halo doping, making it difficult to obtain an analytical expression for V_t. Many parameters such as well_const, ch_position_a, ch_depth_a, and ch_const influence the doping distribution below the dielectric. Physics-based analyses can preliminarily determine whether these parameters are related to V_t, but it is difficult to determine which parameter is more important or has greater influence.

For transconductance, the RF regression suggested that well_const was the most important parameter. A possible reason for this is that the transconductance is strongly influenced by the electron mobility [40], and the electron mobility is influenced by the doping concentration [40, 41].

For the subthreshold slope, the RF regression suggested that LDD doping was the most sensitive part. The subthreshold slope is associated with the ability of the gate voltage to control the surface potential [42]. LDD doping has a significant influence on the electric field near the drain [43-45]. Ldd_factor and ldd_depth are the most important parameters, which is consistent with semiconductor theory. In addition, some other experimental results have confirmed the significant influence of LDD doping on the subthreshold slope [46].

The importance of the parameters obtained by the machine learning method is consistent with theoretical analyses of semiconductors. In addition, for complex devices in which it is not easy to obtain an analytical expression, machine learning methods are helpful in determining the key parameters. These results can be used as a reference for further physical research.

Machine learning-based calibration framework

We built a fast 16-dimension NMOS calibration framework using a machine learning-based surrogate model. The surrogate model was several orders of magnitude faster than the original TCAD simulation, and the desired calibration parameters were obtained within several seconds.

3.1

Surrogate model

As shown in Fig. 10, the proposed surrogate model relates the proposed surrogate model relates the 16-dimension NMOS parameters with the metrics of the I_d–V_g curves: V_t, G_m, and S^-1. Considering that the TCAD model may not function with certain parameters, a classifier was utilized before the typical regressor to judge whether the input parameters were valid. If the input parameters were valid, the related metrics were calculated using the regressor. Therefore, for any set of input parameters, the surrogate model could predict the related metrics or its failure in function.

Fig. 10

Schematic of surrogate model.

We built a new 16-dimensional dataset to train the surrogate model. Classification techniques were employed again to improve the efficiency of sample generation. This procedure is similar to that described in Section 2.3.1. First, we random generated 1398 samples and found that 685 samples were valid for creating the dataset, which accounted for 49.0% of the total calculations. Second, an MLP classifier with a hidden_layer_size of 100 and batch_size of 16 was trained using the 1398-sample dataset. A precision of 83.3% was achieved. Third, we randomly generated 16200 samples, and the MLP classifiers predicted 7710 of them to be valid. We calculated these samples using TCAD and found that 6473 samples were valid. These accounted for 84.0% of the total calculations. The results show again that the classifier successfully improved the efficiency of generating valid samples and saved time.

The 6473 valid samples, together with the previous 685 valid samples, were used to train the regressor of the surrogate model. The training was performed using a Python library called Keras [47], which was developed for deep learning. We built a three-layer artificial neural network (ANN). This type of machine learning model has been widely adopted in many scientific studies [48-51]. The inputs were the 16-dimensional parameters. Considering the differences in magnitude between the dimensions, we normalized the inputs and outputs. The inputs were normalized using the following two steps: First, the doping parameters were processed using a logarithm to limit the changing range. Second, all parameters were normalized to a mean of 0 and standard deviation of 1 over the training set. The outputs were three neurons, corresponding to V_t, G_m, and S^-1. Each regression target was normalized between 0 and 1 using a linear transformation. The activation functions were softmax [52], rectified linear unit (ReLU) [53], and linear functions for the first to the last layers, respectively. 128 neurons were used in the first and second layers. The loss function was the mean square error. Adam [54] with a default learning rate was adopted as the optimizer for training the ANN. A batch size of 32 was used for stable training [55]. 80 training epochs were used in this study. The dataset was divided into portions of 80% and 20%; 80% of the dataset was used to train the ANN and the remaining part was used for test. The performance on the training and test sets is shown in Fig. 11. The results indicate that the ANN could predict the simulation results for the test set. The mean absolute errors were 0.016, 0.011, and 0.023 for V_t, G_m, and S^-1, respectively, in the test set.

Fig. 11

(Color online) Performance of the ANN on the training and test sets. The abscissa is the true value of TCAD simulations while the ordinate is the predicted value.

The first 1398 calculated samples, together with the 7710 calculated samples, were used to train the classifier of the surrogate model. The dataset was randomly split into training and test sets in proportions of 80% and 20%, respectively. The training procedure was similar to the previous procedures. The ROC curves shown in Fig. 12 suggest that the MLP classifier with a hidden_layer_size of 100 and batch_size of 16 performed the best. The confusion matrix for the test set is presented in Table 5. A precision of 91.0% and recall of 93.8% were achieved for the surrogate model.

Confusion matrix of MLP classifier on the test set. Precision: 91.0%. Recall: 93.8%.

Predicted class	True class		Total
	Positive	Negative
Positive	1350	134	1484
Negative	90	248	338
Total	1440	382	1822

Fig. 12

(Color online) ROC curves for the surrogate model.

3.2

Calibration procedure

A surrogate model was utilized for NMOS calibration. The calibration goal is described by six metrics: V_t, G_m, and S^-1 at a low voltage (V_d=0.1 V) and at the working voltage (V_d=V_dd). For a perfect set of calibrated NMOS parameters, all six metrics should be identical to their target values. The dimensions and doping of the NMOS are described by 15 parameters, with the exception of V_d. The parameter V_d is used to describe the drain voltage, which is flexible for a given NMOS. Different V_ds values are related to different metrics of I_d–V_g curves for the NMOS. The calibration procedure is illustrated in Fig. 13. To check the given NMOS structure described by the first 15 parameters, V_d was set to low and working voltages respectively. The corresponding metrics were predicted and their differences from the goals were calculated. If the difference was sufficiently small, this set of parameters was selected as one calibration result.

Fig. 13

Proposed calibration procedure.

When calibrating a PDK, six parameters are specified by the PDK, and the other ten parameters are searched to obtain I_d–V_g curve metrics similar to the goals. The PDK specifies the values of gateLen, gateWidth, tox, sd_peak, sd_depth, and V_dd. For the other 10 parameters, we randomly generated values to search the best parameter values.

A Python script was written to implement the calibration procedure. First, the script generates a large number of random parameters. Second, the script utilizes the surrogate models to identify valid sets of parameters and calculate their related metrics at low voltage and working voltage. Third, the script compares the metrics with the goals and computes the differences between them. Considering that the surrogate model contains errors in predicting the metrics, the script outputs the five most consistent parameters. Finally, the output parameters are checked using TCAD calculations. The most consistent parameter set was selected as the calibration result.

3.3

Performance

We tested the performance of the calibration method with 3 PDKs: 28 nm, 40 nm and 65 nm. Different PDKs have distinct values of tox, sd_peak, sd_depth, and V_dd. In addition, each PDK can specify different gateLen and gateWidth values within its allowed range. For each PDK, we selected one gate length value and one gate width value to generate the calibration goal. The PDK information and selected gate dimensions are listed in Table 6. As shown in Fig. 14, the calibration goals are quite different for the different PDKs.

Information about PDKs to be calibrated and selected gate dimensions.

	28 nm PDK	40 nm PDK	65 nm PDK
tox (nm)	3	2.42	2.35
sd_peak (cm^-3)	1×10²⁰	1×10²⁰	1×10²⁰
sd_depth (nm)	64	70	115
V_dd (V)	0.9	1.1	1.2
gateLen (nm)	35	40	65
gateWidth (nm)	200	120	200

Fig. 14

(Color online) Calibration goals for different PDKs.

For each of the calibration goals, the Python script required approximately 8 s to find the five most consistent parameter sets in 500,000 random inputs. The output parameter sets were then sent for TCAD calculations to determine the most consistent set. The calibration results are shown in Fig. 15 and Fig. 16. The calibrated parameters are listed in Table 7. The calibration results matched the goals, and the parameters required by the PDKs were satisfied.

Calibrated parameters for three different PDKs.

	28 nm PDK	40 nm PDK	65 nm PDK
workF (eV)	4.20	4.09	4.09
well_const (cm^-3)	4.76×10¹⁷	6.89×10¹⁷	8.00×10¹⁷
ch_const (cm^-3)	2.03×10¹⁸	1.88×10¹⁸	5.80×10¹⁸
ch_depth_a (nm)	3.63	10.67	26.05
ch_position_a (nm)	9.01	0.65	0.93
ldd_peak (cm^-3)	4.82×10¹⁹	2.18×10¹⁹	4.53×10¹⁹
ldd_depth (nm)	17.14	9.18	30.08
ldd_factor	0.05	0.48	0.41
halo_position_z (nm)	19.88	10.85	28.21
sd_position (nm)	9.76	7.36	27.48

Fig. 15

(Color online) Calibration performance for 3 different PDKs.

Fig. 16

(Color online) Schematic of calibrated results. (a) 28 nm PDK. (b) 40 nm PDK. (c) 65 nm PDK.

We introduced the root-mean-square error (RMSE) to measure the differences between the calibration results and goals. The RMSE is computed as $RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}},$ (2) where y and $\hat{y}$ represent the goal and the calibration results, respectively. The ratios of the RMSE relative to the average value of y are shown in Fig. 17. The relative RMSE of S^-1 was also computed to evaluate its performance in the subthreshold region. Generally, an RMSE of approximately 10% is sufficient for TCAD model calibration for radiation effects. Our results show the feasibility of fast NMOS model calibration with the help of machine learning. The TCAD simulation required approximately 600 s to calculate one NMOS case on a personal computer with an Intel i7-12700 CPU. Generally, only one case can be calculated at a time. For comparison, 500,000 cases could be evaluated in approximately 8 seconds using the surrogate model on the same computer. The proposed method is approximately 10⁷ times faster than the original TCAD simulation.

Fig. 17

(Color online) Relative RMSEs for three different PDKs.

For the 40 nm and 65 nm PDKs, the gate material is typically polysilicon. In these cases, the workF of the gate material is set to 4.09 eV as an approximation of polysilicon with N+ doping concentration of 2×10²⁰ cm^-3. The other nine parameters are searched during calibration. As shown in Fig. 18, the I_d–V_g curves remained almost the same when the gate material changed to the TCAD built in polysilicon with arsenic doping of 2×10²⁰ cm^-3.

Fig. 18

(Color online) Comparison between polysilicon and material with workF 4.09 eV for the 65 nm MOSFET.

Discussion

The calibration method and relevant Python script proposed here can serve as a fast preliminary calibration tool that automatically provides possible calibration results in several seconds. The model contains basic types of doping, and more detailed calibrations can be performed on this basis. The important parameters that govern each metric of the I_d–V_g curves were identified using the machine learning methods in Section 2.3.2. Thus, small adjustments are easy to perform. For the 28 nm PDK, the gate dielectric is commonly multilayered. However, the PDK used in this study only provides a tox value for reference. If detailed structural information is available, the calibration can be further improved based on the script outputs.

The key advantage of the proposed approach is that once the model is trained, it can continuously provide fast calibrations for different goals within its scope. The generation of training samples is time consuming. However, this is a one-time process.

The proposed machine learning methods are data-driven, learning patterns and correlations from data. In this study, the classification of valid parameter sets, importance of parameters, and correlations between the parameters and related metrics were obtained using machine learning methods without the need for semiconductor expertise. This data-driven approach complements physics-based research. Classical semiconductor theory is suitable for describing relatively simple structures. However, it is difficult to obtain analytical expressions for complex structures. Machine learning is a data-driven approach. It learns the correlations and importance of parameters from the data but does not fully understand the physical principles behind the results. Its results can serve as a reference for further physical research.

Conclusion

We presented a machine learning approach for MOSFET model calibration and built a Python script utilizing a machine learning-based surrogate model. The surrogate model was several orders of magnitude faster than the original TCAD simulation, and the desired calibration parameters for the NMOS could be obtained in several seconds. In this study, a fundamental model containing 26 parameters was introduced to represent the typical structure of a MOSFET. Classifications were developed to improve the efficiency of generating training samples by predicting the validity of parameter combinations before the TCAD calculation. Feature selection techniques were used to identify the important parameters and decrease the dimensions of the NMOS model. A 16-dimension surrogate model comprising a classifier and regressor was built. The surrogate model determines the validity of the input parameters and predicts the corresponding threshold voltage, transconductance, and subthreshold slope of I_d–V_g curve. A calibration procedure was proposed and implemented using a Python script. The calibration script was tested using three NMOS calibration goals generated by different PDKs. The results indicated that the calibrated parameter values could be achieved within approximately 8 s. Our work demonstrates the feasibility of machine learning-based fast model calibration. A similar approach could be adopted to develop fast calibration tools for other devices. In addition, this study shows that these machine learning methods learn patterns and correlations from data instead of employing domain expertise. This indicates that machine learning could be an alternative research approach to complement classical physics-based research.

References

L. Ding, W. Chen, T. Wang et al.,

Adverse effect of inappropriately implementing source-isolation mitigation technique

. Atomic Energy Science and Technology 55, 2260-2266 (2021).