logo

A flexible and robust soft-error testing system for microelectronic devices and integrated circuits

NUCLEAR ELECTRONICS AND INSTRUMENTATION

A flexible and robust soft-error testing system for microelectronic devices and integrated circuits

WANG Xiao-Hui
TONG Teng
SU Hong
LIU Jie
ZHANG Zhan-Gang
GU Song
LIU Tian-Qi
KONG Jie
ZHAO Xing-Wen
YANG Zhen-Lei
Nuclear Science and TechniquesVol.26, No.3Article number 030401Published in print 20 Jun 2015Available online 20 Jun 2015
38600

Single event effects (SEEs) induced by radiations become a significant reliability challenge for modern electronic systems. To evaluate SEEs susceptibility for microelectronic devices and integrated circuits (ICs), an SEE testing system with flexibility and robustness was developed at Heavy Ion Research Facility in Lanzhou (HIRFL). The system is compatible with various types of microelectronic devices and ICs, and supports plenty of complex and high-speed test schemes and plans for the irradiated devices under test (DUTs). Thanks to the combination of meticulous circuit design and the hardened logic design, the system has additional performances to avoid an overheated situation and irradiations by stray radiations. The system has been tested and verified by experiments for irradiating devices at HIRFL.

SEE testingTesting systemSingle Event EffectsSoft errorsHIRFL

I. INTRODUCTION

Radiation-induced single event effects (SEEs) are considered as a primary challenge to reliability of microelectronic devices and integrated circuits (ICs) [1, 2] because highly energetic particles traversing through sensitive regions of the devices under test (DUTs) induce charge collection [1, 3-5]. To indirectly assess SEEs sensitivity of DUTs, especially in terms of the non-destructive effects including single event upset (SEU), single event transient (SET), single event functional interrupt (SEFI) and single event latch-up (SEL) [1, 6], ground tests on an accelerator are generally performed as an important approach for DUTs exposure to a radiation environment [5, 7]. Therefore, an effective and accurate testing system is an indispensable ingredient in characterizing the SEEs as a supportive tool.

In this paper, based on requirements of the ground tests at the Heavy Ion Research Facility in Lanzhou (HIRFL), a robust and flexible system oriented mainly towards the soft errors is designed and constructed. The primary design is to support multiple I/O standards for being physically compatible with the diversity of digital devices, such as static random access memories (SRAMs), dynamic random access memories (DRAMs), flash memories, and field programmable gate arrays (FPGAs). As this testing system, composed of programmable logic devices and combined with extendable controlling software concurrently, can logically support extensive SEEs experiments and can be smoothly managed. Additionally, as a result of the high access speed (up to 200 Msps) and real-time monitoring for the DUTs, the soft errors are effectively detected and recorded in the experiments at HIRFL. Using this testing system, experiments for SEEs characterization are successfully performed for different devices. It is worth mentioning that one type of FPGA that will be employed in satellite and reinforce assurances of the logic design in it are tested and verified.

II. DESIGN OF THE TESTING SYSTEM

According to the guidelines and standards in terms of SEE testing procedures [1, 5- 11] and previous designs of SEE testing systems [12-23], a block diagram of the soft-errors testing system for this work was designed (Fig. 1). It is divided into two parts. The main part composes of a circuital subsystem, programmable power supplies, a RS485-to-USB adaptor and control computer, being placed in the irradiation laboratory where the DUTs are exposed. The other part is a monitoring computer, linked to the control computer with remote desktop protocol, which is placed in the operators’ room, to control the testing system. The two parts are separated by bioshieldings to protect people against radiations.

Fig. 1.
Block diagram of the soft-error testing system.
pic

The circuital subsystem is architected to detect soft errors in DUTs. It is a motherboard-daughterboard structure. The daughterboards are mainly designed to load DUTs, and the motherboard accomplishes the required functions. An advantage of this structure is cost reduction based on reusing the motherboard. Additionally, experiments performed in a vacuum chamber are conveniently available because of the robustness of this circuital subsystem.

To avoid unexpected effects from stray radiations, the rest equipment of the testing system in the irradiation laboratory is placed away from the circuital subsystem. Long cables are used to connect them to each other, though cables generate degradation inevitably. To supply suitable voltages to the circuital subsystem, a closed control loop of power networks is introduced. The loop involves the circuital subsystem, programmable power supplies, control computer and software. The workflow is presented as follows: 1) the programmable power supplies provide required voltage to the circuital subsystem; 2) the circuital subsystem feeds actual values of the voltages back to the control computer; and 3) the control computer instructs the programmable power supplies to modulate the output voltage. In addition, to ensure the data transfer on the long cables between the circuital subsystem and the control computer, a derived industrial RS-485 communication standard, which adopts checking techniques in the byte layer and frame layer, is invoked.

A. Design of the circuital subsystem

Structure of the circuital subsystem is described pictorially in Fig. 2. The daughterboards, which carry the DUTs, are discrete with the motherboard. When a certain type of device or IC is characterized, the corresponding daughterboard is attached to the motherboard. The motherboard provides the operating conditions for the DUTs and detects the soft errors that occur in the DUTs during irradiation. The motherboard can be grouped into three modules by function: the interface module, monitoring module and testing module.

Fig. 2.
Block diagram of the circuital subsystem. Thick dashed lines: power connections; Solid lines: signal connections.
pic
1. The interface module

The interface module is used to import the power sources and to communicate with the control computer. It contains four DB-9 connectors. Two of them are used to import the power sources, with each connector serving two channels (Fig. 2). One channel is used for the motherboard, while the other three channels are used for the DUTs. The other two DB-9 connectors are used to communicate with the control computer, transferring commands from the control computer and sending status of the circuital subsystem and test results of the DUTs back to the control computer. There are four pairs of communication channels, which apply a variation protocol from the RS-485 standards in the physical layer and electrical layer. Three of them are duplex, while the other pair is a bidirectional, configurable channel. Thus, the communication channels can be configured in various modes for different applications. For example, the combination of one duplex pair and one pair configured as an input (related to the circuital subsystem) can make up a joint test action group (JTAG) channel. This facilitates dynamical configuring/reconfiguring of FPGA (the core of the testing module) or for accessing the DUTs that support the JTAG interface (e.g. programmable logic devices, memory ICs, and application specific ICs).

2. The monitoring module

The monitoring module guarantees proper operation of the circuital subsystem. It is cored on the motherboard monitoring and control unit (MMCU), which is a circuit based on a complex programmable logic device (CPLD). Figure 3 shows logic design of the CPLD. The MMCU is powered directly by a low dropout regulator (LDO). It works all the way through the SEE test. During the test, it samples temperatures from six thermal sensors distributed on the motherboard, twice per second. Once any sampled temperature exceeds the predetermined threshold, the MMCU drives the motherboard into one of the low-power modes, such as decreasing running speed (even pausing) of the testing module or turning off all other parts of the motherboard, depending on the temperature rising rate. Once the temperatures drop below the acceptable values, the monitoring and control unit continues or restarts the test cycle.

Fig. 3.
Block diagram of the logic design inside the CPLD.
pic

The monitoring module also manages the power supplies for the testing module by switching them and monitoring their statuses. When a test cycle begins, the MMCU turns on the relay and starts monitoring for the relay output simultaneously. The sampled voltage values are fed back to the closed control loop of the power networks described above. Therefore, the power supply for the motherboard is compensated into a suitable range without the IR-drop effect caused by the long power cables. Then, the four DC-DC power modules are enabled. The DC-DC power modules include high integration, performance and conversion efficiency, which helps to mitigate thermal emission. After the DC-DC power modules, four overcurrent detectors are installed. If SELs occur in the testing module caused by stray radiations, the overcurrent detectors are triggered, and the associated DC-DC power modules are disabled to protect the motherboard.

3. The testing module

The testing module is used to detect soft errors in the DUTs. It is cored on the testing control unit (TCU), which is a circuit based on an FPGA. The logic design of the FPGA is shown in Fig. 4. Because each test plan for the DUTs associates a specific configuration file, the FPGA must be reconfigured if the test plan is changed. To simplify the process, two FPGA configuration modes (local mode and online mode) are introduced. The local mode is implemented with a flash memory that is capable of storing up to four revisions of the configuration files. The revisions can be selected either manually or under the MMCU control. The online mode is implemented in two ways: 1) introducing a JTAG channel from the communication interface to the configuration management circuit, where there is the interface of the JTAG chain cascading the FPGA; 2) downloading, with MMCU, configuration files from the computer into the FPGA through JTAG interface.

Fig. 4.
Block diagram of the logic design inside the FPGA.
pic

Once an experiment begins, the testing module starts to provide operating conditions for the DUTs. Up to three channels of adjustable power supplies can be imported to the daughterboard. This is sufficient for most types of DUTs both in normal mode or bias mode. The control and compensation mechanism of the power supply is the same as that described above in the monitoring module. Currents of the power sources are measured to detect whether SELs occur in the DUTs. In regard to data access of the DUTs, the testing module is able to offer up to 120 channels of single-end signals and 40 pairs of differential signals for the DUTs through two high-speed and high-density board-to-board connectors. Each connector has 180 pins in which 60 channels of single-end signals and 20 pairs of differential signals are assigned. The signals are isolated with the ground alternately to decrease crosstalks. Moreover, signal traces on the printed circuit board (PCB) are spaced from each other at least five times the width of the trace. This is also conducive to reducing the interference. The traces are wired to the connectors in equal length to ensure timing alignment. As a result, throughput of the connections is no less than 200 Mbps. Because the TCU supports various I/O standards in four voltage levels (1.2, 1.8, 2.5 and 3.3 V), most DUTs can be accessed directly. Temperature of the DUTs are monitored by sampling temperatures from the sensors inside the DUTs or from the sensors assembled close to the DUTs. By accompanying this process with a certain heating method, temperature-biased experiments can be performed. This is also meaningful for protecting the DUTs from overheating in a normal testing plan.

The testing module performs soft-error detection when the DUTs are ready. First, it initializes the DUTs in different ways depending on the type of DUTs or test plans. Then, it detects soft errors by accessing the DUTs and monitoring the currents of the DUTs. Once a soft error is detected, the TCU packs the associated information and a time stamp into a frame, which is pushed into a first-in first-out (FIFO) queue to be transmitted automatically to the computer. For certain complex test plans, the SRAM and double data rate (DDR) synchronous dynamic random access memory (SDRAM) can be used for multiple purposes, such as storing the initial testing data, acting as mirrors of the memory ICs, and caching the information of the detected errors. Additionally, introducing a JTAG signal path from the control computer to the DUTs for specific test plans is available. In this case, both the MMCU and TCU pass the JTAG signals to the DUTs directly.

4. The daughterboards

The daughterboards are dedicated to carry DUTs. Benefiting from the number and throughput of the connections between the daughterboard and TCU, the DUTs can be assembled variously. Multiple DUTs of the same type can be placed in one daughterboard to save the time of prepping or changing the DUTs. Multiple DUTs with different types can also be placed in one daughterboard to perform contrastive testing.

5. Layout of the circuital subsystem

The PCB layout of the circuital subsystem is also carefully handled. In the area where the DUTs are exposed to radiations, no device is placed in the motherboard. All active devices and ICs in the motherboard are placed on the back side. The 2.5-mm-thick PCB board can effectively block stray heavy ions from the accelerator to the active devices and ICs in the motherboard. This is useful for cooling of the motherboard because nearly all of the active devices and ICs can be attached by a metal sheet used for heat sinking.

B. Design of the control software

A control software controls the testing system and manages the test procedures. The testing system is compatible with types of DUTs, but different types of DUTs have different parameters: voltages of the power supplies, temperature ranges, access modes, and supported test plans. So, the initialization and SEE testing methods vary. Moreover, for different test plans, the data processing method for the soft errors is different. To program general software for different DUTs, serious means are adopted. Parameters of the DUTs and the supported test plans are abstracted and stored into databases and profiles. The associated configuration files are packed into a folder. The subprograms of the data processing are compiled as dynamic link libraries (DLLs). A simple flow chart of the software is shown in Fig. 5.

Fig. 5.
A simple flow chart of the control software.
pic

III. APPLICATIONS IN EXPERIMENTS

A series of hardness assurance tests for space application were performed with this testing system. Several SRAMs fabricated in different processes were tested. The SRAMs were irradiated with heavy ion beams of 129Xe, 12C or 209Bi [24]. The events of bit-flipping and real-time variations of DUTs currents were detected and logged. Each logged item involved a time stamp. A synchronous 40-bit counter generated 10 ns time stamps, in a long range of over three hours. It is useful for calculating SEEs rate or other additional deep analyses.

Figure 6 shows a part of the raw items for the bit-flipping events in which the time stamps help to distinguish the possible multiple cell upsets (MCUs). The accessing interval of the DUT is 50 ns, so the first three items are in the same polling cycle. The addresses of the second and the third items have just a one-bit difference with the first item. It is possible that the bit flips in two or three of the first three items are caused by an MCU. Whether it is truly an MCU event depends on the layout of the die for the DUT and additional analyses.

Fig. 6.
The raw items that log the bit-flipping events in SRAMs.
pic

According to experimental data from the ISSI SRAMs of this testing system, some interesting phenomena were observed [25]. The results show that the error correcting code (ECC) utilizing the Hamming code can dramatically improve the devices’ tolerance to radiation. However, the accumulated bit-flips make the ECC ineffective.

Temperature dependence of SEU in commercial bulk and SOI (silicon on insulator) SRAMs was checked with this system [26]. The results show that the SEU cross section is affected by the temperature, especially around the threshold of linear energy transfer of SEU occurrence [26].

Specifically, a flash-based FPGA proposed to apply in a satellite was assessed, and the logic design inside was verified. The primary inherent blocks of the FPGA, as programmable logic elements and embedded RAMs, were tested, respectively. The programmable logic elements were configured as shifting register chains and inverter chains before exposure. In the tests, their outputs were continuously read at 200 mega times per second. Figure 7 shows the data format of the raw items, which logs the bit-flipping events occurring in the shifting register chains. The test method for the embedded RAMs is similar to the normal RAM ICs. In testing the FPGA, no SEL event was observed even when the linear energy transfer (LET) was over 90 cm2/mg. It is substantially more than the specified threshold for the space applications.

Fig. 7.
The raw items that log the bit-flipping events occurring in shifting register chains.
pic

Based on preliminary assessments of the FPGA’s sensibility to radiation, the logic designs inside the FPGA are harden in several ways, including the modular redundant design and ECC. Then, the FPGA with the final logic design was exposed to heavy ion irradiation. The data acquired were analyzed to verify whether the logic design is proper for satellite application.

Recently, two types of FPGAs, a type of more advanced flash-based FPGA and a type of SRAM-based FPGA, were tested preliminarily to determine whether they could be candidates for space applications. The experiments showed that they were immune to SEL at least at LET of 37.6 cm2/mg, and soft error rate of the advance flash-based FPGA was approximately 10-8 cm2/bit at LET of 20–40 MeV cm2/mg. These agree with the given reports from the producer. Nevertheless, more experiments shall be performed for detail.

IV. CONCLUSION

The foregoing experiments demonstrate that the testing system is a robust and flexible. This system can protect itself from inefficient heat sinking and can tolerate stray radiation. Additionally, sufficient hardware resources make it flexible enough to be compatible with multiple DUTs. The logic design and software design for the testing system is designed modularly. It is easy to migrate a new testing task just by adding or replacing several modules that can accelerate the design process for several new and more complicated DUTs and can also save time for preparing the testing experiments. In addition, evaluations and verifications of the harden algorithm and logic design can be performed with this testing system.

References
[1] M Nicolaidis. ed.

Soft errors in modern electronic systems

. Frontiers in Electronic Testing, 41, 2011. DOI: 10.1007/978-1-4419-6993-4
Baidu ScholarGoogle Scholar
[2] P E Dodd, M R Shaneyfelt, J R Schwank, et al.

Current and future challenges in radiation effects on CMOS electronics

. IEEE T Nucl Sci, 2010, 57: 1747-1763. DOI: 10.1109/TNS.2010.2042613
Baidu ScholarGoogle Scholar
[3] P E Dodd and L W Massengill.

Basic mechanisms and modeling of single-event upset in digital microelectronics

. IEEE T Nucl Sci, 2003, 50: 583-602. DOI: 10.1109/TNS.2003.813129
Baidu ScholarGoogle Scholar
[4] D Munteanu and J L Autran.

Modeling and simulation of single-event effects in digital devices and ICs

. IEEE T Nucl Sci, 2008, 55: 1854-1878. DOI: 10.1109/TNS.2008.2000957
Baidu ScholarGoogle Scholar
[5] J R Schwank, M R Shaneyfelt and P E Dodd.

Radiation hardness assurance testing of microelectronic devices and integrated circuits: radiation environments, physical mechanisms, and foundations for hardness assurance

. IEEE T Nucl Sci, 2013, 60: 2074-2100. DOI: 10.1109/TNS.2013.2254722
Baidu ScholarGoogle Scholar
[6] J S JESD89A. Measurement and reporting of alpha particle and terrestrial cosmic ray-induced soft errors in semiconductor devices. Solid State Technology Association, 2006.
[7] M R Shaneyfelt, J R Schwank, P E Dodd, et al.

Total ionizing dose and single event effects hardness assurance qualification issues for microelectronics

. IEEE T Nucl Sci, 2008, 55: 1926-1946. DOI: 10.1109/TNS.2008.2001268
Baidu ScholarGoogle Scholar
[8] J R Schwank, M R Shaneyfelt and P E Dodd.

Radiation hardness assurance testing of microelectronic devices and integrated circuits: test guideline for proton and heavy ion single-event effects

. IEEE T Nucl Sci, 2013, 60: 2101-2118. DOI: 10.1109/TNS.2013.2261317
Baidu ScholarGoogle Scholar
[9] J A Felix, J R Schwank, M R Shaneyfelt, et al.

Test procedures for proton-induced single event latchup in space environments

. IEEE T Nucl Sci, 2008, 55: 2161-2165. DOI: 10.1109/TNS.2008.2000773
Baidu ScholarGoogle Scholar
[10] J R Schwank, M R Shaneyfelt, P E Dodd, et al.

Hardness assurance test guideline for qualifying devices for use in proton environments

. IEEE T Nucl Sci, 2009, 56: 2171-2178. DOI: 10.1109/TNS.2009.2013239
Baidu ScholarGoogle Scholar
[11] M A Aguirre, J N Tombs and Miranda H Guzman.

Fault injection analysis of bidirectional signals

. IEEE T Nucl Sci, 2009, 56: 2179-2183. DOI: 10.1109/TNS.2009.2019274
Baidu ScholarGoogle Scholar
[12] C Lopez-Ongil, M Garcia-Valderas, M Portela-Garcia, et al.

Autonomous fault emulation: a new FPGA-based acceleration system for hardness evaluation

. IEEE T Nucl Sci, 2007, 54: 252-261. DOI: 10.1109/TNS.2006.889115
Baidu ScholarGoogle Scholar
[13] F R Palomo, J M Mogollon, J Napoles, et al.

Pulsed laser SEU cross section measurement using coincidence detectors

. IEEE T Nucl Sci, 2009, 56: 2001-2007. DOI: 10.1109/TNS.2009.2018274
Baidu ScholarGoogle Scholar
[14] M Alderighi, F Casini, M Citterio, et al.

Using FLIPPER to predict proton irradiation results for VIRTEX 2 devices: a case study

. IEEE T Nucl Sci, 2009, 56: 2103-2110. DOI: 10.1109/TNS.2009.2015880
Baidu ScholarGoogle Scholar
[15] M Alderighi, F Casini, S d’Angelo, et al.

Evaluation of single event upset mitigation schemes for SRAM based FPGAs using the FLIPPER fault injection platform

. IEEE Int Symp Defect, 2007, 105-113. DOI: 10.1109/DFT.2007.45
Baidu ScholarGoogle Scholar
[16] N Valsecchi, E Gloutnay, T Pellerin, et al.

Development of a versatile test platform for single event effect (SEE) characterization of analog, digital and mixed-signals integrated circuits (ICs)

. IEEE T Nucl Sci, 2009, 56: 3341-3346. DOI: 10.1109/TNS.2009.2032864
Baidu ScholarGoogle Scholar
[17] P Peronnard, R Velazco, G Hubert, et al.

Real-life SEU experiments on 90 nm SRAMs in atmospheric environment: measures versus predictions done by means of MUSCA SEP3 platform

. IEEE T Nucl Sci, 2009, 56: 3450-3455. DOI: 10.1109/TNS.2009.2033362
Baidu ScholarGoogle Scholar
[18] P C Adell, L Edmonds, R McPeak, et al.

An approach to single event testing of SDRAMs

. IEEE T Nucl Sci, 2010, 57: 2923-2928. DOI: 10.1109/TNS.2010.2059711
Baidu ScholarGoogle Scholar
[19] A M Chugg, J McIntosh, A J Burnell, et al.

Probing the nature of intermittently stuck bits in dynamic RAM cells

. IEEE T Nucl Sci, 2010, 57: 3190-3198. DOI: 10.1109/TNS.2010.2084103
Baidu ScholarGoogle Scholar
[20] S Rezgui, P Louris and R Sharmin.

SEE characterization of the new RTAX-DSP (RTAX-D) antifuse-based FPGA

. IEEE T Nucl Sci, 2010, 57: 3537-3546. DOI: 10.1109/TNS.2010.2085017
Baidu ScholarGoogle Scholar
[21] M Berg, H Kim, M Friendlich, et al.

SEU analysis of complex circuits implemented in actel RTAX-S FPGA devices

. IEEE T Nucl Sci, 2011, 58: 1015-1022. DOI: 10.1109/TNS.2011.2128886
Baidu ScholarGoogle Scholar
[22] M Cabanas-Holmen, E H Cannon, T Amort, et al.

Predicting the single-event error rate of a radiation hardened by design microprocessor

. IEEE T Nucl Sci, 2011, 58: 2726-2733. DOI: 10.1109/TNS.2011.2168978
Baidu ScholarGoogle Scholar
[23] J R Azambuja, S Pagliarini, M Altieri, et al.

A fault tolerant approach to detect transient faults in microprocessors based on a non-intrusive reconfigurable hardware

. IEEE T Nucl Sci, 2012, 59: 1117-1124. DOI: 10.1109/TNS.2012.2201750
Baidu ScholarGoogle Scholar
[24] T Tong, H Su, X H Wang, et al.

An improved system of detecting single event effect in SRAM

. Nucl Phys Rev, 2014, 31: 170-176. (in Chinese) DOI: 10.11804/NuclPhysRev.31.02.170
Baidu ScholarGoogle Scholar
[25] T Tong, X H Wang, Z G Zhang, et al.

Effectiveness and failure modes of error correcting code in industrial 65 nm CMOS SRAMs exposed to heavy ions

. Nucl Sci Tech, 2014, 25: 010405. DOI: 10.13538/j.1001-8042/nst.25.010405
Baidu ScholarGoogle Scholar
[26] T Q Liu, C Geng, Z G Zhang, et al.

Impact of temperature on single event upset measurement by heavy ions in SRAM devices

. J Semicond, 2014, 35: 084008. DOI: 10.1088/1674-4926/35/8/084008
Baidu ScholarGoogle Scholar