GERO: A general SCA based readout ASIC for micro-pattern gas detectors with configurable storage depth and on-chip digitizer

NUCLEAR ELECTRONICS AND INSTRUMENTATION

GERO: A general SCA based readout ASIC for micro-pattern gas detectors with configurable storage depth and on-chip digitizer

Xin-Yuan Zhao，

Feng Liu，

Zhi Deng，

Yi-Nong Liu

Nuclear Science and Techniques

Vol.30, No.9

Article number 131

Published in print 01 Sep 2019

Available online 08 Aug 2019

DOI：10.1007/s41365-019-0659-2

194002

The paper presents GERO (GEneral ReadOut), a general readout ASIC based on a switched capacitor array for micro--pattern gas detectors. It aims at providing general readout electronics for low to medium event-rate gas detectors with high sampling frequency, configurable storage depth, and data digitalization. The first prototype GERO chip integrates 16 channels and was fabricated using a 0.18 µm CMOS process. Each channel consists of a sampling array working in a ping-pong mode, a storage array with a 1024-cell depth, and 32 Wilkinson analog-to-digital converters. The detailed design and test results are presented in the paper.

ASICSwitched capacitor arrayWaveform samplingConfigurable deep memory depth

1 Introduction

Switched capacitor array (SCA) ASICs have been widely used for low to medium event-rate physics experiments because of its low power consumption and high channel densities at affordable costs in comparison to analog-to-digital converters (ADCs). SCA chips can provide signal waveforms, and hence information that is more detailed can be extracted by further digital signal processing. Various SCA ASICs have been developed with sampling rates varying from GS/s to MS/s. The GS/s sampling SCA chips were mostly used for fast scintillation or Cherenkov light detectors, for example, the ARS chip for H.E.S.S. and IACT array ^[1], the SAM chip for H.E.S.S.-II ^[2], and the DRS chip for MAGIC-II telescopes ^[3,4] as well as PSEC4 ^[5], LABRADOR ^[6], and TARGET ^[7]. However, the MS/s sampling SCA chips have a much wider range of applications. They can be used for both semiconductor and gas detectors, for example, the CERN-49 IC for STAR ^[8], the DTMROC and HAMAC chips for LHC ATLAS ^[9,10], the APV series chips for LHC CMS ^[11,12], the Beetle chip for LHCb ^[13,14], the AFTER chip for T2K, and the AGET chip generically for Time Projection Chambers (TPCs) ^[15~17].

In our previous work, a CASCA ASIC has been successfully developed for a TPC based X-ray polarimetry using a 0.18 μm CMOS process ^[18]. The emitting direction of the photoelectron is modulated by the X-ray polarization, and it can be estimated by measuring the two-dimensional trajectories of the photoelectrons with a precision of one tenth of a millimeter. In a TPC based polarimeter, one dimension of the photo-electron track was determined by the readout strips and the other dimension was estimated by the signal waveforms. SCA was adopted for the waveform sampling in CASCA chip where each channel consisted of analog front-ends and a 64-cell depth SCA. The prototype chip was tested with a GEM (gas electron multiplier)-TPC to measure the photoelectron tracks generated by 8 keV X-rays ^[19]. However, there were two limiting factors of the SCA in CASCA chip. First, the energy of the photoelectron (namely the maximum track length) was limited below 10 keV by the depth of 64 cells. Second, it could not work in a ping-pong mode, resulting in a systematic dead time during the readout.

A new version of the SCA chip called GERO is proposed and designed to solve these problems and to provide a configurable memory depth for more generic readout solutions. Besides the generic readout requirements of micro-pattern gas detectors (MPGDs), TPC applications were also considered with the utmost importance. Here, the architecture of a two-stage SCA was adopted ^[7], in which the SCA was divided into sampling and storage arrays. The input signal was sampled continuously by the sampling SCA array, and once a trigger was asserted, the sampling was stopped and the analog samples were transmitted to a storage SCA block. A depth of 32 cells was chosen for the sampling block, and each channel consisted of two sampling blocks working at ping-pong mode. The storage block had a depth same as that of the sampling block, and each channel was integrated with 32 storage blocks. Hence, a total depth of 1024 cells was implemented in such a way that the event size could be flexibly configured by an external trigger pattern. In addition, an on-chip Wilkinson type ADC was also integrated to improve the readout speed and dynamic range ^[1,2].

The first prototype chip integrated 16 channels of SCA and was fabricated in a 0.18 µm CMOS process. The detailed design and test results are described in the following sections.

2 Architecture and Specifications

A simplified schematic of 16-channel GERO is shown in Fig.1. Each channel consisted of sample and storage SCA arrays along with 32 Wilkinson ADCs. The sample SCA array consisted of two sample SCA blocks (Block A and Block B), and each block consisted of 32 cells of switched capacitors, namely 32 sample cells. The input signal was sampled continuously by the 32 sample cells in one of the sampling blocks, for example Block A. When a trigger was asserted, the analog samples were held in Block A waiting for transmission to one of the 32 storage SCA blocks. At the same time, the other sampling block (Block B in this case) started sampling immediately. The systematic dead time was reduced dramatically by operating the sampling blocks in ping-pong mode. By the two-stage (sample and storage) SCA, the input signal could avoid driving a long wire and large capacitance load induced by the increasing number of storage cells. As a result, a readout buffer was needed for each sample cell to transmit the analog voltage to the storage cell with a certain precision and speed. To support consecutive triggers, total time for the transmission and the reset of the readout buffer should be within 32 sample clock periods, that is, 320 ns at 100 MS/s sampling rate. A design that has low power, high precision and high speed readout buffer was mandatory. In addition, the maximum trigger latency was limited by the depth of the sample block.

Fig. 1

Simplified schematic of the GERO chip. The sample cell consists of two capacitors; one is from Block A and the other from Block B. "A" in the sample array stands for the readout buffer, the schematic of which is also shown in the figure.

The storage SCA block had a depth of 32 cells same as that of the sample block. Each channel was integrated with 32 storage blocks, resulting in a buffer depth of 1024 cells. The 1024-cell memory can be easily split by sending a sequence of consecutive triggers with an interval of 32 sample clocks. For example, the 1024 storage cells can be used as a two-event buffer of 512, or a four-event buffer of 256, etc. In this way, the storage SCA can work as a multi-event buffer with reconfigurable event number and buffer depth that is, from a 32-cell event buffer recording 32 consecutive events to a 1024-cell event buffer recording one event. Because of the benefit of the two-stage SCA architecture, GERO could be configured to meet the stringent demands of event size and rate for different experiments.

In the storage array, the data are stored in a sequence as they are generated and wait for the digitization or so-called read phase. The analog samples in the selected storage block are then digitized by the 32 Wilkinson ADCs in parallel. These digitized samples are then latched into the output registers and can be shifted out through 4 LVDS data outputs. All the control signals for the sample and storage SCA block, and ADC were generated by the control logic module including a few external signals. In addition, the control logic was designed with flexibility so that one or several ASICs can be controlled and read out by a single companion field programmable gate array (FPGA).

The TPCs in different experiments require readout ASICs with different sampling speeds (at ~ 10 MS/s) and resolutions from 8–10 bit. Moreover, the event length varies largely due to different experiments. For example, the TPC based X-ray polarimetry needs a readout ASIC with approximately 100 sampling cells when sampling at 20 MS/s. Whereas, for the ALICE TPC, the readout MALICE ASIC ^[20] integrates 1024 switched capacitor cells in one channel and has a sampling speed of 1 MS/s. The aim of the prototype GERO is to meet the different requirements of TPCs. The major specifications of GERO are listed in Table I.

Specifications of the GERO chip

Number of channels	16
Input signal range	0.3–1.3 V
Sampling frequency	1–100 MS/s
Readout bandwidth	800 Mbps max.
Buffer latency	10 µs @ 100 MS/s sampling
ADC ENOB	10 bit
Power consumption	<4.5 mW/ch

3 Circuit Design

3.1 The Sampling Cell

As shown in Fig.1, the sample cell consists of two switched capacitors (one each in Block A and B) and a readout buffer. In total, 32 sampling cells are integrated for each channel.

Switches $S_{1 A} - S_{1 A}^{'} / S_{1 B} - S_{1 B}^{'}$ are used for sampling and $S_{2 A} - S_{2 A}^{'} / S_{2 B} - S_{2 B}^{'}$ for transferring. Switch $S_{3}$ is used for resetting the readout buffer. Complementary switches are used with an n-channel and a p-channel transistor connected in parallel. Bottom plate sampling ^[21] is adopted to reduce the input-related charge injection and to achieve better linearity performance. The sampling cell was simulated by the Spectre and the residual voltage could be reduced from 0.6 mV with a no time delay between $φ_{1}$ and $φ_{1}^{'}$ to 0.1 mV with a proper delay between $φ_{1}$ and $φ_{1}^{'}$ . The non-overlap control signals, which are insensitive to CVT (corner, voltage, and temperature), are generated by the circuits shown in Fig.2.

Fig. 2

The schematic of the non-overlap control signal generator.

The structure of the readout buffer is also shown in Fig.1. The simple two-stage structure was used to save the power consumption and area, as well as to meet the speed and precision requirements. The settling time and input referred noise of the buffer were simulated to be 250 ns and 0.49 mV, respectively.

3.2 The Storage Cell

The schematic of the storage cell with the common current mirrors and comparator in ADC is shown in Fig.3. It consists of a switched capacitor and two common drain (source follower) transistors ( $M_{1}$ and $M_{2}$ ) with switched outputs. Each storage block integrates 32 storage cells and each channel integrates 32 blocks, forming a 1024-cell event buffer.

Fig. 3

The schematic of the storage cell with the common current mirrors and comparator in ADC.

The storage cells are divided into two groups, corresponding to the sampling block A and B. The write of the storage cell (from sampling cell to storage cell) is then selected by the Block A/B selection line ( $S_{5}$ ) and a 5-bit address line ( $S_{6}$ ). The read of the storage cell (selected for digitization) is selected by a 6-bit address line ( $S_{7}$ ). Two source followers are used to buffer the stored analog voltage and ramp signal, and the selected outputs are sent to the inputs of the comparator in the Wilkinson ADC. The two source follower transistors are placed closed to each other for better matching and reducing the influence from the environment. Each cell needs a pair of current mirrors that are shared among different storage blocks. In total, 32 pairs of current mirrors are integrated for 32 Wilkinson ADCs working in parallel. The supply voltage of 2.5 V is used to achieve 1 V linear range.

3.3 The Wilkinson ADC

The Wilkinson type ADC has been adopted because of its low power consumption and circuit compactness. Each channel integrates 32 Wilkinson ADCs and hence, a whole storage block can be digitized simultaneously.

The ramp signal generator is shared by all the 32 ADCs, as well as the 11-bit counter. As described above, the ramp signal is sent to each storage cell as the input of one source follower. The counter starts counting as the ramp signal rises, and its value is latched into a local 11-bit latch when the ramp signal crosses the stored voltage in the corresponding cell. The ramp signal sweeps from 0.2 V to 1.4 V in 12 μms. Here, an input voltage range of 0.3–1.3 V is used for better linearity. After digitization, the data is loaded into the output shift registers. The maximum output data bandwidth for ADCs is 29.3 Mbps per channel.

3.4 The Control Logic

The digital control circuit is developed using a standard digital IC design process. Only a few external signals are used to generate all internal control signals for sampling and storage of SCA, for example, sync for multi-ASIC synchronizing, trigger, read, move, and global reset. Three clocks are required: the sampling clock, ADC clock, and readout clock.

Two finite state machines (FSMs), SA and ST, are designed to control the sampling and storage SCA, respectively. The simplified state diagrams are shown in Fig.4.

Fig. 4

The state diagrams of the FSMs for (a) the sampling SCA and (b) the storage SCA.

The FSM SA has 4 different states:

SA1 – Block A sampling, Block B idle

SA2 – Block A transmitting data, Block B sampling

SA3 – Block A idle, Block B sampling

SA4 – Block A sampling, Block B transmitting data

The state transitions are driven by the trigger and full signals. The latter is an indicator whether the storage SCA blocks are full. There is no more response to the external triggers once all the storage blocks are full. In each state, the corresponding control signals are generated for the switches in the sampling cells (Fig.1). For instance, in State SA1, switches $S_{1 A} - S_{1 A}^{'}$ are turned on for each sampling cell in sequence with a duration of one sampling clock. Once a trigger is asserted, the state transits to SA2. Now, the switches $S_{1 A} - S_{1 A}^{'}$ are turned off and switches $S_{1 B} - S_{1 B}^{'}$ are turned on for each cell in sequence. At the same time, switches $S_{2 A} - S_{2 A}^{'}$ are turned on and switches $S_{3}$ are turned off for all sampling cells.

The FSM ST also has 4 different states:

ST1 – Waiting data from Block A

ST2 – Writing in data of Block A

ST3 – Waiting data from Block B

ST4 – Writing in data of Block B

In each state, the corresponding control signals are generated for the switches $S_{5}$ and $S_{6}$ in the storage cells (Fig.3). The external trigger signal generates control signals AW and BW, which are used to drive the state transitions along with signal full.

Control signals read and move are used for digitization and output data buffering. Signal read is used to start the AD conversion of the storage block that is currently selected, and signal move is used to shift the address to the next storage block. The digitized data is also loaded into the output shift registers at the end of the read operation. Flexible readout schemes can be implemented by an independent combination of these two signals. After digitization (read + move) or abandonment (move), the flag of the corresponding block is cleared, which otherwise may alter the indicator signals full and empty. The data can be shifted out by enabling the readout clock and signal read enable.

3.5 Layout design

The layout of the chip with a dimension of 4960 µm × 3980 µm is shown in Fig.5. The control logic is at the top and the analog bias generator is at the bottom. Each channel consists of the sampling SCA, storage SCA and Wilkinson ADC with output data buffer as shown in Fig.5. from the left to the right. The storage array occupies the largest area with a large number of MIM capacitors. A careful layout has been considered to suppress the interference of the digital circuits on the analog circuits.

Fig. 5

The layout of the GERO chip.

4 Test Results and Discussion

4.1 The Evaluation System

A dedicated evaluation board and FPGA test system have been developed to characterize the GERO prototype chip. The GERO chip was mounted on the evaluation board with 3 channel input signals via the SMA connectors. The external control signals were connected to the FPGA board through an adapter board, for 1.8 V to 3.3 V logic level conversions. The sampling frequency could be programmed through the remote bus control protocol (RBCP), as well as the event size and depth. The output data from the GERO chip was buffered in the FPGA before transferring to the computer via Ethernet. A Qt-based data acquisition software has been developed to configure the FPGA and to collect the data.

4.2 Functional Test

Configurations of different sampling rates from 25 MS/s to 100 MS/s, and event sizes from 32 to 1024 have been tested. A maximum data bandwidth of 800 Mbps has also been verified with 200 MHz readout clock. All the functions in the GERO chip, that are sampling, storage, data digitalization, and data output worked well; which means that the two stage SCA architecture, Wilkinson ADC, output shift registers, and control logic were running well. Fig.6 shows two examples of the waveforms with two different memory depths of 32-cell and 64-cell. The waveforms were far from satisfactory, but they could verify the flexible split of the memory depth.

Fig. 6

Waveforms of GERO sampling sin waves with (a) 32-cell memory depth (b) 64-cell memory depth

However, nearly half of the storage SCA blocks were found to have failed due to the mistakes in the layout. The outputs of these blocks were noisy and were independent to the inputs, and thus were discarded in the following analysis.

4.3 The Power Consumption

The power consumption of the GERO was measured at the room temperature with a main bias current of 50 µA. Two supply voltages of 1.8 V and 2.5 V were used. The power consumption was measured to be 2.3 mW/ch for the 1.8 V power supply, and 7.23 mW/ch for the 2.5 V power supply. The latter was much higher than the simulation result of 1.8 mW/ch. The ramp signals for the Wilkinson ADCs were found to be considerably steeper and their start points were approximately 100 mV higher than the designed values, accounting for the additional current of 0.5 mA per channel. Abnormal parasitic current paths were found from 2.5 V supply to ground, probably through some switch transistors in the ramp generator. The test results shown below are for less than 100 MS/s sampling and 1024-cell event size.

4.4 The Static Noise

The output noise was first tested by measuring the output variances, during sampling different DC voltages. A typical histogram distribution of the digitized outputs for a certain storage cell sampling a DC level is shown in Fig.7(a). In total, 5000 samples were collected for each histogram. The input referred noise was then estimated to be 1.2 mV from the standard deviation of the major peak, which is consistent with the simulation result of 1.17 mV. However, a minor peak could also be clearly seen, which was common for quite amount of storage cells. The differences between the major and minor peaks varied for different input voltages. The phenomenon of the twin peak was most probably caused by the disturbance on the ramp signal. Only the major peaks were used for the static performance evaluation.

Fig. 7

The GERO test results of (a) the output distribution for a certain storage cell for the input voltage of 0.8 V (5000 events collected), (b) the measured ASIC outputs vs. DC input voltages (top) and the INL (bottom), (c) the change of the ASIC output vs. waiting time before readout, (d) the DC responses of the different storage cells in one channel with their linear fits, (e) the transient waveforms recorded by GERO with their Lorentz fits. The sampling period between two adjacent sampling cells is 10 ns; the figures illustrate the outputs with sampling cell as well as time.

4.5 The Linearity

The mean values of the major peaks for different DC input voltages are shown in Fig.7(b), along with the linear fit curve. The linear range was reduced to 0.4–1.25 V, because of the exceptional leakage current in the ramp signal. The maximum integral non-linearity (INL) was typically around 3% for the input range. Besides the issue of the ramp signal, the major source of the non-linearity was from the pair of the source followers, according to the circuit simulation. This could be improved in the future by optimizing the size of the source follower transistors (e.g. 8 u/1 u) and by changing the output point to the other side of the switch $S_{8}$ (N in Fig.3). According to the simulation results, the INL of the updated structure decreased blow 0.1%.

4.6 The Leakage Current

The charge loss caused by the leakage current could be significant, especially for a large SCA. The major contribution to the leakage current was from the switch transistors and was voltage dependent. The change of the ASIC output versus time for 1.2 V input voltage is shown in Fig.7(c). The leakage current was calculated to be 260 fA. At the readout frequency of 200 MHz, the difference caused by the leakage current between the first and the last block is less than 1 mV.

4.7 The Non-Uniformity

The DC responses of the different storage cells in one channel were measured for 0.7–0.9 V input range. In total, 544 cells were tested; and for each cell three different input voltages were repeatedly sampled for 1000 times. The averaged sample points with their linear fits are shown in Fig.7(d). The standard deviation of the offsets was 20.4 mV. This is almost one order of the magnitude larger than the simulation result of 2.43 mV, indicating abnormal contributions probably due to the unstable ramp signal. In the next version, the mistakes relating ramp generate circuit would be corrected, and a bandgap will be used instead of a current mirror to generate a more stable and reliable ramp signal.

4.8 The Transient Performance

The quantitative measurement of the dynamic performance became extremely difficult due to the unexpected problems such as half-failed storage cells and twin peak phenomenon. However, to prove the functionality of the GERO chip, Lorentz waves with two different amplitudes were sampled at 100 MS/s, as shown in Fig.7(e). The baseline of the Lorentz waves was 550 mV, and the peak values were 700 mV and 1000 mV, respectively. The FWHM width of the Lorentz waves was approximately 575 ns, and the waves occupied seven consecutive storage blocks.

5 Conclusion

The GERO chip has been designed for the generic readout for MPGD with a sampling frequency up to 100 MHz, an event buffer depth up to 1024 cells, and an on-chip digitization. Here, the architecture of the two-stage SCA was implemented, along with the corresponding control logics. In this way, the event size and buffer depth could be easily reconfigured, which brought a large flexibility to meet the stringent demands of various applications. Although the performance of the prototype is not satisfactory and cannot meet the design specifications, the whole function of the GERO chip has been verified, that includes, the sampling and storage SCAs, on-chip ADCs, data output, and the corresponding control logics. The performance of the first prototype chip was severely affected by the failed storage blocks and the unstable ramp signal. An upgraded version will be designed and tested in future works.

References:

D. Lachartre, F. Feinstein,

Application specific integrated circuits for ANTARES offshore front-end electronics

. Nucl. Instrum. Meth. A, 442(1-3):99-104. (2000).doi: 10.1016/s0168-9002(99)01205-x