logo

Design and offline processing of an ultrafast digitizer based on internal cascaded DRS4

NUCLEAR ELECTRONICS AND INSTRUMENTATION

Design and offline processing of an ultrafast digitizer based on internal cascaded DRS4

Ya-Fei Du
Jun Wu
Chen Yuan
Bo Yang
Cen-Ming Ye
Chuan-Fei Zhang
Yi-Nong Liu
Nuclear Science and TechniquesVol.30, No.8Article number 121Published in print 01 Aug 2019Available online 11 Jul 2019
48500

In this paper, we present an ultrafast digitizer utilizing the DRS4 switched capacitor array application-specific integrated circuit to achieve an ultrafast sampling speed of at most 5 GS/s. We cascaded all eight channels (sub-channels) of a single DRS4 chip for increased storage depth. The digitizer contains four DRS4 chips, a quad-channel analog-to-digital converter, a controlling field-programmable gate array, a PXI interface, and an SFP+ connector. Consequently, each DRS4 channel has a depth of 8192 points and a vertical resolution of 14 bits. The readout sequences should be broken into several segments and then reordered to obtain the correct sequential data sets, and this offline procedure varies in different readout modes. This paper describes the design and implementation of the hardware; in particular, the respective processing procedures are described in detail. Furthermore, the offset error is calibrated and corrected to improve the precision of the captured waveform in both single-channel and high-resolution modes.

UltrafastDigitizerDRS4CascadeReadout

1 Introduction

In recent decades, ultrafast (sampling rate ge; 1 GS/s) digitizers have attracted considerable interest in nuclear and particle physics experiments owing to increasingly sensitive detectors and more critical conditions [1]. Conventional ultrafast digitizers mostly utilize commercial time-interleaved or flash analog-to-digital converters (ADCs), which suffer from limitations of high cost and high power consumption [2, 3]. A new approach has been proposed using switched capacitor array (SCA) application-specific integrated circuits (ASICs) to realize ultrafast analog sampling along with additional ADCs to achieve digitization [4]. The SCA-based digitizers have significant advantages over their conventional counterparts in terms of power consumption, cost, and scalability [5, 6]. DRS4, an SCA ASIC developed at the Paul Scherrer Institute (PSI), is becoming increasingly popular in high-energy physics experiments such as those involving various Cherenkov telescope arrays [4, 7]. However, the main limitations of DRS4-based digitizers are considerable dead time, insufficient accuracy, and limited recording time [8, 9].

In general, the DRS4-based digitizers cannot be applied to record high-event-rate signals. However, the pulsed signals are single-shot events in the diagnosis of high-intensity pulsed radiation fields [10, 11], enabling an excellent application prospect of these digitizers. Many studies have attempted to improve the precision of the DRS4-based digitizers [8, 9]; however, there is no design or product that uses a DRS4 chip with all eight sub-channels internally cascaded to increase the record time, especially for recording single-shot transient signals.

In this study, we aim to design an ultrafast digitizer with a sampling rate of at most 5 Gs/s, and a storage depth that can extend to 8192 points. The digitizer is mainly aimed at recording single-shot transient pulsed signals lasting approximately one microsecond or more. Moreover, essential offline processing work to run this digitizer with a correct timing sequence has been introduced and implemented, and a procedure has been performed to correct the offset error.

2 Hardware Design of Functional Blocks

In this design, the ultrafast digitizer is a PXI 3U board containing DRS4 sampling, clock, ADC, data receiving, and transmission modules. Four DRS4 chips feed their sampled contents directly into a quad-channel ADC for digitization after the external trigger arrives at the board. A field-programmable gate array (FPGA)-based central controller is used to receive, process, and store the digitized signals. The captured data could be transferred to a local disk through a PXI 33-MHz/32-bit bus or to a remote receiver through an SFP+ interface for optical fiber communication. A simplified block diagram of this board is shown in Fig.1.

Fig. 1.
Hardware block diagram of the DRS4 board.
pic
2.1 DRS4 Sampling Module

DRS4 is the fourth and latest version of the DRS chips at PSI [8]. A single DRS4 chip has eight sub-channels under normal operation. Each sub-channel has a storage depth of 1024 (or recording time of approximately 200 ns at a sampling rate of 5 GS/s), which is not adequate for our application. A feasible solution is to cascade all these sub-channels to act as a single channel with an eight-fold storage depth. Table1 [12] demonstrates the particular configurations of different cascading schemes. After being correctly configured, the DRS4 chip can circularly store the sampled contents into the 8192 capacitors consecutively[12]. Moreover, DRS4 adopted a single 2.5-V power supply with isolated analog and digital powers. Furthermore, shielding from possible interference owing to high-frequency state switching must be performed.

TABLE 1.
Cascade schemes of a single DRS4 chip
Channel Number Recording time at
-4 5 GS/s 2.5 GS/s 1 GS/s
1 1.6 us 3.2 us 8 us
0.8 us 1.6 us 4 us
0.4 us 0.8 us 2 us
8 0.2 us 0.4 us 1 us
Show more
2.2 Reference Clock Generation and Distribution Module

In principle, the sampling instants for DRS4 are derived from a so-called "Domino wave." The Domino wave outputs a DTAP signal whose rising edges are used for a comparison with the reference clock by an internal phase-locked loop (PLL). The frequency relationship between the sampling clock and reference clock is expressed as fs = fin/D × 2048 [12], where D is the frequency divider factor.

A fanout buffer chip LMK01010 was used to generate highly synchronized clocks for these four DRS4 chips. LMK01010 supports eight channels of LVDS clock output, each having the capability of frequency dividing and delay adjustment via the SPI interface [13]. Fig.2 shows the configuration scheme in this design. The first four clocks are distributed to four DRS4 chips to generate a sampling clock of 5 GHz. The fifth clock is fed into the FPGA to generate the ADC sampling clock. Hence, we can synchronize both the sampling and digitizing processes with the same onboard oscillator (Si591BC156M025DG [14]).

Fig. 2.
LMK01010 device configurations
pic
2.3 ADC Module

We selected the AD9253, a quad 14-bit ADC from Analog Devices, to digitize the output content of the four DRS4 chips. The output stages of all four channels are compatible with the serial LVDS standard, supporting one- or two- lane, single data rate/double data rate (DDR) modes. The four pipeline sub-ADCs, integrated into the single AD9253, share the common clock management circuits [15]. A data clock output and a frame clock output are provided for capturing and aligning the serial data. Furthermore, a single 1.8-V power supply is required, and the available sampling rate is 10∼80 MS/s.

Typically, the SRCLK for DRS4 and ADCCLK for AD9253 should have the same frequency and a fixed phase relation. Theoretically, there should be an optimal phase difference between the SRCLK and ADCCLK to minimize the nonlinearity of DRS4[12].

3 Firmware Design with a Kintex-7 FPGA

A Xilinx Kintex-7 FPGA (xc7k160tffg676) was used to serve as the central control unit for the critical function of capturing the high-speed digital data sourced by the ADC, transmitting the captured data, and configuring the DRS4 and other devices via individual interfaces.

3.1 DRS4 Control and Readout

Figure 3 illustrates the DRS4 control and configuration block diagram in the firmware design. The rising edge of the DENABLE signal starts the Domino wave, while the internal PLL circuit stabilizes the phase and frequency of the sampling clock. When the DWRITE signal goes high, each activated channel begins to sample the input signal at the corresponding instants[12]. All these cascading sampling cells operate as a circular buffer. Once an external trigger arrives, the Domino wave stops immediately, and each cascaded channel begins to output its contents for digitization. Moreover, different cascading schemes and arbitrary channels can be activated through particular configurations.

Fig. 3.
Block diagram of DRS4 control and configuration
pic

In the full readout mode, all the 1024 contents of the read shift register should be cleared one by one with only the last one asserted high, and after this initializing operation, the readout phase begins [12]. This indicates that an additional hold time of 1024 clock cycles (or typically 30 ms) is imposed on these capacitors, inevitably resulting in additional discharge error. In this work, we started the readout with a pulse on the RSRLOAD pin of the DRS4 chip to eliminate this error. In this region-of-interest (ROI) readout mode, the ROI is the entire sampling cells.

3.2 Data Collection and Transmission

When the external trigger arrives, the four channels of the AD9253 output their serial data streams until all the contents on the 8192 capacitors are readout completely. The SelectIO resource ISERDES [16] of the Xilinx FPGA makes it effective to both realize serial-to-parallel conversion and receive the DDR, two-lane-mode LVDS data [17]. Furthermore, a bitslip command reorders the captured bits to implement word alignment. At the instant of power-on or system reset, the ADC is configured to enter the test mode by outputting fixed pattern data and then return to the normal mode as soon as the word alignment process is completed successfully. Moreover, the 14-bit digital output is converted to 16-bit parallel output by respectively adding "00," "01," "10," and "11" to the two least significant bits to act as channel tags as shown in Fig.4.

Fig. 4.
Basic procedures for PXI data transmission
pic

We used the integrated block RAM IP core, owing to its fixed storage depth for every single event, to instantiate a module to assemble the captured data into valid packets for transmission. For primary data transmission, we designed the 33-MHz/32-bit PCI interface based on the PXI platform, and the basic procedures are shown in Fig.4. Furthermore, optical fiber communication can be implemented via an SFP+ interface using the Aurora 8B/10B protocol.

4 Data Capture and Offline Processing

A photograph of the DRS4 digitizing board is presented in Fig.5. Four LEMO differential connectors were used at the input stage, and this work simplified the analog front-end circuits by feeding four differential (LVDS) analog signals into these four channels. For each DRS4 chip, the positive and negative inputs are shared by all the eight cascaded channels, and all the eight sub-channels output their sampled contents via a single multiplexed output.

Fig. 5.
Photograph of the DRS4 board
pic
4.1 Data Capture Platform

We built a system to improve the scalability based on the PXI platform with data stored in the local disk via the PXI bus. After configuring NI-VISA in LabVIEW environment to create the VISA-based driver, we developed basic communication and DMA support [18] and subsequently designed a data acquisition (DAQ) software for our digitizer. Using this software, we can implement essential processes illustrated in Fig.6 and obtain the original data of the captured waveform. The interface of the simple software is presented in Fig.7. Furthermore, the software has the potential to communicate with a remote server to develop further instrumentation functions.

Fig. 6.
Fundamental flowchart for the configuration and operations of the digitizer
pic
Fig. 7.
Simple control software for our digitizer developed in the LabVIEW environment
pic
4.2 Single-channel Mode Processing

In this study, the single-channel mode is used to refer to the cascading of the eight sub-channels of a single DRS4, and the resulting storage depth is 8192 points. In our design, these sub-channels are multiplexed by using a single output port. Owing to the difference between the readout and storage sequences, a discontinuity in the captured waveform occurs, as shown in Fig.7. First, all the 8192 capacitor cells store the sampled contents according to the sequence from cell 1 to cell 8192. Further, the output series is read from the stop position circularly until a sub-channel, and then the next sub-channel, are empty. The circular-memory-like operations for cascaded DRS4 are demonstrated in Fig.8.

Fig. 8.
Circular-memory-like operations in single-channel mode of the developed digitizer
pic

The specific explanation diagram is provided by combining Fig.8(b) with Fig.9. The physical storage sequence is 1a-1b-2a-2b-3a-3b-4a-4b-5a-5b-6a-6b-7a-7b-8a-8b, the readout sequence is 1b-1a-2b-2a-3b-3a-4b-4a-5b-5a-6b-6a-7b-7a-8b-8a, and the corresponding waveform sequence is 6b-7a-7b-8a-8b-1a-1b-2a-2b-3a-3b-4a-4b-5a-5b-6a, as shown, respectively, in Fig.9 (a),(b), and (c).

Fig. 9.
DRS4 reorder diagram with storage, readout, and waveform sequences
pic
4.3 High-resolution Mode Processing

Moreover, this digitizer is designed to be capable of implementing a high-resolution mode, providing a method to improve precision. In this mode, these sub-channels are individually independent and sample the same signal simultaneously. Average operations in DAQ systems such as advanced oscilloscopes are traditionally realized by averaging over adjacent points or multiple periods [19]. Herein, we implemented averaging over the adjacent sub-channels at highly synchronized instants owing to the common sampling clock of 5 GHz. In the idealized case, all the differences in the measurement results between these sub-channels at specific time instants should result only from random noise. In the cascading schemes listed in the last three rows of Tab.1, the average factors are, respectively, 2, 4, and 8, and in principle, these three schemes can equivalently minimize noise level by the factors of 2, 2, and 22, respectively.

The offline procedure varies for different cascading schemes, and can be deduced from the illustrations in the last subsection. In the case of the all-independent scheme, the readout and waveform sequences of these eight sub-channels are consistently 1b-1a, 2b-2a, 3b-3a, 4b-4a, 5b-5a, 6b-6a, 7b-7a, 8b-8a.

5 Preliminary Results and Discussion

The preliminary work set the digitization speed of AD9253 to 33 MS/s, according to the recommendation in [12]. This section mainly reports the results of offset error correction and offline processing.

5.1 Offset Error Correction

According to [8, 12, 20], a residual charge remains on each capacitor after readout, and at a particular sampling rate, the error is fixed and is dependent mainly on the position or sample cell number. This fixed pattern noise is always superimposed on the real signal and should be corrected and subtracted for every sample cell. This error can be treated as the offset error from an ADC-like point of view. We used a Keithley SourceMeter (Series 2400) to provide a highly accurate (0.02%), low-noise, highly stable zero voltage [21] to measure the offset error. We reordered these 8192 points using procedures described in the last section and performed testing 32 times. Fig.10 illustrates these 32 plots in one figure, and the blue plot at the bottom presents the calibrated baseline in a shifted version.

Fig. 10.
32 zero outputs and an arbitrary one after offset error correction
pic

Moreover, we calculated the standard deviation over the 8192 sample cells for these 32 sets of both original and calibrated data. In our preliminary tests, the full-scale range of the digitizer is measured to be 1.06 V (differential voltage). Subsequently, the results of offset error correction can be converted to millivolt, as presented in Fig.11. The effect of offset correction is apparent, but there are still significant spikes. These spikes are removed by limiting the original data in the ± 3σ range and improving the standard deviation of the baseline from 1.63 mV to 0.60 mV (RMS). The results are listed in Table2 and can be compared with Table 1 in the DRS4 datasheet on "Fixed pattern offset error" and "random noise" parameters [12].

TABLE 2.
Comparison of offset error and random noise
Parameter DRS4 datasheet This work
offset error 5mV(RMS) 6.61mV(RMS)
random noise 0.35mV(RMS) 1.59/0.58mV(RMS)
Show more
Fig. 11.
Standard deviation of 32 zero outputs with and without offset error correction over 8192 cells
pic
5.2 Single-channel and High-resolution Mode Result

We used the Keysight 81160A signal generator to feed a 40 MHz sinusoidal signal into a signal-conditioning board to generate the differential signals, implementing single-channel mode acquisitions. Fig.12 (a) verifies the discontinuity in the captured waveform, (b) presents the reordered version of the signal, and (c) shows the waveform obtained after offset error correction.

Fig. 12.
Single-channel mode acquisition (partial view)
pic

In the high-resolution mode test, a flat input waveform (0 V DC voltage) provided by the Keithley SourceMeter was captured by our digitizer to verify the improvement of the noise level directly. After offset correction and removal of spikes, we presented the error of the first sub-channel and its high-resolution version in Fig.13. The random noise level is minimized from 0.65 mV (RMS) to 0.27 mV (RMS) in the high-resolution mode by averaging over these eight sub-channels, and this equivalently minimizes the noise level by a factor of 2.41, which is less than 22 by 17%.

Fig. 13.
Improvement on random noise performance in high-resolution mode
pic
5.3 Discussion

It can be verified that the offset error is highly dependent on the position of the cell in the entire channel. Owing to the cancellation of this fixed pattern offset error, we can significantly minimize the noise level (from 6∼7 mV to 0.6 mV in RMS). In the high-resolution mode, the random noise level of our digitizer can be improved to 0.27 mV, which is better than the value provided in [12, 20, 22].

Conventional practice assumes that the offset error has a fixed pattern and is not relevant to the signal magnitude. The DC level can be measured to calibrate the magnitude-dependent error in further studies. The driving capability and analog bandwidth of the front-end electronics may have become an issue owing to the cascading of all the sub-channels, and additional high-frequency distortion should be considered.

6 Conclusion and Outlook

This paper mainly aimed to report an ultrafast digitizer using an internal cascaded DRS4, including the hardware, firmware, and software design, as well as some preliminary test results. The SCA-based digitizers have an excellent application prospect in the measurement of single-shot pulse signals, provided that the precision and recording time are improved accordingly. Our digitizer cascaded all the sub-channels of a DRS4 chip, significantly increasing the recording time to 1.6 µs at a sampling speed of 5 GS/s. In particular configurations, this digitizer can implement high-resolution mode data acquisition at the cost of reducing depth. However, system calibration, thorough performance tests, and essential correction algorithms still need to be investigated.

In the near future, we will perform further improvements to increase the analog bandwidth and correct the nonlinear timing error of this DRS4-based digitizer, and thus, the performance of the digitizer can be studied further. Moreover, further real-time digital signal processing is planned to implement the aforementioned two readout modes in the FPGA alone. The ultimate goal is to develop a highly scalable, cost-effective, DRS4-based DAQ system applicable to the capture of single-shot transient signals in pulsed radiation fields.

References
[1] S. N. Ahmed, Physics and engineering of radiation detection, 1st edn. (Elsevier, London, 2007), p.717
[2] B. Dominique, D. Eric, H. Michael,

Very high dynamic range and high sampling rate VME digitizing boards for physics experiments

. IEEE Trans. Nucl. Sci. 52(6):2853-2860 (2005). doi: 10.1109/tns.2005.860165
Baidu ScholarGoogle Scholar
[3] S. Vitali, G. Cimatti, R. Rovatti et al.,

Adaptive time-interleaved ADC offset compensation by nonwhite data chopping

. IEEE Trans.Circuits-II. 56(11):820-824 (2009). doi: 10.1109/tcsii.2009.2032443
Baidu ScholarGoogle Scholar
[4] R. Stefan,

Design and performance of the 6 GHz waveform digitizing chip DRS4

. In 2008 IEEE Nuclear Science Symposium Conference Record doi: 10.1109/nssmic.2007.4436659
Baidu ScholarGoogle Scholar
[5] K. Stuart,

Gigahertz waveform sampling and digitization circuit design and implementation

. IEEE Trans. Nucl. Sci. 50(4):955-962 (2003). doi: 10.1109/tns.2003.815137
Baidu ScholarGoogle Scholar
[6] E. Oberla, J. Genat,H. Grabas et al.,

A 15 GSa/s, 1.5 GHz bandwidth waveform digitizing ASIC

. Nucl. Instrum. Meth. A. 735:452-461 (2014). doi: 10.1016/j.nima.2013.09.042
Baidu ScholarGoogle Scholar
[7] M. Bitossi, R. Paoletti, D. Tescaro et al.,

Ultra-fast sampling and Data Acquisition using the DRS4 Waveform Digitizer

. IEEE Trans. Nucl. Sci. 63(4):2309-2316 (2016). doi: 10.1109/tns.2016.2578963
Baidu ScholarGoogle Scholar
[8] R. Stefan, D. Roberto, H. Ueli,

Application of the DRS chip for fast waveform digitizing

. Nucl. Instrum. Meth. A. 623(1):486-488, (2010). doi: 10.1016/j.nima.2010.03.045
Baidu ScholarGoogle Scholar
[9] J. Wang, L. Zhao,C. Feng, et al.,

Waveform timing algorithms with a 5 GS/s fast pulse sampling module

. In Real Time Conference (RT), 2012 18th IEEE-NPSS. doi: 10.1109/rtc.2012.6418222
Baidu ScholarGoogle Scholar
[10] X. Cheng, R. Fan, B. Li,

Design of a 1.2 GSPS single-channel real-time long-distance transmission sampling system for fast-transient signal

. Chin. Phys. C. 32:217-221 (2008).
Baidu ScholarGoogle Scholar
[11] X. Cheng, X. Tian, M. Zeng et al.,

The design of 1 gsps real-time sampling system for transient pulsed signal

. IEEE Trans. Nucl. Sci. 57(2):539-542 (2010). doi: 10.1109/rtc.2009.5321994
Baidu ScholarGoogle Scholar
[12] R. Stefan, "DRS4 Handbook." Paul Scherrer Inst., Villigen, Switzerland, Rev. 0.9 (2008).
[13] Texas Instruments Inc.,

"1.6 GHz High Performance Clock Buffer, Divider, and Distributor", LMK01010 Data Sheet

. http://www.ti.com/product/LMK01010
Baidu ScholarGoogle Scholar
[14] Silicon Labs.,

"1 ps MAX JITTER CRYSTAL OSCILLATOR", Si590/591 Data Sheet

. https://www.silabs.com/documents/public/data-sheets/Si590-591.pdf
Baidu ScholarGoogle Scholar
[15] Analog Devices, Inc.,

"Quad, 14-Bit, 80 MSPS/105 MSPS/125 MSPS Serial LVDS 1.8V Analog-to-Digital Converter", AD9253 Data Sheet

. https://www.analog.com/media/en/technical-documentation/data-sheets/ad9253.pdf
Baidu ScholarGoogle Scholar
[16] Xilinx Inc.,

7 Series FPGAs SelectIO Resources (2018). [Xilinx User Guide]

https://www.xilinx.com/support/documentation/user_guides/ug471_7Series_SelectIO.pdf.
Baidu ScholarGoogle Scholar
[17] T. Yeoh,

Dynamic phase alignment for networking applications (2004). [Xilinx Application Note]

http://www.xilinx.com/bvdocs/appnotes/xapp700.pdf.
Baidu ScholarGoogle Scholar
[18] G. W. Johnson, LabVIEW graphical programming, 1st edn. (Tata McGraw-Hill Education, 2006), pp.257-273
[19] LeCroy, Inc., "Oscilloscope Vertical Resolution", HD4096 Data Sheet.
[20] J. Wang, L. Zhao,C. Feng et al.,

Evaluation of a fast pulse sampling module with switched-capacitor arrays

. IEEE Trans. Nucl. Sci. 59(5):2435-2443 (2012). doi: 10.1109/tns.2012.2208656
Baidu ScholarGoogle Scholar
[21] Keithley Instruments Inc., "Streamline your production with precision voltage and current sourcing", Series 2400 SourceMeter® Family Data Sheet.
[22] H. Yang, H. Su, J. Kong et al.,

Application of the DRS4 chip for GHz waveform digitizing circuits

. Chin. Phys. C. 39(5):056101 (2015). doi: 10.1088/1674-1137/39/5/056101
Baidu ScholarGoogle Scholar