An improved technology for eliminating nondeterministic latency in the L1 trigger system

NUCLEAR ELECTRONICS AND INSTRUMENTATION

An improved technology for eliminating nondeterministic latency in the L1 trigger system

DU Zhongwei，

SU Hong，

KONG Jie，

ZHAO Xingwen，

QIAN Yi

Nuclear Science and Techniques

Vol.24, No.2

Article number 020401

Published in print 01 Apr 2013

DOI：10.13538/j.1001-8042/nst.2013.02.003

67800

Gamma Ray Array Detector (GRAD) is one of external target facility subsystems in the Cooling Storage Ring on the Heavy Ion Research Facility at Lanzhou (HIRFL-CSR). The trigger subsystem of GRAD is required to make a fast L1 trigger decision with a fixed latency for the data acquisition. Because the hit signals from the detector are asynchronous with the local clock of the trigger system, a nondeterministic latency (the value changes between zero and one clock period) is generated when the synchronous receivers of the conventional trigger system process the hit signals. In this paper, an improved trigger logic based on a field-programmable gate array is developed, and comprised of zero-delay broadening circuits as receivers and an improved adding circuit designed for the new receivers. Software simulation and experimental measurement have been conducted. Comparison with the conventional trigger logic, the improved trigger logic has the advantage of eliminating the nondeterministic latency and reducing the total processing latency.

Nondeterministic latencyField-programmable gate arrayL1 triggerZero-delay broadening

1 Introduction

A reliable trigger system is of significant importance to large detector electronics in particle physics. A universal architecture of the L1 trigger system is simplified from several particle experiment electronics (Fig.1), such as BESIII^[1], ATLAS^[2], and ALICE^[3]. The basic logic of making a L1 trigger decision in the ﬁeld-programmable gate array (FPGA) contains three parts: receiving module, adding module, and a comparator comparing the hit-number with the trigger condition. As a collision event occurs, multichannel analog signals from the detector are converted to digital hit signals by the threshold discriminator in the front-end electronics (FEE), and delivered to the L1 trigger system. The receiving module of the L1 trigger system will synchronize and align the hit signals. The adding module will count up the total number of the responding channels (hit-number) in this event. The comparator will compare the acquired hit-number with the trigger condition to generate a L1 trigger decision. That is the kernel process of the L1 trigger system. Simultaneously, the trigger information including position information and other useful information will be delivered to the L2 trigger system for further processing. These multilevel trigger decisions will select the valid event together for the data acquisition (DAQ) system.

Fig.1

A universal architecture of the multichannel L1 trigger system.

There are two main parts of the receiving and the adding latencies for the L1 trigger latency. Due to the nondeterministic time of the random collision occurrence, the arrival time of each hit signal is asynchronous with the local clock of the L1 trigger system. The conventional trigger receiving module employs the registers to synchronize the hit signals by latching them on the clock edge (e.g., a FIFO with 41.67-MHz local clock is employed as receiving module in the BESIII TOF trigger subsystem^[4]). The hit signals are latched on the clock edge and the receiving latency is generated. The receiving latency is equal to the time difference between the arrival time and the clock edge. The receiving latency is a nondeterministic latency because its value changes randomly between zero and one local clock period in different events. An average value of nondeterministic latency, which is half a clock period, can be reduced a little by increasing clock frequency. But higher clock frequency will increase the logic designing difficulty of satisfying the timing slack, and the highest clock frequency is limited by the FPGA hardware.

The GRAD electronics in the HIRFL-CSR^[5] is composed of three parts of a FEE based on ASIC chips named MATE^[6], a DAQ subsystem, and a 64-channel L1 trigger subsystem. Due to the mechanism of track- hold processing in the MATEs (sampling should be started at a specific time for accuracy measurement), the DAQ subsystem requires a fast L1 trigger decision, and the latency of the trigger decision should be fixed for accurate off-line calibration. So the synchronous receivers of the conventional trigger logic are incapable of satisfying the requirement due to the nondeterministic receiving latency. Eliminating the nondeterministic latency, the zero-delay broadening circuits are developed as receivers, and an improved adding module is designed for matching the new receivers. The design and performance of improved L1 trigger logic would be described below.

2 Improved L1 trigger logic

Figure 2 shows the improved L1 trigger logic implemented in the FPGA. The kernel trigger logic consists of the zero-delay broadening circuits as receivers and the improved adding circuit. The new receivers are able to catch and broaden the multichannel hit signals without the nondeterministic latency. The improved adding circuit by adding all hit signals together will acquire the hit-number (one channel signal as a 1-bit data). The hit-number will be compared with the trigger condition, and a fast L1 trigger decision will be generated to make the DAQ subsystem readout first. The latency of the L1 trigger decision is low and fixed because there is no synchronization in the receiving and adding processes.

Simultaneously, the conventional synchronous trigger logic is employed to acquire trigger information for the L2 trigger system. The slow L2 trigger decision will decide whether the event data acquired by the DAQ system should be stored.

Fig.2

The architecture of the improved L1 trigger logic.

2.1 Zero-delay broadening circuits

On striking the corresponding scintillator for each signal channel, the arrival time of the charged particles, has difference due to the hit position difference ^[4]. So the signals of hit channels are delivered to the trigger system at different time. A kernel channel decides the trigger latency in one event (e.g., if the trigger threshold of hit-number is 5 and the hit-number is 10, the 5th received hit signal is kernel. When the latency of receiving the 5th hit signal is contributed to the trigger latency, the latency of the 4th signals and the last 5 signals are not worth concerning). On the other hand, a valid arrival time (t_max) stands for the maximum time difference between all channel paths (e.g., t_max is 20 ns in the GRAD, meaning that only from the first hit signal reaching the trigger system to 20 ns later, all hit signals received belong to the same event). Due to the difference of the arrival time and pulse width of hit signals, the first operation of receivers should be aligning all the hit signals. Instead of registers of the conventional synchronous receiver, the zero-delay broadening circuits are employed to complete the aligning operation. The signal of each hit channel will be broadened to a width of more than t_max, so there will be a period when all hit signals up to the adding circuit maintain a high voltage level. The combined adding circuit can count the hit-number in this period. That is asynchronous aligning.

Figure 3 shows a zero-delay broadening circuit constituted of D type flip-flops (DFF). To meet the requirement that broadening width be more than t_max, the minimum number of DFFs used in each channel is

N = t_{max} / t_{clk} + 2

(1)

where, t_clk is the one clock period. When a pulse of hit signal is delivered to the clock pin of the first DFF, the output signal of Q pin will turn to high level immediately due to the link of high level to the input D pin. Simultaneously, the following DFFs will receive and transmit the high level signal from the first DFF in sequence. Synchronously with the local clock, each following DFF will generate a latency of one clock cycle (the latency of the second DFF is less than a clock period because a synchronizing operation should be conducted). After a total latency of more than t_max, the clear pin of the first DFF will receive a feedback signal of high level from the Q pin of the last DFF. The first D flip-flop will be reset and the output signal will turn to low level until the arrival of next hit signal. As a result, the hit signal is broadened to a width of more than t_max with zero-delay, whatever the input signal changes in this process.

Fig.3

Single channel zero-delay broadening circuit.

Figure 4 shows the timing simulation of the synchronous and the zero-delay receivers. The synchronous receiver with several registers can synchronize and store the hit signal for a few clock periods. Two things about the synchronous receiver should be concerned. Firstly, the latency of channel 1 does not equal that of channel 3. Secondly, the signal of channel 2 (the pulse width is less than a clock cycle) is missed catching. Compared with the latency of the synchronous receivers, the receiving latency of the zero-delay receiver is fixed and lower. The latency, which is only composed of pin-to-pin delay and gate delay, depends on the speed grade of the FPGA without nondeterministic latency in the receiving process. Furthermore, the zero-delay receiver is capable of catching the signal of short pulse width, though the false triggering may be caused if the short pulse was a noise. In fact, this case does not appear due to the reliable FEE designing. In addition, the time of broadening width is a dead time of the improved trigger logic (any new hit signal is ignored in this period). So the broadening width of the zero-delay receivers is just satisfying the physical requirement when t_max is 20 ns, as shown in Fig. 4.

Fig.4

Timing simulations of the synchronous receivers and the zero-delay receivers.

2.2 Improved adding circuit

When the adding circuit receives the 64 broadened signals as 64 1-bit data (each 1-bit data will be "1" when a hit occurs on the correspond scintillator), there is the 63 adding operations in the process of calculating the hit-number, and the adding circuit consists of 63 adders. The adding circuit is a combination to avoid new nondeterministic synchronization latency. So the two things of the non- deterministic latency and the race hazard should be concerned. Fig.5 shows two kinds of adding circuits of the serial and the parallel adding circuits. The former is default in the FPGA if the logic is compiled without special designing, and the latter as the improved adding logic is superior to the former.

First, the serial adding circuit generates a new nondeterministic latency. As shown in Fig.5(a), the input channels have different paths and length (the signal of channel 1 passes 63 adders, but the signal of channel 64 only passes 1 adder). Because the kernel channel changes randomly in different events, the actual processing delay (only depending on the kernel channel) changes with a nondeterministic latency. In Fig.5(b), the parallel circuit has no such problem because every input signal passes the 7 adders.

Fig.5

The 64-channel adding circuit. (a)Serial adders, (b) parallel adders.

Figure 6(a) shows the latency of different channels (there are the input signals for channels of 1, 12, 24, 36, 48 and 56 from left to right). The latency of the serial circuit changes with different input channels. The latency of the parallel circuit is always fixed.

Second, the serial circuit generates more glitches. Because of the asymmetric paths of the 64 channels, there are more junctions when the serial circuit is compiled to look-up tables (LUTs) in the FPGA. If a signal passing a junction goes through two paths of different length, and reaches two LUTs or two inputs of one LUT, there will be a race hazard, thus causing a glitch. Because all channel paths of the parallel circuit are the same, the fewer glitches are caused by the parallel circuit. Fig.6(b) shows the glitches of the two adding circuits at an extreme state. The glitches of the parallel circuit are fewer and shorter than that of the serial circuit. In fact, because the hit-number in one event is smaller than 10 in the gamma-ray energy measuring experiment, the glitches are fewer than that of the simulation.

Fig.6

Timing simulations of the serial adding circuit and the parallel adding circuit. (a) The latency, (b) the glitches.

A grounder capacitor linking to the output pin of the trigger decision is used to eliminate the glitches. The voltage of the charging capacitor is

V t = E \times [1 - exp (- t / R C)]

(2)

where, E is the output voltage of the FPGA pin, and R is its output impedance, which is about 10 Ω here. C is the capacitance and t is the charging time. If RC is greater than t, the capacitor voltage will be less than 0.63×E. So the glitch whose pulse width is less than t will be eliminated in the LVTTL signal (the threshold of the high level is more than 0.63×E). For the improved trigger logic, we choose a capacitor of 10 pf as the grounded capacitor. The capacitor is capable of eliminating the glitches width of less than 1 ns. The capacitor capacity is enough for the parallel adding circuit. Also, the capacitor will delay the output signal of the trigger decision, and the delay value equals RC. So the circuit with more glitches needs a greater capacitor, thus generating more trigger latency. As a result, the improved adding circuit of parallel adders has the advantage of the lower and fixed latency comparing with the serial adding circuit.

3 Experiment measurement

To quantify the latency difference between the improved asynchronous and the conventional synchronous trigger logics, a test system has been assembled. The ⁶⁰Co source and CsI scintillator of the GRAD generate hit signals, and the FEE modules convert the analog hit signals to digital hit signals. The 64-channel L1 trigger module receives and processes the hit signals in the test logic implemented in a FPGA. The test logic contains the conventional trigger logic with synchronous receivers and the improved trigger logic with zero-delay receivers. The adding circuits of the two logics are the parallel adders, reducing the influence of other differences. The two logics receive the hit signals at the same time and generate L1 trigger decisions. The trigger decisions are delivered to three counters to measure the latency difference, trigger frequency and false triggering. A timer of latency, working with a 250-MHz clock supplied by a phase locking loop (PLL), will start counting at receiving the improved trigger decision, and stop counting at receiving the conventional trigger decision. So the timer result is the latency difference between the two logics. Every 0.25 second, the data of the three counters are packed and stored in a FIFO, and the three counters are reset. When the FIFO is close to full, the data will be transmitted to the host computer via the peripheral component interconnect (PCI) bus. The software will calculate the average latency difference of one event, as shown in Fig.7.

Fig.7

The architecture of the test system. (a) Photograph, (b) logic.

Table 1 shows the latency difference between the improved and the conventional trigger logics at different clock frequencies. Theoretically, the latency difference is equal to half a clock period on average, but following reasons can cause the error. First, the clock frequency of the timer is only 250 MHz (the FPGA limits the highest clock frequency) and the measuring precision is 4 ns. Second, because the timer and the trigger clocks are generated by the same PLL and clock source, there is a delay of timer clock period when the timer catches the synchronous trigger decision (e.g., there is always a 4-ns delay at the 10- MHz trigger clock when the timer stops counting). Though some errors are caused by the limited measuring precision, the result shows that the improved logic reduces the latency by about half a clock cycle. The reduced latency accords with the nondeterministic average value, and changes randomly between zero and one clock cycle. The difference between two logics is their receivers in the test. The nondeterministic latency is eliminated by the improved trigger logic.

The reduced latency at different clock frequency

Clock frequency/MHz	Latency difference/ns	Half a period/ns	Error/ns	Trigger frequency/Hz
10	54.3	50	4.3	417
20	29.8	25	4.8	418
40	12.3	12.5	–0.2	416
60	8.4	8.33	–0.07	417
80	6.2	6.25	–0.05	418
100	4.9	5	–0.1	418

4 Conclusions

The improved L1 trigger logic with zero-delay broadening circuits as receivers is capable of eliminating the nondeterministic latency and reducing the total latency by half a clock period. The improved L1 trigger logic has been implemented in the GRAD trigger subsystem with a reliable performance. It is suitable for other L1 trigger systems required to make the L1 trigger decision with a fixed and low latency.

References

Feng C Q, Liu S B, An Q. IEEE Trans Nucl Sci, 2010, 57: 463-466.

Garvey J, Hillier S, Mahout G, et al. Nucl Instrum Meth A, 2003, 512: 506-516.

Bourrion O, Guernane R, Boyer B, et al. J Instrum, 2010, 5: 12048-12048.

Liu S B, Shen Q, Zheng W, et al. IEEE Trans Nucl Sci, 2010, 57: 625-629.

Bocharov V, Bubley A, Boimelstein Y, et al. Nucl Instrum Meth A, 2004, 532: 144-149.

Baron P, Atkin E, Blumenfeld Y, et al. IEEE Nuclear science symposium-conference record, Portland, 2004: 386-390.