DL-TDC: Difference between revisions
Jump to navigation
Jump to search
Line 189: | Line 189: | ||
* delay line: in theory, 20 delay line elements can be packed in a 10-ALM block. In practice, to ensure routing uses LCELL input "F", TDC uses 8 delay elements per 10-ALM block. (quartus uses the "leftover" ALMs to implement the encoder and other logic). Typical timing reported by quartus is 0.087 ns transmit time through combinatorial logic from input F to COMBOUT output, 0.250 ns transit time to the next delay element inthe same 10-ALM block, 0.800 ns transmit time to the first delay element of the next 10-ALM block. [[Image:TDC_DELAY_CHAIN1.pdf|150px|one block of TDC delay chain]] and [[Image:TDC_DELAY_CHAIN2.pdf|150px|complete TDC delay chain]] | * delay line: in theory, 20 delay line elements can be packed in a 10-ALM block. In practice, to ensure routing uses LCELL input "F", TDC uses 8 delay elements per 10-ALM block. (quartus uses the "leftover" ALMs to implement the encoder and other logic). Typical timing reported by quartus is 0.087 ns transmit time through combinatorial logic from input F to COMBOUT output, 0.250 ns transit time to the next delay element inthe same 10-ALM block, 0.800 ns transmit time to the first delay element of the next 10-ALM block. [[Image:TDC_DELAY_CHAIN1.pdf|150px|one block of TDC delay chain]] and [[Image:TDC_DELAY_CHAIN2.pdf|150px|complete TDC delay chain]] | ||
* timing of TDC delay chain for each TDC channel is shown in "Report DL-TDC-NN-{LE,TE}", use "Locate path" to "Locate path in chip planner", them zoom in and click in logic elements to examine the physical layout. Use "show routing" to see more detail of connection between logic elements. | * timing of TDC delay chain for each TDC channel is shown in "Report DL-TDC-NN-{LE,TE}", use "Locate path" to "Locate path in chip planner", them zoom in and click in logic elements to examine the physical layout. Use "show routing" to see more detail of connection between logic elements. | ||
* layout of the 60-element TDC delay line is done manually using quartus qsf file dltdc.qsf. This file is generated by a perl script (dltdc_qsf.perL). Location of each TDC channel is selected manually and must be adjusted to have them close to the FPGA input pins and to spread things around to avoid FPGA resource congestion. | |||
<pre> | |||
set_location_assignment LABCELL_X11_Y10_N6 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[0].c|lcell" | |||
set_location_assignment FF_X11_Y10_N7 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[0].c|ff1" | |||
set_location_assignment LABCELL_X11_Y10_N12 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[1].c|lcell" | |||
set_location_assignment FF_X11_Y10_N13 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[1].c|ff1" | |||
</pre> | |||
== Secret Sauces == | == Secret Sauces == | ||
Yea, right. Ask me. | Yea, right. Ask me. |
Revision as of 17:32, 7 September 2024
DL-TDC DarkLight FPGA TDC
ODB settings
- dl_enable - yes/no - enable or disable TDC readout in the midas frontend
- dl_ctrl - 32 bits of general control
bit - quartus - description 0 - dl_ctrl_gate - jam TDC gate open, enable un-triggered free-running mode 1 - dl_ctrl_gate_A - gate TDC from A-side trigger 2 - dl_ctrl_gate_B - gate TDC from B-side trigger 3 - dl_ctrl_gate_AB - gate TDC from A*B 4 - dl_ctrl_gate_T - gate TDC from T trigger (T = A*B) 5 - dl_ctrl_ena_A - enable TDC channel 32 (A) 6 - dl_ctrl_ena_B - enable TDC channel 33 (B) 7 - dl_ctrl_ena_T - enable TDC channel 34 (T) 15..8 - dl_ctrl_gate_w - TDC gate width in units of 8 ns 31..16 - not used
- dl_trg_mask - 16 bits of trigger mask
bit - description 0 - enable A pair 1-9 1 - enable A pair 2-10 2 - enable A pair 3-11 3 - enable A pair 4-12 4 5 6 7 8 - enable B pair 5-13 9 - enable B pair 6-14 10 - enable B pair 7-15 11 - enable B pair 8-16
- dl_tdc_mask - 32 bits to enable 32 TDC channels, in sequence
Channel map
// map TDC cable to SiPM channels assign ch[1] = tdc[0]; assign ch[2] = tdc[1]; assign ch[3] = tdc[10]; assign ch[4] = tdc[11]; assign ch[5] = tdc[2]; assign ch[6] = tdc[3]; assign ch[7] = tdc[8]; assign ch[8] = tdc[9]; assign ch[9] = tdc[15]; assign ch[10] = tdc[14]; assign ch[11] = tdc[7]; assign ch[12] = tdc[6]; assign ch[13] = tdc[13]; assign ch[14] = tdc[12]; assign ch[15] = tdc[5]; assign ch[16] = tdc[4]; assign ch[16+1] = tdc[16+0]; // 16 assign ch[16+2] = tdc[16+1]; // 17 assign ch[16+3] = tdc[16+10]; // 26 assign ch[16+4] = tdc[16+11]; // 27 assign ch[16+5] = tdc[16+2]; // 18 assign ch[16+6] = tdc[16+3]; // 19 assign ch[16+7] = tdc[16+8]; // 24 assign ch[16+8] = tdc[16+9]; // 25 assign ch[16+9] = tdc[16+15]; // 31 assign ch[16+10] = tdc[16+14]; // 30 assign ch[16+11] = tdc[16+7]; // 23 assign ch[16+12] = tdc[16+6]; // 22 assign ch[16+13] = tdc[16+13]; // 29 assign ch[16+14] = tdc[16+12]; // 28 assign ch[16+15] = tdc[16+5]; // 21 assign ch[16+16] = tdc[16+4]; // 20 // compute SiPM pair concindences assign A[0] = ch[1] & ch[9] & enable_input[0]; // 0 * 15 -> pair1 assign A[1] = ch[2] & ch[10] & enable_input[1]; // 1 * 14 -> pair2 assign A[2] = ch[3] & ch[11] & enable_input[2]; // 10 * 7 -> pair3 assign A[3] = ch[4] & ch[12] & enable_input[3]; // 11 * 6 -> pair4 assign A[4] = ch[5] & ch[13] & enable_input[4]; assign A[5] = ch[6] & ch[14] & enable_input[5]; assign A[6] = ch[7] & ch[15] & enable_input[6]; assign A[7] = ch[8] & ch[16] & enable_input[7]; assign B[0] = ch[16+1] & ch[16+9] & enable_input[8]; // 16 * 31 assign B[1] = ch[16+2] & ch[16+10] & enable_input[9]; // 17 * 30 assign B[2] = ch[16+3] & ch[16+11] & enable_input[10]; // 26 * 23 assign B[3] = ch[16+4] & ch[16+12] & enable_input[11]; // 27 * 22 assign B[4] = ch[16+5] & ch[16+13] & enable_input[12]; // 18 * 29 -> pair5 assign B[5] = ch[16+6] & ch[16+14] & enable_input[13]; // 19 * 28 -> pair6 assign B[6] = ch[16+7] & ch[16+15] & enable_input[14]; // 24 * 21 -> pair7 assign B[7] = ch[16+8] & ch[16+16] & enable_input[15]; // 25 * 20 -> pair8 wire A_or = |A; wire B_or = |B; //wire A_or = A[0] | A{1] | A{2] | A[3] | A[4] | A{5] | A{6] | A[7]; //wire B_or = B[0] | B{1] | B{2] | B[3] | B[4] | B{5] | B{6] | B[7]; wire AB_and = A_or & B_or;
D3 delay tuning
- quartus report "DL trigger GPIO to dlA", "to dlB" and "through dlT"
- delay values are set in quartus assignements file. Delay values go from 0 to 7 in increments of about 0.5 ns.
dsdaqgw:chronobox_firmware$ grep D3_DELAY *.qsf DE10_NANO_SoC_GHRD.qsf:set_instance_assignment -name D3_DELAY 5 -to GPIO_1_21
9.109 GPIO_1_20 dl|WideOr0|combout -> 1 9.025 GPIO_1_33 dl|WideOr0|combout -> 1 9.022 GPIO_1_26 dl|WideOr0|combout -> 0 8.966 GPIO_1_28 dl|WideOr0|combout -> 5 = 8.494 add 1 8.787 GPIO_1_22 dl|WideOr0|combout -> 0 8.714 GPIO_1_21 dl|WideOr0|combout -> 5 8.711 GPIO_1_34 dl|WideOr0|combout -> 0 8.597 GPIO_1_30 dl|WideOr0|combout -> 0 = 8.240 add 1 -> 1 = 9.733 sub 1 8.590 GPIO_1_29 dl|WideOr0|combout -> 1 add 1 -> 2 = 9.588 sub 1 8.506 GPIO_1_27 dl|WideOr0|combout -> 4 add 1 --------------------> 5 = 10.097 sub 1 8.453 GPIO_1_24 dl|WideOr0|combout -> 2 add 1 -> 3 = 9.588 sub 1 8.412 GPIO_1_35 dl|WideOr0|combout add 1 8.380 GPIO_1_31 dl|WideOr0|combout -> 4 add 1 -> 5 = 9.208 sub 1 8.312 GPIO_1_25 dl|WideOr0|combout -> 1 add 1 -> 2 = 8.555 add 1 -> 3 = 9.379 7.992 GPIO_1_32 dl|WideOr0|combout add 2 7.248 GPIO_1_23 dl|WideOr0|combout -> 2 add 3 -> 5 = 9.425 sub 1
9.339 GPIO_0_14 dl|WideOr1|combout -> 6 sub 1 -> 5 = 8.587 9.207 GPIO_0_2 dl|WideOr1|combout -> 7 sub 1 -> 6 = 8.360 add 1 9.174 GPIO_0_10 dl|WideOr1|combout -> 6 9.161 GPIO_0_6 dl|WideOr1|combout -> 7 9.105 GPIO_0_11 dl|WideOr1|combout -> 7 9.019 GPIO_0_7 dl|WideOr1|combout -> 7 8.731 GPIO_0_15 dl|WideOr1|combout add 0 -> 0 = 8.540 add 1 8.462 GPIO_0_12 dl|WideOr1|combout add 1 -> 1 = 9.189 8.256 GPIO_0_4 dl|WideOr1|combout add 2 -> 2 = 9.750 sub 1 8.214 GPIO_0_5 dl|WideOr1|combout add 2 8.182 GPIO_0_1 dl|WideOr1|combout add 2 7.328 GPIO_0_3 dl|WideOr1|combout add 3 6.584 GPIO_0_8 dl|WideOr1|combout add 5 6.409 GPIO_0_13 dl|WideOr1|combout add 5 -> 5 = 9.943 sub 2 6.009 GPIO_0_9 dl|WideOr1|combout add 6 -> 6 = 10.189 sub 2 5.917 GPIO_0_0 dl|WideOr1|combout add 6
Theory of operation
Why FPGA TDC?
- combine trigger logic, hit recording and time measurement in one device
- avoid having to split signals to FPGA trigger and to external TDC and to coordinate clocks and timestamps between them
- ability to construct custom TDC with data paths tuned to experiment requirements, i.e. ultra high data rates. this avoids the well known problem with the CERN TDC ASIC (V1190) where high rate on one channel will cause data loss on other channels.
- ability to get the data out at FPGA speeds, not limited to VME, USB or Ethernet speeds.
- ability to "right-size" the TDC. ASIC TDCs come in fixed increments: 96 channels (Lecroy Fastbus TDC), 64 or 128 channels (CAEN V1190 VME TDC), 64 channels (PicoTDC), overkill of fewer channels are actually needed. Compared to FPGA TDC where size (cost) of FPGA can be selected according to need and where FPGAs of different sizes often are available as interchangeable plug-in modules.
Downsides:
- ASIC TDC design can control internal timing much better than an FPGA TDC. as result per-time-bin and per-channel variations can be made much smaller.
- ASIC TDC can run at much higher clock frequencies and have much smaller time bins.
- ASIC TDC may have internal temperature compensation functions in order to avoid temperature drift of TDC calibration.
Types of FPGA TDC
FPGA TDC come in two basic types: based on delay lines and based on the Vernier method. Delay line TDC resolution is limited to size and number of delay line elements. Vernier TDCs requires precise clock generators (usually not available on standard FPGA devices). Delay line TDCs have several designs. Delay line captures phase of input signal relative to the clock. Delay line captures phase of the clock relative to the input signal. Delay line encoder looks for 1 edge transition, or looks for many edge transitions ("wavelet TDC").
Delay line FPGA TDC building blocks
- signal capture and clock synchronizer - asynchronous input signal is latched and synchronized to the TDC clock. Per-hit dead time (LE to next pulse), minimum pulse width requirement (LE to TE), minimum time double-pulse resolution (TE to next LE) are created here.
- tdc clock counter - latched and synchronized input signal records the hit coarse time (10 ns time bin)
- delay line - tdc clock (50 MHz/20 ns) waveform travels throught the 60-element delay line (~25 ns) total delay), latched input signal captures this waveform in the phase latch register (60 bits). Typical bit pattern: "00...000111...11100..000"
- "temperature encoder" - looks for the position of the first 0->1 or 1->0 transition and converts it into a time bin number 1..60 (for 0->1 transitions) and -1..-60 (for 1->0 transitions). This clock phase time bin number corresponds to the TDC fine time. After calibration that accounts for individual delay of each time bin.
- per-channel input buffer - 64 hits per channel for LE and TE (32 LE+TE hits), to handle bursts of hits
- main multiplexor - data from 32+3 TDC channels is funneled into 1 output stream
- main data FIFO - "the bigger, the better" data buffer to hold the data before it is transmitted out of the TDC
- data transmitter - DL-TDC uses a MIDAS frontend to read the main data FIFO via a 64-bit AXI bus, package data as MIDAS events and send them to the main computer via the Cyclone-5 SoC 1gige ethernet (data rate about 50 Mbytes/sec; at 8+8 bytes per TDC hit (LE+TE), about 3 Mhits/sec, sustained).
Implementation details
- delay line element: use LCELL or cyclonev_lcell_comb (TDClcellff.sv), quartus fitter does not care which, does it's own thing regarding which LCELL input ports to use. Input "F" must be used for best timing (shortest delay): LUT mask: F0 = vcc 0xFFFF, F1 = vcc 0xFFFF, F2 = gnd 0x0000, F3 = gnd 0x0000, Combout equation: LCELL(F), File:TDC LCELL.pdf
- delay line: in theory, 20 delay line elements can be packed in a 10-ALM block. In practice, to ensure routing uses LCELL input "F", TDC uses 8 delay elements per 10-ALM block. (quartus uses the "leftover" ALMs to implement the encoder and other logic). Typical timing reported by quartus is 0.087 ns transmit time through combinatorial logic from input F to COMBOUT output, 0.250 ns transit time to the next delay element inthe same 10-ALM block, 0.800 ns transmit time to the first delay element of the next 10-ALM block. File:TDC DELAY CHAIN1.pdf and File:TDC DELAY CHAIN2.pdf
- timing of TDC delay chain for each TDC channel is shown in "Report DL-TDC-NN-{LE,TE}", use "Locate path" to "Locate path in chip planner", them zoom in and click in logic elements to examine the physical layout. Use "show routing" to see more detail of connection between logic elements.
- layout of the 60-element TDC delay line is done manually using quartus qsf file dltdc.qsf. This file is generated by a perl script (dltdc_qsf.perL). Location of each TDC channel is selected manually and must be adjusted to have them close to the FPGA input pins and to spread things around to avoid FPGA resource congestion.
set_location_assignment LABCELL_X11_Y10_N6 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[0].c|lcell" set_location_assignment FF_X11_Y10_N7 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[0].c|ff1" set_location_assignment LABCELL_X11_Y10_N12 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[1].c|lcell" set_location_assignment FF_X11_Y10_N13 -to "TDCn:tdcs|TDC2ef:tdc[17].tdc|TDC2e:tdc|TDC1:le|TDClcellN:phase|TDClcellff:e[1].c|ff1"
Secret Sauces
Yea, right. Ask me.