VME-PPG32

From DaqWiki
Jump to navigation Jump to search

VME-PPG32 Pulse Pattern Generator

The VME-PPG32 is a VME FPGA board that generates precise digital pulse patterns on 32 NIM output channels. It is driven by a 100 MHz clock (10 ns resolution) and executes a small user-written program stored in on-board memory. Programs can contain loops, subroutines, and branches, making it possible to generate multi-period sequences that last from nanoseconds to hours from a few dozen instructions.

This page covers hardware specifications, the VME register interface, the instruction set, and a step-by-step programming tutorial.

This page has been re-written for clarity with the help of an AI. For the old instructions see this page: VME-PPG32 (legacy)


How the PPG Works

Conceptually the PPG32 is a tiny processor dedicated to driving 32 output pins. You load a short program into its on-board memory, then trigger it; the board executes the program top-to-bottom and drives its NIM outputs accordingly.

Each instruction answers four questions:

  • Which outputs go HIGH? — the SET mask.
  • Which outputs go LOW? — the CLEAR mask.
  • For how long? — the delay count, in units of 10 ns.
  • What happens next? — the instruction type: continue to the next slot, loop, call a subroutine, branch, or halt.

Instructions execute sequentially from slot 0 unless a loop, branch, or subroutine redirects the flow. A 256-entry hardware stack supports nested loops and subroutine calls.

A typical workflow is:

  1. Reset the board — puts it in a known, halted state.
  2. Write the program over the VME bus — one instruction per memory slot.
  3. Trigger execution, either by software (write a bit over VME) or by an external NIM pulse.
  4. Poll the status bit until the program halts, then read results or load the next program.

Output Model: SET and CLEAR

Each instruction carries two independent 32-bit masks. A 1 in the SET mask drives that channel HIGH; a 1 in the CLEAR mask drives it LOW. Bit 0 corresponds to channel 1, bit 31 to channel 32.

The hardware documentation does not specify what happens to a channel whose bit appears in neither mask, nor in both. The safe, conventional practice — used throughout this page and in the UCN sequencer code — is to assign every channel to exactly one mask in every instruction, so all 32 outputs are fully defined at each step. In code this usually appears as a value and its bitwise complement:

set_mask = my_outputs;
clr_mask = ~my_outputs;   // every other channel explicitly driven LOW

Hardware Description

FPGA and Memory

Component Part / Value
FPGA Altera Cyclone 3 EP3C40Q240C8
Configuration flash Altera EPCS16 serial flash
Program memory 4096 × 128-bit words (4k instructions)
Call stack 256 entries
FPGA resources 1044 LEs, 500 kbits internal memory

Programs are written to and read from program memory over the VME bus. Each 128-bit instruction occupies one slot; the program counter advances one slot per instruction executed.

I/O Characteristics

Feature Detail
NIM outputs 32 channels (front panel)
NIM inputs 4 channels (front panel)
Output LEDs 32 (one per output channel)
Input LEDs 4 (one per input channel) + 1 VME access LED
Serial DACs 2 × AD5439YRUZ, unipolar 0–2.5 V, 10-bit
Rev1/Rev2 Inputs switchable NIM/TTL via JMP3
Rev2 only Outputs switchable NIM/TTL via SW1 micro switches

Clock

The PPG runs at 100 MHz (10 ns per tick). The clock source is selected by the CSR:

  • Internal: 50 MHz crystal on board, doubled to 100 MHz via PLL.
  • External: 20 MHz NIM signal on Input 3, scaled to 100 MHz via the on-board PLL (see Clock Control Register).

Every instruction takes a fixed 3 clock cycles of overhead regardless of type, plus a programmable 32-bit delay. The actual dwell time for any instruction is therefore:

dwell = (3 + delay_count) × 10 ns

The delay count is a 32-bit field, so the maximum single-instruction dwell is approximately 42.9 seconds (232 × 10 ns).

Caveat: the source wiki states a 10-second maximum, which conflicts with the documented 32-bit field width and is unexplained. Until verified on hardware, treat ~10 s as a conservative practical limit, and use a loop for anything longer.

VME Interface

Parameter Value
Address space A32 only
Data width D32 only
Transfer modes Single-word, 32-bit DMA (BLT32), 2eVME DMA
Not supported MBLT64, 2eSST
Data direction VME-D[31..0] bidirectional; VME-A[23..0] input only
Handshake DTACK output

Jumper Settings

Jumper Setting Function
JMP1 INP / DAC NIM input with 50 Ω termination, or DAC output
JMP2 INP / DAC Same as JMP1
JMP3 NIM (1–2) / TTL (2–3) Input signal standard selection (Rev1 and later)
JMP4 ACT Active-serial flash programming mode (leave set)
IrqSel Open Leave open
SW1–3 VME base address selection (A20–A31)

NIM Input Assignments

Input Function
4 External start — rising edge triggers program execution; subsequent edges while running are ignored
3 External 20 MHz clock input — used by PLL to generate 100 MHz PPG clock
2 Unassigned
1 Unassigned (outputs internal PPG clock in test mode)

Input LED Indicators

LED Meaning
4 Clock source: lit = external clock selected
3 NIM Input 2 signal status
2 External clock quality: lit = external clock present and good
1 Program running: lit = PPG is executing

Hardware Setup

VME Address Configuration

The VME base address is set by address switches SW1–3 on the board. Set address bits A20–A23 to 1 and A24–A31 to 0 to place the board at 0x00100000 (the default used in firmware examples on this wiki). After setting switches, program the address-decoder CPLD (EPM3032) using a JTAG programmer.

Startup and Verification Checklist

  1. Verify power supply voltages — check for shorts on 1.2 V, ±3.3 V, 12 V, and 2.5 V rails before powering on.
  2. Set VME address switches.
  3. Program the address-decoder CPLD (EPM3032) via JTAG if the board is new or the CPLD was cleared.
  4. Load the VME-PPG32 firmware (.sof) to the FPGA via Quartus Programmer or the VME flash method below.
  5. Verify board presence: vmescan should detect the board at the configured base address.
  6. Run the built-in register test (write 0xBEEFBEEF to the Test register at offset +0x04 and read it back).
  7. Exercise NIM outputs and inputs, and verify LEDs respond.
  8. Flash firmware to active-serial memory (EPCS16) for persistence across power cycles.

Firmware Update Methods

The board runs one of two firmware personalities: VME-PPG32-IO32, a plain 32-channel I/O image with no sequencer, and VME-PPG32, the full pulse-pattern generator described on this page. The PPG features documented here require the VME-PPG32 image.

Method 1 — USB-Blaster JTAG (preferred for initial programming):

  1. Start Quartus Programmer and auto-detect.
  2. Attach VME-PPG32.sof to the EP3C40 device and program it.
  3. Auto-detect again, attach ppg.jic to the EPCS16 device, and program it (takes approximately 2 minutes).
  4. Power-cycle the board to reboot into the PPG firmware. When running the PPG firmware, all LEDs are off after reboot.

Method 2 — VME flash programmer (requires IO32 firmware already loaded):

./srunner_vme_gef.exe -program -16 ppg.jic 0x100020
./test_VMENIMIO32_gef.exe --addr 0x100000 --reboot

Register Reference

Register Map

All registers are accessed at 32-bit aligned offsets from the board's VME base address.

# Offset Name Access Description
0 +0x00 CSR R/W Control and Status Register
1 +0x04 Test R/W Test register (read-back verification)
2 +0x08 Addr R/W Program address — selects instruction slot to read/write
3 +0x0C Inst_Lo R/W Instruction bits 0–31 (SET mask)
4 +0x10 Inst_Med R/W Instruction bits 32–63 (CLEAR mask)
5 +0x14 Inst_Hi R/W Instruction bits 64–95 (delay count)
6 +0x18 Inst_Top R/W Instruction bits 96–127 (type + data); writing this register commits the instruction to program memory
7 +0x1C Inv_Mask R/W Output inversion mask — a 1 inverts the polarity of that output channel (one bit per output; useful for active-low signals)
8 +0x20 Version R Firmware version (Unix timestamp)
9 +0x24 Flash R/W Serial flash control
10 +0x28 Serial R Board serial number
11 +0x2C Hardware R Hardware revision ID
12 +0x30 Clock Control R/W PLL reconfiguration (see below)

CSR Register (+0x00)

The Control/Status Register is the main run-control interface.

Bit(s) Name Access Description
0 Run R/W Write 1 to start program execution from slot 0. Read: 1 = running, 0 = halted.
1 Ext-Clk-Toggle W Toggle between external and internal clock source.
2 Ext-Start R/W 1 = wait for rising edge on NIM Input 4 to start; 0 = start via CSR Run bit (software trigger).
3 PPG-Reset R/W Write 1 to reset the PPG (clears program counter, halts execution). This is not a full power-up reset. Must be cleared (write 0) after reset or the board will not operate.
4 Test-Mode R/W 1 = test mode: NIM Input 1 outputs the internal PPG clock, NIM Input 2 outputs the active PPG clock. 0 = normal operation.
16 Ext-Clk-Sel R 1 = external clock currently selected (LED 4 lit).
17 Ext-Clk-Good R 1 = external clock signal present and locked (LED 2 lit).
?–31 Status R Readback of program counter (PC), stack pointer (SP), and current delay counter.

Common CSR write values and their effects:

Value written Effect
0x8 Reset — clears program counter and halts execution
0x0 Idle — clears reset; selects software trigger mode (Ext-Start bit cleared)
0x4 Arm for external trigger — waits for rising edge on NIM Input 4
0x1 Software start — immediately begins executing from instruction slot 0

Clock Control Register (+0x30)

This register reprograms the on-board PLL to scale an external input clock to 100 MHz. It is only needed when operating with an external clock source other than the default 20 MHz.

The PLL relationship is:

VCO frequency = Fin × M / N
Clock output = VCO / C0

Operating limits: Fin: 5–472 MHz; FVCO: 600–1300 MHz; lock time < 1 ms.

Default configuration for a 20 MHz input: M = 30, N = 1 (bypassed), C0 = 6 → 100 MHz output.

Bit layout:

Bits Description
30–28 Phase counter select (0 = all; 1 = M; 2–6 = Clock 0–4)
26–24 Counter parameter (0 = HighCount; 1 = LowCount; 4 = Bypass; 5 = Mode)
23–20 Counter type (0 = N; 1 = M; 2 = Cp/LF; 3 = VCO; 4–8 = Clock 0–4)
16–8 9-bit parameter data
5 PLL Reset
4 Up/Down (1 = up; 0 = down)
3 PhaseStep
2 Write parameter
1 Reconfigure
0 Control trigger (toggle to apply changes)

Example: 100 MHz external frequency divide-down:

vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x00000305
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x01000205
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x05000105
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x04000005
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x3
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0

Example: return to 10 MHz internal frequency:

vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x04000105
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x3
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0

Instruction Set

Instruction Format

Every PPG program is a sequence of 128-bit instructions stored in program memory. Each instruction is written to the board as four consecutive 32-bit register writes (to Inst_Lo, Inst_Med, Inst_Hi, Inst_Top). Writing Inst_Top commits the instruction to the currently selected slot.

Bits Register Name Description
0–31 Inst_Lo SET mask One bit per output channel (bit 0 = channel 1 … bit 31 = channel 32). A 1 drives that channel HIGH.
32–63 Inst_Med CLEAR mask Same bit-to-channel mapping. A 1 drives that channel LOW.
64–95 Inst_Hi Delay count Number of additional 10 ns clock cycles to hold this state. Total dwell = (3 + delay_count) × 10 ns.
96–115 Inst_Top bits 0–19 Data 20-bit payload — loop count or branch/call address, depending on instruction type.
116–118 Inst_Top bits 20–22 Type 3-bit instruction opcode (see Instruction Types).
119–127 Inst_Top bits 23–31 Ignored.

SET and CLEAR masks: these follow the output model described above — assign each channel to exactly one mask per instruction.

Note on the type field: the board's own register summary labels this field as a "4-bit instruction type" at bits 116–117, which is internally inconsistent (that range is only two bits). The actual opcode values listed below occupy bits 20–22 of Inst_Top (bits 116–118 of the full instruction), a 3-bit field. This page uses the values, which are unambiguous.

Instruction Types

The 3-bit opcode occupies bits 20–22 of the Inst_Top word (bits 116–118 of the full 128-bit instruction). The 20-bit data payload occupies bits 0–19 of Inst_Top.

Type Opcode Inst_Top value Description
Halt 0 0x000000 Stop execution. The program counter does not advance; CSR bit 0 goes low.
Continue 1 0x100000 Advance to the next instruction slot.
New Loop 2 0x200000 + N Begin a loop that repeats N times (maximum N = 1,048,575 = 0xFFFFF). Pushes the loop start address and count onto the stack.
End Loop 3 0x300000 Decrement the loop counter. If > 0, jump back to the start of the loop body (the instruction after the matching New Loop); otherwise fall through.
Call 4 0x400000 + addr Push the next instruction address onto the stack and jump to the 20-bit address in the data field.
Return 5 0x500000 Pop the return address from the stack and jump to it.
Branch 6 0x600000 + addr Unconditional jump to the 20-bit address in the data field (does not push a return address).

Timing

Every instruction occupies exactly 3 clock cycles of overhead regardless of type. The delay count field adds additional cycles on top of that:

dwell time = (3 + delay_count) × 10 ns

To produce a 280 ns pulse, write a delay count of 25: (3 + 25) × 10 = 280 ns.

The maximum delay count is 232 − 1 = 4,294,967,295, giving a maximum single-instruction dwell of approximately 42.9 seconds. Longer durations require a loop (see Looping for Long Durations).

Important: every instruction — including Halt, New Loop, End Loop, etc. — incurs the 3-cycle overhead. Loop and branch instructions with delay_count = 0 still consume 30 ns.


Programming Tutorial

This section walks through writing PPG programs from first principles, building up to a complete real-world example. All examples use the mvme_write_value VME library function:

mvme_write_value(vme, address, value);

where vme is an open VME handle, address is an absolute 32-bit VME address, and value is a 32-bit unsigned integer.

The examples assume a base address of BASE_ADDR = 0x00100000. Adjust for your hardware configuration.

The set_command Helper

All PPG programming is built from a single primitive: selecting an instruction slot and writing its four 32-bit fields. It is convenient to wrap this in a helper:

void set_command(int slot,
                 unsigned int set_mask,    // Inst_Lo:  channels to drive HIGH
                 unsigned int clr_mask,    // Inst_Med: channels to drive LOW
                 unsigned int delay,       // Inst_Hi:  extra 10 ns ticks
                 unsigned int type_data)   // Inst_Top: opcode + payload
{
    mvme_write_value(vme, BASE_ADDR + 0x08, slot);       // select slot
    mvme_write_value(vme, BASE_ADDR + 0x0C, set_mask);
    mvme_write_value(vme, BASE_ADDR + 0x10, clr_mask);
    mvme_write_value(vme, BASE_ADDR + 0x14, delay);
    mvme_write_value(vme, BASE_ADDR + 0x18, type_data);  // commits instruction
}

All higher-level programming is built from calls to set_command.

Step 1: Reset the Board

Before writing any program, reset the board to put it in a known state:

mvme_write_value(vme, BASE_ADDR + 0x00, 0x8);  // CSR bit 3: assert reset
mvme_write_value(vme, BASE_ADDR + 0x00, 0x0);  // clear reset → idle

Asserting reset clears the program counter and halts execution. The reset bit must then be cleared (write 0x0) or the board will not operate. After this the board sits idle in software-trigger mode, waiting for a program to be loaded and started. It will not fire on any trigger until you write 0x1 (software start) or 0x4 (external trigger arm) to the CSR.

Step 2: Write a Safety Halt at Slot 0

Always write a Halt to slot 0 before writing the rest of your program. If an external trigger edge arrives while you are still writing instructions, the PPG executes this Halt immediately rather than running a partially-written or stale program:

set_command(0,
            0x00000000,   // SET:   no channels raised
            0xFFFFFFFF,   // CLEAR: all channels driven low
            0,            // delay: 0 extra ticks → 30 ns dwell
            0x000000);    // type:  Halt

Step 3: Write Your Program

Write instructions to consecutive slots starting from slot 1. Each instruction specifies which outputs are high, which are low, how long to hold that state, and what to do next.

Example: generate a single 280 ns pulse on channel 1, then halt.

Channel 1 is bit 0 of the output masks. The sequence is:

  1. All outputs low for 30 ns (initial clear).
  2. Channel 1 high for 280 ns.
  3. All outputs low, halt.
// Slot 1: all outputs low for 30 ns (delay=0 → (3+0)×10 = 30 ns)
set_command(1, 0x00000000, 0xFFFFFFFF, 0, 0x100000);

// Slot 2: channel 1 (bit 0) high for 280 ns (delay=25 → (3+25)×10 = 280 ns)
set_command(2, 0x00000001, 0xFFFFFFFE, 25, 0x100000);

// Slot 3: all outputs low, halt
set_command(3, 0x00000000, 0xFFFFFFFF, 0, 0x000000);

Step 4: Arm and Start

Once all instructions are written, arm or start the board by writing to the CSR:

// Rewind program address to slot 0 before starting
mvme_write_value(vme, BASE_ADDR + 0x08, 0x0);

// Option A: software trigger — starts immediately
mvme_write_value(vme, BASE_ADDR + 0x00, 0x1);  // Run bit

// Option B: external trigger — waits for rising edge on NIM Input 4
mvme_write_value(vme, BASE_ADDR + 0x00, 0x4);  // Ext-Start bit

Step 5: Detecting Completion

Poll CSR bit 0. When it falls to 0 the program has halted and the next sequence can be loaded:

while (mvme_read_value(vme, BASE_ADDR + 0x00) & 0x1) {
    // still running — sleep or yield here
}
// Program has halted; safe to reload

Timing Calculations

To produce a dwell of exactly T nanoseconds, the required delay count is:

delay_count = T / 10 − 3

For example, to hold an output high for exactly 500 ns:

delay_count = 500 / 10 − 3 = 47

The minimum possible dwell is 30 ns (delay_count = 0). It is impossible to produce a dwell shorter than 30 ns with any single instruction.

When two consecutive instructions must sum to an exact total time, remember that each instruction carries its own 30 ns overhead:

Ttotal = (3 + d1) × 10 + (3 + d2) × 10 = (6 + d1 + d2) × 10 ns

Looping for Long Durations

A single delay count can produce at most ~42.9 seconds. For longer durations, split the period across multiple loop iterations. Each iteration holds for a fraction of the total time; the loop instruction repeats until the full duration is reached.

Strategy: choose a loop count N and compute the per-iteration delay count so that N iterations sum to the target duration. The example below uses N = 100:

double total_time_s = 30.0;
int N = 100;

// Each iteration should cover total_time_s / N seconds. Convert to 10 ns ticks.
// We ignore the ~30 ns spent on the End Loop instruction each iteration (and the
// one-time New Loop): at second scale that error is < 0.001 %. For tight timing,
// subtract the overhead or measure on hardware.
unsigned int body_delay = (unsigned int)(total_time_s * 1e8 / N);

// Slot i:   New Loop, repeat N times  (0x200000 + N) — runs once
// Slot i+1: hold outputs for body_delay ticks, Continue — the loop body
// Slot i+2: End Loop — jumps back to the body until the count is exhausted
set_command(i,   0x0,              0x0,              0,          0x200000 + N);
set_command(i+1, my_output_mask,   ~my_output_mask,  body_delay, 0x100000);
set_command(i+2, 0x0,              0x0,              0,          0x300000);

New Loop initialises the counter once; End Loop runs once per iteration and jumps back to the loop body. Both are 3-cycle instructions, so the loop adds a fixed 30 ns plus 30 ns per iteration of overhead — negligible for periods of seconds or longer.

Loops may be nested, and combined with subroutine calls, up to the 256-entry stack depth: place another New Loop / End Loop pair (or a subroutine call) inside the body.

Subroutines

Use Call and Return instructions to share a block of instructions across multiple points in a program. The hardware call stack holds up to 256 return addresses.

// --- Main program ---
set_command(0, 0x0, 0xFFFFFFFF, 0,  0x000000);         // slot 0: safety Halt
set_command(1, 0x0, 0xFFFFFFFF, 10, 0x100000);         // slot 1: setup, Continue
set_command(2, 0x0, 0x0,        0,  0x400000 + 10);    // slot 2: Call subroutine at slot 10
set_command(3, 0x0, 0xFFFFFFFF, 0,  0x000000);         // slot 3: Halt (reached after Return)

// --- Subroutine at slot 10 ---
set_command(10, 0x00000001, 0xFFFFFFFE, 25, 0x100000); // pulse channel 1
set_command(11, 0x00000000, 0xFFFFFFFF, 25, 0x100000); // all low
set_command(12, 0x0,        0x0,        0,  0x500000); // Return

Complete Example: Timing Calibration Sequence

This is the timing calibration sequence used in the UCN sequencer frontend. It fires 10 pulses on output channel 29 (bit 28, mask 0x10000000) at a 0.2 s cadence so that downstream DAQ systems can establish an absolute time reference. It uses a software trigger.

The program layout:

Slot Instruction Purpose
0 All LOW, Continue (190 ns) Blank at start of sequence
1 New Loop ×10 Repeat 10 times
2 CH29 HIGH, Continue (280 ns) Rising-edge timing pulse
3 All LOW, Continue (~0.2 s) Off time between pulses
4 End Loop Jump back to slot 1 nine more times
5 All LOW, Halt End of sequence
// Reset and select software trigger
mvme_write_value(vme, BASE_ADDR + 0x00, 0x8);  // reset
mvme_write_value(vme, BASE_ADDR + 0x00, 0x0);  // clear reset (software trigger mode)

// Slot 0: blank — all outputs low, 190 ns dwell (delay=16 → (3+16)×10=190 ns)
set_command(0, 0x00000000, 0xFFFFFFFF, 0x10, 0x100000);

// Slot 1: New Loop, repeat 10 times
set_command(1, 0x0, 0x0, 0x0, 0x20000A);  // 0x200000 + 10

// Slot 2: channel 29 (bit 28) HIGH, 280 ns pulse (delay=25 → (3+25)×10=280 ns)
set_command(2, 0x10000000, 0xEFFFFFFF, 25, 0x100000);

// Slot 3: hold all LOW for the rest of the 0.2 s period.
// 0.2 s = 2e7 ticks of 10 ns. Subtract the 28 ticks used by the slot-2 pulse
// ((3+25)=28). The few ticks of loop/instruction overhead are negligible at
// this scale, so we approximate:
unsigned int off_time = (unsigned int)(0.2 * 1e8) - 28;
set_command(3, 0x00000000, 0xFFFFFFFF, off_time, 0x100000);

// Slot 4: End Loop
set_command(4, 0x0, 0x0, 0x0, 0x300000);

// Slot 5: all LOW, Halt
set_command(5, 0x00000000, 0xFFFFFFFF, 0x1, 0x000000);

// Rewind address pointer and start
mvme_write_value(vme, BASE_ADDR + 0x08, 0x0);  // address = slot 0
mvme_write_value(vme, BASE_ADDR + 0x00, 0x1);  // CSR: software start

// Poll for completion (~2 seconds)
while (mvme_read_value(vme, BASE_ADDR + 0x00) & 0x1) { /* wait */ }