VME-PPG32
VME-PPG32 Pulse Pattern Generator
The VME-PPG32 is a VME FPGA board that generates precise digital pulse patterns on 32 NIM output channels. It is driven by a 100 MHz clock (10 ns resolution) and executes a small user-written program stored in on-board memory. Programs can contain loops, subroutines, and branches, making it possible to generate multi-period sequences that last from nanoseconds to hours from a few dozen instructions.
This page covers hardware specifications, the VME register interface, the instruction set, and a step-by-step programming tutorial.
This page has been re-written for clarity with the help of an AI. For the old instructions see this page: VME-PPG32 (legacy)
How the PPG Works
Conceptually the PPG32 is a tiny processor dedicated to driving 32 output pins. You load a short program into its on-board memory, then trigger it; the board executes the program top-to-bottom and drives its NIM outputs accordingly.
Each instruction answers four questions:
- Which outputs go HIGH? — the SET mask.
- Which outputs go LOW? — the CLEAR mask.
- For how long? — the delay count, in units of 10 ns.
- What happens next? — the instruction type: continue to the next slot, loop, call a subroutine, branch, or halt.
Instructions execute sequentially from slot 0 unless a loop, branch, or subroutine redirects the flow. A 256-entry hardware stack supports nested loops and subroutine calls.
A typical workflow is:
- Reset the board — puts it in a known, halted state.
- Write the program over the VME bus — one instruction per memory slot.
- Trigger execution, either by software (write a bit over VME) or by an external NIM pulse.
- Poll the status bit until the program halts, then read results or load the next program.
Output Model: SET and CLEAR
Each instruction carries two independent 32-bit masks. A 1 in the SET mask drives that channel HIGH; a 1 in the CLEAR mask drives it LOW. Bit 0 corresponds to channel 1, bit 31 to channel 32.
The hardware documentation does not specify what happens to a channel whose bit appears in neither mask, nor in both. The safe, conventional practice — used throughout this page and in the UCN sequencer code — is to assign every channel to exactly one mask in every instruction, so all 32 outputs are fully defined at each step. In code this usually appears as a value and its bitwise complement:
set_mask = my_outputs; clr_mask = ~my_outputs; // every other channel explicitly driven LOW
Hardware Description
FPGA and Memory
| Component | Part / Value |
|---|---|
| FPGA | Altera Cyclone 3 EP3C40Q240C8 |
| Configuration flash | Altera EPCS16 serial flash |
| Program memory | 4096 × 128-bit words (4k instructions) |
| Call stack | 256 entries |
| FPGA resources | 1044 LEs, 500 kbits internal memory |
Programs are written to and read from program memory over the VME bus. Each 128-bit instruction occupies one slot; the program counter advances one slot per instruction executed.
I/O Characteristics
| Feature | Detail |
|---|---|
| NIM outputs | 32 channels (front panel) |
| NIM inputs | 4 channels (front panel) |
| Output LEDs | 32 (one per output channel) |
| Input LEDs | 4 (one per input channel) + 1 VME access LED |
| Serial DACs | 2 × AD5439YRUZ, unipolar 0–2.5 V, 10-bit |
| Rev1/Rev2 | Inputs switchable NIM/TTL via JMP3 |
| Rev2 only | Outputs switchable NIM/TTL via SW1 micro switches |
Clock
The PPG runs at 100 MHz (10 ns per tick). The clock source is selected by the CSR:
- Internal: 50 MHz crystal on board, doubled to 100 MHz via PLL.
- External: 20 MHz NIM signal on Input 3, scaled to 100 MHz via the on-board PLL (see Clock Control Register).
Every instruction takes a fixed 3 clock cycles of overhead regardless of type, plus a programmable 32-bit delay. The actual dwell time for any instruction is therefore:
dwell = (3 + delay_count) × 10 ns
The delay count is a 32-bit field, so the maximum single-instruction dwell is approximately 42.9 seconds (232 × 10 ns).
Caveat: the source wiki states a 10-second maximum, which conflicts with the documented 32-bit field width and is unexplained. Until verified on hardware, treat ~10 s as a conservative practical limit, and use a loop for anything longer.
VME Interface
| Parameter | Value |
|---|---|
| Address space | A32 only |
| Data width | D32 only |
| Transfer modes | Single-word, 32-bit DMA (BLT32), 2eVME DMA |
| Not supported | MBLT64, 2eSST |
| Data direction | VME-D[31..0] bidirectional; VME-A[23..0] input only |
| Handshake | DTACK output |
Jumper Settings
| Jumper | Setting | Function |
|---|---|---|
| JMP1 | INP / DAC | NIM input with 50 Ω termination, or DAC output |
| JMP2 | INP / DAC | Same as JMP1 |
| JMP3 | NIM (1–2) / TTL (2–3) | Input signal standard selection (Rev1 and later) |
| JMP4 | ACT | Active-serial flash programming mode (leave set) |
| IrqSel | Open | Leave open |
| SW1–3 | — | VME base address selection (A20–A31) |
NIM Input Assignments
| Input | Function |
|---|---|
| 4 | External start — rising edge triggers program execution; subsequent edges while running are ignored |
| 3 | External 20 MHz clock input — used by PLL to generate 100 MHz PPG clock |
| 2 | Unassigned |
| 1 | Unassigned (outputs internal PPG clock in test mode) |
Input LED Indicators
| LED | Meaning |
|---|---|
| 4 | Clock source: lit = external clock selected |
| 3 | NIM Input 2 signal status |
| 2 | External clock quality: lit = external clock present and good |
| 1 | Program running: lit = PPG is executing |
Hardware Setup
VME Address Configuration
The VME base address is set by address switches SW1–3 on the board. Set address bits A20–A23 to 1 and A24–A31 to 0 to place the board at 0x00100000 (the default used in firmware examples on this wiki). After setting switches, program the address-decoder CPLD (EPM3032) using a JTAG programmer.
Startup and Verification Checklist
- Verify power supply voltages — check for shorts on 1.2 V, ±3.3 V, 12 V, and 2.5 V rails before powering on.
- Set VME address switches.
- Program the address-decoder CPLD (EPM3032) via JTAG if the board is new or the CPLD was cleared.
- Load the
VME-PPG32firmware (.sof) to the FPGA via Quartus Programmer or the VME flash method below. - Verify board presence:
vmescanshould detect the board at the configured base address. - Run the built-in register test (write
0xBEEFBEEFto the Test register at offset+0x04and read it back). - Exercise NIM outputs and inputs, and verify LEDs respond.
- Flash firmware to active-serial memory (EPCS16) for persistence across power cycles.
Firmware Update Methods
The board runs one of two firmware personalities: VME-PPG32-IO32, a plain 32-channel I/O image with no sequencer, and VME-PPG32, the full pulse-pattern generator described on this page. The PPG features documented here require the VME-PPG32 image.
Method 1 — USB-Blaster JTAG (preferred for initial programming):
- Start Quartus Programmer and auto-detect.
- Attach
VME-PPG32.softo the EP3C40 device and program it. - Auto-detect again, attach
ppg.jicto the EPCS16 device, and program it (takes approximately 2 minutes). - Power-cycle the board to reboot into the PPG firmware. When running the PPG firmware, all LEDs are off after reboot.
Method 2 — VME flash programmer (requires IO32 firmware already loaded):
./srunner_vme_gef.exe -program -16 ppg.jic 0x100020 ./test_VMENIMIO32_gef.exe --addr 0x100000 --reboot
Register Reference
Register Map
All registers are accessed at 32-bit aligned offsets from the board's VME base address.
| # | Offset | Name | Access | Description |
|---|---|---|---|---|
| 0 | +0x00 |
CSR | R/W | Control and Status Register |
| 1 | +0x04 |
Test | R/W | Test register (read-back verification) |
| 2 | +0x08 |
Addr | R/W | Program address — selects instruction slot to read/write |
| 3 | +0x0C |
Inst_Lo | R/W | Instruction bits 0–31 (SET mask) |
| 4 | +0x10 |
Inst_Med | R/W | Instruction bits 32–63 (CLEAR mask) |
| 5 | +0x14 |
Inst_Hi | R/W | Instruction bits 64–95 (delay count) |
| 6 | +0x18 |
Inst_Top | R/W | Instruction bits 96–127 (type + data); writing this register commits the instruction to program memory |
| 7 | +0x1C |
Inv_Mask | R/W | Output inversion mask — a 1 inverts the polarity of that output channel (one bit per output; useful for active-low signals) |
| 8 | +0x20 |
Version | R | Firmware version (Unix timestamp) |
| 9 | +0x24 |
Flash | R/W | Serial flash control |
| 10 | +0x28 |
Serial | R | Board serial number |
| 11 | +0x2C |
Hardware | R | Hardware revision ID |
| 12 | +0x30 |
Clock Control | R/W | PLL reconfiguration (see below) |
CSR Register (+0x00)
The Control/Status Register is the main run-control interface.
| Bit(s) | Name | Access | Description |
|---|---|---|---|
| 0 | Run | R/W | Write 1 to start program execution from slot 0. Read: 1 = running, 0 = halted. |
| 1 | Ext-Clk-Toggle | W | Toggle between external and internal clock source. |
| 2 | Ext-Start | R/W | 1 = wait for rising edge on NIM Input 4 to start; 0 = start via CSR Run bit (software trigger). |
| 3 | PPG-Reset | R/W | Write 1 to reset the PPG (clears program counter, halts execution). This is not a full power-up reset. Must be cleared (write 0) after reset or the board will not operate. |
| 4 | Test-Mode | R/W | 1 = test mode: NIM Input 1 outputs the internal PPG clock, NIM Input 2 outputs the active PPG clock. 0 = normal operation. |
| 16 | Ext-Clk-Sel | R | 1 = external clock currently selected (LED 4 lit). |
| 17 | Ext-Clk-Good | R | 1 = external clock signal present and locked (LED 2 lit). |
| ?–31 | Status | R | Readback of program counter (PC), stack pointer (SP), and current delay counter. |
Common CSR write values and their effects:
| Value written | Effect |
|---|---|
0x8 |
Reset — clears program counter and halts execution |
0x0 |
Idle — clears reset; selects software trigger mode (Ext-Start bit cleared) |
0x4 |
Arm for external trigger — waits for rising edge on NIM Input 4 |
0x1 |
Software start — immediately begins executing from instruction slot 0 |
Clock Control Register (+0x30)
This register reprograms the on-board PLL to scale an external input clock to 100 MHz. It is only needed when operating with an external clock source other than the default 20 MHz.
The PLL relationship is:
VCO frequency = Fin × M / N
Clock output = VCO / C0
Operating limits: Fin: 5–472 MHz; FVCO: 600–1300 MHz; lock time < 1 ms.
Default configuration for a 20 MHz input: M = 30, N = 1 (bypassed), C0 = 6 → 100 MHz output.
Bit layout:
| Bits | Description |
|---|---|
| 30–28 | Phase counter select (0 = all; 1 = M; 2–6 = Clock 0–4) |
| 26–24 | Counter parameter (0 = HighCount; 1 = LowCount; 4 = Bypass; 5 = Mode) |
| 23–20 | Counter type (0 = N; 1 = M; 2 = Cp/LF; 3 = VCO; 4–8 = Clock 0–4) |
| 16–8 | 9-bit parameter data |
| 5 | PLL Reset |
| 4 | Up/Down (1 = up; 0 = down) |
| 3 | PhaseStep |
| 2 | Write parameter |
| 1 | Reconfigure |
| 0 | Control trigger (toggle to apply changes) |
Example: 100 MHz external frequency divide-down:
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x00000305 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x01000205 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x05000105 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x04000005 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x3 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
Example: return to 10 MHz internal frequency:
vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x04000105 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x3 vme_poke -a VME_A32UD -A 0x00100030 -d VME_D32 0x0
Instruction Set
Instruction Format
Every PPG program is a sequence of 128-bit instructions stored in program memory. Each instruction is written to the board as four consecutive 32-bit register writes (to Inst_Lo, Inst_Med, Inst_Hi, Inst_Top). Writing Inst_Top commits the instruction to the currently selected slot.
| Bits | Register | Name | Description |
|---|---|---|---|
| 0–31 | Inst_Lo |
SET mask | One bit per output channel (bit 0 = channel 1 … bit 31 = channel 32). A 1 drives that channel HIGH. |
| 32–63 | Inst_Med |
CLEAR mask | Same bit-to-channel mapping. A 1 drives that channel LOW. |
| 64–95 | Inst_Hi |
Delay count | Number of additional 10 ns clock cycles to hold this state. Total dwell = (3 + delay_count) × 10 ns. |
| 96–115 | Inst_Top bits 0–19 |
Data | 20-bit payload — loop count or branch/call address, depending on instruction type. |
| 116–118 | Inst_Top bits 20–22 |
Type | 3-bit instruction opcode (see Instruction Types). |
| 119–127 | Inst_Top bits 23–31 |
— | Ignored. |
SET and CLEAR masks: these follow the output model described above — assign each channel to exactly one mask per instruction.
Note on the type field: the board's own register summary labels this field as a "4-bit instruction type" at bits 116–117, which is internally inconsistent (that range is only two bits). The actual opcode values listed below occupy bits 20–22 of Inst_Top (bits 116–118 of the full instruction), a 3-bit field. This page uses the values, which are unambiguous.
Instruction Types
The 3-bit opcode occupies bits 20–22 of the Inst_Top word (bits 116–118 of the full 128-bit instruction). The 20-bit data payload occupies bits 0–19 of Inst_Top.
| Type | Opcode | Inst_Top value |
Description |
|---|---|---|---|
| Halt | 0 | 0x000000 |
Stop execution. The program counter does not advance; CSR bit 0 goes low. |
| Continue | 1 | 0x100000 |
Advance to the next instruction slot. |
| New Loop | 2 | 0x200000 + N |
Begin a loop that repeats N times (maximum N = 1,048,575 = 0xFFFFF). Pushes the loop start address and count onto the stack. |
| End Loop | 3 | 0x300000 |
Decrement the loop counter. If > 0, jump back to the start of the loop body (the instruction after the matching New Loop); otherwise fall through. |
| Call | 4 | 0x400000 + addr |
Push the next instruction address onto the stack and jump to the 20-bit address in the data field. |
| Return | 5 | 0x500000 |
Pop the return address from the stack and jump to it. |
| Branch | 6 | 0x600000 + addr |
Unconditional jump to the 20-bit address in the data field (does not push a return address). |
Timing
Every instruction occupies exactly 3 clock cycles of overhead regardless of type. The delay count field adds additional cycles on top of that:
dwell time = (3 + delay_count) × 10 ns
To produce a 280 ns pulse, write a delay count of 25: (3 + 25) × 10 = 280 ns.
The maximum delay count is 232 − 1 = 4,294,967,295, giving a maximum single-instruction dwell of approximately 42.9 seconds. Longer durations require a loop (see Looping for Long Durations).
Important: every instruction — including Halt, New Loop, End Loop, etc. — incurs the 3-cycle overhead. Loop and branch instructions with delay_count = 0 still consume 30 ns.
Programming Tutorial
This section walks through writing PPG programs from first principles, building up to a complete real-world example. All examples use the mvme_write_value VME library function:
mvme_write_value(vme, address, value);
where vme is an open VME handle, address is an absolute 32-bit VME address, and value is a 32-bit unsigned integer.
The examples assume a base address of BASE_ADDR = 0x00100000. Adjust for your hardware configuration.
The set_command Helper
All PPG programming is built from a single primitive: selecting an instruction slot and writing its four 32-bit fields. It is convenient to wrap this in a helper:
void set_command(int slot,
unsigned int set_mask, // Inst_Lo: channels to drive HIGH
unsigned int clr_mask, // Inst_Med: channels to drive LOW
unsigned int delay, // Inst_Hi: extra 10 ns ticks
unsigned int type_data) // Inst_Top: opcode + payload
{
mvme_write_value(vme, BASE_ADDR + 0x08, slot); // select slot
mvme_write_value(vme, BASE_ADDR + 0x0C, set_mask);
mvme_write_value(vme, BASE_ADDR + 0x10, clr_mask);
mvme_write_value(vme, BASE_ADDR + 0x14, delay);
mvme_write_value(vme, BASE_ADDR + 0x18, type_data); // commits instruction
}
All higher-level programming is built from calls to set_command.
Step 1: Reset the Board
Before writing any program, reset the board to put it in a known state:
mvme_write_value(vme, BASE_ADDR + 0x00, 0x8); // CSR bit 3: assert reset mvme_write_value(vme, BASE_ADDR + 0x00, 0x0); // clear reset → idle
Asserting reset clears the program counter and halts execution. The reset bit must then be cleared (write 0x0) or the board will not operate. After this the board sits idle in software-trigger mode, waiting for a program to be loaded and started. It will not fire on any trigger until you write 0x1 (software start) or 0x4 (external trigger arm) to the CSR.
Step 2: Write a Safety Halt at Slot 0
Always write a Halt to slot 0 before writing the rest of your program. If an external trigger edge arrives while you are still writing instructions, the PPG executes this Halt immediately rather than running a partially-written or stale program:
set_command(0,
0x00000000, // SET: no channels raised
0xFFFFFFFF, // CLEAR: all channels driven low
0, // delay: 0 extra ticks → 30 ns dwell
0x000000); // type: Halt
Step 3: Write Your Program
Write instructions to consecutive slots starting from slot 1. Each instruction specifies which outputs are high, which are low, how long to hold that state, and what to do next.
Example: generate a single 280 ns pulse on channel 1, then halt.
Channel 1 is bit 0 of the output masks. The sequence is:
- All outputs low for 30 ns (initial clear).
- Channel 1 high for 280 ns.
- All outputs low, halt.
// Slot 1: all outputs low for 30 ns (delay=0 → (3+0)×10 = 30 ns) set_command(1, 0x00000000, 0xFFFFFFFF, 0, 0x100000); // Slot 2: channel 1 (bit 0) high for 280 ns (delay=25 → (3+25)×10 = 280 ns) set_command(2, 0x00000001, 0xFFFFFFFE, 25, 0x100000); // Slot 3: all outputs low, halt set_command(3, 0x00000000, 0xFFFFFFFF, 0, 0x000000);
Step 4: Arm and Start
Once all instructions are written, arm or start the board by writing to the CSR:
// Rewind program address to slot 0 before starting mvme_write_value(vme, BASE_ADDR + 0x08, 0x0); // Option A: software trigger — starts immediately mvme_write_value(vme, BASE_ADDR + 0x00, 0x1); // Run bit // Option B: external trigger — waits for rising edge on NIM Input 4 mvme_write_value(vme, BASE_ADDR + 0x00, 0x4); // Ext-Start bit
Step 5: Detecting Completion
Poll CSR bit 0. When it falls to 0 the program has halted and the next sequence can be loaded:
while (mvme_read_value(vme, BASE_ADDR + 0x00) & 0x1) {
// still running — sleep or yield here
}
// Program has halted; safe to reload
Timing Calculations
To produce a dwell of exactly T nanoseconds, the required delay count is:
delay_count = T / 10 − 3
For example, to hold an output high for exactly 500 ns:
- delay_count = 500 / 10 − 3 = 47
The minimum possible dwell is 30 ns (delay_count = 0). It is impossible to produce a dwell shorter than 30 ns with any single instruction.
When two consecutive instructions must sum to an exact total time, remember that each instruction carries its own 30 ns overhead:
- Ttotal = (3 + d1) × 10 + (3 + d2) × 10 = (6 + d1 + d2) × 10 ns
Looping for Long Durations
A single delay count can produce at most ~42.9 seconds. For longer durations, split the period across multiple loop iterations. Each iteration holds for a fraction of the total time; the loop instruction repeats until the full duration is reached.
Strategy: choose a loop count N and compute the per-iteration delay count so that N iterations sum to the target duration. The example below uses N = 100:
double total_time_s = 30.0; int N = 100; // Each iteration should cover total_time_s / N seconds. Convert to 10 ns ticks. // We ignore the ~30 ns spent on the End Loop instruction each iteration (and the // one-time New Loop): at second scale that error is < 0.001 %. For tight timing, // subtract the overhead or measure on hardware. unsigned int body_delay = (unsigned int)(total_time_s * 1e8 / N); // Slot i: New Loop, repeat N times (0x200000 + N) — runs once // Slot i+1: hold outputs for body_delay ticks, Continue — the loop body // Slot i+2: End Loop — jumps back to the body until the count is exhausted set_command(i, 0x0, 0x0, 0, 0x200000 + N); set_command(i+1, my_output_mask, ~my_output_mask, body_delay, 0x100000); set_command(i+2, 0x0, 0x0, 0, 0x300000);
New Loop initialises the counter once; End Loop runs once per iteration and jumps back to the loop body. Both are 3-cycle instructions, so the loop adds a fixed 30 ns plus 30 ns per iteration of overhead — negligible for periods of seconds or longer.
Loops may be nested, and combined with subroutine calls, up to the 256-entry stack depth: place another New Loop / End Loop pair (or a subroutine call) inside the body.
Subroutines
Use Call and Return instructions to share a block of instructions across multiple points in a program. The hardware call stack holds up to 256 return addresses.
// --- Main program --- set_command(0, 0x0, 0xFFFFFFFF, 0, 0x000000); // slot 0: safety Halt set_command(1, 0x0, 0xFFFFFFFF, 10, 0x100000); // slot 1: setup, Continue set_command(2, 0x0, 0x0, 0, 0x400000 + 10); // slot 2: Call subroutine at slot 10 set_command(3, 0x0, 0xFFFFFFFF, 0, 0x000000); // slot 3: Halt (reached after Return) // --- Subroutine at slot 10 --- set_command(10, 0x00000001, 0xFFFFFFFE, 25, 0x100000); // pulse channel 1 set_command(11, 0x00000000, 0xFFFFFFFF, 25, 0x100000); // all low set_command(12, 0x0, 0x0, 0, 0x500000); // Return
Complete Example: Timing Calibration Sequence
This is the timing calibration sequence used in the UCN sequencer frontend. It fires 10 pulses on output channel 29 (bit 28, mask 0x10000000) at a 0.2 s cadence so that downstream DAQ systems can establish an absolute time reference. It uses a software trigger.
The program layout:
| Slot | Instruction | Purpose |
|---|---|---|
| 0 | All LOW, Continue (190 ns) | Blank at start of sequence |
| 1 | New Loop ×10 | Repeat 10 times |
| 2 | CH29 HIGH, Continue (280 ns) | Rising-edge timing pulse |
| 3 | All LOW, Continue (~0.2 s) | Off time between pulses |
| 4 | End Loop | Jump back to slot 1 nine more times |
| 5 | All LOW, Halt | End of sequence |
// Reset and select software trigger
mvme_write_value(vme, BASE_ADDR + 0x00, 0x8); // reset
mvme_write_value(vme, BASE_ADDR + 0x00, 0x0); // clear reset (software trigger mode)
// Slot 0: blank — all outputs low, 190 ns dwell (delay=16 → (3+16)×10=190 ns)
set_command(0, 0x00000000, 0xFFFFFFFF, 0x10, 0x100000);
// Slot 1: New Loop, repeat 10 times
set_command(1, 0x0, 0x0, 0x0, 0x20000A); // 0x200000 + 10
// Slot 2: channel 29 (bit 28) HIGH, 280 ns pulse (delay=25 → (3+25)×10=280 ns)
set_command(2, 0x10000000, 0xEFFFFFFF, 25, 0x100000);
// Slot 3: hold all LOW for the rest of the 0.2 s period.
// 0.2 s = 2e7 ticks of 10 ns. Subtract the 28 ticks used by the slot-2 pulse
// ((3+25)=28). The few ticks of loop/instruction overhead are negligible at
// this scale, so we approximate:
unsigned int off_time = (unsigned int)(0.2 * 1e8) - 28;
set_command(3, 0x00000000, 0xFFFFFFFF, off_time, 0x100000);
// Slot 4: End Loop
set_command(4, 0x0, 0x0, 0x0, 0x300000);
// Slot 5: all LOW, Halt
set_command(5, 0x00000000, 0xFFFFFFFF, 0x1, 0x000000);
// Rewind address pointer and start
mvme_write_value(vme, BASE_ADDR + 0x08, 0x0); // address = slot 0
mvme_write_value(vme, BASE_ADDR + 0x00, 0x1); // CSR: software start
// Poll for completion (~2 seconds)
while (mvme_read_value(vme, BASE_ADDR + 0x00) & 0x1) { /* wait */ }