BNMR: Troubleshooting: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(4 intermediate revisions by the same user not shown)
Line 2: Line 2:


== Web pages not visible ==
== Web pages not visible ==
<ol>
<ol>
<li>Log in to isdaq01 as bnmr/bnqr user (as appropriate) and type <code>start-all</code>.</li>
<li>Log in to isdaq01 as bnmr/bnqr user (as appropriate) and type <code>start-all</code>.</li>
Line 8: Line 7:
</ol>
</ol>


== DAQ not operating as expected ==
These commands are safe to run even if a run is in progress. (A bare <code>kill-all</code> should not be run while a run is in progress, but <code>killall mhttpd</code> is okay).
When troubleshooting the BNMR/BNQR DAQ system, the first question to ask is
"Has anything changed since the system last ran successfully?". If no, then it is
likely that the problem is with the hardware. First check that the VME crates and
required NIM bins are turned on.


=== Access to the VME cpus ===
== Programs can't start or run can't start ==
=== Try to restart the DAQ programs first ===
On isdaq01 type these three commands. The first one forcefully stops a run if it is in progress. Then we restart all the DAQ programs.
 
odbedit -c "stop now"
kill-all
start-all
 
If the problem is not solved, the issue is likely with the hardware.
 
=== Check access to the VME cpus ===
Since the BNMR/BNQR platforms are electrically isolated, accessing the VME cpus from  
Since the BNMR/BNQR platforms are electrically isolated, accessing the VME cpus from  
off-platform can fail even if the VME crate is switched on. Check the [[#Network connection to platform|network connection]].
off-platform can fail even if the VME crate is switched on. Check the [[#Network connection to platform|network connection]].
If the VMICs can be accessed, but not the CAMP MVME162, check the [[#Firewall to platform|firewall]].
If the VMICs can be accessed, but not the CAMP MVME162, check the [[#Firewall to platform|firewall]].


=== Network connection to platform ===
=== Check network connection to platform ===


The ethernet is connected to both BNQR and BNMR HV platforms via a pair of optical couplers (small gray boxes). The high voltage end is located close to the VME Crates on each platform.  They are connected via orange fibre-optic cables to the low voltage (ground) ends which are in one of the blue racks on the floor. The low-voltage boxes are also connected directly to the site Ethernet by a wall connection. The high voltage end is connected to an ethernet switch, to which all the network devices on the platform are connected.
The ethernet is connected to both BNQR and BNMR HV platforms via a pair of optical couplers (small gray boxes). The high voltage end is located close to the VME Crates on each platform.  They are connected via orange fibre-optic cables to the low voltage (ground) ends which are in one of the blue racks on the floor. The low-voltage boxes are also connected directly to the site Ethernet by a wall connection. The high voltage end is connected to an ethernet switch, to which all the network devices on the platform are connected.
If the platform network fails, there will be no flashing lights on the ethernet switches on the platforms. Make sure that power is on to the ethernet switch on the platform and both optical couplers (platform and ground).  Check that the site ethernet connection to the low-voltage box is still working.  If working, the ethernet lights will be flashing in the grey optical coupler box.  The site connection could have been disconnected for some reason. A laptop may be useful here. If the site connection is working, the problem may be with one of the optical couplers. Swap in the spare or borrow the box from the other platform to diagnose the problem.
If the platform network fails, there will be no flashing lights on the ethernet switches on the platforms. Make sure that power is on to the ethernet switch on the platform and both optical couplers (platform and ground).  Check that the site ethernet connection to the low-voltage box is still working.  If working, the ethernet lights will be flashing in the grey optical coupler box.  The site connection could have been disconnected for some reason. A laptop may be useful here. If the site connection is working, the problem may be with one of the optical couplers. Swap in the spare or borrow the box from the other platform to diagnose the problem.


=== Firewall to platform ===
=== Check the CAMP firewall boxes ===
The CAMP MVME162 cpus are isolated from the network by a firewall, provided by the cpus ''daqfire1'' and ''daqfire2'', located off the platform in blue racks. These need to be up and running. The firewall was installed to prevent the CAMP mvme162 from causing trouble on the control group network (with broadcasts not properly handled by the control group cpus I believe).
The CAMP MVME162 cpus are isolated from the network by a firewall, provided by the cpus ''daqfire1'' and ''daqfire2'', located off the platform in blue racks. These need to be up and running. The firewall was installed to prevent the CAMP mvme162 from causing trouble on the control group network (with broadcasts not properly handled by the control group cpus I believe).


=== Frontend not running ===
== Frontend stuck on first cycle ==
Restart the frontend either by pressing the button on the webserver Programs page, or by issuing start-all in an xterm.
If the frontend dies after restarting, check the error messages in the frontend window and/or on the webserver Messages page.
 
=== Frontend stuck on first cycle ===
After starting a run, and the hardware has been initialized, the frontend window should show lines of the form
After starting a run, and the hardware has been initialized, the frontend window should show lines of the form



Latest revision as of 08:49, 8 July 2022

Web pages not visible

  1. Log in to isdaq01 as bnmr/bnqr user (as appropriate) and type start-all.
  2. If the web pages are still not visible, try: killall mhttpd then start-all again.

These commands are safe to run even if a run is in progress. (A bare kill-all should not be run while a run is in progress, but killall mhttpd is okay).

Programs can't start or run can't start

Try to restart the DAQ programs first

On isdaq01 type these three commands. The first one forcefully stops a run if it is in progress. Then we restart all the DAQ programs.

odbedit -c "stop now"
kill-all
start-all

If the problem is not solved, the issue is likely with the hardware.

Check access to the VME cpus

Since the BNMR/BNQR platforms are electrically isolated, accessing the VME cpus from off-platform can fail even if the VME crate is switched on. Check the network connection. If the VMICs can be accessed, but not the CAMP MVME162, check the firewall.

Check network connection to platform

The ethernet is connected to both BNQR and BNMR HV platforms via a pair of optical couplers (small gray boxes). The high voltage end is located close to the VME Crates on each platform. They are connected via orange fibre-optic cables to the low voltage (ground) ends which are in one of the blue racks on the floor. The low-voltage boxes are also connected directly to the site Ethernet by a wall connection. The high voltage end is connected to an ethernet switch, to which all the network devices on the platform are connected. If the platform network fails, there will be no flashing lights on the ethernet switches on the platforms. Make sure that power is on to the ethernet switch on the platform and both optical couplers (platform and ground). Check that the site ethernet connection to the low-voltage box is still working. If working, the ethernet lights will be flashing in the grey optical coupler box. The site connection could have been disconnected for some reason. A laptop may be useful here. If the site connection is working, the problem may be with one of the optical couplers. Swap in the spare or borrow the box from the other platform to diagnose the problem.

Check the CAMP firewall boxes

The CAMP MVME162 cpus are isolated from the network by a firewall, provided by the cpus daqfire1 and daqfire2, located off the platform in blue racks. These need to be up and running. The firewall was installed to prevent the CAMP mvme162 from causing trouble on the control group network (with broadcasts not properly handled by the control group cpus I believe).

Frontend stuck on first cycle

After starting a run, and the hardware has been initialized, the frontend window should show lines of the form

Seen 1/100 bins
Seen 2/100 bins
Seen 4/100 bins
Seen 8/100 bins

This shows that the cycle has started, and the Scaler is receiving "External Next" pulses from the PPG. The cycle ends when the expected number of pulses have been received from the PPG, the data is read out and the PPG cycle restarted.

If the scaler receives too few or no "External Next" pulses from the PPG, the frontend will become stuck (generally saying Seen 0/100 bins or similar). This may be due to bad cabling. The PPG output "MCS Next" is connected to the Scaler "External Next" Input via a NIM fan-out module. Check that the NIM bin is powered up, and that the fan-out module is still operational.

If the scaler receives too many "External Next" pulses, the DAQ will run but the data will not be correct. Extra bins from the cycle may appear in subsequent cycles, so the data gets out of phase. There is a test in the frontend code (using scaler Ref Ch 1) that should detect this.

If running in dual-channel mode, the PPG is started by an external signal from the kicker. Check if the signal is being received, and that the operators have configured dual channel mode correctly in EPICS. If running in single-channel mode, the PPG is started by a software signal from the frontend, so this is not a concern.


Debugging VME DAQ Modules

Individual VME DAQ Modules can be debugged with test programs - see BNMR: Hardware Debugging.