DEAP: Difference between revisions
daqwiki>Olchansk |
daqwiki>Olchansk |
||
Line 242: | Line 242: | ||
* deap00 is the master server | * deap00 is the master server | ||
* there are no secondary servers | * there are no secondary servers | ||
=== Time configuration === | |||
AAA | |||
=== System monitoring tools === | === System monitoring tools === |
Revision as of 10:27, 10 July 2013
Links
- https://deapdaqgw.snolab.ca/ MIDAS status page
- https://www.snolab.ca/deap/private/TWiki/bin/view SNOLAB Wiki
- https://deapdaqgw.snolab.ca/elog/ DEAP00 ELOG
- https://deapdaqgw.snolab.ca/ganglia/ GANGLIA system monitoring
- https://deapdaqgw.snolab.ca/nodeinfo/config.html computer configuration and status
- https://deapdaqgw.snolab.ca/vme01/ VME crate 1 (all 3 VME crates: does not work from google-chrome, firefox is ok, safari is ok. after 10 sec reloads to the midas status page)
- https://deapdaqgw.snolab.ca/vme02/ VME crate 2
- https://deapdaqgw.snolab.ca/vme03/ VME crate 3
- https://deapdaqgw.snolab.ca:8443/ UPS (deapups, guest/guest)
- https://deapdaqgw.snolab.ca:8444/index.html Sentry Power Distribution Unit (deapcdu, deap/deap)
- https://deapdaqgw.snolab.ca:8445/ KVM (deapkvm8)
- https://deapkvm8 (192.168.1.17) (deapkvm8) ATEN IP KVM (only works from firefox on deapdaqgw gateway machine) (deapuser, deapuser)
Fingerprints for the deapdaqgw.snolab.ca SSL certificate:
- SHA-256: 2A ED 88 3E 38 27 25 D0 1E 4D 48 A1 78 FC E0 0B 6E 58 00 FA A6 53 B1 FE 50 3D 91 CE C9 AB 08 19
- SHA-1: 1A FE 53 23 0A 22 F3 35 C8 20 CD 0E 0E F0 3F A7 72 B7 2F 29
DAQ machines
- deapdaqgw: gateway machine (DHCP for deap00, UPS, CDU, NAT)
- deap00: main daq machine (storage, home directories, central services, etc)
- deap01..05: A3818 daq frontend machines
- deap07: spare A3818 daq machine (old deap00) (used for PCIe ADC DAQ)
- deap08: spare deap00 machine
- lxdeap01: VME daq machine
- deapvme01..03: VME crate power supplies
- deapups: DAQ UPS unit
- deapcdu: DAQ power distribution unit
- deapkvm8: 8-port IP KVM
- mscb520: MSCB-ETH bridge
Power up sequence
- power up and turn on all 3 UPSes
- network switch, CDU and gateway machine are connected to non-switchable UPS ports, they should power up and boot
- one should be able to ping the gateway machine
- ssh deapgw@deapdaqgw, ping deapups and deapcdu
- open the UPS and CDU web pages through the HTTPS proxy
- from the CDU web page (outlet control), turn on deap00. If deap00 is off (0 power use) but outlet is "on", use the "reboot" action.
- wait for deap00 to boot (ping deap00)
- mhttpd and elog should start automatically
- open the DEAP MIDAS status page
- now one can ssh deap@deapdaqgw then ssh deap00
- on the MIDAS status page, start the slow controls frontends: UPS, CDU, VME crates and NutUps, clear all alarms. Do not start MPOD and SCB yet.
- all frontends should start "green", except for "vme02" (and "mpod") should report "communication problem"
- on the CDU web page, turn on all power outlets (use "global control action" - "on" - "apply")
- wait for VME02 (and MPOD) to boot: their frontend status should turn "green"
- wait for deap01..deap05 to boot (no simple indication, but they should ping from deap00)
- from the MIDAS "VME" slow controls pages, turn on all 3 VME crates
- from the MIDAS "programs" page, start mpod and scb frontends. They status should show "green"
- from the MIDAS "MPOD_HV" page, turn on the MPOD ("main switch ON", if ready to ramp up the voltages, "output ON")
- from the MIDAS "SCB" page, turn on all SCBs ("ON" button)
- wait for lxdeap01 to boot (should be able to ping from deap00)
- from the MIDAS "programs" page, start all daq frontends
- clear all MIDAS alarms
- start a run, wait for a bit, stop the run (to confirm all frontends are happy)
- DAQ is now ready to take data
deapups connections
TBW
UPS configuration
Tripp-lite management software
ssh root@deap00 /opt/nut/bin/upsdrvctl stop (unplug USB cable from UPS, wait 10 sec, plug it back in) service pald restart /var/tripplite/poweralert/console/pal_console.sh ### restore NUT monitoring service pald stop ### this takes a few minutes /opt/nut/bin/upsdrvctl start
USB connections
- lsusb -v | grep -i product
- lsusb -v | grep -i serial
NUT UPS configuration
NB - UPS names are tied to the UPS serial numbers via the NUT config file!
- http://www.networkupstools.org/
- ssh root@deap00
- (initial software checkout) cd ~; svn checkout svn://anonscm.debian.org/nut/trunk nut
- (update) cd ~/nut; svn update
- (build, install) cd ~/nut; ./autogen.sh; ./configure --prefix=/opt/nut; make -j6 -k; make -k; make -k install
- config file: /opt/nut/etc/ups.conf
[ups1] driver = usbhid-ups port = auto desc = "ups1" serial = "2231ELCPS720300082" [ups2] driver = usbhid-ups port = auto desc = "ups2" serial = "2211KW0PS733900093" [ups3] driver = usbhid-ups port = auto desc = "ups3" serial = "2231ELCPS720300090"
- restart drivers: /opt/nut/bin/upsdrvctl start
- reload upsd: /opt/nut/sbin/upsd -c reload
- see ups status: /opt/nut/bin/upsc ups1
deapcdu connections
NOTE: UPS shutdown scripts (ups2_ob, ups2_lb) need to know what equipment is connected to which CDU port, if things move around, please review those scripts!.
1 : DEAP00 2 : DEAP01 3 : DEAP02 4 : DEAP03 5 : DEAP04 6 : DEAP05 7 : 8 : ------------ 9 : 10 : SCB1 11 : SCB2 12 : DEAPMPOD 13 : DEAPVME02 14 : 15 : 16 : DEAPKVM8
deapcdu snmp
- snmpwalk -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c public deapcdu sentry3
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.1 i 1 ### turn on outlet 1
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.3 i 2 ### turn off outlet 3
VME and MPOD snmp
- snmpwalk -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 crate
- snmpset -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 sysMainSwitch.0 i 1 ### turn crate on
- snmpset -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 sysMainSwitch.0 i 0 ### turn crate off
Network configuration (DEAP)
The DEAP DAQ cluster is configured for standalone running with or without an internet connection.
(NB: Some internet functions are required: access to NTP for time synchronization and access to Linux package repositories to install packages, etc)
Network numbers
Network numbers are assigned by deapdaqgw and deap00 DHCP servers:
192.168.1.x (netmask 255.255.255.0): daq private network 192.168.2.x: deap00a-deap01a connection 192.168.3.x: deap00b-deap02b connection 192.168.4.x: deap00c-deap03c connection 192.168.5.x: deap00d-deap04d connection
Network cabling
deapdaqgw: eth0 - uplink eth1 - daq private network deap00 mobo: eth0 - daq private network eth1 - reserved for the DEAP private network deap00 pcie nic: (top to bottom) eth2 - deap01a eth3 - deap02b eth4 - deap03c eth5 - deap04d deap01..deap04: eth0 - daq private network eth1 - secondary network direct link to deap00 deap05, lxdeap01: eth0 or eth1 - daq private network (use either port)
DHCP configuration
Main DHCP server is running on the gateway machine. It provides IP addresses to all devices on the main private network.
Additional DHCP server is running on deap00. It provides IP addresses to the secondary network links to deap00 (deap01a..deap04d)
See following sections for more details.
DEAP network nodes with statically configured IP addresses:
- deapmod : Wiener MPOD firmware does not support DHCP
Gateway machine
deapdaqgw is the gateway machine that provides internet access to the DEAP DAQ cluster.
- NAT ("network address translation", see /etc/rc.local)
- IP address assignement via /etc/hosts
- DNS via dnsmasq serving contents of /etc/hosts and bridge to upstream DNS (configured in /etc/resolv.conf by upstream DHCP)
- DHCP for all machines via /etc/dhcpd/dhcpd.conf, Special DHCP settings:
- "option routers" sets the "default route" through the gateway machine itself
- "option domain-name-servers" sets the DNS server in /etc/resolv.conf to dnsmasq on the gateway machine
- "option ntp-servers" specifies the time servers, (but not used by any hosts?)
- "option domain-name" is not specified, leaving the "domain" and "search" entries of /etc/resolv.conf blank (actually the entries are not there)
- unknown clients are assigned IP addresses in the range 192.168.x.200 through .250.
- MSCB nodes are assigned "infinite" leases by avoid a bug in MSCB firmware
- remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
- HTTPS proxy for midas, elog, and other web-connected devices (see links above). Edit /etc/httpd/conf.d/*.conf
deap00 machine
deap00 is the main machine for the DEAP DAQ cluster.
- DHCP for secondary network links to frontend machines, remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
- NIS master
- NFS export of home disks, data disks (NFS exports list: edit /etc/netgroup, run "make -C /var/yp")
network port assignements:
- eth0: main connection to the local network, IP address is assigned by DHCP from the gateway machine
- eth1: no used, reserved for special link to the data storage machine
- eth2..eth5: Intel 4-port card, ports are numbered from the top, connected to deap01..deap04 in order.
NIS configuration
Usernames, passwords and hostnames are distributed using NIS:
- domain name: DEAP-NIS
- deap00 is the master server
- there are no secondary servers
Time configuration
AAA
System monitoring tools
- ganglia
- triumf_nodeinfo
- konstantin's ganglia packages (monitor_nfs, ganglia sensors, top, etc) - To install/update: see TRIUMF SL install instructions.
Backups
Backups of all system disks (SSD and USB flash media) are done to the deap00 data disk. This includes deapdaqgw and deap00 SSDs:
- deap00 cron job /etc/cron.d/backup.lxdaq.cron
- runs script deap00:~root/backup.lxdaq
- writes backups to deap00 data disk: deap00:/data/root/backups:
[root@deap00 ~]# ls -l /data/root/backups/ total 56 -rwxr-xr-x 1 root root 4208 Dec 7 2012 clone.perl dr-xr-xr-x 31 root root 4096 Jul 3 13:28 deap00 dr-xr-xr-x 28 root root 4096 Jul 3 13:32 deap01 dr-xr-xr-x 28 root root 4096 Jul 3 13:32 deap02 dr-xr-xr-x 28 root root 4096 Apr 19 18:44 deap03 dr-xr-xr-x 28 root root 4096 Jul 3 13:32 deap04 dr-xr-xr-x 28 root root 4096 Jul 3 13:32 deap05 dr-xr-xr-x 28 root root 4096 Mar 2 21:32 deap07 dr-xr-xr-x 28 root root 4096 Mar 2 21:32 deap08 dr-xr-xr-x 29 root root 4096 Jul 3 11:10 deapdaqgw dr-xr-xr-x 26 root root 4096 Jul 3 13:38 lxdeap01 -rwxr-xr-x 1 root root 7672 Dec 7 2012 uuidfix.perl [root@deap00 ~]#
- clone.perl and uuidfix.perl are the scripts for writing these backups back to bootable media (to 16GB USB flash disk or to 30/60GB SSD)
Backups of deap00 home disk, deapdaqgw and of the backups of deap00, deap01 and lxdeap01 system disks are done to TRIUMF ladd00 data disk.
- ladd00 cron job /etc/cron.d/cron.d-backup-lxdaq
- runs script /root/backup.os.all
- runs script "/root/backup.os deapdaqgw.snolab.ca" writes to /ladd/data0/backup.os/deapdaqgw.snolab.ca
- runs script /root/backup.deap:
cd /ladd/data0/backup.os rsync -avx --delete-after deapdaqgw.snolab.ca:/data/root/backups/deap00 deap/deap00 >> $lastlog 2>&1 rsync -avx --delete-after deapdaqgw.snolab.ca:/data/root/backups/deap01 deap/deap01 >> $lastlog 2>&1 rsync -avx --delete-after deapdaqgw.snolab.ca:/data/root/backups/lxdeap01 deap/lxdeap01 >> $lastlog 2>&1 rsync -avx --delete-after --max-size=100000000 deapdaqgw.snolab.ca:/home deap/home >> $lastlog 2>&1
[root@ladd00 ~]# ls -l /ladd/data0/backup.os/ /ladd/data0/backup.os/deap /ladd/data0/backup.os/: ... drwxr-xr-x 6 root root 4096 Jun 17 15:34 deap dr-xr-xr-x 29 root root 4096 Jul 3 08:10 deapdaqgw.snolab.ca -rw-r--r-- 1 root root 2414 Jul 8 15:01 deapdaqgw.snolab.ca.last.log -rw-r--r-- 1 root root 113454 Jul 8 15:24 deap.last.log ... /ladd/data0/backup.os/deap: total 16 drwxr-xr-x 3 root root 4096 Jun 12 12:53 deap00 drwxr-xr-x 3 root root 4096 Jun 12 12:55 deap01 drwxr-xr-x 3 root root 4096 Jun 17 15:34 home drwxr-xr-x 3 root root 4096 Jun 12 12:56 lxdeap01 [root@ladd00 ~]#
There are no backups of the deap00 data disks.
Creating boot disks for deap01..deap05
mirrored 16GB USB Flash disks
go here Cloning_raid1_boot_disks
V7865 single 8GB/16GB USB Flash disks
The V7865 VME processors use single USB flash disks. To create the boot disks, follow instructions for #64GB_SSD_boot_disks, but clone "lxdeap01" instead of "deap01".
Single 8/16GB USB and 64GB SSD boot disks
- attach SSD disk to any of the deap01..deap05 machines (SATA+power)
- login as root to that machine
- "fdisk -l" to identify which /dev/sdX disk it is
- cd /data/root/backups
- ./clone.perl ./deap01 /dev/sdX
- observe script completes sucessfully and prints "Done. You can remove /dev/sdX and try to boot from it."
- disconnect the disk
- connect to new machine, try to boot from it