DEAP
Links
- https://deap06.triumf.ca/ MIDAS status page
- https://deap06.triumf.ca/elog/ ELOG
- https://deap06.triumf.ca/ganglia/ GANGLIA system monitoring
- https://deap06.triumf.ca/nodeinfo/config.html computer configuration and status
- https://deap06.triumf.ca/vme01/ VME crate 1
- https://deap06.triumf.ca/vme02/ VME crate 2
- https://deap06.triumf.ca/vme03/ VME crate 3
- https://deap06.triumf.ca:8443/ UPS
- https://deap06.triumf.ca:8444/ Power Distribution Unit
- https://deap06.triumf.ca:8445/ KVM (deapkvm8)
- https://deapkvm8 (192.168.1.17) (deapkvm8) ATEN IP KVM (only works from deap06 gateway machine) (deapuser, deapuser)
DAQ machines
- deapdaqgw: gateway machine (DHCP for deap00, UPS, CDU, NAT)
- deap00: main daq machine (storage, home directories, central services, etc)
- deap01..05: A3818 daq machines
- deap06.triumf.ca: temporary network gateway
- deap07: spare A3818 daq machine (old deap00) (used for PCIe ADC DAQ)
- deap08: spare deap00 machine
- lxdeap01: VME daq machine
- deapvme01..03: VME crate power supplies
- deapups: DAQ UPS unit
- deapcdu: DAQ power distribution unit
- deapkvm8: 8-port IP KVM
- mscb520: MSCB-ETH bridge
deapups connections
- C13(f) : Switched Load 1 - N/C
- C13(f) : Switched Load 2 - N/C
- C13(f) : Switched Load 3 - N/C
- C13(f) : Unswitched Load 4 - Rack Fan Left
- C13(f) : Unswitched Load 4 - Rack Fan Centre
- C13(f) : Unswitched Load 4 - Rack Fan Right
- C13(f) : Unswitched Load 4 -
- C13(f) : Unswitched Load 4 -
- C19(f) : Unswitched Load 4 - CDU
UPS configuration
Tripp-lite management software
ssh deap00 /var/tripplite/poweralert/console/pal_console.sh
USB connections
- lsusb -v | grep -i product
- lsusb -v | grep -i serial
NUT UPS configuration
- http://www.networkupstools.org/
- ssh root@deap00
- (initial software checkout) cd ~; svn checkout svn://anonscm.debian.org/nut/trunk nut
- (update) cd ~/nut; svn update
- (build, install) cd ~/nut; ./autogen.sh; ./configure --prefix=/opt/nut; make -j6 -k; make -k; make -k install
- config file: /opt/nut/etc/ups.conf
[ups1] driver = usbhid-ups port = auto desc = "ups1" serial = "2231ELCPS720300082" [ups2] driver = usbhid-ups port = auto desc = "ups2" serial = "2211KW0PS733900093" [ups3] driver = usbhid-ups port = auto desc = "ups3" serial = "2231ELCPS720300090"
- restart drivers: /opt/nut/bin/upsdrvctl start
- reload upsd: /opt/nut/sbin/upsd -c reload
deapcdu connections
1 : DEAP00 2 : DEAP01 3 : DEAP02 4 : DEAP03 5 : DEAP04 6 : DEAP05 7 : DEAPVME02 8 : DEAP07 ------------ 9 : DEAP08 10 : SCB1 11 : SCB2 12 : n/c 13 : n/c 14 : DEAPDAQGW (temp) 15 : DEAPMPOD 16 : DEAPKVM8
deapcdu snmp
- snmpwalk -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c public deapcdu sentry3
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.1 i 1 ### turn on outlet 1
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.3 i 2 ### turn off outlet 3
Network configuration (TRIUMF)
DEAP DAQ machines are on the private network (see below).
Gateway to TRIUMF network is 1U machine deap06.triumf.ca connected to the LADD-NIS cluster (deap account on ladd00).
Gateway services running on the gateway:
- DHCP server for the 192.168.1.x network (/etc/hosts, /etc/dhcp/dhcpd.conf)
- apache SSL/https proxy for MIDAS status page, ELOG, ganglia and nodeinfo (/etc/httpd/conf.d/ssl.conf, /etc/httpd/htpasswd)
- NAT proxy from private network to the TRIUMF network (/etc/rc.local). Makes the internet accessible from deapNN machines.
Network configuration (DEAP)
The DEAP DAQ cluster is configured for standalone running with or without an internet connection.
(NB: Some internet functions are required: access to NTP for time synchronization and access to Linux package repositories to install packages, etc)
Network numbers
Network numbers are assigned by deapdaqgw and deap00 DHCP servers:
192.168.1.x (netmask 255.255.255.0): main private network 192.168.2.x: deap00a-deap01a connection 192.168.3.x: deap00b-deap02b connection 192.168.4.x: deap00c-deap03c connection 192.168.5.x: deap00d-deap04d connection
DHCP servers
Main DHCP server is deapdaqgw. ... On deap00 there is a dhcp server running
DEAP network nodes with statically configured IP addresses:
- deapmod : Wiener MPOD firmware does not support DHCP
Gateway machine
deapdaqgw is the gateway machine that provides internet access to the DEAP DAQ cluster.
- NAT ("network address translation", see /etc/rc.local)
- IP address assignement via /etc/hosts
- DNS via dnsmasq serving contents of /etc/hosts and bridge to upstream DNS (configured in /etc/resolv.conf by upstream DHCP)
- DHCP for all machines except deap01..deap05 via /etc/dhcpd/dhcpd.conf, Special DHCP settings:
- "option routers" sets the "default route" through the gateway machine itself
- "option domain-name-servers" sets the DNS server in /etc/resolv.conf to dnsmasq on the gateway machine
- "option ntp-servers" specifies the time servers, (but not used by any hosts?)
- "option domain-name" is not specified, leaving the "domain" and "search" entries of /etc/resolv.conf blank (actually the entries are not there)
- unknown clients are assigned IP addresses in the range 192.168.x.200 through .250.
- MSCB nodes are assigned "infinite" leases by avoid a bug in MSCB firmware
- remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
deap00 machine
deap00 is the main machine for the DEAP DAQ cluster.
- DHCP for frontend machines, remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
- NIS master
network port assignements:
- eth0: main connection to the local network, IP address is assigned by DHCP from the gateway machine
- eth1: no used, reserved for special link to the data storage machine
- eth2..eth5: Intel 4-port card, ports are numbered from the top, connected to deap01..deap04 in order.
NIS configuration
Usernames, passwords and hostnames are distributed using NIS:
- domain name: DEAP-NIS
- deap00 is the master server
- there are no secondary servers
- hostnames are distributed using NIS (from deap00:/etc/hosts, MUST MATCH deap06:/etc/hosts!)
- to solve chicken-and-egg problem deap00 IP address has to be listed in each machine /etc/hosts (MUST MATCH deap06 and deap00 /etc/hosts!) (SL6.2+ NIS broadcast does not work so deap00 has to be listed in each machine /etc/yp.conf, also NFS filesystems are mounted before NIS is started).
- also NIS has to be listed in front of DNS in the "hosts:" entry of /etc/nsswitch.conf
DNS kludge:
- normally DNS would be used to distribute IP addresses and hostnames to the DHCP server, to deap00 and to other deap machines. But we do not have a private DNS server and the TRIUMF DNS server has the wrong IP addresses for deap machines (142.90.x.x).
- deap06 DHCP is telling all machines to use the TRIUMF DNS server (to resolve internet addresses - google, etc). To avoid confusion between local deap00, etc hostnames and deap00, etc hostnames from TRIUMF, /etc/nsswitch.conf "hosts:" entry has to list "nis" before "dns".
- hopefully the deap00, etc hostnames will be resolved correctly by the SNOlab DNS servers and all this kludging can go away.
System monitoring tools:
- ganglia
- triumf_nodeinfo
- konstantin's ganglia packages (monitor_nfs, ganglia sensors, top, etc) - To install/update: yum --disablerepo="*" --enablerepo=konstantin update
- diskscrub
Backups
- backups of Linux images:
- backups of linux images are done to deap00:/data/root/backups using cron job on deap00:/etc/cron.d/backup.lxdaq.cron and deap00:~root/backup.lxdaq
- backups of home directories: NONE
- backups of data disks: NONE
Creating boot disks for deap01..deap05
mirrored 16GB USB Flash disks
go here Cloning_raid1_boot_disks
V7865 single 8GB/16GB USB Flash disks
The V7865 VME processors use single USB flash disks. To create the boot disks, follow instructions for #64GB_SSD_boot_disks, but clone "lxdeap01" instead of "deap01".
Single 8/16GB USB and 64GB SSD boot disks
- attach SSD disk to any of the deap01..deap05 machines (SATA+power)
- login as root to that machine
- "fdisk -l" to identify which /dev/sdX disk it is
- cd /data/root/backups
- ./clone.perl ./deap01 /dev/sdX
- observe script completes sucessfully and prints "Done. You can remove /dev/sdX and try to boot from it."
- disconnect the disk
- connect to new machine, try to boot from it