DEAP: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
daqwiki>Olchansk
daqwiki>Olchansk
Line 32: Line 32:
* power up and turn on all 3 UPSes
* power up and turn on all 3 UPSes
* network switch, CDU and gateway machine are connected to non-switchable UPS ports, they should power up and boot
* network switch, CDU and gateway machine are connected to non-switchable UPS ports, they should power up and boot
* one should be able to ping the gateway machine, ssh deapgw@deapdaqgw and connect to the UPS and CDU through the HTTPS proxy
* one should be able to ping the gateway machine
* from the gateway machine, deapups and deapcdu should ping
* ssh deapgw@deapdaqgw, ping deapups and deapcdu
* connect to the CDU (through the HTTPS proxy) and turn on deap00
* open the UPS and CDU web pages through the HTTPS proxy
* wait for deap00 to boot and start mhttpd (one can connect to the DEAP MIDAS status page)
* from the CDU web page (outlet control), turn on deap00
* wait for deap00 to boot (ping deap00)
* mhttpd and elog should start automatically
* open the DEAP MIDAS status page
* now one can ssh deap@deapdaqgw then ssh deap00
* now one can ssh deap@deapdaqgw then ssh deap00
* connect to the MIDAS status page and start the slow controls frontends: UPS, CDU, VME crates and NutUps.
* on the MIDAS status page, start the slow controls frontends: UPS, CDU, VME crates and NutUps, clear all alarms
* connect to the CDU and turn on all remaining power outlets (use "global control action" - "on" - "apply")  
* on the CDU web page, turn on all power outlets (use "global control action" - "on" - "apply")  
* wait for deap01..deap05 to boot (no simple indication, but they should ping from deap00)
* wait for deap01..deap05 to boot (no simple indication, but they should ping from deap00)
* from the MIDAS "programs" page, start all slow controls frontends
* from the MIDAS slow controls pages, turn on the MPOD HV PS and all 3 VME crates
* from the MIDAS slow controls pages, turn on the MPOD HV PS and all 3 VME crates
* wait for lxdeap01 to boot (should be able to ping from deap00)
* wait for lxdeap01 to boot (should be able to ping from deap00)

Revision as of 09:45, 1 March 2013

Links

DAQ machines

  • deapdaqgw: gateway machine (DHCP for deap00, UPS, CDU, NAT)
  • deap00: main daq machine (storage, home directories, central services, etc)
  • deap01..05: A3818 daq machines
  • deap06.triumf.ca: temporary network gateway
  • deap07: spare A3818 daq machine (old deap00) (used for PCIe ADC DAQ)
  • deap08: spare deap00 machine
  • lxdeap01: VME daq machine
  • deapvme01..03: VME crate power supplies
  • deapups: DAQ UPS unit
  • deapcdu: DAQ power distribution unit
  • deapkvm8: 8-port IP KVM
  • mscb520: MSCB-ETH bridge

Power up sequence

  • power up and turn on all 3 UPSes
  • network switch, CDU and gateway machine are connected to non-switchable UPS ports, they should power up and boot
  • one should be able to ping the gateway machine
  • ssh deapgw@deapdaqgw, ping deapups and deapcdu
  • open the UPS and CDU web pages through the HTTPS proxy
  • from the CDU web page (outlet control), turn on deap00
  • wait for deap00 to boot (ping deap00)
  • mhttpd and elog should start automatically
  • open the DEAP MIDAS status page
  • now one can ssh deap@deapdaqgw then ssh deap00
  • on the MIDAS status page, start the slow controls frontends: UPS, CDU, VME crates and NutUps, clear all alarms
  • on the CDU web page, turn on all power outlets (use "global control action" - "on" - "apply")
  • wait for deap01..deap05 to boot (no simple indication, but they should ping from deap00)
  • from the MIDAS slow controls pages, turn on the MPOD HV PS and all 3 VME crates
  • wait for lxdeap01 to boot (should be able to ping from deap00)
  • from the MIDAS "programs" page, start all daq frontends
  • clear all MIDAS alarms
  • start a run, wait for a bit, stop the run (to confirm all frontends are happy)
  • DAQ is now ready to take data

deapups connections

TBW

UPS configuration

Tripp-lite management software

ssh root@deap00
/opt/nut/bin/upsdrvctl stop
(unplug USB cable from UPS, wait 10 sec, plug it back in)
service pald restart
/var/tripplite/poweralert/console/pal_console.sh
### restore NUT monitoring
service pald stop ### this takes a few minutes
/opt/nut/bin/upsdrvctl start

USB connections

  • lsusb -v | grep -i product
  • lsusb -v | grep -i serial

NUT UPS configuration

NB - UPS names are tied to the UPS serial numbers via the NUT config file!

[ups1]
        driver = usbhid-ups
        port = auto
        desc = "ups1"
        serial = "2231ELCPS720300082"
[ups2]
        driver = usbhid-ups
        port = auto
        desc = "ups2"
        serial = "2211KW0PS733900093"
[ups3]
        driver = usbhid-ups
        port = auto
        desc = "ups3"
        serial = "2231ELCPS720300090"
  • restart drivers: /opt/nut/bin/upsdrvctl start
  • reload upsd: /opt/nut/sbin/upsd -c reload
  • see ups status: /opt/nut/bin/upsc ups1

deapcdu connections

 1 : DEAP00
 2 : DEAP01
 3 : DEAP02
 4 : DEAP03
 5 : DEAP04
 6 : DEAP05
 7 : 
 8 : 
------------
 9 : 
10 : SCB1
11 : SCB2
12 : DEAPMPOD
13 : DEAPVME02
14 :
15 :
16 : DEAPKVM8

deapcdu snmp

  • snmpwalk -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c public deapcdu sentry3
  • snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.1 i 1 ### turn on outlet 1
  • snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.3 i 2 ### turn off outlet 3

VME and MPOD snmp

  • snmpwalk -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 crate
  • snmpset -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 sysMainSwitch.0 i 1 ### turn crate on
  • snmpset -v 2c -M +/home/deap/online/slow/fewiener -m +WIENER-CRATE-MIB -c guru deapvme01 sysMainSwitch.0 i 0 ### turn crate off

Network configuration (TRIUMF)

DEAP DAQ machines are on the private network (see below).

Gateway to TRIUMF network is 1U machine deap06.triumf.ca connected to the LADD-NIS cluster (deap account on ladd00).

Gateway services running on the gateway:

  • DHCP server for the 192.168.1.x network (/etc/hosts, /etc/dhcp/dhcpd.conf)
  • apache SSL/https proxy for MIDAS status page, ELOG, ganglia and nodeinfo (/etc/httpd/conf.d/ssl.conf, /etc/httpd/htpasswd)
  • NAT proxy from private network to the TRIUMF network (/etc/rc.local). Makes the internet accessible from deapNN machines.

Network configuration (DEAP)

The DEAP DAQ cluster is configured for standalone running with or without an internet connection.

(NB: Some internet functions are required: access to NTP for time synchronization and access to Linux package repositories to install packages, etc)

Network numbers

Network numbers are assigned by deapdaqgw and deap00 DHCP servers:

192.168.1.x (netmask 255.255.255.0): main private network
192.168.2.x: deap00a-deap01a connection
192.168.3.x: deap00b-deap02b connection
192.168.4.x: deap00c-deap03c connection
192.168.5.x: deap00d-deap04d connection

DHCP configuration

Main DHCP server is running on the gateway machine. It provides IP addresses to deap00 and most other devices.

Additional DHCP server is running on deap00. It provides IP addresses to members of the DEAP-NIS cluster (deap01..deap05).

See following sections for more details.

DEAP network nodes with statically configured IP addresses:

  • deapmod : Wiener MPOD firmware does not support DHCP

NB - is there a race condition between gateway and deap00 DHCP servers in assigning "unknown mahine IPs" to deap01..deap05?!?

Gateway machine

deapdaqgw is the gateway machine that provides internet access to the DEAP DAQ cluster.

  • NAT ("network address translation", see /etc/rc.local)
  • IP address assignement via /etc/hosts
  • DNS via dnsmasq serving contents of /etc/hosts and bridge to upstream DNS (configured in /etc/resolv.conf by upstream DHCP)
  • DHCP for all machines except deap01..deap05 via /etc/dhcpd/dhcpd.conf, Special DHCP settings:
    • "option routers" sets the "default route" through the gateway machine itself
    • "option domain-name-servers" sets the DNS server in /etc/resolv.conf to dnsmasq on the gateway machine
    • "option ntp-servers" specifies the time servers, (but not used by any hosts?)
    • "option domain-name" is not specified, leaving the "domain" and "search" entries of /etc/resolv.conf blank (actually the entries are not there)
    • unknown clients are assigned IP addresses in the range 192.168.x.200 through .250.
    • MSCB nodes are assigned "infinite" leases by avoid a bug in MSCB firmware
    • remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
  • HTTPS proxy for midas, elog, and other web-connected devices (see links above). Edit /etc/httpd/conf.d/*.conf

deap00 machine

deap00 is the main machine for the DEAP DAQ cluster.

  • DHCP for frontend machines, remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
  • NIS master
  • NFS export of home disks, data disks (NFS exports list: edit /etc/netgroup, run "make -C /var/yp")

network port assignements:

  • eth0: main connection to the local network, IP address is assigned by DHCP from the gateway machine
  • eth1: no used, reserved for special link to the data storage machine
  • eth2..eth5: Intel 4-port card, ports are numbered from the top, connected to deap01..deap04 in order.

NIS configuration

Usernames, passwords and hostnames are distributed using NIS:

  • domain name: DEAP-NIS
  • deap00 is the master server
  • there are no secondary servers

System monitoring tools

  • ganglia
  • triumf_nodeinfo
  • konstantin's ganglia packages (monitor_nfs, ganglia sensors, top, etc) - To install/update: yum --disablerepo="*" --enablerepo=konstantin update
  • diskscrub

Backups

  • backups of Linux images:
    • backups of linux images are done to deap00:/data/root/backups using cron job on deap00:/etc/cron.d/backup.lxdaq.cron and deap00:~root/backup.lxdaq
  • backups of home directories: NONE
  • backups of data disks: NONE

Creating boot disks for deap01..deap05

mirrored 16GB USB Flash disks

go here Cloning_raid1_boot_disks

V7865 single 8GB/16GB USB Flash disks

The V7865 VME processors use single USB flash disks. To create the boot disks, follow instructions for #64GB_SSD_boot_disks, but clone "lxdeap01" instead of "deap01".

Single 8/16GB USB and 64GB SSD boot disks

  • attach SSD disk to any of the deap01..deap05 machines (SATA+power)
  • login as root to that machine
  • "fdisk -l" to identify which /dev/sdX disk it is
  • cd /data/root/backups
  • ./clone.perl ./deap01 /dev/sdX
  • observe script completes sucessfully and prints "Done. You can remove /dev/sdX and try to boot from it."
  • disconnect the disk
  • connect to new machine, try to boot from it