DEAP: Difference between revisions
daqwiki>Olchansk m (→DHCP servers) |
daqwiki>Olchansk m (→DHCP servers) |
||
Line 136: | Line 136: | ||
</pre> | </pre> | ||
=== DHCP | === DHCP configuration === | ||
Main DHCP server is | Main DHCP server is running on the gateway machine. It provides IP addresses to deap00 and most other devices. | ||
Additional DHCP server is running on deap00. It provides IP addresses to members of the DEAP-NIS cluster (deap01..deap05). | |||
See following sections for more details. | |||
DEAP network nodes with statically configured IP addresses: | DEAP network nodes with statically configured IP addresses: |
Revision as of 10:38, 18 February 2013
Links
- https://deap06.triumf.ca/ MIDAS status page
- https://deap06.triumf.ca/elog/ ELOG
- https://deap06.triumf.ca/ganglia/ GANGLIA system monitoring
- https://deap06.triumf.ca/nodeinfo/config.html computer configuration and status
- https://deap06.triumf.ca/vme01/ VME crate 1
- https://deap06.triumf.ca/vme02/ VME crate 2
- https://deap06.triumf.ca/vme03/ VME crate 3
- https://deap06.triumf.ca:8443/ UPS
- https://deap06.triumf.ca:8444/ Power Distribution Unit
- https://deap06.triumf.ca:8445/ KVM (deapkvm8)
- https://deapkvm8 (192.168.1.17) (deapkvm8) ATEN IP KVM (only works from deap06 gateway machine) (deapuser, deapuser)
DAQ machines
- deapdaqgw: gateway machine (DHCP for deap00, UPS, CDU, NAT)
- deap00: main daq machine (storage, home directories, central services, etc)
- deap01..05: A3818 daq machines
- deap06.triumf.ca: temporary network gateway
- deap07: spare A3818 daq machine (old deap00) (used for PCIe ADC DAQ)
- deap08: spare deap00 machine
- lxdeap01: VME daq machine
- deapvme01..03: VME crate power supplies
- deapups: DAQ UPS unit
- deapcdu: DAQ power distribution unit
- deapkvm8: 8-port IP KVM
- mscb520: MSCB-ETH bridge
deapups connections
- C13(f) : Switched Load 1 - N/C
- C13(f) : Switched Load 2 - N/C
- C13(f) : Switched Load 3 - N/C
- C13(f) : Unswitched Load 4 - Rack Fan Left
- C13(f) : Unswitched Load 4 - Rack Fan Centre
- C13(f) : Unswitched Load 4 - Rack Fan Right
- C13(f) : Unswitched Load 4 -
- C13(f) : Unswitched Load 4 -
- C19(f) : Unswitched Load 4 - CDU
UPS configuration
Tripp-lite management software
ssh deap00 /var/tripplite/poweralert/console/pal_console.sh
USB connections
- lsusb -v | grep -i product
- lsusb -v | grep -i serial
NUT UPS configuration
- http://www.networkupstools.org/
- ssh root@deap00
- (initial software checkout) cd ~; svn checkout svn://anonscm.debian.org/nut/trunk nut
- (update) cd ~/nut; svn update
- (build, install) cd ~/nut; ./autogen.sh; ./configure --prefix=/opt/nut; make -j6 -k; make -k; make -k install
- config file: /opt/nut/etc/ups.conf
[ups1] driver = usbhid-ups port = auto desc = "ups1" serial = "2231ELCPS720300082" [ups2] driver = usbhid-ups port = auto desc = "ups2" serial = "2211KW0PS733900093" [ups3] driver = usbhid-ups port = auto desc = "ups3" serial = "2231ELCPS720300090"
- restart drivers: /opt/nut/bin/upsdrvctl start
- reload upsd: /opt/nut/sbin/upsd -c reload
deapcdu connections
1 : DEAP00 2 : DEAP01 3 : DEAP02 4 : DEAP03 5 : DEAP04 6 : DEAP05 7 : DEAPVME02 8 : DEAP07 ------------ 9 : DEAP08 10 : SCB1 11 : SCB2 12 : n/c 13 : n/c 14 : DEAPDAQGW (temp) 15 : DEAPMPOD 16 : DEAPKVM8
deapcdu snmp
- snmpwalk -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c public deapcdu sentry3
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.1 i 1 ### turn on outlet 1
- snmpset -v 2c -M +/home/deap/online/slow/fesnmp -m +Sentry3-MIB -c write deapcdu outletControlAction.1.1.3 i 2 ### turn off outlet 3
Network configuration (TRIUMF)
DEAP DAQ machines are on the private network (see below).
Gateway to TRIUMF network is 1U machine deap06.triumf.ca connected to the LADD-NIS cluster (deap account on ladd00).
Gateway services running on the gateway:
- DHCP server for the 192.168.1.x network (/etc/hosts, /etc/dhcp/dhcpd.conf)
- apache SSL/https proxy for MIDAS status page, ELOG, ganglia and nodeinfo (/etc/httpd/conf.d/ssl.conf, /etc/httpd/htpasswd)
- NAT proxy from private network to the TRIUMF network (/etc/rc.local). Makes the internet accessible from deapNN machines.
Network configuration (DEAP)
The DEAP DAQ cluster is configured for standalone running with or without an internet connection.
(NB: Some internet functions are required: access to NTP for time synchronization and access to Linux package repositories to install packages, etc)
Network numbers
Network numbers are assigned by deapdaqgw and deap00 DHCP servers:
192.168.1.x (netmask 255.255.255.0): main private network 192.168.2.x: deap00a-deap01a connection 192.168.3.x: deap00b-deap02b connection 192.168.4.x: deap00c-deap03c connection 192.168.5.x: deap00d-deap04d connection
DHCP configuration
Main DHCP server is running on the gateway machine. It provides IP addresses to deap00 and most other devices.
Additional DHCP server is running on deap00. It provides IP addresses to members of the DEAP-NIS cluster (deap01..deap05).
See following sections for more details.
DEAP network nodes with statically configured IP addresses:
- deapmod : Wiener MPOD firmware does not support DHCP
Gateway machine
deapdaqgw is the gateway machine that provides internet access to the DEAP DAQ cluster.
- NAT ("network address translation", see /etc/rc.local)
- IP address assignement via /etc/hosts
- DNS via dnsmasq serving contents of /etc/hosts and bridge to upstream DNS (configured in /etc/resolv.conf by upstream DHCP)
- DHCP for all machines except deap01..deap05 via /etc/dhcpd/dhcpd.conf, Special DHCP settings:
- "option routers" sets the "default route" through the gateway machine itself
- "option domain-name-servers" sets the DNS server in /etc/resolv.conf to dnsmasq on the gateway machine
- "option ntp-servers" specifies the time servers, (but not used by any hosts?)
- "option domain-name" is not specified, leaving the "domain" and "search" entries of /etc/resolv.conf blank (actually the entries are not there)
- unknown clients are assigned IP addresses in the range 192.168.x.200 through .250.
- MSCB nodes are assigned "infinite" leases by avoid a bug in MSCB firmware
- remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
deap00 machine
deap00 is the main machine for the DEAP DAQ cluster.
- DHCP for frontend machines, remember to "service dhcpd restart" after editing /etc/dhcp/dhcpd.conf
- NIS master
network port assignements:
- eth0: main connection to the local network, IP address is assigned by DHCP from the gateway machine
- eth1: no used, reserved for special link to the data storage machine
- eth2..eth5: Intel 4-port card, ports are numbered from the top, connected to deap01..deap04 in order.
NIS configuration
Usernames, passwords and hostnames are distributed using NIS:
- domain name: DEAP-NIS
- deap00 is the master server
- there are no secondary servers
- hostnames are distributed using NIS (from deap00:/etc/hosts, MUST MATCH deap06:/etc/hosts!)
- to solve chicken-and-egg problem deap00 IP address has to be listed in each machine /etc/hosts (MUST MATCH deap06 and deap00 /etc/hosts!) (SL6.2+ NIS broadcast does not work so deap00 has to be listed in each machine /etc/yp.conf, also NFS filesystems are mounted before NIS is started).
- also NIS has to be listed in front of DNS in the "hosts:" entry of /etc/nsswitch.conf
DNS kludge:
- normally DNS would be used to distribute IP addresses and hostnames to the DHCP server, to deap00 and to other deap machines. But we do not have a private DNS server and the TRIUMF DNS server has the wrong IP addresses for deap machines (142.90.x.x).
- deap06 DHCP is telling all machines to use the TRIUMF DNS server (to resolve internet addresses - google, etc). To avoid confusion between local deap00, etc hostnames and deap00, etc hostnames from TRIUMF, /etc/nsswitch.conf "hosts:" entry has to list "nis" before "dns".
- hopefully the deap00, etc hostnames will be resolved correctly by the SNOlab DNS servers and all this kludging can go away.
System monitoring tools:
- ganglia
- triumf_nodeinfo
- konstantin's ganglia packages (monitor_nfs, ganglia sensors, top, etc) - To install/update: yum --disablerepo="*" --enablerepo=konstantin update
- diskscrub
Backups
- backups of Linux images:
- backups of linux images are done to deap00:/data/root/backups using cron job on deap00:/etc/cron.d/backup.lxdaq.cron and deap00:~root/backup.lxdaq
- backups of home directories: NONE
- backups of data disks: NONE
Creating boot disks for deap01..deap05
mirrored 16GB USB Flash disks
go here Cloning_raid1_boot_disks
V7865 single 8GB/16GB USB Flash disks
The V7865 VME processors use single USB flash disks. To create the boot disks, follow instructions for #64GB_SSD_boot_disks, but clone "lxdeap01" instead of "deap01".
Single 8/16GB USB and 64GB SSD boot disks
- attach SSD disk to any of the deap01..deap05 machines (SATA+power)
- login as root to that machine
- "fdisk -l" to identify which /dev/sdX disk it is
- cd /data/root/backups
- ./clone.perl ./deap01 /dev/sdX
- observe script completes sucessfully and prints "Done. You can remove /dev/sdX and try to boot from it."
- disconnect the disk
- connect to new machine, try to boot from it