SLinstall
Notes
- these instructions are periodically updated to include items needed for older/newer versions of Linux. They are marked like this: (SL4.2+) means Scientific Linux 4.2 and newer; (SL4 is equivalent to FC3). (FC5 only) means Fedora Core 5; etc.
- obsolete items are marked by the "#" sign at the beginning of the line and sometimes have a comment about the reason for removal.
- typically, we do not "upgrade" machines using the Red Hat "upgrade" function. Instead, we save critical files from the old installation and do a "fresh install" from scratch
- starting with RHEL7, the recommended OS is CentOS7 (instead of SL7).
Disk configurations
In the era of SSD storage, 6TB HDDs and $10 USB flash, use these disk configurations:
- single SSD - 120GB min - single partition for "/", no swap partition (create a swap file if swap is needed) - for non-critical machine with no local data storage (OS only)
- dual SSD - 2x120GB min - all partitions mirrored (RAID1), 30GB "/", 30GB swap, rest for /home1 and data - for machines required for beam data taking, with local user directories
- single SSD + 2x4TB or 2x6TB HDD - SSD partition (a) all "/", (b) 30GB "/", 30GB swap, rest "/home1"; HDD partition is mirrored RAID1 "/data" - machine with local user directories and local data storage (complete midas daq server)
- single SSD + 6-8x6TB HDD - same as above, HDD partition as RAID6 "/data", use XFS filesystem - for small storage server machines
For VME processors:
- network boot - VME-CPU#Network_boot - only option for V7648/V7750, do not use for V7805 (no netboot from GigE), optional for V7865/XVB-602
- USB boot - 8GB USB for V7805, 16GB USB for V7865/XVB-602
Preparation
- save /etc, /var, /root, /opt, (if needed: /usr/local, /tftpboot) by rsync to some data disk (/ladd/data0/root)
- check that "/" partition (it will be overwritten) is different from /home1 and /data partitions
- note the MAC addresses of all network interfaces, add them to ladd00 dhcpd.conf to enable PXE boot into the SL "network installer"
- shutdown
Running SL installer
- Start installation of the new system:
- IMPORTANT: if you have WDC "advanced partitioning disks" (4kB sectors), disks have to be repartitioned before use, see special instructions (TBW) (note: use fdisk -H 224 -S 56 /dev/sdx)
- (NOT AVAILABLE ANYMORE) boot from latest "SL5 kickstart" CD from Kelvin Raywood or PXE boot the latest SL installation image. after the system enters graphical mode, one can remove the CD- the installation is running over the network
- boot from ladd00 PXE server - after power up, during BIOS POST, press BIOS "boot selection menu" key (F8, F12, etc). The MAC of the network interface should be listed in the ladd00 dhcpd.conf file. In the PXE boot menu, select SL6x-64 kickstart install.
- linux will boot into the graphical installer
- two questions will be asked: how to partition the disks and the root password. The rest of the installation is automatic.
- to partition the disks, select "Custom partioning":
- If using a single SSD (30 or 60 GB), use whole disk for "/" partition (no swap partition)
- If using single HD, create 4 primary partitions (see below)
- If using dual HDs (should be same size), create 4 "RAID1" (see below) (DO NOT USE LVM)
- Use these partition sizes:
- "/" - 40GB - md0 or sda1
- swap - 32 GB - md1 or sda2
- "/home1" - 100 GB - md2 or sda3
- "/data" - remaining disk space - md3 or sda4
- if installer asks questions about boot loader, accept default settings
- package installation will proceed automatically
- when finished will ask "press button to reboot"
- boot newly installed system
- if installing without a kickstart, some questions need to be answered:
- Firewall: disabled
- SELinux: disabled
- KDump: disabled
- Date and Time: leave kickstart defaults (should be NTP using TRIUMF time servers)
- Create user: skip - will be handled during post-installation
- The system will reboot again
- after the final reboot, login as root and proceed with post-installation.
Running installer (CentOS7)
The CentOS7/SL7 installer is very different from the SL6 installer. There are some improvements, and there are several quirks:
- the disk management part was completely FUBARed.
- boot loader is now installed to the correct disk (no longer overwrites the usb-installer itself)
- vanilla installer removed all support for NIS and after first boot requires creation of fake local user. To avoid this, use the usb-installer or a custom kickstart installer (remove package "gnome-initial-setup"
Instructions for using the usb-installer:
- disconnect machine from network
- plug the usb-installer into usb3 port (blue colour)
- reboot machine, select booting from usb (press F8 on ASUS motherboards)
- usb-installer boot menu offers to install CentOS7, go there
- CentOS7 should boot (many messages scroll on screen)
- into graphical mode
- into installer main menu
- all installer options should "happy" except for the "installation destination"
- go to the "installation destination" menu
- unselect all disks except for the SSD where the OS will be installed
- (MOST IMPORTANT: unselect the USB installer disk!)
- select "I will configure..."
- say "done"
- the "manual partitionning" menu will open
- partition the SSD (good luck figuring out this new menu system).
- recommended is to use 120GB SSD, partition the whole SSD as one large partition ("normal partition" choice), use XFS filesystem (BTRFS is still experimental), no swap. (installer will complain, but accept lack of swap):
- use the "-" button to delete all existing partitions
- select "standard partition"
- click on the "+" button
- in the "Add new partition" dialog, set mount point "/", capacity blank, click "add mount point"
- check capacity (should be full size of SSD), check filesystem type (should be XFS)
- say "done", there will be a warning about absent swap partition, say "done" again.
- in the big useless dialog, say "accept changes"
- should be back to the "installation summary" screen, "installation destination" should be happy now
- after everything is happy, say "begin installation"
- as the installation proceeds, set the password for the root user
- after installation is complete, reboot the machine
- unplug the usb-installer, CentOS7 should boot from SSD into the login screen
- click on "not listed?", login as root (what's with that?!?)
- setup network connection:
- connect the network cable
- go to the gnome "network settings" (icon on top-right of screen)
- select "wired"
- select "add profile..."
- in "Identity", set "name" to "static"
- in "Identity", check that "Connect automatically" and "Make available..." is enabled
- in "IPv4", set "Addresses" to "manual" instead of "dhcp"
- enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19
- say "Add", then close/quit the network settings
- network should be up, ping something
- run: yum update -y
- check new kernel is installed: ls -l /boot
- logout and restart (good luck finding these buttons in the gui!)
- confirm correct linux kernel is selected during boot (-229.20, not the original installer kernel)
- login as root, confirm network is up, proceed with the rest of these instructions
Configure SSH
(+CentOS7)
- Login from the console
- restore the SSH keys from backup (/etc/ssh/*key*)
- service sshd restart
- ssh into the new machine as root
- ssh root@localhost, ctrl-C
- ### this is done later from Konstantin's git repository - scp root@ladd00:/root/authorized_keys ~root/.ssh/
- (not needed for SL5.5 kickstart) check that /etc/ssh/ssh_config contains "ForwardX11 yes" and "ForwardX11Trusted yes":
echo " ForwardX11 yes" >> /etc/ssh/ssh_config echo " ForwardX11Trusted yes" >> /etc/ssh/ssh_config
Configure disks, partitions, raid arrays and filesystems
NOTE1: For compatibility with the SL6 installer, use "fdisk -u" when creating new partitions.
NOTE2a: For 2TB disks or bigger, use "gdisk" to create GPT partitions (yum install epel-release; yum install gdisk)
NOTE2c: (SL6) 3TB, 4TB, 6TB disks do not require anything special - proceed with installation as normal.
Typical disk configuration for DAQ use has 2 large disks with system ("/"), swap, home and data partitions, fully mirrored across the 2 disks using RAID1 software raid (MD).
In this fully mirrored configuration, a DAQ system will continue to operate without interruption and without performance degradation when there is a full or partial failure of either of the two disks.
If disks are hot-swappable, the failed or defective disk can then be physically replaced by a spare, the spare disk can be partioned and added to the RAID1 array, restoring full normal operation, without shutting down or rebooting the system or interrupting data taking. (Since SATA, eSATA and USB are always electrically hot-swappable, disk hot-replacement is more of a mechanical issue).
For small disks using traditional partitions (<=2TB) a typical layout looks like this:
[root@ladd06 ~]# fdisk -l ### use "fdisk -lu" instead!!! Disk /dev/sdb: 750.2 GB, 750156374016 bytes ... Device Boot Start End Blocks Id System /dev/sdb1 * 1 5100 40960000 fd Linux raid autodetect /dev/sdb2 5100 9179 32768000 fd Linux raid autodetect /dev/sdb3 9179 21927 102399603+ fd Linux raid autodetect /dev/sdb4 21928 91201 556443405 fd Linux raid autodetect Disk /dev/sda: 750.2 GB, 750156374016 bytes ... Device Boot Start End Blocks Id System /dev/sda1 * 1 5100 40960000 fd Linux raid autodetect /dev/sda2 5100 9179 32768000 fd Linux raid autodetect /dev/sda3 9179 21927 102399603+ fd Linux raid autodetect /dev/sda4 21928 91201 556443405 fd Linux raid autodetect ... [root@ladd06 ~]# cat /proc/mdstat Personalities : [raid1] md3 : active raid1 sdb4[1] sda4[0] 556442245 blocks super 1.2 [2/2] [UU] bitmap: 0/5 pages [0KB], 65536KB chunk md2 : active raid1 sdb3[1] sda3[0] 102398507 blocks super 1.2 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md1 : active raid1 sda2[0] sdb2[1] 32766908 blocks super 1.1 [2/2] [UU] bitmap: 0/1 pages [0KB], 65536KB chunk md0 : active raid1 sda1[0] sdb1[1] 40959928 blocks super 1.0 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk ... [root@ladd06 ~]# df -kl Filesystem 1K-blocks Used Available Use% Mounted on /dev/md0 40316208 6222676 32045536 17% / /dev/md2 100790232 192116 95478192 1% /home1 /dev/md3 547709948 202404 519685432 1% /data6 ... [root@ladd06 ~]# swapon -s Filename Type Size Used Priority /dev/md1 partition 32766900 0 -1
Typical size of partitions:
- /dev/md0 : "/" : 40 Gbytes should be sufficient. SL5 fits into an 8GB "/" and SL6 fits into a 16GB "/".
- /dev/md1 : swap : 32 Gbytes. Additional swap space can be added using a swap file located on the data disk.
- /dev/md2 : "/home1" : 100 Gbytes. User home directories backed up by the amanda site backup system. Space is limited by the capacity and capability of the backup and archiving system used to protect user data against accidental file deletion, filesystem corruption and disastrous system failures.
- /dev/md3 : "/data" : data partition uses the remaining space on the disks.
Usually, the "/" and swap partitions are created through the SL installer program. The /home and /data partitions can be created at the same time.
Otherwise, for traditional partitions (disks <2TB) follow these instructions:
- create the partitions using fdisk or similar (this example creates a 60 GB partition):
- fdisk -cu /dev/sda
- Command (m for help): n
- Command action ... p
- Partition number ... 2, 3 or 4 according to what has been defined before
- First cylinder ... default
- Last cylinder ... +60000M or default
- Command action ... t
- Partition number ... : 2, 3 or 4 according to what has been defined before
- Hex code ... : fd
- Command action ... p to check all is correct
- Command (m for help): w
- fdisk /dev/sdb and repeat as above
- Reboot the machine
For GPT partitions (disks >=2TB), do this:
- install gdisk: yum install epel-release; yum install gdisk
- gdisk /dev/sdX
- if this is a new disk, do "o" to create a blank partition table
- "n" to create new partition:
- accept default for partition number
- accept default for first sector
- for last sector, say "+40G" to create 40 Gbyte partition, accept default to use all remaining disk space
- for partition type, say "fd00" to create an mdadm raid partition
- "p" to print the partition table
- "d" to delete wrong partition
- "w" to save and exit
Typical GPT layout:
[root@isdaq01 ~]# gdisk -l /dev/sdh GPT fdisk (gdisk) version 0.8.10 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Disk /dev/sdh: 3907029168 sectors, 1.8 TiB Logical sector size: 512 bytes Disk identifier (GUID): D4FCDE83-12BD-4118-ACA2-702F0E2E57C2 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 3907029134 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 83888127 40.0 GiB FD00 Linux RAID 2 83888128 150996991 32.0 GiB FD00 Linux RAID 3 150996992 360712191 100.0 GiB FD00 Linux RAID 4 360712192 3907029134 1.7 TiB FD00 Linux RAID [root@isdaq01 ~]#
- Check the newly created partitions: fdisk -lu /dev/sda; fdisk -lu /dev/sdb
- mdadm --create /dev/md2 --metadata=1.0 --bitmap=internal -l 1 -n 2 /dev/sda3 /dev/sdb3
- Check the progress of building the RAID with: more /proc/mdstat
- When finished: mkfs -t ext4 /dev/md2; tune2fs -i 0 -c 0 /dev/md2
- mkdir /home1
- Add to /etc/fstab: "/dev/md2 /home1 ext4 defaults 1 2"
- Finally mount this new partition: mount -a
- Repeat from "mkfs" for each of the data partitions
- At this point you should have these disk partitions (single-disk in parenthesis)
- /dev/md0 (/dev/sda1, sdb1) is the system partition, 40 GBytes or more
- /dev/md1 (/dev/sda2, sdb2) is the swap partition, 32 GBytes or more
- /dev/md2 (/dev/sda3, sdb3) is the /home1 partition, 100 GBytes or more
- /dev/md3 (/dev/sda4, sdb4) is the data partition
- Add array descriptions to /etc/mdadm.conf:
- mdadm -Ds >> /etc/mdadm.conf
- emacs -nw /etc/mdadm.conf ### remove duplicate entries
Example /etc/mdadm.conf:
MAILADDR root AUTO +imsm +1.x -all ARRAY /dev/md0 metadata=1.0 name=isdaq01.triumf.ca:0 UUID=055f0455:18401f41:b12abf53:2b23eca0 ARRAY /dev/md1 metadata=1.0 name=isdaq01.triumf.ca:1 UUID=dde05275:17961aaf:7c864e3a:c51477d6 ARRAY /dev/md2 metadata=1.0 name=isdaq01.triumf.ca:2 UUID=e430ba44:361f1807:41f0c491:53c10438 ARRAY /dev/md3 metadata=1.0 name=isdaq01.triumf.ca:3 UUID=a34d8c5b:cb65a435:be8ee01d:7f988927
- (SL5.5 or newer) enable raid1 bitmap files, for each /dev/mdX device: mdadm --grow --bitmap=internal /dev/mdX
Restore data from backups
- (on midm15/midm9b/midm20 only) install correct ethernet driver eepro100 not e100
- restore /home (non-NIS) or /home1 (NIS) and other required user directories from backup. (Can use /triumfcs/trshare/midas/Disks/rsync_back.csh ).
- if needed, for non-NIS only, make a softlink for /home1: ln -s /home /home1
- restore users accounts (non-NIS and NIS master only): edit /etc/passwd and /etc/shadow, append users' login info to the end of these files from the backup versions.
Post installation
- echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca" >> ~root/.forward
- emacs -nw /etc/sysconfig/network
- set "HOSTNAME=" (set it to blank to use hostname from DHCP)
- set "NETWORKWAIT=yes"
- (not needed for SL6.1, NEEDED for SL6->6.1 update) in /etc/hosts, remove exteraneous entries - only entries for localhost and localhost6 should remain
- disable selinux: edit /etc/sysconfig/selinux, change line to read: SELINUX=disabled, reboot later for change to take effect
- chmod a+r /var/log/messages
- chmod a+r /var/log/yum.log
Post installation CentOS7
CentOS 7.1 default installer will be stuck at the "create local user" screen. To proceed without creating fake local users, do: yum erase gnome-initial-setup killall Xorg
Set hostname: (use full name, i.e. daq11.triumf.ca)
emacs -nw /etc/hostname
echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca" >> ~root/.forward chmod a+r /var/log/messages chmod a+r /var/log/yum.log
Activate rc.local:
chmod a+x /etc/rc.local systemctl start rc-local systemctl status rc-local
Disable "persistent network names" (DO NOT DO THIS)
/bin/touch /etc/udev/rules.d/75-persistent-net-generator.rules /bin/rm /etc/udev/rules.d/70-persistent-net.rules #shutdown -r now
Configure NIS master (OPTIONAL)
(do not use SL6.2 for NIS master)
- yum install ypserv
- domainname DEAP-NIS
- cd /var/yp
- edit Makefile
- change NOPUSH=false
- change the "all:" entry to read: all: passwd group netgrp shadow auto.master auto.home auto.local ypservers
- touch /etc/netgroup /etc/auto.home /etc/auto.local ./ypservers
- make
- inspect created NIS maps: ls -l DEAP-NIS
- chkconfig ypserv on
- chkconfig ypxfrd on
- chkconfig yppasswdd on
- service ypserv start
Configure NIS client
- run "authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --update"
- if NIS server is SL6.2, add "--nisserver=ladd00" to above command
- (not needed with --enablepreferdns above) run "sed 's/^hosts:.*/hosts: files dns/' -i /etc/nsswitch.conf" (to undo a mistake from authconfig)
- On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
- Use "system-config-users" to add local user accounts
- NIS: check user accounts: run "ypcat -k passwd"
- echo "NISTIMEOUT=5" >> /etc/sysconfig/network
- echo "NETWORKWAIT=yes" >> /etc/sysconfig/network
Configure NIS client (CentOS7)
yum -y install ypbind authconfig echo "NISTIMEOUT=5" >> /etc/sysconfig/network echo "NETWORKWAIT=yes" >> /etc/sysconfig/network authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --nisserver ladd00.triumf.ca --update ypwhich ypcat -k passwd systemctl restart autofs
- On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
- Use "system-config-users" to add local user accounts
- enable selinux ssh key login to nfs mounted home directories:
setsebool -P use_nfs_home_dirs 1
Configure NIS secondary server (OPTIONAL)
(do this only if needed)
(do not use SL6.2 for NIS secondary server!)
- yum install ypserv
- ypwhich -m # to identify hostname of nis master for next step:
- /usr/lib64/yp/ypinit -s ladd00 # /usr/lib/yp/ypinit on 32-bit machines
- chkconfig ypserv on
- service ypserv start
- edit /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost"
- service ypbind restart
- ypwhich should report "localhost", ypcat auto.master should work
- on the NIS master:
- add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
- if using /var/yp/securenets, copy it from NIS master to new NIS secondary server
Configure NIS secondary server (CentOS7) (OPTIONAL)
Enable local NIS server, make local machine use it:
yum install ypserv /usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines) systemctl enable rpcbind ypserv ypxfrd yppasswdd systemctl start rpcbind ypserv ypxfrd yppasswdd edit /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost" systemctl restart ypbind ypwhich # should say "localhost"
Punch hole in the firewall: (or "make" on NIS master will complain)
echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network systemctl restart ypserv firewall-cmd --get-services firewall-cmd --add-service rpc-bind --permanent firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent firewall-cmd --reload firewall-cmd --list-all
- on the NIS master:
- add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
- if using /var/yp/securenets, copy it from NIS master to new NIS secondary server
Configure AUTOFS
- (if NIS master or standalone) check /etc/auto.* against backups, particularly auto.master if NIS master
- (if needed) add "+auto.master" at the end of /etc/auto.master
- restart autofs to use the newly configured NIS maps: "service autofs stop; service autofs start"
Configure AUTOFS (CentOS7)
yum -y install autofs systemctl enable autofs systemctl start autofs ls -l /daq/daqshare
Configure time with chronyd (SL6)
Use chronyd instead of ntpd.
yum -y install chrony echo server time1 iburst >> /etc/chrony.conf echo server time2 iburst >> /etc/chrony.conf echo server time3 iburst >> /etc/chrony.conf chkconfig --level 123456 ntpd off chkconfig --level 123456 ntpdate off service ntpd stop chkconfig chronyd on service chronyd restart chronyc sources chronyc tracking
- if desired, edit /etc/chrony.conf, remove non-triumf time servers
Configure time (CentOS7)
Time server ntpd was replaced by chronyd.
yum -y install chrony echo server time1 iburst >> /etc/chrony.conf echo server time2 iburst >> /etc/chrony.conf echo server time3 iburst >> /etc/chrony.conf systemctl enable chronyd systemctl restart chronyd chronyc sources chronyc tracking
- if desired, edit /etc/chrony.conf, remove non-triumf time servers
Enable automatic kernel updates (SL6)
- enable kernel updates: sed 's/^EXCLUDE=/#EXCLUDE=/' -i /etc/sysconfig/yum-autoupdate
Enable automatic system updates (CentOS7)
Disable yum-cron:
rpm --erase yum-cron /bin/rm -v /var/lock/subsys/yum-cron /bin/rm -v /etc/cron.daily/0yum-daily.cron /bin/rm -v /etc/cron.hourly/0yum-hourly.cron
Enable yum-autoupdate:
yum install -y epel-release yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm #rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm systemctl enable yum-autoupdate systemctl start yum-autoupdate systemctl status yum-autoupdate
Configure system services
- chkconfig --list | grep :on | sort (to see enabled services)
- disable unwanted services:
(only if amanda is not used) -> chkconfig --level 12345 xinetd off chkconfig --level 12345 canna off chkconfig --level 12345 FreeWnn off chkconfig --level 12345 hpoj off chkconfig --level 12345 ip6tables off chkconfig --level 12345 iptables off chkconfig --level 12345 isdn off chkconfig --level 12345 pcmcia off chkconfig --level 12345 rhnsd off chkconfig --level 12345 spamassassin off chkconfig --level 12345 bluetooth off chkconfig --level 12345 apmd off chkconfig --level 12345 iiim off chkconfig --level 12345 fenced off chkconfig --level 12345 ccsd off chkconfig --level 12345 cpuspeed off chkconfig --level 12345 pcp off chkconfig --level 12345 pmie off chkconfig --level 12345 yum-updatesd off chkconfig --level 12345 clvmd off chkconfig --level 12345 cman off chkconfig --level 12345 lvm2-monitor off chkconfig --level 12345 modclusterd off chkconfig --level 12345 yum-updateonboot off chkconfig --level 12345 cmirror off chkconfig --level 12345 lock_gulmd off chkconfig --level 12345 firstboot off chkconfig --level 12345 ricci off chkconfig --level 12345 gfs off chkconfig --level 12345 scsi_reserve off chkconfig --level 12345 openibd off chkconfig --level 12345 arptables_jf off chkconfig --level 12345 auditd off chkconfig --level 12345 avahi-daemon off chkconfig --level 12345 hplip off chkconfig --level 12345 iscsi off chkconfig --level 12345 iscsid off chkconfig --level 12345 mcstrans off chkconfig --level 12345 pcscd off chkconfig --level 12345 restorecond off chkconfig --level 12345 setroubleshoot off chkconfig --level 12345 xend off chkconfig --level 12345 xendomains off chkconfig --level 12345 kudzu off #chkconfig --level 12345 yum-cron off chkconfig --level 12345 kdump off chkconfig --level 12345 libvirt-guests off chkconfig --level 12345 libvirtd off chkconfig --level 12345 spice-vdagentd off chkconfig --level 12345 ksm off chkconfig --level 12345 ksmtuned off chkconfig --level 12345 iscsi off chkconfig --level 12345 iscsid off chkconfig --level 12345 openct off chkconfig --level 12345 blk-availability off chkconfig --level 12345 fcoe off chkconfig --level 12345 lldpad off
Configure system services (CentOS7)
- systemctl list-unit-files | grep enabled | sort ### (to see enabled services)
- disable unwanted services:
systemctl disable bluetooth systemctl disable dm-event systemctl disable dmraid-activation systemctl disable iscsid systemctl disable iscsi systemctl disable iscsiuio systemctl disable libvirtd systemctl disable lvm2-lmetad systemctl disable lvm2-monitor systemctl disable ModemManager systemctl disable multipathd systemctl disable netcf-transaction #systemctl disable
Erase unwanted packages
yum erase PackageKit # bugs users about security updates
Erase unwanted packages (CentOS7)
- PackageKit # bugs users about security updates, hogs yum lock
- perl-homedir # creates unwanted $HOME/perl5
- ModemManager # thinks that all USB-attached devices are modems
- pcp # sends error email to itself, does not work
- abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted
- rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken
- bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful.
yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion
Configure external package repositories
yum install elrepo-release epel-release
Configure external package repositories (CentOS7)
EPEL: (addtional packages)
yum install epel-release
ELREPO: (kernel drivers)
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm yum -y install yum-plugin-fastestmirror
Install packages needed to continue with installation
(+CentOS7)
(these packages are sometimes missing, they are needed to follow following instructions instructions)
(SL6.5: libotf is a dependancy of emacs - SL6.5 installer fails to install it)
yum install ed patch wget git libotf gdisk emacs
Configure TRIUMF packages
(only for machines on the TRIUMF network)
(TRIUMF kickstart usually installs this automatically)
rpm -vh --install http://mirror.triumf.ca/triumf/6/x86_64/Packages/triumf-release-1.4-1.noarch.rpm yum install triumf-ssh triumf-syslog
Configure TRIUMF packages (CentOS7)
(only for machines on the TRIUMF network)
# TL Was rpm -vh --install http://mirror.triumf.ca/triumf/6/x86_64/RPMS/triumf-release-1.4-1.noarch.rpm rpm -vh --install http://mirror.triumf.ca/triumf/6/x86_64/Packages/triumf-release-1.4-1.noarch.rpm yum install triumf-ssh triumf-syslog
Configure Konstantin's scripts
(+Centos7)
mkdir ~root/git cd ~root/git git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git cd scripts git pull
Configure TRIUMF mirror of yum repositories (SL6)
(only for machines on TRIUMF network)
- if /daq/mirror is available: /bin/cp ~/git/scripts/etc/daq-mirror-SL6.repo /etc/yum.repos.d/
- if /triumfcs/mirror is available: /bin/cp ~/git/scripts/etc/triumfcs-mirror-SL6.repo /etc/yum.repos.d/
- otherwise: /bin/cp ~/git/scripts/etc/triumf-SL6.repo /etc/yum.repos.d/
then disable external repositories:
yum clean all yum-config-manager --disable epel yum-config-manager --disable elrepo yum-config-manager --disable sl yum-config-manager --disable sl-security yum-config-manager --disable sl6x yum-config-manager --disable sl6x-security yum clean all
Configure trusted ssh keys
(+CentOS7)
ssh localhost interrupt by Ctrl-C /bin/cp ~/git/scripts/etc/authorized_keys ~/.ssh/
Configure hardware sensors
- yum install lm_sensors kmod-k10temp kmod-coretemp
- sensors-detect (accept default answer to all questions - press ENTER)
- service lm_sensors restart (to reload the kernel modules)
- sensors (to see available sensors)
If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page.
Configure coretemp CPU sensors
On some machines, the coretemp driver for Intel CPU temperature sensors is not loaded after the above steps.
- sensors | grep coretemp ### number of sensors reported should be the same as the number of CPU cores
- if output is blank, add this to /etc/rc.local
emacs -nw /etc/rc.local modprobe coretemp
Configure IPMI sensors
Some machines support the IPMI interface for monitoring the hardware: fan speeds, temperatures, voltages.
- find out if IPMI is supported. Try this:
dmidecode | grep -i ipmi
if output is not blank, IPMI is maybe supported.
- install and enable IPMI software:
yum install "OpenIPMI*" ipmitool service ipmi start ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. chkconfig ipmi on chkconfig ipmievd on service ipmi restart service ipmievd restart tail -100 /var/log/messages ### look at messages logged by ipmievd
- (CentOS7) install and enable IPMI software:
yum install "OpenIPMI*" ipmitool systemctl start ipmi ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. systemctl list-unit-files | grep -i ipmi systemctl enable ipmi systemctl restart ipmi systemctl status ipmi systemctl enable ipmievd systemctl restart ipmievd systemctl status ipmievd tail -100 /var/log/messages ### look at messages logged by ipmievd
- if ipmievd complains about SEL buffer overflow, clear it manually:
ipmitool sel list ### show ipmi messages in raw format ipmitool sel elist ### show ipmi messages in useful format ipmitool sel elist > file ### save ipmi messages into a file ipmitool sel clear ### clear all accumulated ipmi messages
- useful ipmi commands:
- ipmitool sensor -- read hardware sensors
- ipmitool sel elist -- report all accumulated messages
Configure SMARTD (CentOS7)
Default el7 smartd config files send deficient email notices about disk failures. Overwrite.
/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/ /bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/ systemctl restart smartd systemctl status smartd
Enable User Disk Quotas (OPTIONAL)
(+CentOS7)
- read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html
- emacs -nw /etc/fstab, add "grpquota,usrquota" to filesystem options, e.g.:
[root@isdaq00 home1]# grep quota /etc/fstab UUID=5a2aefbd-45db-475e-841e-12ec89220fbd /home1 ext4 defaults,grpquota,usrquota 1 2
- cd /; umount /home1; mount /home1
- quotacheck -cug /home1
- quotacheck -avug
- quotaon -av
- quota system is now active
- increase the soft quota time limit from default 7days to 30 or 60 days: edquota -t
- set quotas for all users (see below)
- setup warnquota:
- create warnquota config file: emacs -nw /etc/warnquota.conf
# values can be quoted: MAIL_CMD = "/usr/sbin/sendmail -t" FROM = root SUBJECT = User %i@%h exceeded allocated disk quota CC_TO = "root" # If you set this variable CC will be used only when user has less than # specified grace time left (examples of possible times: 5 seconds, 1 minute, # 12 hours, 5 days) # CC_BEFORE = 2 days SUPPORT = "root" # Text in the beginning of the mail (if not specified, default text is used) # This way text can be split to more lines # Line breaks are done by '|' character # The expressions %i, %h, %d, and %% are substituted for user/group name, # host name, domain name, and '%' respectively. For backward compatibility # %s behaves as %i but is deprecated. MESSAGE = User "%i" on "%h" has exceeded the allocated disk quota.||Please delete any unnecessary files on following filesystems or|contact the system administrato r to increase your quota allocation:| SIGNATURE = --|automated email from warnquota
- note that %i@%h in the SUBJECT line do not seem to work
- create cron job: emacs -nw /etc/cron.daily/warnquota
#!/bin/sh warnquota #end
- chmod a+x /etc/cron.daily/warnquota
- touch /etc/crontab
Useful commands for managing quotas:
- repquota -a | sort -n -k3 ### show quota of all users sorted by disk usage
- edquota -u username ### open "vi" editor to change user quotas
- repquote -a | grep username ### report quota for given user
- setquota -u username 0 0 0 0 /home1 ### disable quotas for given user
- setquota -u username 50000000 100000000 0 0 /home1 ### set quotas for 50GB soft and 100GB hard
- edquota -t ### change user quota time limits
- edquote -tg ### change group quota time limits
Enable NFS V4 server (CentOS7)
- create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...)
/home1 @home_export(rw,no_root_squash,async,fsid=1) /data1 @data_export(rw,no_root_squash,async,fsid=2)
- check the netgroup file
- if using NIS: check NIS netgroup: ypcat -k netgroup
- if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
- if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
- enable things, start them:
firewall-cmd --get-services firewall-cmd --permanent --add-service=nfs firewall-cmd --reload firewall-cmd --list-all systemctl enable nfs-server systemctl start nfs-server systemctl status nfs
Enable NFS V3 server (CentOS7)
ps -efw | grep rpc.mountd # should be running! firewall-cmd --get-services firewall-cmd --permanent --add-service=mountd firewall-cmd --permanent --add-service=rpc-bind firewall-cmd --reload firewall-cmd --list-all
Enable NFS V3 server
- edit /etc/hosts.allow, add or uncomment "mountd: 142.90.0.0/255.255.0.0"
- create /etc/exports. example:
/home1 @home_export(rw,no_root_squash,async) /data1 @data_export(rw,no_root_squash,async)
- check the netgroup file
- if using NIS: check NIS netgroup: ypcat -k netgroup
- if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
- if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
- chkconfig nfs on
- chkconfig nfslock on
- service nfs restart
Then on ladd00 need to do
- ssh to root@ladd00
- edit /etc/auto.daq to add new machine...
- make -C /var/yp
Enable NFS V4 SERVER (SL6)
- if used with NIS, same as NFSv3
- if used as standalone, need to edit idmapd.conf - set the "Domain" name to the same value on NFS server and NFS slave (default automagically determined value does not always work). More TBW.
Enable AMANDA backups
AMANDA backups are already enabled by TRIUMF kickstart installs. For non-kickstart installation, follow instructions at [http://amanda/~amanda], or look at "/triumfcs/trshare/olchansk/linux/amanda/amanda-enable.perl". As final step, use [https://helpdesk.triumf.ca] to contact TRIUMF CS to add this new machine to the amanda backup list.
- yum install triumf-amanda
Enable AMANDA backups (CentOS7)
yum install amanda-client list-unit-files | grep -i amanda #systemctl enable amanda systemctl enable amanda.socket systemctl enable amanda-udp.socket systemctl restart amanda.socket systemctl restart amanda-udp.socket firewall-cmd --get-services firewall-cmd --permanent --add-service=amanda-client firewall-cmd --reload firewall-cmd --list-all echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts
On amanda server, add new machine to the disklist, then:
amcheck -c daily titan00
Enable DCACHE
DAQ dcache server is mounted as
/daq/pnfs/triumf.ca/data/
This instructions are unnecessary
- # mkdir -p /pnfs
- # edit /etc/rc.local, add to the end of file: "mount -o intr,rw,noac,hard,nfsvers=3 trdata00:/pnfs /pnfs &"
- # . /etc/rc.local
For more information on, see TrdataDcache dcache page.
Configure CPU speed (CentOS7)
In el7 the CPU frequency selection is confused. On some machines the default governor is "conservative", on other machines it is "powersave".
The current configuration can be seen by: "cpupower frequency-info -p"
The actual cpu frequency can be seen by "cat /proc/cpuinfo | grep -i mhz" and by "cpupower monitor" (run them under "watch -d -n1").
The linux kernel documentation says "powersave" will set CPU frequency to the minimum value, forever. But on some machines (i.e. daq06, daq14) it is easy to see that the CPU frequency actually changes according to the CPU load. This is explained in the documentation for the intel_pstate" driver.
On machines where CPU frequency seems always stuck at mimimum, try this:
- set the governor to "performance": cpupower frequency-set -g performance
- see if frequency now changes according to load (good) or is stuck at maximum (not so good, but ok)
- make it permanent by adding this command to /etc/rc.local - echo cpupower frequency-set -g performance >> /etc/rc.local
Configure Ganglia
SL6 Ganglia instructions (EPEL6 ganglia-3.7.2)
/bin/rm /etc/gmond.conf yum install "*gmond*" /bin/rm /etc/ganglia/conf.d/ganglia-triumf-daq.conf /bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf /bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf /bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf /bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf /bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf chkconfig gmond on service gmond restart
Configure Ganglia (Centos7)
CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2)
/bin/rm /etc/gmond.conf yum -y install "ganglia-gmond*" /bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf # collects useless data /bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog /bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data /bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf /bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf systemctl enable gmond systemctl restart gmond systemctl status gmond
Configure TRIUMF DAQ packages
(+CentOS7)
cd /etc/yum.repos.d wget http://daq.triumf.ca/~daqweb/yum/triumf-daq.repo
Install Konstantin's packages
(+CentOS7)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs "ganglia-*" triumf_nodeinfo
Install memtest and PXE boot
cd /boot wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.bin.gz wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.bin.gz wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.10 wget http://ladd00.triumf.ca/tftpboot/gpxe-1.0.1+-gpxe.lkrn emacs -nw /boot/grub/grub.conf title memtest86+-5.01 root (hd0,0) kernel /boot/memtest86+-5.01.bin.gz title memtest86+-4.20 root (hd0,0) kernel /boot/memtest86+-4.20.bin.gz title memtest86+-4.10 root (hd0,0) kernel /boot/memtest86+-4.10 title pxeboot root (hd0,0) kernel /boot/gpxe-1.0.1+-gpxe.lkrn
Install node monitoring
(+CentOS7)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install triumf_nodeinfo /usr/sbin/sendnodeinfo.perl --config ladd00.triumf.ca:8600 emacs -nw /etc/nodeinfo /usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600
Install latest system updates
(+CentOS7)
yum update -y
Configure TRIUMF Printers
chkconfig cups off service cups stop yum install triumf-printers
Configure TRIUMF Printers (CentOS7)
systemctl stop cups systemctl disable cups echo "ServerName printers.triumf.ca" > /etc/cups/client.conf lpstat -a
Disable syslog spam (CentOS7)
Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this:
echo auditctl -e 0 >> /etc/rc.local echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local /etc/rc.local
Install basic system packages (CentOS7)
(if starting from minimal system, basic system packages required:)
yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils
yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools
Install packages needed for QUARTUS, ROOT, EPICS and MIDAS DAQ
(+CentOS7)
yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang-vet"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" mesa"*" xerces-c"*" diffuse clang i2c-tools texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras
(do not install boost on 32-bit machines)
yum install --skip-broken "boost-*"
(packages for 32-bit software compilation on 64-bit machines. this is optional)
yum install --skip-broken giflib.i386 giflib.i686 compat-libf2c-34.i386 compat-libf2c-34.i686 mysql-devel.i686 openssl-devel.i686 unixODBC-devel.i686 libstdc++-devel.i386 libstdc++-devel.i686 "zlib-*.i686" "libXext-*.i686" "libXtst-*.i686" glibc-static.i686 freetype.i686 fontconfig.i686 libpng.i686 libXrender.i686 glibc-devel.i686 libX11-devel.i686 libXpm-devel.i686 libXft-devel.i686 mysql-devel.i686 dcap-devel.i686 gsl-devel.i686 pcre-devel.i686 fontconfig-devel.i686 freetype-devel.i686 libpng-devel.i686 libjpeg-devel.i686 libgfortran.i686 libxml2-devel.i686 gd-devel.i686 readline-devel.i686 ncurses-devel.i686 libXdmcp.i686 readline-static.i686 compat-readline5.i686
yum install boost-devel.i686
(separately install these packages - they collide with the big bunch above)
yum install rdesktop
yum reinstall urw-fonts
Install additional desktop environements (CentOS7)
THIS IS OPTIONAL
- lxqt desktop (from EPEL)
yum install "lxqt*"
- cinnamon desktop (from EPEL)
yum install cinnamon
- KDE5
(NOT AVAILABLE YET)
- MATE (from EPEL)
#yum install mate-desktop yum groupinstall "MATE Desktop" yum install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils yum -y erase ModemManager abrt abrt-libs abrt-gui-libs
- XFCE4 (from EPEL)
yum groupinstall xfce yum install "xfce*plugin" xfce4-about yum -y erase bash-completion
- lighdm login manager (from EPEL)
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5
- and switch from gdm to lighdm
systemctl disable gdm.service systemctl enable lightdm.service (systemctl stop gdm; systemctl restart lightdm) &
Make installation smaller (optional)
This is optional. Only do this if reducing the size of the OS image is very important.
yum erase "texlive*" "java*" "boost*" yum erase "xemacs*" yum erase "libstdc++-docs"
Install SMART scripts
(+CentOS7)
ln -sf ~/git/scripts/smart-status/smart-status.perl ~/
Install NTFS drivers
yum install ntfs-3g ntfsprogs (from EPEL)
Install HFS and HFS+ drivers (CentOS7)
yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus
Install Google Chrome web browser (64-bit SL6)
Google-chrome 27 is too old to using with recent MIDAS but it has working Flash:
rpm -vh --install https://daqshare.triumf.ca/~olchansk/google-chrome/google-chrome-stable-27.0.1453.110-202711.x86_64.rpm /bin/rm /etc/cron.daily/google-chrome yum-config-manager --disable google-chrome yum-config-manager --disable google-chrome-64 google-chrome
Chromium 38 works with current MIDAS. No Flash, no PDF viewer:
yum install -y policycoreutils-python rpm -vh --install https://daqshare.triumf.ca/~olchansk/google-chrome/chromium-browser-38.0.2125.111-1.el6.centos.x86_64.rpm chromium-browser
Install Google Chrome web browser (64-bit CentOS7)
/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/ yum install google-chrome-stable
Enable monitoring of HTTPS certificates
On SL6, CentOS7:
yum install crypto-utils /etc/cron.daily/certwatch strace -f /etc/cron.daily/certwatch |& grep open | grep crt
Enable 100dpi fonts for EPICS
(+CentOS7)
ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/
Enable firewall for MIDAS (CentOS7)
Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports).
To enable access to mhttpd:
firewall-cmd --add-port=8443/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host)
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept" firewall-cmd --reload firewall-cmd --list-all
Disable gdm and X11 (OPTIONAL)
initctl stop prefdm echo "start on never" > /etc/init/prefdm.override echo "start on never" > /etc/init/splash-manager.override initctl reload-configuration
then enable login on default console:
echo "plymouth quit" >> /etc/rc.local echo "X_TTY=xxx/dev/tty1" >> /etc/sysconfig/init
Install JAVAWS (OPTIONAL)
- to run Java "web start" jnlp files (EVO, SEEVOGH, etc): javaws Downloads/spider.jnlp
- install javaws:
- yum install icedtea-web icedtea-web-javadoc
Install firefox java plugin (OPTIONAL, DO NOT DO THIS)
This installs the Oracle Java plugin:
- rpm -vh --install ~deap/jdk-7u15-linux-x64.rpm
- ls -l /usr/lib64/mozilla/plugins/
- ln -s /usr/java/jdk1.7.0_15/jre/lib/amd64/libnpjp2.so /usr/lib64/mozilla/plugins/
- start firefox, go edit->preferences->general->manage add-ons->plugins
- "java plugin 1.7.0_15" should be listed
Configure USB device permissions
(+CentOS7)
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
- create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c" ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}" ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
- apply new permissions: udevadm trigger --action=add
Disable modem-manager
The modem-manager will try to talk to any serial devices attached to USB serial ports. It assumes that those devices are modems and will send out modem-specific commands. if the devices are not modems and do not understand or do not like modem commands, well that's too bad. modem-manager is installed by the ModemManager package required by the NetworkManager package, and there is no configuration setting to turn modem-manager off.
One way to disable it is: chmod a= /usr/sbin/modem-manager
Another way to disable it is by forced uninstall: rpm --erase --nodeps ModemManager
Remember to kill the running copy: killall -KILL modem-manager
Caveat: it is not clear if modem-manager would not be resurrected by an update to the NetworkManager or ModemManager packages.
Configure Altera jtagd
(if needed)
mkdir /etc/jtagd echo 'Password = "123";' > /etc/jtagd/jtagd.conf cp -pv /triumfcs/trshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts
- start local jtagd: /triumfcs/trshare/olchansk/altera/11.0/quartus/bin/jtagd
- test local connection: /triumfcs/trshare/olchansk/altera/11.0/quartus/bin/jtagconfig
- test remote connection (add this machine to your .jtag.conf, run jtagconfig
For more information, go to Quartus
Configure GRUB boot loader
- edit /boot/grub/grub.conf, remove the "quiet" and "rhgb" options
- edit /boot/grub/grub.conf, comment out (with "#") the "splashimage=" line
- check that GRUB boot loader is installed on all system disks:
- dd if=/dev/sda bs=1 count=1024 2>&1 | strings | grep GRUB
- dd if=/dev/sdb bs=1 count=1024 2>&1 | strings | grep GRUB
- if GRUB is not installed, (i.e. on the 2nd disk of machines with mirrored system disk), (but check that /dev/sdb is the right disk):
# grub grub> device (hd0) /dev/sdb grub> root (hd0,0) grub> setup (hd0)
Configure GRUB boot loader (CENTOS7)
- edit /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX
- grub2-mkconfig -o /boot/grub2/grub.cfg
- grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
- grub2-editenv list # show contents of boot environement file
- /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file
Configure GRUB boot loader (CentOS7)
DO NOT DO ANY OF THIS.
- (maybe) grub2-install /dev/sda
- check that GRUB boot loader is installed on all system disks:
- dd if=/dev/sda bs=1 count=1024 2>&1 | strings | grep GRUB
- dd if=/dev/sdb bs=1 count=1024 2>&1 | strings | grep GRUB
- if GRUB is not installed, (--- unfinished)
Disable ELREPO
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo_triumf.repo sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo
Special hardware settings
ASUS Crosshair mobo
- use BIOS version 1207 or newer
- (before CentOS7) sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
- CentOS7: installs correct drivers automatically
ASUS Crosshair-II mobo
- use BIOS version 2607 or newer
- for the onboard IDE to work, add "all-generic-ide" to kernel boot options in grub.conf
- sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
ASUS P7P55D EVO mobo
- use BIOS version 2004 or newer
- SL6 - install special driver for on board PCIe GigE network port and disable on board PCI GigE network port:
- yum --enablerepo elrepo install kmod-r8168 kmod-r8169
- # do not do this: sed 's/^blacklist/#blacklist/' -i /etc/modprobe.d/blacklist-r8169.conf
- reboot
- verify that correct drivers are loaded: ethtool -i eth0; ethtool -i eth1
- note: there will be no eth1 - r8169 driver is disabled.
ASUS P6X58-E-WS mobo
- BIOS settings
- F1 or DEL to enter BIOS setup, F8 boot menu
- go to POWER->HW mon, confirm CPU temperature is around 30C. (heatsink is installed correctly. Bad heatsink temperature quickly goes up to 50-70C).
- Main menu: Storage config - SATA change IDE->AHCI
- System information: confirm BIOS version 301, CPU type, memory size
- AI Tweak: set DRAM frequency - AUTO->DDR3-1333
- Advanced->Onboard devices: LAN BOOT: enabled
- Power->HW monitor: CPU Q-FAN: enabled
- Boot->Settings: Quick boot: enabled; Full screen logo: disabled; Wait for F1: disabled
- Save and exit
ASUS E35M1-M PRO mobo
- http://www.asus.com/Motherboards/E35M1M_PRO/#specifications
- use BIOS version 1002 or newer
- for CPU temperature: install kmod-k10temp from ELREPO (kmod-k10temp-0.0-4.el6.elrepo.x86_64.rpm)
- for Sensors: yum --enablerepo elrepo install kmod-w83627ehf; modprobe w83627ehf; sensors
- for Graphics: yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
- to enable booting from USB3, edit /etc/dracut.conf, change line "add_drivers" to read: add_drivers+="xhci-hcd"
- to use multiple monitors, run "aticonfig --initial --heads=2 --adapter=1 --xinerama=on", to change screen layout, edit /etc/X11/xorg.conf. Only dual monitors DVI+HDMI seem to work. Tripple monitors does not seem to work.
Sensors instructions below are obolete (use driver from ELREPO)
- for Sensors, install driver for NCT6776F chip from https://github.com/groeck/w83627ehf/archives/master (in the Makefile, change the line "KERNEL_BUILD=" to read: "KERNEL_BUILD:=/usr/src/kernels/$(TARGET)"):
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/groeck-w83627ehf-dd3e543/w83627ehf.ko echo "modprobe hwmon; modprobe hwmon-vid; modprobe k10temp; rmmod w83627ehf; insmod /root/w83627ehf.ko" >> /etc/rc.local
ASUS E45M1-M PRO mobo
- https://www.asus.com/Motherboards/E45M1M_PRO/#specifications
- use BIOS 1202 or newer
- follow the E35M1-M PRO instructions above
ASUS P9X79 WS
- http://www.asus.com/Motherboard/P9X79_WS/
- use BIOS version 3101, 3401, 4701 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS.
- (not needed for CentOS7) for CPU temperature, install coretemp
- (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above.
- BIOS Settings:
- enter "Advanced mode"
- Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default
- Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings
- Boot -> Full screen logo -> Set to "disabled"
- Wait for F1 -> Set to "disabled"
ASUS P8B-M
- use BIOS version 6103 or newer
- for CPU temperature, install coretemp
- for sensors, install driver for NCT6776F chip same as E35M1-M above.
SUPERMICRO X9SCL
- yum install kmod-w83627ehf.x86_64 coretemp
- xemacs -nw /etc/rc.local, add:
modprobe coretemp modprobe w83627ehf
ASUS Z87-WS
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/nct6775/nct6775.ko
Place the modprobe and insmod lines in /etc/rc.local to load the drivers at boot time
modprobe hwmon-vid insmod /root/nct6775.ko
ASUS AM1M-A
- use BIOS 602 or later
- SL6.5 installer cannot use USB2 ports and the network. Use USB3 ports (blue colour) to boot USB installer (memtest, rescue, etc)
- SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey)
- install ATI/AMD video drivers from ELREPO (see below)
- sensors chip is ITE IT8623E, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures):
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/it87.ko echo modprobe hwmon_vid >> /etc/rc.local echo insmod /root/it87.ko >> /etc/rc.local sensors-detect sensors
- AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together)
Intel SE7230NH1
- front panel header connector pinout is like this:
PWR LED | 1 2| | 3 4| PWR LED | 5 6| HDD LED | 7 8| HDD LED | 9 10| PWR SW |11 12| NIC1 LED PWR SW |13 14| NIC1 LED RST SW |15 16| RST SW |17 18| |19 20| NMI SW |21 22| NIC2 LED NMI SW |23 24| NIC2 LED ... |... | |33 34|
Configure X11 graphics
Special settings for DAQ
- add the following at the end of /etc/X11/xorg.conf. The enables Ctrl-Alt-KP-/ and Ctrl-Alt-KP-* to unlock the keyboard after Altera Quartus crash:
Section "ServerFlags" Option "AllowDeactivateGrabs" "true" Option "AllowClosedownGrabs" "true" EndSection
Install NVIDIA drivers
- yum --enablerepo=elrepo install nvidia-detect
- run: nvidia-detect
- as instructed by nvidia-detect, install correct driver:
- yum --enablerepo=elrepo install kmod-nvidia
- yum --enablerepo=elrepo install kmod-nvidia-304xx
- yum --enablerepo=elrepo install kmod-nvidia-173xx
- (before SL6.x: if it fails due to conflict with module-init-tools, run "yum --disablerepo \* --enablerepo elrepo update module-init-tools")
- yum erase xorg-x11-glamor ### see http://elrepo.org/tiki/kmod-nvidia (search for glamor)
- mv /etc/X11/xorg.conf /etc/X11/xorg.conf-xxx
- nvidia-xconfig
- (SL6) reboot
- (SL5) /dev/MAKEDEV nvidia
- (SL5) restart the X11 server (Ctrl-Alt-Backspace or "killall Xorg gdm-binary")
- observe that X11 server restarts using the NVIDIA driver (big NVIDIA logo on startup)
- if needed, login as root and run "nvidia-settings" to setup dual-screen configuration, etc
Install legacy NVIDIA drivers
For old NVIDIA cards:
- GeForce FX 5500
wget http://us.download.nvidia.com/XFree86/Linux-x86/173.14.31/NVIDIA-Linux-x86-173.14.31-pkg1.run sh ./NVIDIA-Linux-x86-173.14.31-pkg1.run
- GeForce 6200 - NVIDIA Corporation NV44A [GeForce 6200]
yum install nvidia-x11-drv-304xx-304.121 --enablerepo=elrepo nvidia-xconfig rmmod nvidia killall gdm-binary login as root nvidia-settings to setup multiple displays
Install ATI/AMD drivers
- yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
- check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx"
- run "aticonfig --initial" to create xorg.conf if existing one is not good
- run "amdcccle" as root to configure dual-screens, etc
Note: 'amdcccle' is a GUI, so you must run this command from within a running X session
- killall Xorg
Install ATI/AMD drivers (CentOS7)
- wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
- wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm
- yum install acpid
- rpm -vh --install kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
- amdconfig -f --initial
- grub2-mkconfig -o /boot/grub2/grub.cfg
- reboot
- login as root
- amdcccle
NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig.
Install Intel drivers for HD4600/Z87
SL6.5 has the required drivers for the socket 1150 machines with Intel HD4600 graphics and Z87 chipset.
ASUS Z87 WS motherboard has these video connections with corresponding Intel video port assignements, as reported by "xrandr":
- DisplayPort - DP1/HDMI1
- MiniDisplayPort - DP2/HDMI2
- HDMI - HDMI3
Due to hardware limitations, 3 HDMI monitors using 2 passive DP-HDMI adapters (and 1 straight HDMI) cannot be used.
To use 3 monitors do this:
- 1st monitor: DisplayPort - DP-to-HDMI-passive-adapter - HDMI monitor (not tried: DP-to-DP-cable - DisplayPort monitor).
- 2nd monitor: MiniDisplayPort - MiniDP-to-DP-cable - DisplayPort monitor
- 3rd monitor: HDMI - HDMI-cable - HDMI monitor
With the monitors I have (Dell 1920x1200 VGA-HDMI-DP), the software thinks that there are 4 monitors: somehow both DP2 and HDMI2 see 1 minitor each, but the hardware cannot drive 4 monitors, so everything goes blank. To fix, disable HDMI2 (xrandr -display :0 --output HDMI2 --off) and enable DP2 (xrandr -display :0 --output DP2 --auto).
How to make this configuration permanent and how to assign monitor locations (left-right, etc), you figure it out.
Manual selection of monitor, video mode and resolution
Automatic selection of monitor and video mode usually works. When it does not, configure it manualls:
- physically go to the computer
- login as root
- run "nvidia-settings" on machines using the NVIDIA driver
- run "aticonfig" on machines with the ATI/AMD driver (use "aticonfig --initial" for initial setup, and good luck with anything more complicated)
- run "system-config-display".
- In the "hardware" tab, select monitor type: "generic LCD 1280x1024" or "generic LCD 1600x1200".
- In the "settings" tab, select "1280x1024" or "1600x1200" and "Thousands of colors".
- Press "ok", the display settings application should close.
- Logout, the new login window should use the new settings.
Disable screen saver
If machine is booted without any monitor connected, current video cards to not enable any video outputs. If a monitor is connected later, there is no video image and there is no easy way to get a video image.
This can be solved by configuring X11 to always enable some video output. Because the monitor type is not known when X11 starts, one has to select some standard video mode (i.e. VESA 1280x1024) on some video output (VGA, DVI or HDMI).
Only NVIDIA cards with the NVIDIA driver (from EPEL) is supported by these instructions.
- create default xorg.conf: nvidia-xconfig
- edit /etc/X11/xorg.conf
- add monitor section for the fake monitor:
Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 31.0 - 83.0 VertRefresh 59.0 - 61.0 Option "DPMS" "off" ModeLine "1280x1024" 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync EndSection
- add output selection in the "Device" section:
Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce 210" #Option "ConnectedMonitor" "DFP" #Option "ConnectedMonitor" "CRT" Option "ConnectedMonitor" "CRT-1" Option "UseEDID" "no" EndSection
- add fake video mode to the "Screen" section:
Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection
- disable screen saver and DPMS power off in the "ServerLayout" or "ServerFlags" section:
Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" 0 0 InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" Option "Xinerama" "0" Option "BlankTime" "0" Option "StandbyTime" "0" Option "SuspendTime" "0" Option "OffTime" "0" EndSection Section "ServerFlags" Option "BlankTime" "0" Option "StandbyTime" "0" Option "SuspendTime" "0" Option "OffTime" "0" EndSection
Finish installation
- logout and reboot the computer to have all the changes to take effect
Configure HTTPS server (CentOS7)
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd.
First, configure apache httpd:
- yum instal mod_ssl certwatch
- cd /etc/httpd/conf.d/
- create new file ssl-daq12.conf # use actual hostname instead of daq12
<VirtualHost *:443> ServerName daq12.triumf.ca DocumentRoot /var/www/html ErrorLog /var/log/httpd/daq12.log SSLEngine on SSLProtocol all -SSLv2 -SSLv3 SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4 SSLCertificateFile /etc/pki/tls/certs/localhost.crt SSLCertificateKeyFile /etc/pki/tls/private/localhost.key #SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt #ProxyPass /elog/ http://localhost:8082/ retry=1 #ProxyPass / http://localhost:8080/ retry=1 <Location /> SSLRequireSSL AuthType Basic AuthName "DAQ password protected site" Require valid-user # create password file: touch /etc/httpd/htpasswd # to add new user or change password: htpasswd /etc/httpd/htpasswd username AuthUserFile /etc/httpd/htpasswd </Location> </VirtualHost>
- systemctl enable httpd
- systemctl restart httpd
- systemctl status httpd
- try to access https://daq12.triumf.ca
- you should see a complaint about self-signed certificate
- you should see a request for password (do not login yet)
- if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again:
firewall-cmd --add-port=443/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
Second, configure certbot:
- yum install certbot python2-certbot-apache # (from EPEL)
- certbot # then answer questions:
- "activate HTTPS for daq12.triumf.ca" - say ok
- "enter email address" - enter your own email address
- "please read terms..." - read the terms and say "agree"
- it will take a few moments...
- "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration
- "congratulations..." - say ok.
NOTE: this certificate will expire in 3 months, automatic renewal is possible, but was not tested yet and it is not enabled by these instructions. Certificate expiration should be automatically detected by "certwatch" and email will be sent to local root user, to be forwarded to an actual person by ~root/.forward.
- cd ~/git/scripts; git pull
- cp etc/certbot.cron /etc/cron.weekly/
Third, activate password protection:
- as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/httpd/htpasswd htpasswd /etc/httpd/htpasswd midas
Final test:
- access https://daq12.triumf.ca - https status should be "green"
- login with password should work
- the apache httpd test page should load
- check site security using the SSLlabs https tester. (I get grade "A-"): https://www.ssllabs.com/ssltest/
From here:
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
- setsebool -P httpd_can_network_connect 1
- systemctl restart httpd
Configure large RAID6 arrays
- connect the disks
- check the disks health
- run smart-status.perl
- partition the disks
- yum install gdisk
- gdisk /dev/sdX
- delete all partitions: o
- create new partition: n, enter, enter, enter, fd00 (default sizes, partition type fd00)
- write and exit: w
- check presence of all partitions:
- /bin/ls -l /dev/sd*1
- prepare to use an external bitmap file
- touch /md6bitmap
- edit /etc/fstab, change entry for root filesystem from: "defaults 1 1" to "defaults 0 0"
- edit /boot/grub/grub.conf, change entry "kernel ... ro ..." to "kernel ... rw ..."
- create raid array:
- mdadm --create /dev/md6 --level=6 --bitmap=/md6bitmap --raid-devices=10 /dev/sd[b-k]1
- mdadm -Ds >> /etc/mdadm.conf
- cleanup /etc/mdadm.conf
- echo "echo 16384 > /sys/block/md6/md/stripe_cache_size" >> /etc/rc.local
- echo "echo 1 > /sys/block/md6/md/sync_speed_min" >> /etc/rc.local
- source /etc/rc.local
- observe raid array rebuild:
- watch -d -n1 "cat /proc/mdstat"
Configure ZFS
Install ZFS: (from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)
Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs.
#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm yum-config-manager --disable zfs yum-config-manager --enable zfs-kmod yum install zfs sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs echo zpool import -a >> /etc/rc.local echo exportfs -rv >> /etc/rc.local echo systemctl start nfs-server >> /etc/rc.local #echo zpool scrub -s pool8tb >> /etc/rc.local ### replace "pool8tb" with actual name of zfs pool shutdown -r now # required to load the zfs kernel modules and to disable selinux
Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see https://github.com/zfsonlinux/zfs/issues/4845
Create "RAID6" filesystem:
- http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-quickstart.html)
- http://www.freebsd.org/cgi/man.cgi?query=zpool&sektion=8
- zpool create data14 raidz2 /dev/sd[b-h]1
- zpool status
- zpool get all
- zpool iostat 1
- zpool iostat -v 1
- zpool history
- zpool scrub data14
- zpool events
Create raid0 (mirror) volume:
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs dracut -vf zpool create pool8tb mirror /dev/sda1 /dev/sdb1 zpool set cachefile=none pool8tb zpool set failmode=continue pool8tb zpool status zpool events zpool get all df /pool8tb ls -l /pool8tb
Replace failed disk:
- pull failed disk out
- zpool status # identify failed disk zfs label (it should be labeled FAULTED or OFFLINE
- safe to reboot here
- install new disk
- partition new disk, i.e. "gdisk /dev/sdh", use "o" to create new partition table, use "n" to create new partition, accept all default answers, use "w" to save and exit
- safe to reboot here
- run tests on new disk (smart, diskscrub), if unhappy go back to "install new disk"
- safe to reboot here
- identify serial number of new disk, i.e. "smartctl -a /dev/sdh | grep -i serial" yields "Serial Number: WD-WCAVY0893313"
- identify linux id of new disk by "ls -l /dev/disk/by-id | grep -i WD-WCAVY0893313" yields "ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1"
- zpool replace data11 zfs-label-of-failed-disk ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1
- zpool status should look like this:
[root@daq11 ~]# zpool status pool: data11 state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Apr 29 11:51:03 2016 24.7G scanned out of 795G at 32.3M/s, 6h46m to go 3.00G resilvered, 3.11% done config: NAME STATE READ WRITE CKSUM data11 DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1 ONLINE 0 0 0 replacing-2 DEGRADED 0 0 0 17494865033746374811 FAULTED 0 0 0 was /dev/sdi1 ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1 ONLINE 0 0 0 (resilvering) ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1 ONLINE 0 0 0 ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1 ONLINE 0 0 0 errors: No known data errors
- wait for raid rebuild ("resilvering") to complete
- zpool status should look like this:
[root@daq11 ~]# zpool status pool: data11 state: ONLINE scan: resilvered 96.2G in 1h44m with 0 errors on Fri Apr 29 13:35:40 2016 config: NAME STATE READ WRITE CKSUM data11 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1 ONLINE 0 0 0 ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1 ONLINE 0 0 0 ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1 ONLINE 0 0 0 ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1 ONLINE 0 0 0 errors: No known data errors
Enable periodic scrub:
- cp zfs-scrub.service zfs-scrub.timer /etc/systemd/system/
- systemctl daemon-reload
- systemctl enable zfs-scrub.timer
- systemctl status zfs-scrub
- zpool status
Also:
- systemctl start zfs-scrub
- systemctl status zfs-scrub
- systemctl status zfs-scrub.timer
Working with ZFS snapshots:
- zfs list -t snapshot
- cd ~/git; git clone https://github.com/zfsonlinux/zfs-auto-snapshot.git; cd zfs-auto-snapshot; make install
If ZFS becomes 100% full, "rm" will stop working, but space can still be freed by using "echo > bigfile", afterwards "rm" works again.
performance notes
Go here: disk_benchmarks