SLinstall: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
 
(185 intermediate revisions by 2 users not shown)
Line 35: Line 35:
* note the MAC addresses of all network interfaces, add them to ladd00 dhcpd.conf to enable PXE boot into the SL "network installer"
* note the MAC addresses of all network interfaces, add them to ladd00 dhcpd.conf to enable PXE boot into the SL "network installer"
* shutdown
* shutdown
== Running SL installer ==
* Start installation of the new system:
* IMPORTANT: if you have WDC "advanced partitioning disks" (4kB sectors), disks have to be repartitioned before use, see special instructions (TBW) (note: use fdisk -H 224 -S 56 /dev/sdx)
* (NOT AVAILABLE ANYMORE) boot from latest "SL5 kickstart" CD from Kelvin Raywood or PXE boot the latest SL installation image. after the system enters graphical mode, one can remove the CD- the installation is running over the network
* boot from ladd00 PXE server - after power up, during BIOS POST, press BIOS "boot selection menu" key (F8, F12, etc). The MAC of the network interface should be listed in the ladd00 dhcpd.conf file. In the PXE boot menu, select SL6x-64 kickstart install.
* linux will boot into the graphical installer
* two questions will be asked: how to partition the disks and the root password. The rest of the installation is automatic.
* to partition the disks, select "Custom partioning":
** If using a single SSD (30 or 60 GB), use whole disk for "/" partition (no swap partition)
** If using single HD, create 4 primary partitions (see below)
** If using dual HDs (should be same size), create 4 "RAID1" (see below) (DO NOT USE LVM)
** Use these partition sizes:
*** "/" - 40GB - md0 or sda1
*** swap - 32 GB - md1 or sda2
*** "/home1" - 100 GB - md2 or sda3
*** "/data" - remaining disk space - md3 or sda4
* if installer asks questions about boot loader, accept default settings
* package installation will proceed automatically
* when finished will ask "press button to reboot"
* boot newly installed system
* if installing without a kickstart, some questions need to be answered:
** Firewall: disabled
** SELinux: disabled
** KDump: disabled
** Date and Time: leave kickstart defaults (should be NTP using TRIUMF time servers)
** Create user: skip - will be handled during post-installation
** The system will reboot again
* after the final reboot, login as root and proceed with post-installation.


== Running installer (CentOS7) ==
== Running installer (CentOS7) ==


The CentOS7/SL7 installer is very different from the SL6 installer. There are some improvements, and there are several quirks:
CentOS7 can be installed from vanilla CentOS7 installation media or from
a custom USB key build per there instructions:
https://daqshare.triumf.ca/~olchansk/linux/CentOS7/


* the disk management part was completely FUBARed.
The custom installer makes it easy to use a custom kickstart file (ks.cfg).
* boot loader is now installed to the correct disk (no longer overwrites the usb-installer itself)
* vanilla installer removed all support for NIS and after first boot requires creation of fake local user. To avoid this, use the usb-installer or a custom kickstart installer (remove package "gnome-initial-setup"


Instructions for using the usb-installer:
Instructions for using the usb-installer:


* disconnect machine from network
* disconnect machine from network
* plug the usb-installer into usb3 port (blue colour)
* plug the usb-installer into a usb3 port (blue colour)
* reboot machine, select booting from usb (press F8 on ASUS motherboards)
* reboot machine, select booting from usb (press F8 on ASUS motherboards)
* usb-installer boot menu offers to install CentOS7, go there
* usb-installer boot menu offers to install CentOS7, go there
Line 90: Line 60:
** say "done"
** say "done"
** the "manual partitionning" menu will open
** the "manual partitionning" menu will open
** partition the SSD (good luck figuring out this new menu system).
** recommended is to use 120GB SSD, partition the whole SSD as one large partition ("normal partition" choice), use XFS filesystem (BTRFS is still experimental), no swap. (installer will complain, but accept lack of swap):
*** use the "-" button to delete all existing partitions
*** use the "-" button to delete all existing partitions
*** select "standard partition"
*** select "standard partition"
Line 104: Line 72:
* after installation is complete, reboot the machine
* after installation is complete, reboot the machine
* unplug the usb-installer, CentOS7 should boot from SSD into the login screen
* unplug the usb-installer, CentOS7 should boot from SSD into the login screen
* click on "not listed?", login as root (what's with that?!?)
* click on "not listed?", login as root
* setup network connection:
* setup network connection:
** connect the network cable
** open a terminal
** go to the gnome "network settings" (icon on top-right of screen)
** start "nm-connection-editor"
** select "wired"
** click on "+" to create a new connection profile
** select "wired ethernet"
** select "add profile..."
** select "add profile..."
** in "Identity", set "name" to "static"
** in "Identity", set "name" to "static"
** in "Identity", check that "Connect automatically" and "Make available..." is enabled
** in "Identity", check that "Connect automatically" and "Make available..." is enabled
** in "IPv4", set "Addresses" to "manual" instead of "dhcp"
** in "IPv4", set "Addresses" to "manual" instead of "dhcp"
** enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19
** enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19, search triumf.ca
** say "Add", then close/quit the network settings
** say "Add", then close/quit the network settings
* network should be up, ping something
* connect network cable
* network should be up, ping ladd00 should work
* run: yum update -y
* run: yum update -y
* check new kernel is installed: ls -l /boot
* check new kernel is installed: ls -l /boot
Line 138: Line 108:
</pre>
</pre>


== Configure disks, partitions, raid arrays and filesystems ==
== Set hostname ==


NOTE1: For compatibility with the SL6 installer, use "fdisk -u" when creating new partitions.
Set hostname: (use full name, i.e. daq11.triumf.ca)
 
<pre>
NOTE2a: For 2TB disks or bigger, use "gdisk" to create GPT partitions (yum install epel-release; yum install gdisk)
emacs -nw /etc/hostname
</pre>


NOTE2c: (SL6) 3TB, 4TB, 6TB disks do not require anything special - proceed with installation as normal.
== Configure email ==


Typical disk configuration for DAQ use has 2 large disks with system ("/"), swap, home and data partitions, fully mirrored across the 2 disks using RAID1 software raid (MD).
* TRIUMF: use relayhost = smtp.triumf.ca
* CERN: use relayhost = cernmx.cern.ch


In this fully mirrored configuration, a DAQ system will continue to operate without interruption and without performance degradation when there is a full or partial failure of either of the two disks.
* edit /etc/postfix/main.cf, set "relayhost = smtp.triumf.ca"
* echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca bsmith@triumf.ca" >> ~root/.forward


If disks are hot-swappable, the failed or defective disk can then be physically replaced by a spare, the spare disk can be partioned and added to the RAID1 array, restoring full normal operation, without shutting down or rebooting the system or interrupting data taking. (Since SATA, eSATA and USB are always electrically hot-swappable, disk hot-replacement is more of a mechanical issue).
== Make log files readable ==


For small disks using traditional partitions (<=2TB) a typical layout looks like this:
<pre>
<pre>
[root@ladd06 ~]# fdisk -l  ### use "fdisk -lu" instead!!!
chmod a+r /var/log/messages
chmod a+r /var/log/yum.log
</pre>


Disk /dev/sdb: 750.2 GB, 750156374016 bytes
== Activate /etc/rc.local ==
...
  Device Boot      Start        End      Blocks  Id  System
/dev/sdb1  *          1        5100    40960000  fd  Linux raid autodetect
/dev/sdb2            5100        9179    32768000  fd  Linux raid autodetect
/dev/sdb3            9179      21927  102399603+  fd  Linux raid autodetect
/dev/sdb4          21928      91201  556443405  fd  Linux raid autodetect


Disk /dev/sda: 750.2 GB, 750156374016 bytes
Activate rc.local:
...
<pre>
  Device Boot      Start        End      Blocks  Id  System
chmod a+x /etc/rc.local
/dev/sda1  *          1        5100    40960000  fd  Linux raid autodetect
chmod a+x /etc/rc.d/rc.local  # TL edit
/dev/sda2            5100        9179    32768000  fd  Linux raid autodetect
systemctl enable rc-local
/dev/sda3            9179      21927  102399603+ fd  Linux raid autodetect
systemctl start rc-local
/dev/sda4          21928      91201  556443405  fd  Linux raid autodetect
systemctl status rc-local
...
</pre>
[root@ladd06 ~]# cat /proc/mdstat
Personalities : [raid1]
md3 : active raid1 sdb4[1] sda4[0]
      556442245 blocks super 1.2 [2/2] [UU]
      bitmap: 0/5 pages [0KB], 65536KB chunk


md2 : active raid1 sdb3[1] sda3[0]
== Disable "persistent network names" (DO NOT DO THIS) ==
      102398507 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk


md1 : active raid1 sda2[0] sdb2[1]
<pre>
      32766908 blocks super 1.1 [2/2] [UU]
/bin/touch /etc/udev/rules.d/75-persistent-net-generator.rules
      bitmap: 0/1 pages [0KB], 65536KB chunk
/bin/rm /etc/udev/rules.d/70-persistent-net.rules
 
#shutdown -r now
md0 : active raid1 sda1[0] sdb1[1]
      40959928 blocks super 1.0 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk
...
[root@ladd06 ~]# df -kl
Filesystem          1K-blocks      Used Available Use% Mounted on
/dev/md0              40316208  6222676  32045536  17% /
/dev/md2            100790232    192116  95478192  1% /home1
/dev/md3            547709948    202404 519685432  1% /data6
...
[root@ladd06 ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/md1                                partition      32766900        0      -1
</pre>
</pre>


Typical size of partitions:
== Configure NIS client (CentOS7) ==
* /dev/md0 : "/" : 40 Gbytes should be sufficient. SL5 fits into an 8GB "/" and SL6 fits into a 16GB "/".
* /dev/md1 : swap : 32 Gbytes. Additional swap space can be added using a swap file located on the data disk.
* /dev/md2 : "/home1" : 100 Gbytes. User home directories backed up by the amanda site backup system. Space is limited by the capacity and capability of the backup and archiving system used to protect user data against accidental file deletion, filesystem corruption and disastrous system failures.
* /dev/md3 : "/data" : data partition uses the remaining space on the disks.


Usually, the "/" and swap partitions are created through the SL installer program. The /home and /data partitions can be created at the same time.
<pre>
yum -y install ypbind authconfig
echo "NISTIMEOUT=5" >> /etc/sysconfig/network
echo "NETWORKWAIT=yes" >> /etc/sysconfig/network
authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --nisserver ladd00.triumf.ca --update
ypwhich
ypcat -k passwd
systemctl restart autofs
</pre>
* On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
* Use "system-config-users" to add local user accounts
* enable selinux ssh key login to nfs mounted home directories:
<pre>
setsebool -P use_nfs_home_dirs 1
</pre>


Otherwise, for traditional partitions (disks <2TB) follow these instructions:
== Configure NIS client (CentOS8) ==
* create the partitions using fdisk or similar (this example creates a 60 GB partition):
** fdisk -cu /dev/sda
** Command (m for help): <strong>n</strong>
** Command action ...  <strong>p</strong>
** Partition number ... <strong>2, 3 or 4</strong> according to what has been defined before
** First cylinder ... default
** Last cylinder ...  <strong>+60000M</strong>  or default
** Command action ...  t
** Partition number ... : <strong>2, 3 or 4</strong> according to what has been defined before
** Hex code ... : fd
** Command action ...  <strong>p to check all is correct</strong>
** Command (m for help): <strong>w</strong>
** fdisk /dev/sdb and repeat as above
** Reboot the machine


For GPT partitions (disks >=2TB), do this:
* all the same as for CentOS7
* install gdisk: yum install epel-release; yum install gdisk
* ensure correct boot order for ypbind (in CentOS 8.1 ypbind is started before network is ready, service file uses "Wants" instead of "After")
* gdisk /dev/sdX
** if this is a new disk, do "o" to create a blank partition table
** "n" to create new partition:
*** accept default for partition number
*** accept default for first sector
*** for last sector, say "+40G" to create 40 Gbyte partition, accept default to use all remaining disk space
*** for partition type, say "fd00" to create an mdadm raid partition
** "p" to print the partition table
** "d" to delete wrong partition
** "w" to save and exit
 
Typical GPT layout:
<pre>
<pre>
[root@isdaq01 ~]# gdisk -l /dev/sdh
mkdir /etc/systemd/system/ypbind.service.d
GPT fdisk (gdisk) version 0.8.10
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ypbind.service.d/local.conf
 
systemctl daemon-reload
Partition table scan:
systemctl cat ypbind.service
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present
 
Found valid GPT with protective MBR; using GPT.
Disk /dev/sdh: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): D4FCDE83-12BD-4118-ACA2-702F0E2E57C2
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
 
Number  Start (sector)    End (sector)  Size      Code  Name
  1            2048        83888127  40.0 GiB    FD00  Linux RAID
  2        83888128      150996991  32.0 GiB    FD00  Linux RAID
  3      150996992      360712191  100.0 GiB  FD00  Linux RAID
  4      360712192      3907029134  1.7 TiB    FD00  Linux RAID
[root@isdaq01 ~]#
</pre>
</pre>


* Check the newly created partitions: fdisk -lu /dev/sda; fdisk -lu /dev/sdb
== Configure NIS secondary server (CentOS7) ==
* mdadm --create /dev/md2 --metadata=1.0 --bitmap=internal -l 1 -n 2 /dev/sda3 /dev/sdb3
* Check the progress of building the RAID with: more /proc/mdstat
* When finished: mkfs -t ext4 /dev/md2; tune2fs -i 0 -c 0 /dev/md2
* mkdir /home1
* Add to /etc/fstab: "/dev/md2                /home1                  ext4    defaults        1 2"
* Finally mount this new partition: mount -a
* Repeat from "mkfs" for each of the data partitions


* At this point you should have these disk partitions (single-disk in parenthesis)
Enable local NIS server, make local machine use it:
** /dev/md0 (/dev/sda1, sdb1) is the system partition, 40 GBytes or more
** /dev/md1 (/dev/sda2, sdb2) is the swap partition, 32 GBytes or more
** /dev/md2 (/dev/sda3, sdb3) is the /home1 partition, 100 GBytes or more
** /dev/md3 (/dev/sda4, sdb4) is the data partition


* Add array descriptions to /etc/mdadm.conf:
** mdadm -Ds >> /etc/mdadm.conf
** emacs -nw /etc/mdadm.conf ### remove duplicate entries
Example /etc/mdadm.conf:
<pre>
<pre>
MAILADDR root
yum -y install ypserv
AUTO +imsm +1.x -all
/usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines)
ARRAY /dev/md0 metadata=1.0 name=isdaq01.triumf.ca:0 UUID=055f0455:18401f41:b12abf53:2b23eca0
### ypinit will give lots of errors about "rpc.ypxfrd failed: RPC: Can't decode result"; can be ignored
ARRAY /dev/md1 metadata=1.0 name=isdaq01.triumf.ca:1 UUID=dde05275:17961aaf:7c864e3a:c51477d6
systemctl disable ypxfrd yppasswdd
ARRAY /dev/md2 metadata=1.0 name=isdaq01.triumf.ca:2 UUID=e430ba44:361f1807:41f0c491:53c10438
systemctl stop ypxfrd yppasswdd
ARRAY /dev/md3 metadata=1.0 name=isdaq01.triumf.ca:3 UUID=a34d8c5b:cb65a435:be8ee01d:7f988927
systemctl enable rpcbind ypserv
systemctl start rpcbind ypserv
emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost"
systemctl restart ypbind
ypwhich # should say "localhost"
ypcat -k auto.master # should work
</pre>
</pre>


* (SL5.5 or newer) enable raid1 bitmap files, for each /dev/mdX device: mdadm --grow --bitmap=internal /dev/mdX
Punch hole in the firewall: (or "make" on NIS master will complain)


== Restore data from backups ==
<pre>
echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network
systemctl restart ypserv
firewall-cmd --get-services
firewall-cmd --add-service rpc-bind --permanent
firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent
firewall-cmd --reload
firewall-cmd --list-all
</pre>


* (on midm15/midm9b/midm20 only) install correct ethernet driver eepro100 not e100
* on the NIS master:
* restore /home (non-NIS) or /home1 (NIS) and other required user directories from backup. (Can use /triumfcs/trshare/midas/Disks/rsync_back.csh ).
** add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
* if needed, for non-NIS only, make a softlink for /home1: ln -s /home /home1
*** TL (2020-09): we not doing this anymore?  I guess it doesn't work anyway...
* restore users accounts (non-NIS and NIS master only): edit /etc/passwd and /etc/shadow, append users' login info to the end of these files from the backup versions.
** if using /var/yp/securenets, copy it from NIS master to new NIS secondary server


== Post installation ==
Enable hourly NIS update cron job (DO THIS AFTER git pull scripts, see below)
 
* echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca" >> ~root/.forward
* emacs -nw /etc/sysconfig/network
** set "HOSTNAME=" (set it to blank to use hostname from DHCP)
** set "NETWORKWAIT=yes"
* (not needed for SL6.1, NEEDED for SL6->6.1 update) in /etc/hosts, remove exteraneous entries - only entries for localhost and localhost6 should remain
* disable selinux: edit /etc/sysconfig/selinux, change line to read: SELINUX=disabled, reboot later for change to take effect
* chmod a+r /var/log/messages
* chmod a+r /var/log/yum.log
 
== Post installation CentOS7 ==


<pre>
<pre>
CentOS 7.1 default installer will be stuck at the "create local user" screen. To proceed without creating fake local users, do:
cd ~/git/scripts
yum erase gnome-initial-setup
git pull
killall Xorg
cd etc
cd ~/git/scripts/etc; ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
</pre>
</pre>


Set hostname: (use full name, i.e. daq11.triumf.ca)
== Configure AUTOFS (CentOS7) ==
 
<pre>
<pre>
emacs -nw /etc/hostname
yum -y install autofs
systemctl enable autofs
systemctl start autofs
ls -l /daq/daqshare
</pre>
</pre>


<pre>
echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca" >> ~root/.forward
chmod a+r /var/log/messages
chmod a+r /var/log/yum.log
</pre>


Activate rc.local:
<pre>
chmod a+x /etc/rc.local
chmod u+x /etc/rc.d/rc.local  # TL edit
systemctl start rc-local
systemctl status rc-local
</pre>


== Disable "persistent network names" (DO NOT DO THIS) ==
== Label Selinux labels ==
 
When upgrading non-selinux machines (el6) to el7 (selinux enforcing) the existing
user home directories will not have the correct selinux labels and many things
will not work, including ssh logins (sshd cannot access ~user/.ssh files).


<pre>
<pre>
/bin/touch /etc/udev/rules.d/75-persistent-net-generator.rules
semanage fcontext -a -e /home /home1 ### selinux has special rules for /home, assign them to /home1
/bin/rm /etc/udev/rules.d/70-persistent-net.rules
restorecon -R -v /home1 ### apply the new rules to files in /home1
#shutdown -r now
ls -Zd /home1/alpha/.ssh
# should say: drwx------. alpha users system_u:object_r:ssh_home_t:s0  /home1/alpha/.ssh
</pre>
</pre>


== Configure NIS master (OPTIONAL) ==
== Configure time (CentOS7) ==


(do not use SL6.2 for NIS master)
Time server ntpd was replaced by chronyd.


* yum install ypserv
<pre>
* domainname DEAP-NIS
yum -y install chrony
* cd /var/yp
echo server time1 iburst >> /etc/chrony.conf
* edit Makefile
echo server time2 iburst >> /etc/chrony.conf
** change NOPUSH=false
echo server time3 iburst >> /etc/chrony.conf
** change the "all:" entry to read: all: passwd group netgrp shadow auto.master auto.home auto.local ypservers
systemctl enable chronyd
* touch /etc/netgroup /etc/auto.home /etc/auto.local ./ypservers
systemctl restart chronyd
* make
chronyc sources
* inspect created NIS maps: ls -l DEAP-NIS
chronyc tracking
* chkconfig ypserv on
</pre>
* chkconfig ypxfrd on
* chkconfig yppasswdd on
* service ypserv start


== Configure NIS client ==
* if desired, edit /etc/chrony.conf, remove non-triumf time servers


* run "authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --update"
== Enable automatic system updates (CentOS7) ==
* if NIS server is SL6.2, add "--nisserver=ladd00" to above command
* (not needed with --enablepreferdns above) run "sed 's/^hosts:.*/hosts: files dns/' -i /etc/nsswitch.conf" (to undo a mistake from authconfig)
* On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
* Use "system-config-users" to add local user accounts
* NIS: check user accounts: run "ypcat -k passwd"
* echo "NISTIMEOUT=5" >> /etc/sysconfig/network
* echo "NETWORKWAIT=yes" >> /etc/sysconfig/network


== Configure NIS client (CentOS7) ==
Disable yum-cron:


<pre>
<pre>
yum -y install ypbind authconfig
rpm --erase yum-cron
echo "NISTIMEOUT=5" >> /etc/sysconfig/network
/bin/rm -v /var/lock/subsys/yum-cron
echo "NETWORKWAIT=yes" >> /etc/sysconfig/network
/bin/rm -v /etc/cron.daily/0yum-daily.cron
authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --nisserver ladd00.triumf.ca --update
/bin/rm -v /etc/cron.hourly/0yum-hourly.cron
ypwhich
ypcat -k passwd
systemctl restart autofs
</pre>
</pre>
* On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
 
* Use "system-config-users" to add local user accounts
Enable yum-autoupdate:
* enable selinux ssh key login to nfs mounted home directories:
 
<pre>
<pre>
setsebool -P use_nfs_home_dirs 1
yum install -y epel-release
yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm
#rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm
systemctl enable yum-autoupdate
systemctl start yum-autoupdate
systemctl status yum-autoupdate
</pre>
</pre>


== Configure NIS secondary server (OPTIONAL) ==
== Disable automatic system updates (CentOS7) ==


<pre>
<pre>
yum -y install ypserv
yum -y erase yum-autoupdate
ypwhich -m # to identify hostname of nis master for next step:
/bin/rm -f /etc/sysconfig/yum-autoupdate.rpmsave
/usr/lib64/yp/ypinit -s ladd00 # /usr/lib/yp/ypinit on 32-bit machines
/bin/rm -f /var/lock/subsys/yum-autoupdate
chkconfig ypserv on
service ypserv start
emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost"
service ypbind restart
ypwhich # should report "localhost"
ypcat auto.master # should work
</pre>
</pre>


* on the NIS master:
== Enable automatic system updates (CentOS8) ==
** add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
** if using /var/yp/securenets, copy it from NIS master to new NIS secondary server
 
== Configure NIS secondary server (CentOS7) ==


Enable local NIS server, make local machine use it:
<pre>
yum -y install dnf-automatic
systemctl enable --now dnf-automatic.timer
systemctl list-timers *dnf-*
</pre>


edit /etc/dnf/automatic.conf
<pre>
<pre>
yum -y install ypserv
apply_updates = yes
/usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines)
</pre>
systemctl enable rpcbind ypserv ypxfrd yppasswdd
 
systemctl start rpcbind ypserv ypxfrd yppasswdd
== Configure system services (CentOS7) ==
emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost"
systemctl restart ypbind
ypwhich # should say "localhost"
ypcat -k auto.master # should work
</pre>
 
Punch hole in the firewall: (or "make" on NIS master will complain)


* systemctl list-unit-files | grep enabled | sort ### (to see enabled services)
* disable unwanted services:
<pre>
<pre>
echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network
systemctl disable bluetooth
systemctl restart ypserv
systemctl disable dm-event
firewall-cmd --get-services
systemctl disable dmraid-activation
firewall-cmd --add-service rpc-bind --permanent
systemctl disable iscsid
firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent
systemctl disable iscsi
firewall-cmd --reload
systemctl disable iscsiuio
firewall-cmd --list-all
systemctl disable libvirtd
systemctl disable lvm2-lmetad
systemctl disable lvm2-monitor
systemctl disable ModemManager
systemctl disable multipathd
systemctl disable netcf-transaction
systemctl disable lvm2-lvmetad.socket
systemctl disable lvm2-lvmpolld.socket
systemctl disable iscsid.socket
systemctl disable iscsiuio.socket
systemctl disable ksm
systemctl disable ksmtuned
#systemctl disable
</pre>
</pre>


* on the NIS master:
== Erase unwanted packages (CentOS7) ==
** add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
** if using /var/yp/securenets, copy it from NIS master to new NIS secondary server


Enable hourly NIS update cron job
* PackageKit # bugs users about security updates, hogs yum lock
* perl-homedir # creates unwanted $HOME/perl5
* ModemManager # thinks that all USB-attached devices are modems
* pcp # sends error email to itself, does not work
* abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted
* rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken
* bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful.


<pre>
<pre>
cd ~/git/scripts
yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion
git pull
cd etc
cd ~/git/scripts/etc; ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
</pre>
</pre>


== Configure AUTOFS ==
== Disable unwanted package "tracker" ==
 
The "tracker" package is part of the GNOME desktop, it scans the content of all files
into a database for quick searching.
 
When it malfunctions, bad things happen, i.e. read through
https://bugzilla.redhat.com/show_bug.cgi?id=747689


* (if NIS master or standalone) check /etc/auto.* against backups, particularly auto.master if NIS master
Specific problem I see is that it floods the system log with error messages. Also
* (if needed) add "+auto.master" at the end of /etc/auto.master
consumes network and filesystem bandwidth for NFS mounted home directories.
* restart autofs to use the newly configured NIS maps: "service autofs stop; service autofs start"


== Configure AUTOFS (CentOS7) ==
This package cannot be removed by "yum erase tracker" dues to dependencies
from core GNOME desktop.
 
Instead, do this to deactivate it:


<pre>
<pre>
yum -y install autofs
chmod -x /usr/libexec/tracker-*
systemctl enable autofs
chmod -x /usr/bin/tracker
systemctl start autofs
chattr +i /usr/bin/tracker
ls -l /daq/daqshare
chattr +i /usr/libexec/tracker-*
</pre>
</pre>


== Configure external package repositories (CentOS7) ==


EPEL: (addtional packages)
<pre>
yum install epel-release
</pre>


== Label Selinux labels ==
ELREPO: (kernel modules and drivers) (CentOS8)
 
<pre>
When upgrading non-selinux machines (el6) to el7 (selinux enforcing) the existing
yum install elrepo-release
user home directories will not have the correct selinux labels and many things
</pre>
will not work, including ssh logins (sshd cannot access ~user/.ssh files).


ELREPO: (kernel drivers)
<pre>
<pre>
semanage fcontext -a -e /home /home1 ### selinux has special rules for /home, assign them to /home1
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
restorecon -R -v /home1 ### apply the new rules to files in /home1
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
ls -Zd /home1/alpha/.ssh
yum -y install yum-plugin-fastestmirror
# should say: drwx------. alpha users system_u:object_r:ssh_home_t:s0  /home1/alpha/.ssh
</pre>
</pre>


== Configure time with chronyd (SL6) ==
== Install packages needed to continue with installation ==


Use chronyd instead of ntpd.
(+CentOS7)


<pre>
(these packages are sometimes missing, they are needed to follow following instructions instructions)
yum -y install chrony
 
echo server time1 iburst >> /etc/chrony.conf
(SL6.5: libotf is a dependancy of emacs - SL6.5 installer fails to install it)
echo server time2 iburst >> /etc/chrony.conf
 
echo server time3 iburst >> /etc/chrony.conf
<pre>
chkconfig --level 123456 ntpd off
yum install ed patch wget git libotf gdisk emacs perl
chkconfig --level 123456 ntpdate off
service ntpd stop
chkconfig chronyd on
service chronyd restart
chronyc sources
chronyc tracking
</pre>
</pre>


* if desired, edit /etc/chrony.conf, remove non-triumf time servers
== Configure Konstantin's scripts ==


== Configure time (CentOS7) ==
(+Centos7)
 
Time server ntpd was replaced by chronyd.


<pre>
<pre>
yum -y install chrony
mkdir ~root/git
echo server time1 iburst >> /etc/chrony.conf
cd ~root/git
echo server time2 iburst >> /etc/chrony.conf
git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git
echo server time3 iburst >> /etc/chrony.conf
cd scripts
systemctl enable chronyd
git pull
systemctl restart chronyd
chronyc sources
chronyc tracking
</pre>
</pre>


* if desired, edit /etc/chrony.conf, remove non-triumf time servers
Go back to the NIS slave server and install the hourly NIS update cron job.


== Enable automatic kernel updates (SL6) ==
== Enable yum version lock ==


* enable kernel updates: sed 's/^EXCLUDE=/#EXCLUDE=/' -i /etc/sysconfig/yum-autoupdate
<pre>
yum install yum-plugin-versionlock
#yum versionlock packagename # yum versionlock rpcbind
#yum versionlock list # list locked packages
#yum versionlock delete packagename # unlock given package
#yum versionlock clear # delete all locks
</pre>


== Enable automatic system updates (CentOS7) ==
== Configure trusted ssh keys ==


Disable yum-cron:
(+CentOS7)


<pre>
<pre>
rpm --erase yum-cron
ssh localhost
/bin/rm -v /var/lock/subsys/yum-cron
interrupt by Ctrl-C
/bin/rm -v /etc/cron.daily/0yum-daily.cron
/bin/cp ~/git/scripts/etc/authorized_keys ~/.ssh/
/bin/rm -v /etc/cron.hourly/0yum-hourly.cron
</pre>
</pre>


Enable yum-autoupdate:
== Configure hardware sensors ==
 
* yum -y install lm_sensors
* sensors-detect (accept default answer to all questions - press ENTER)
* systemctl restart lm_sensors
* sensors (to see available sensors)
 
If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page.
 
== Configure IPMI sensors ==
 
Some machines support the IPMI interface for monitoring the hardware: fan speeds, temperatures, voltages.


* find out if IPMI is supported. Try this:
<pre>
<pre>
yum install -y epel-release
dmidecode | grep -i ipmi
yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm
#rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm
systemctl enable yum-autoupdate
systemctl start yum-autoupdate
systemctl status yum-autoupdate
</pre>
</pre>
 
if output is not blank, IPMI is maybe supported.
== Disable automatic system updates (CentOS7) ==
* install and enable IPMI software:
 
<pre>
<pre>
yum -y erase yum-autoupdate
yum install "OpenIPMI*" ipmitool
/bin/rm -f /etc/sysconfig/yum-autoupdate.rpmsave
service ipmi start
/bin/rm -f /var/lock/subsys/yum-autoupdate
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
chkconfig ipmi on
chkconfig ipmievd on
service ipmi restart
service ipmievd restart
tail -100 /var/log/messages ### look at messages logged by ipmievd
</pre>
</pre>
 
* (CentOS7) install and enable IPMI software:
== Configure system services ==
<pre>
 
yum install "OpenIPMI*" ipmitool
* chkconfig --list | grep :on | sort (to see enabled services)
systemctl start ipmi
* disable unwanted services:
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
<pre>(only if amanda is not used) -&gt; chkconfig --level 12345 xinetd off
systemctl list-unit-files | grep -i ipmi
chkconfig --level 12345 canna off
systemctl enable ipmi
chkconfig --level 12345 FreeWnn off
systemctl restart ipmi
chkconfig --level 12345 hpoj off
systemctl status ipmi
chkconfig --level 12345 ip6tables off
systemctl enable ipmievd
chkconfig --level 12345 iptables off
systemctl restart ipmievd
chkconfig --level 12345 isdn off
systemctl status ipmievd
chkconfig --level 12345 pcmcia off
tail -100 /var/log/messages ### look at messages logged by ipmievd
chkconfig --level 12345 rhnsd off
chkconfig --level 12345 spamassassin off
chkconfig --level 12345 bluetooth off
chkconfig --level 12345 apmd off
chkconfig --level 12345 iiim off
chkconfig --level 12345 fenced off
chkconfig --level 12345 ccsd off
chkconfig --level 12345 cpuspeed off
chkconfig --level 12345 pcp off
chkconfig --level 12345 pmie off
chkconfig --level 12345 yum-updatesd off
chkconfig --level 12345 clvmd off
chkconfig --level 12345 cman off
chkconfig --level 12345 lvm2-monitor off
chkconfig --level 12345 modclusterd off
chkconfig --level 12345 yum-updateonboot off
chkconfig --level 12345 cmirror off
chkconfig --level 12345 lock_gulmd off
chkconfig --level 12345 firstboot off
chkconfig --level 12345 ricci off
chkconfig --level 12345 gfs off
chkconfig --level 12345 scsi_reserve off
chkconfig --level 12345 openibd off
chkconfig --level 12345 arptables_jf off
chkconfig --level 12345 auditd off
chkconfig --level 12345 avahi-daemon off
chkconfig --level 12345 hplip off
chkconfig --level 12345 iscsi off
chkconfig --level 12345 iscsid off
chkconfig --level 12345 mcstrans off
chkconfig --level 12345 pcscd off
chkconfig --level 12345 restorecond off
chkconfig --level 12345 setroubleshoot off
chkconfig --level 12345 xend off
chkconfig --level 12345 xendomains off
chkconfig --level 12345 kudzu off
#chkconfig --level 12345 yum-cron off
chkconfig --level 12345 kdump off
chkconfig --level 12345 libvirt-guests off
chkconfig --level 12345 libvirtd off
chkconfig --level 12345 spice-vdagentd off
chkconfig --level 12345 ksm off
chkconfig --level 12345 ksmtuned off
chkconfig --level 12345 iscsi off
chkconfig --level 12345 iscsid off
chkconfig --level 12345 openct off
chkconfig --level 12345 blk-availability off
chkconfig --level 12345 fcoe off
chkconfig --level 12345 lldpad off
</pre>
</pre>


== Configure system services (CentOS7) ==
* if ipmievd complains about SEL buffer overflow, clear it manually:
<pre>
ipmitool sel list ### show ipmi messages in raw format
ipmitool sel elist ### show ipmi messages in useful format
ipmitool sel elist > file ### save ipmi messages into a file
ipmitool sel clear  ### clear all accumulated ipmi messages
</pre>


* systemctl list-unit-files | grep enabled | sort ### (to see enabled services)
* useful ipmi commands:
* disable unwanted services:
** ipmitool sensor -- read hardware sensors
<pre>
** ipmitool sel elist -- report all accumulated messages
systemctl disable bluetooth
 
systemctl disable dm-event
== Configure ECC memory ==
systemctl disable dmraid-activation
systemctl disable iscsid
systemctl disable iscsi
systemctl disable iscsiuio
systemctl disable libvirtd
systemctl disable lvm2-lmetad
systemctl disable lvm2-monitor
systemctl disable ModemManager
systemctl disable multipathd
systemctl disable netcf-transaction
systemctl disable lvm2-lvmetad.socket
systemctl disable lvm2-lvmpolld.socket
systemctl disable iscsid.socket
systemctl disable iscsiuio.socket
#systemctl disable
</pre>


== Erase unwanted packages ==
* check that machine has ECC memory: dmidecode --type memory | grep -i ecc


<pre>
Configure mcelog (machine check exception)
yum erase PackageKit # bugs users about security updates
</pre>


== Erase unwanted packages (CentOS7) ==
* yum install mcelog
* check that mcelog is running: ps -efw | grep mcelog
* (el6) chkconfig mcelogd on; service mcelogd restart
* (el7) systemctl status mcelog.service; systemctl enable mcelog.service; systemctl restart mcelog.service


* PackageKit # bugs users about security updates, hogs yum lock
Check for MCE (machine check exception) messages:
* perl-homedir # creates unwanted $HOME/perl5
* ModemManager # thinks that all USB-attached devices are modems
* pcp # sends error email to itself, does not work
* abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted
* rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken
* bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful.


<pre>
* mcelog --client
yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion
* grep -i mce /var/log/messages*
</pre>
* grep -i ecc /var/log/messages*


== Configure external package repositories ==
Configure EDAC


<pre>
<pre>
yum install elrepo-release epel-release
yum install edac-utils
</pre>
edac-ctl --mainboard
edac-ctl --status
lsmod | grep edac
modprobe ie31200_edac ### driver for Intel E3-1200 series ECC memory


== Configure external package repositories (CentOS7) ==
[root@grsmid00 ~]# ls -l /sys/devices/system/edac/mc/
... empty


EPEL: (addtional packages)
[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/
<pre>
drwxr-xr-x. 15 root root    0 Oct 25 16:40 mc0
yum install epel-release
...
</pre>
[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_count
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_noinfo_count
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow0
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow1
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow2
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow3
-r--r--r--. 1 root root 4096 Oct 25 16:40 max_location
-r--r--r--. 1 root root 4096 Oct 25 16:40 mc_name
drwxr-xr-x. 2 root root    0 Oct 25 16:40 power
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank0
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank1
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank2
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank3
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank4
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank5
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank6
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank7
--w-------. 1 root root 4096 Oct 25 16:40 reset_counters
-r--r--r--. 1 root root 4096 Oct 25 16:40 seconds_since_reset
-r--r--r--. 1 root root 4096 Oct 25 16:40 size_mb
lrwxrwxrwx. 1 root root    0 Oct  2 12:02 subsystem -> ../../../../../bus/mc0
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_count
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_noinfo_count
-rw-r--r--. 1 root root 4096 Oct 25 16:40 uevent
[root@alpha00 ~]#


ELREPO: (kernel drivers)
[root@alpha00 ~]# edac-ctl --status
<pre>
edac-ctl: drivers are loaded.
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum -y install yum-plugin-fastestmirror
</pre>


== Install packages needed to continue with installation ==
[root@alpha00 ~]# edac-util
edac-util: No errors to report.


(+CentOS7)
[root@alpha00 ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
</pre>


(these packages are sometimes missing, they are needed to follow following instructions instructions)
== Configure SMARTD (CentOS7) ==


(SL6.5: libotf is a dependancy of emacs - SL6.5 installer fails to install it)
Default el7 smartd config files send deficient email notices about disk failures. Overwrite.


<pre>
<pre>
yum install ed patch wget git libotf gdisk emacs
/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/
/bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/
systemctl enable smartd
systemctl restart smartd
systemctl status smartd
</pre>
</pre>


== Configure TRIUMF packages ==
== Enable User Disk Quotas (OPTIONAL) ==


(only for machines on the TRIUMF network)
(+CentOS7)
 
(TRIUMF kickstart usually installs this automatically)


* read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html
* emacs -nw /etc/fstab, add "grpquota,usrquota" to filesystem options, e.g.:
<pre>
<pre>
rpm -vh --install  http://mirror.triumf.ca/triumf/6/x86_64/Packages/triumf-release-1.4-1.noarch.rpm
[root@isdaq00 home1]# grep quota /etc/fstab
yum install triumf-ssh triumf-syslog
UUID=5a2aefbd-45db-475e-841e-12ec89220fbd /home1 ext4 defaults,grpquota,usrquota 1 2
</pre>
 
== Configure TRIUMF packages (CentOS7) ==
 
(only for machines on the TRIUMF network)
 
<pre>
# TL Was rpm -vh --install http://mirror.triumf.ca/triumf/6/x86_64/RPMS/triumf-release-1.4-1.noarch.rpm
rpm -vh --install  http://mirror.triumf.ca/triumf/6/x86_64/Packages/triumf-release-1.4-1.noarch.rpm
yum install triumf-ssh triumf-syslog
</pre>
</pre>
 
* cd /; umount /home1; mount /home1
== Configure Konstantin's scripts ==
* quotacheck -cug /home1
 
* quotacheck -avug
(+Centos7)
* quotaon -av
 
* quota system is now active
* increase the soft quota time limit from default 7days to 30 or 60 days: edquota -t
* set quotas for all users (see below)
* setup warnquota:
** create warnquota config file: emacs -nw /etc/warnquota.conf
<pre>
<pre>
mkdir ~root/git
# values can be quoted:
cd ~root/git
MAIL_CMD        = "/usr/sbin/sendmail -t"
git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git
FROM            = root
cd scripts
SUBJECT        = User %i@%h exceeded allocated disk quota
git pull
CC_TO          = "root"
# If you set this variable CC will be used only when user has less than
# specified grace time left (examples of possible times: 5 seconds, 1 minute,
# 12 hours, 5 days)
# CC_BEFORE = 2 days
SUPPORT        = "root"
# Text in the beginning of the mail (if not specified, default text is used)
# This way text can be split to more lines
# Line breaks are done by '|' character
# The expressions %i, %h, %d, and %% are substituted for user/group name,
# host name, domain name, and '%' respectively. For backward compatibility
# %s behaves as %i but is deprecated.
MESSAGE        = User "%i" on "%h" has exceeded the allocated disk quota.||Please delete any unnecessary files on following filesystems or|contact the system administrato
r to increase your quota allocation:|
SIGNATURE      = --|automated email from warnquota
</pre>
</pre>
 
** note that %i@%h in the SUBJECT line do not seem to work
== Enable yum version lock ==
** create cron job: emacs -nw /etc/cron.daily/warnquota
 
DO THIS ONLY IF NEEDED
 
<pre>
<pre>
yum install yum-plugin-versionlock
#!/bin/sh
yum versionlock packagename # yum versionlock rpcbind
warnquota
yum versionlock list # list locked packages
#end
yum versionlock delete packagename # unlock given package
yum versionlock clear # delete all locks
</pre>
</pre>
** chmod a+x /etc/cron.daily/warnquota
** touch /etc/crontab


== Configure TRIUMF mirror of yum repositories (SL6) ==
Useful commands for managing quotas:
 
* repquota -a | sort -n -k3 ### show quota of all users sorted by disk usage
(only for machines on TRIUMF network)
* edquota -u username ### open "vi" editor to change user quotas
* repquote -a | grep username ### report quota for given user
* setquota -u username 0 0 0 0 /home1 ### disable quotas for given user
* setquota -u username 50000000 100000000 0 0 /home1 ### set quotas for 50GB soft and 100GB hard
* edquota -t ### change user quota time limits
* edquote -tg ### change group quota time limits


* if /daq/mirror is available: /bin/cp ~/git/scripts/etc/daq-mirror-SL6.repo /etc/yum.repos.d/
== Enable NFS V4 server (CentOS7) ==
* if /triumfcs/mirror is available: /bin/cp ~/git/scripts/etc/triumfcs-mirror-SL6.repo /etc/yum.repos.d/
* otherwise: /bin/cp ~/git/scripts/etc/triumf-SL6.repo /etc/yum.repos.d/


then disable external repositories:
* create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...)
<pre>
<pre>
yum clean all
/home1  @home_export(rw,no_root_squash,async,fsid=1)
yum-config-manager --disable epel
/data1  @data_export(rw,no_root_squash,async,fsid=2)
yum-config-manager --disable elrepo
</pre>
yum-config-manager --disable sl
* check the netgroup file
yum-config-manager --disable sl-security
** if using NIS: check NIS netgroup: ypcat -k netgroup
yum-config-manager --disable sl6x
** if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
yum-config-manager --disable sl6x-security
** if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
yum clean all
* enable things, start them:
</pre>
 
== Configure trusted ssh keys ==
 
(+CentOS7)
 
<pre>
<pre>
ssh localhost
firewall-cmd --get-services
interrupt by Ctrl-C
firewall-cmd --permanent --add-service=nfs
/bin/cp ~/git/scripts/etc/authorized_keys ~/.ssh/
firewall-cmd --permanent --add-service=rpc-bind ### needed for ubuntu automounter
</pre>
firewall-cmd --reload
firewall-cmd --list-all
systemctl enable nfs-server
systemctl start nfs-server
systemctl status nfs
</pre>


== Configure hardware sensors ==
== Enable NFS V3 server (CentOS7) ==
 
* yum install lm_sensors kmod-k10temp kmod-coretemp
* sensors-detect (accept default answer to all questions - press ENTER)
* service lm_sensors restart (to reload the kernel modules)
* sensors (to see available sensors)


If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page.
== Configure coretemp CPU sensors ==
On some machines, the coretemp driver for Intel CPU temperature sensors is not loaded after the above steps.
* sensors | grep coretemp ### number of sensors reported should be the same as the number of CPU cores
* if output is blank, add this to /etc/rc.local
<pre>
<pre>
emacs -nw /etc/rc.local
ps -efw | grep rpc.mountd # should be running!
modprobe coretemp
firewall-cmd --get-services
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload
firewall-cmd --list-all
</pre>
</pre>


== Configure IPMI sensors ==
== Enable NFS V3 server ==


Some machines support the IPMI interface for monitoring the hardware: fan speeds, temperatures, voltages.
* edit /etc/hosts.allow, add or uncomment "mountd: 142.90.0.0/255.255.0.0"
 
* create /etc/exports. example:
* find out if IPMI is supported. Try this:
<pre>
<pre>
dmidecode | grep -i ipmi
/home1  @home_export(rw,no_root_squash,async)
</pre>
/data1  @data_export(rw,no_root_squash,async)
if output is not blank, IPMI is maybe supported.
* install and enable IPMI software:
<pre>
yum install "OpenIPMI*" ipmitool
service ipmi start
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
chkconfig ipmi on
chkconfig ipmievd on
service ipmi restart
service ipmievd restart
tail -100 /var/log/messages ### look at messages logged by ipmievd
</pre>
* (CentOS7) install and enable IPMI software:
<pre>
yum install "OpenIPMI*" ipmitool
systemctl start ipmi
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
systemctl list-unit-files | grep -i ipmi
systemctl enable ipmi
systemctl restart ipmi
systemctl status ipmi
systemctl enable ipmievd
systemctl restart ipmievd
systemctl status ipmievd
tail -100 /var/log/messages ### look at messages logged by ipmievd
</pre>
</pre>
* check the netgroup file
** if using NIS: check NIS netgroup: ypcat -k netgroup
** if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
** if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
* chkconfig nfs on
* chkconfig nfslock on
* service nfs restart
Then on ladd00 need to do
* ssh to root@ladd00
* edit /etc/auto.daq to add new machine...
* make -C /var/yp
== Enable NFS V4 SERVER (SL6) ==


* if ipmievd complains about SEL buffer overflow, clear it manually:
* if used with NIS, same as NFSv3
<pre>
* if used as standalone, need to edit idmapd.conf - set the "Domain" name to the same value on NFS server and NFS slave (default automagically determined value does not always work). More TBW.
ipmitool sel list ### show ipmi messages in raw format
ipmitool sel elist ### show ipmi messages in useful format
ipmitool sel elist > file ### save ipmi messages into a file
ipmitool sel clear  ### clear all accumulated ipmi messages
</pre>


* useful ipmi commands:
== Enable AMANDA backups ==
** ipmitool sensor -- read hardware sensors
** ipmitool sel elist -- report all accumulated messages


== Configure SMARTD (CentOS7) ==
AMANDA backups are already enabled by TRIUMF kickstart installs. For non-kickstart installation, follow instructions at [[http://amanda/~amanda http://amanda/~amanda]], or look at "/triumfcs/trshare/olchansk/linux/amanda/amanda-enable.perl". As final step, use [[https://helpdesk.triumf.ca https://helpdesk.triumf.ca]] to contact TRIUMF CS to add this new machine to the amanda backup list.


Default el7 smartd config files send deficient email notices about disk failures. Overwrite.
* yum install triumf-amanda
 
== Enable AMANDA backups (CentOS7) ==


<pre>
<pre>
/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/
yum install amanda-client
/bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/
systemctl list-unit-files | grep -i amanda
systemctl restart smartd
#systemctl enable amanda
systemctl status smartd
systemctl enable amanda.socket
systemctl enable amanda-udp.socket
systemctl restart amanda.socket
systemctl restart amanda-udp.socket
firewall-cmd --get-services
firewall-cmd --permanent --add-service=amanda-client
firewall-cmd --reload
firewall-cmd --list-all
echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts
</pre>
</pre>


== Enable User Disk Quotas (OPTIONAL) ==
On amanda server, add new machine to the disklist, then:
 
(+CentOS7)


* read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html
* emacs -nw /etc/fstab, add "grpquota,usrquota" to filesystem options, e.g.:
<pre>
<pre>
[root@isdaq00 home1]# grep quota /etc/fstab
amcheck -c daily titan00
UUID=5a2aefbd-45db-475e-841e-12ec89220fbd /home1 ext4 defaults,grpquota,usrquota 1 2
</pre>
</pre>
* cd /; umount /home1; mount /home1
 
* quotacheck -cug /home1
== Enable DCACHE ==
* quotacheck -avug
 
* quotaon -av
DAQ dcache server is mounted as
* quota system is now active
 
* increase the soft quota time limit from default 7days to 30 or 60 days: edquota -t
/daq/pnfs/triumf.ca/data/
* set quotas for all users (see below)
 
* setup warnquota:
For Centos-7 machines, you need to adjust the firewall rules in order to be able to communicate with the trdata machines; this is only necessary if you are copying data to trdata.  The firewall changes are
** create warnquota config file: emacs -nw /etc/warnquota.conf
 
<pre>
<pre>
# values can be quoted:
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.212/32" port protocol="tcp" port="0-65535" accept"
MAIL_CMD        = "/usr/sbin/sendmail -t"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.107.156/32" port protocol="tcp" port="0-65535" accept"
FROM            = root
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.219/32" port protocol="tcp" port="0-65535" accept"
SUBJECT        = User %i@%h exceeded allocated disk quota
firewall-cmd --reload
CC_TO          = "root"
firewall-cmd --list-all
# If you set this variable CC will be used only when user has less than
</pre>
# specified grace time left (examples of possible times: 5 seconds, 1 minute,
# 12 hours, 5 days)
# CC_BEFORE = 2 days
SUPPORT        = "root"
# Text in the beginning of the mail (if not specified, default text is used)
# This way text can be split to more lines
# Line breaks are done by '|' character
# The expressions %i, %h, %d, and %% are substituted for user/group name,
# host name, domain name, and '%' respectively. For backward compatibility
# %s behaves as %i but is deprecated.
MESSAGE        = User "%i" on "%h" has exceeded the allocated disk quota.||Please delete any unnecessary files on following filesystems or|contact the system administrato
r to increase your quota allocation:|
SIGNATURE      = --|automated email from warnquota
</pre>
** note that %i@%h in the SUBJECT line do not seem to work
** create cron job: emacs -nw /etc/cron.daily/warnquota
<pre>
#!/bin/sh
warnquota
#end
</pre>
** chmod a+x /etc/cron.daily/warnquota
** touch /etc/crontab


Useful commands for managing quotas:
This instructions are unnecessary
* repquota -a | sort -n -k3 ### show quota of all users sorted by disk usage
* # mkdir -p /pnfs
* edquota -u username ### open "vi" editor to change user quotas
* # edit /etc/rc.local, add to the end of file: "mount -o intr,rw,noac,hard,nfsvers=3 trdata00:/pnfs /pnfs &"
* repquote -a | grep username ### report quota for given user
* # . /etc/rc.local
* setquota -u username 0 0 0 0 /home1 ### disable quotas for given user
 
* setquota -u username 50000000 100000000 0 0 /home1 ### set quotas for 50GB soft and 100GB hard
For more information on, see [[TrdataDcache]] dcache page.
* edquota -t ### change user quota time limits
 
* edquote -tg ### change group quota time limits
== Configure Ganglia (Centos7) ==


== Enable NFS V4 server (CentOS7) ==
CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2)


* create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...)
<pre>
<pre>
/home1  @home_export(rw,no_root_squash,async,fsid=1)
/bin/rm /etc/gmond.conf
/data1  @data_export(rw,no_root_squash,async,fsid=2)
yum -y install "ganglia-gmond*"
/bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf  # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog
/bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf
yum erase -y ganglia-vmstat ganglia-sensors ganglia-top ganglia-smart ganglia-cpumhz
cd ~/git/scripts
git pull
/bin/cp etc/gmond.conf /etc/ganglia/gmond.conf
systemctl enable gmond
systemctl restart gmond
systemctl status gmond
cd ganglia
./ganglia-all.perl
make install
cd ~
</pre>
</pre>
* check the netgroup file
 
** if using NIS: check NIS netgroup: ypcat -k netgroup
== Configure Ganglia (Centos8) ==
** if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
 
** if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
CentOS8 Ganglia instructions (EPEL8 ganglia-3.7.2)
* enable things, start them:
 
<pre>
<pre>
firewall-cmd --get-services
/bin/rm /etc/gmond.conf
firewall-cmd --permanent --add-service=nfs
yum -y install "ganglia-gmond*"
firewall-cmd --reload
/bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf
firewall-cmd --list-all
systemctl enable gmond
systemctl enable nfs-server
systemctl restart gmond
systemctl start nfs-server
systemctl status gmond
systemctl status nfs
cd ~/git/scripts/ganglia
git pull
./ganglia-all.perl
make install
</pre>
</pre>


== Enable NFS V3 server (CentOS7) ==
== Configure TRIUMF DAQ packages ==
 
(+CentOS7)


<pre>
<pre>
ps -efw | grep rpc.mountd # should be running!
cd /etc/yum.repos.d
firewall-cmd --get-services
wget http://daq.triumf.ca/~daqweb/yum/triumf-daq.repo
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload
firewall-cmd --list-all
</pre>
</pre>


== Enable NFS V3 server ==
== Install Konstantin's packages ==
 
(+CentOS7)


* edit /etc/hosts.allow, add or uncomment "mountd: 142.90.0.0/255.255.0.0"
* create /etc/exports. example:
<pre>
<pre>
/home1  @home_export(rw,no_root_squash,async)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs
/data1  @data_export(rw,no_root_squash,async)
</pre>
</pre>
* check the netgroup file
** if using NIS: check NIS netgroup: ypcat -k netgroup
** if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
** if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
* chkconfig nfs on
* chkconfig nfslock on
* service nfs restart


Then on ladd00 need to do
== Install memtest and PXE boot ==
* ssh to root@ladd00
* edit /etc/auto.daq to add new machine...
* make -C /var/yp


== Enable NFS V4 SERVER (SL6) ==
!!!DO NOT DO THIS!!!


* if used with NIS, same as NFSv3
<pre>
* if used as standalone, need to edit idmapd.conf - set the "Domain" name to the same value on NFS server and NFS slave (default automagically determined value does not always work). More TBW.
cd /boot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.10
wget http://ladd00.triumf.ca/tftpboot/gpxe-1.0.1+-gpxe.lkrn


== Enable AMANDA backups ==
emacs -nw /boot/grub/grub.conf
title memtest86+-5.01
      root (hd0,0)
      kernel /boot/memtest86+-5.01.bin.gz
title memtest86+-4.20
      root (hd0,0)
      kernel /boot/memtest86+-4.20.bin.gz
title memtest86+-4.10
      root (hd0,0)
      kernel /boot/memtest86+-4.10
title pxeboot
      root (hd0,0)
      kernel /boot/gpxe-1.0.1+-gpxe.lkrn
</pre>


AMANDA backups are already enabled by TRIUMF kickstart installs. For non-kickstart installation, follow instructions at [[http://amanda/~amanda http://amanda/~amanda]], or look at "/triumfcs/trshare/olchansk/linux/amanda/amanda-enable.perl". As final step, use [[https://helpdesk.triumf.ca https://helpdesk.triumf.ca]] to contact TRIUMF CS to add this new machine to the amanda backup list.
== Install node monitoring ==


* yum install triumf-amanda
!!! OBSOLETE, DO NOT DO THIS !!!


== Enable AMANDA backups (CentOS7) ==
(+CentOS7)


<pre>
<pre>
yum install amanda-client
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install triumf_nodeinfo
list-unit-files | grep -i amanda
/usr/sbin/sendnodeinfo.perl --config ladd00.triumf.ca:8600
#systemctl enable amanda
emacs -nw /etc/nodeinfo
systemctl enable amanda.socket
/usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600
systemctl enable amanda-udp.socket
systemctl restart amanda.socket
systemctl restart amanda-udp.socket
firewall-cmd --get-services
firewall-cmd --permanent --add-service=amanda-client
firewall-cmd --reload
firewall-cmd --list-all
echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts
</pre>
</pre>


On amanda server, add new machine to the disklist, then:
== Install gonodeinfo node monitoring ==
 
(+Ubuntu, +CentOS7, +CentOS8)


go to https://bitbucket.org/dd1/gonodeinfo
follow instructions:
<pre>
<pre>
amcheck -c daily titan00
yum -y install golang
mkdir ~/git
cd ~/git
git clone https://bitbucket.org/dd1/gonodeinfo.git
# or git clone https://daq.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
</pre>
</pre>


== Enable DCACHE ==
* emacs -nw /etc/gonodeinfo.conf
 
* change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
DAQ dcache server is mounted as
* change "Servers" to read: Servers: daq00.triumf.ca:8601
* run gonodeinfo -e
* if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
* on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
* try gonodeinfo again, there should be no error
* on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now


/daq/pnfs/triumf.ca/data/
== Install latest system updates ==


For Centos-7 machines, you need to adjust the firewall rules in order to be able to communicate with the trdata machines; this is only necessary if you are copying data to trdata.  The firewall changes are
(+CentOS7)


<pre>
<pre>
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.212/32" port protocol="tcp" port="0-65535" accept"
yum update -y
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.107.156/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.219/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all
</pre>
</pre>


This instructions are unnecessary
== Configure TRIUMF Printers (CentOS7) ==
* # mkdir -p /pnfs
* # edit /etc/rc.local, add to the end of file: "mount -o intr,rw,noac,hard,nfsvers=3 trdata00:/pnfs /pnfs &"
* # . /etc/rc.local


For more information on, see [[TrdataDcache]] dcache page.
<pre>
systemctl stop cups
systemctl disable cups
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a
</pre>


== Configure CPU speed (CentOS7) ==
== Disable syslog spam (CentOS7) ==


In el7 the CPU frequency selection is confused. On some machines
Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this:
the default governor is "conservative", on other machines it is "powersave".


The current configuration can be seen by: "cpupower frequency-info -p"
<pre>
 
echo auditctl -e 0 >> /etc/rc.local
The actual cpu frequency can be seen by "cat /proc/cpuinfo | grep -i mhz" and by "cpupower monitor" (run them under "watch -d -n1").
echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local
/etc/rc.local
</pre>


The linux kernel documentation says "powersave" will set CPU frequency to the minimum value, forever.
== Install basic system packages (CentOS7) ==
But on some machines (i.e. daq06, daq14) it is easy to see that the CPU frequency actually changes
according to the CPU load. This is explained in the documentation for the intel_pstate" driver.


On machines where CPU frequency seems always stuck at mimimum, try this:
(if starting from minimal system, basic system packages required:)
* set the governor to "performance": cpupower frequency-set -g performance
* see if frequency now changes according to load (good) or is stuck at maximum (not so good, but ok)
* make it permanent by adding this command to /etc/rc.local - echo cpupower frequency-set -g performance >> /etc/rc.local
 
== Configure Ganglia ==
 
SL6 Ganglia instructions (EPEL6 ganglia-3.7.2)


<pre>
<pre>
/bin/rm /etc/gmond.conf
yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils sysstat iftop tcsh
yum install "*gmond*"
yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools
/bin/rm /etc/ganglia/conf.d/ganglia-triumf-daq.conf
/bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf
/bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf
/bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf
/bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf
/bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf
chkconfig gmond on
service gmond restart
</pre>
</pre>


== Configure Ganglia (Centos7) ==
== Install packages needed for QUARTUS, ROOT, EPICS and MIDAS DAQ ==


CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2)
(+CentOS7)


<pre>
yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" "libusbx-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*-g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy sympy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" --exclude golang"*"git"*" mesa"*" xerces-c"*" diffuse clang i2c-tools  texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras perl-GD"*" perl-Math"*" perl-Statistics-Basic cmake3 cmake3-gui extra-cmake-modules python2-pip  mariadb-devel glibc-devel.i686 libzstd zlib-devel.i686
/bin/rm /etc/gmond.conf
yum -y install "ganglia-gmond*"
/bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf  # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog
/bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf
/bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf
systemctl enable gmond
systemctl restart gmond
systemctl status gmond
</pre>


== Configure TRIUMF DAQ packages ==
== Install optional packages ==


(+CentOS7)
!! DO NOT DO THIS !!
 
(do not install boost on 32-bit machines)
 
yum install --skip-broken "boost-*"


<pre>
(packages for 32-bit software compilation on 64-bit machines. this is optional)
cd /etc/yum.repos.d
wget http://daq.triumf.ca/~daqweb/yum/triumf-daq.repo
</pre>


== Install Konstantin's packages ==
yum install --skip-broken giflib.i386 giflib.i686 compat-libf2c-34.i386 compat-libf2c-34.i686 mysql-devel.i686 openssl-devel.i686 unixODBC-devel.i686 libstdc++-devel.i386 libstdc++-devel.i686 "zlib-*.i686" "libXext-*.i686" "libXtst-*.i686" glibc-static.i686 freetype.i686 fontconfig.i686 libpng.i686 libXrender.i686 glibc-devel.i686 libX11-devel.i686 libXpm-devel.i686 libXft-devel.i686 mysql-devel.i686 dcap-devel.i686 gsl-devel.i686 pcre-devel.i686 fontconfig-devel.i686 freetype-devel.i686 libpng-devel.i686 libjpeg-devel.i686 libgfortran.i686 libxml2-devel.i686 gd-devel.i686 readline-devel.i686 ncurses-devel.i686 libXdmcp.i686 readline-static.i686 compat-readline5.i686


(+CentOS7)
yum install boost-devel.i686


<pre>
(separately install these packages - they collide with the big bunch above)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs "ganglia-*" triumf_nodeinfo
</pre>


== Install memtest and PXE boot ==
yum install rdesktop


<pre>
yum reinstall urw-fonts
cd /boot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.10
wget http://ladd00.triumf.ca/tftpboot/gpxe-1.0.1+-gpxe.lkrn


emacs -nw /boot/grub/grub.conf
== Install libraries for PHYSICA (CentOS7) ==
title memtest86+-5.01
 
      root (hd0,0)
To run physica built on el6 from git sources on el7, do this:
      kernel /boot/memtest86+-5.01.bin.gz
title memtest86+-4.20
      root (hd0,0)
      kernel /boot/memtest86+-4.20.bin.gz
title memtest86+-4.10
      root (hd0,0)
      kernel /boot/memtest86+-4.10
title pxeboot
      root (hd0,0)
      kernel /boot/gpxe-1.0.1+-gpxe.lkrn
</pre>


== Install node monitoring ==
(building physica on el7 is nort supported at this time)


(+CentOS7)
(see more http://www.triumf.info/wiki/DAQwiki/index.php/PHYSICA)


<pre>
<pre>
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install triumf_nodeinfo
yum -y install libX11.i686 gd.i686 libpng12.i686 readline.i686 compat-libf2c-34.i686
/usr/sbin/sendnodeinfo.perl --config ladd00.triumf.ca:8600
emacs -nw /etc/nodeinfo
/usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600
</pre>
</pre>


== Install gonodeinfo node monitoring ==
== Install additional desktop environements (CentOS7) ==


(+Ubuntu, +CentOS7)
go to https://bitbucket.org/dd1/gonodeinfo
follow instructions:
<pre>
<pre>
yum -y install golang
# LXQT (from EPEL)
mkdir ~/git
# NOT COMPATIBLE WITH el7.7 # yum -y install "lxqt*"
cd ~/git
# Cinnamon desktop (from EPEL)
git clone https://bitbucket.org/dd1/gonodeinfo.git
yum -y install cinnamon
cd gonodeinfo
# KDE5 not available yet
git pull
# MATE (from epel)
make
yum -y groupinstall "MATE Desktop"
make install # install gonodeinfo agent
yum -y install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils
cd ~ # this is important
yum -y erase ModemManager abrt abrt-libs abrt-gui-libs
# XFCE4 (from EPEL)
yum -y groupinstall xfce
yum -y install "xfce*plugin" xfce4-about --exclude xfce4-hamster-plugin
yum -y erase bash-completion
</pre>
</pre>


* edit /etc/gonodeinfo.conf
* make the MATE desktop as default
* change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
* change "Servers" to read: Servers: ladd00.triumf.ca:8601
* run gonodeinfo
* if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
* on the gonodeinfo server: run gonodereceive -a daq13
* try gonodeinfo again, there should be no error
* on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now
 
== Install latest system updates ==
 
(+CentOS7)


<pre>
<pre>
yum update -y
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
</pre>
</pre>


== Configure TRIUMF Printers ==
* lighdm login manager (from EPEL)
 
<pre>
<pre>
chkconfig cups off
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5
service cups stop
yum install triumf-printers
</pre>
</pre>


== Configure TRIUMF Printers (CentOS7) ==
* and switch from gdm to lighdm
 
<pre>
<pre>
systemctl stop cups
systemctl disable gdm.service
systemctl disable cups
systemctl enable lightdm.service
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
(systemctl stop gdm; systemctl restart lightdm) &
lpstat -a
</pre>
</pre>


== Disable syslog spam (CentOS7) ==
== Install SMART scripts ==


Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this:
(+CentOS7)


<pre>
<pre>
echo auditctl -e 0 >> /etc/rc.local
ln -sf ~/git/scripts/smart-status/smart-status.perl ~/
echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local
/etc/rc.local
</pre>
</pre>


== Install basic system packages (CentOS7) ==
== Install NTFS drivers ==


(if starting from minimal system, basic system packages required:)
yum install ntfs-3g ntfsprogs (from EPEL)


yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils sysstat iftop tcsh
== Install HFS and HFS+ drivers (CentOS7) ==


yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools
yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus


== Install packages needed for QUARTUS, ROOT, EPICS and MIDAS DAQ ==
== Install Google Chrome web browser (64-bit CentOS7) ==


(+CentOS7)
DOES NOT WORK AS OF google-chrome-stable-114 because google uses signature incompatible with CentOS-7, see https://www.reddit.com/r/chrome/comments/13s799o/googlechromebeta_1140573545_rpm_invalid_signature/


yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*-g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy sympy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" --exclude golang"*"git"*" mesa"*" xerces-c"*" diffuse clang i2c-tools  texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras perl-GD"*" perl-Math"*" perl-Statistics-Basic cmake3 cmake3-gui extra-cmake-modules python2-pip x2go"*"
automatic updates will fail with signature check error, to defeat it lock old version of google-chrome:
<pre>
yum versionlock google-chrome-stable
</pre>


(do not install boost on 32-bit machines)
THIS DOES NOT WORK ANYMORE:


yum install --skip-broken "boost-*"
<pre>
/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/
yum install google-chrome-stable
</pre>


(packages for 32-bit software compilation on 64-bit machines. this is optional)
== Enable monitoring of HTTPS certificates ==


yum install --skip-broken giflib.i386 giflib.i686 compat-libf2c-34.i386 compat-libf2c-34.i686 mysql-devel.i686 openssl-devel.i686 unixODBC-devel.i686 libstdc++-devel.i386 libstdc++-devel.i686 "zlib-*.i686" "libXext-*.i686" "libXtst-*.i686" glibc-static.i686 freetype.i686 fontconfig.i686 libpng.i686 libXrender.i686 glibc-devel.i686 libX11-devel.i686 libXpm-devel.i686 libXft-devel.i686 mysql-devel.i686 dcap-devel.i686 gsl-devel.i686 pcre-devel.i686 fontconfig-devel.i686 freetype-devel.i686 libpng-devel.i686 libjpeg-devel.i686 libgfortran.i686 libxml2-devel.i686 gd-devel.i686 readline-devel.i686 ncurses-devel.i686 libXdmcp.i686 readline-static.i686 compat-readline5.i686
On SL6, CentOS7:


yum install boost-devel.i686
<pre>
yum install crypto-utils
/etc/cron.daily/certwatch
strace -f /etc/cron.daily/certwatch  |& grep open  | grep crt
</pre>


(separately install these packages - they collide with the big bunch above)
== Enable 100dpi fonts for EPICS ==


yum install rdesktop
(+CentOS7)


yum reinstall urw-fonts
<pre>
ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/
</pre>


== Install libraries for PHYSICA (CentOS7) ==
== Enable crontab @reboot for MIDAS (CentOS7) ==


To run physica built on el6 from git sources on el7, do this:
el7 has a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory
is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be
started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).


(building physica on el7 is nort supported at this time)
<pre>
mkdir /etc/systemd/system/crond.service.d
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/crond.service.d/local.conf
systemctl daemon-reload
systemctl cat crond.service
</pre>


(see more http://www.triumf.info/wiki/DAQwiki/index.php/PHYSICA)
el7 has a second bug, sometimes it thinks the network is running when it is not, specifically,
DNS is not working and autofs mount of user home directory fails. So not only cron has
to wait for ypbind and autofs to be ready, we also have to wait for DNS to be ready:


<pre>
<pre>
yum -y install libX11.i686 gd.i686 libpng12.i686 readline.i686 compat-libf2c-34.i686
cd ~/git/scripts
git pull
cp etc/wait-for-dns.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable wait-for-dns
systemctl restart wait-for-dns # should return immediately. if there is a 30 second time, script is broken, disable it
systemctl status wait-for-dns # to see what went wrong.
</pre>
</pre>


== Install additional desktop environements (CentOS7) ==
Explore the systemd dependacy tree using "systemctl list-dependencies" maybe with "--all".
 
Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.


<pre>
== Enable firewall for MIDAS (CentOS7) ==
# LXQT (from EPEL)
 
yum -y install "lxqt*"
Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports).
# Cinnamon desktop (from EPEL)
yum -y install cinnamon
# KDE5 not available yet
# MATE (from epel)
yum -y groupinstall "MATE Desktop"
yum -y install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils
yum -y erase ModemManager abrt abrt-libs abrt-gui-libs
# XFCE4 (from EPEL)
yum -y groupinstall xfce
yum -y install "xfce*plugin" xfce4-about --exclude xfce4-hamster-plugin
yum -y erase bash-completion
</pre>


* make the MATE desktop as default
To enable access to mhttpd:


<pre>
<pre>
cd ~root/git/scripts/
firewall-cmd --add-port=8443/tcp --permanent
git pull
firewall-cmd --reload
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
firewall-cmd --list-all
</pre>
</pre>


* lighdm login manager (from EPEL)
To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host)
 
<pre>
<pre>
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all
</pre>
</pre>


* and switch from gdm to lighdm
To enable access from the private network (replace "192.168.1.0" with your private network number):
 
<pre>
<pre>
systemctl disable gdm.service
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="0-65535" accept"
systemctl enable lightdm.service
firewall-cmd --reload
(systemctl stop gdm; systemctl restart lightdm) &
firewall-cmd --list-all
</pre>
</pre>


== Make installation smaller (optional) ==
== Enable firewall for EPICS (CentOS7) ==


This is optional. Only do this if reducing the size of the OS image is very important.
To enable access to TRIUMF EPICS servers, do this:


<pre>
<pre>
yum erase "texlive*" "java*" "boost*"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.132.0/23" accept"
yum erase "xemacs*"
firewall-cmd --reload
yum erase "libstdc++-docs"
firewall-cmd --list-all
</pre>
</pre>


== Install SMART scripts ==
For UCN the controls people seem to have EPICS setup on a different server; this might be true for CMMS as well.  In this case the firewall rule change should be
 
(+CentOS7)


<pre>
<pre>
ln -sf ~/git/scripts/smart-status/smart-status.perl ~/
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.139.0/23" accept"
firewall-cmd --reload
firewall-cmd --list-all
</pre>
</pre>


== Install NTFS drivers ==
== Disable gdm and X11 (OPTIONAL) ==


yum install ntfs-3g ntfsprogs (from EPEL)
<pre>
 
initctl stop prefdm
== Install HFS and HFS+ drivers (CentOS7) ==
echo "start on never" > /etc/init/prefdm.override
 
echo "start on never" > /etc/init/splash-manager.override
yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus
initctl reload-configuration
 
== Install Google Chrome web browser (64-bit SL6) ==
 
Google-chrome 27 is too old to using with recent MIDAS but it has working Flash:
 
<pre>
rpm -vh --install https://daqshare.triumf.ca/~olchansk/google-chrome/google-chrome-stable-27.0.1453.110-202711.x86_64.rpm
/bin/rm /etc/cron.daily/google-chrome
yum-config-manager --disable google-chrome
yum-config-manager --disable google-chrome-64
google-chrome
</pre>
</pre>


Chromium 38 works with current MIDAS. No Flash, no PDF viewer:
then enable login on default console:
 
<pre>
<pre>
yum install -y policycoreutils-python
echo "plymouth quit" >> /etc/rc.local
rpm -vh --install https://daqshare.triumf.ca/~olchansk/google-chrome/chromium-browser-38.0.2125.111-1.el6.centos.x86_64.rpm
echo "X_TTY=xxx/dev/tty1" >> /etc/sysconfig/init
chromium-browser
</pre>
</pre>


== Install Google Chrome web browser (64-bit CentOS7) ==
== Install JAVAWS (OPTIONAL) ==


<pre>
* to run Java "web start" jnlp files (EVO, SEEVOGH, etc): javaws Downloads/spider.jnlp
/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/
* install javaws:
yum install google-chrome-stable
* yum install icedtea-web icedtea-web-javadoc
</pre>


== Enable monitoring of HTTPS certificates ==
== Install firefox java plugin (OPTIONAL, DO NOT DO THIS) ==
 
This installs the Oracle Java plugin:
 
* rpm -vh --install ~deap/jdk-7u15-linux-x64.rpm
* ls -l /usr/lib64/mozilla/plugins/
* ln -s /usr/java/jdk1.7.0_15/jre/lib/amd64/libnpjp2.so /usr/lib64/mozilla/plugins/
* start firefox, go edit->preferences->general->manage add-ons->plugins
* "java plugin 1.7.0_15" should be listed


On SL6, CentOS7:


<pre>
yum install crypto-utils
/etc/cron.daily/certwatch
strace -f /etc/cron.daily/certwatch  |& grep open  | grep crt
</pre>


== Enable 100dpi fonts for EPICS ==
== Configure USB device permissions ==


(+CentOS7)
(+CentOS7)
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
* create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:


<pre>
<pre>
ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
</pre>
</pre>


== Enable crontab @reboot for MIDAS (CentOS7) ==
* reload udev rules: udevadm control --reload-rules
* apply new permissions: udevadm trigger --action=add
* watch udev activity: udevadm monitor -p


el7 has a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory
== Disable modem-manager ==
is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be
 
started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).
The modem-manager will try to talk to any serial devices attached to USB serial ports. It assumes that those devices are modems and will send out modem-specific commands. if the devices are not modems and do not understand or do not like modem commands, well that's too bad. modem-manager is installed by the ModemManager package required by the NetworkManager package, and there is no configuration setting to turn modem-manager off.


<pre>
One way to disable it is: chmod a= /usr/sbin/modem-manager
mkdir /etc/systemd/system/crond.service.d
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/crond.service.d/local.conf
systemctl daemon-reload
systemctl cat crond.service
</pre>


Explore the systemd dependacy tree using "systemctl list-dependencies" maybe with "--all".
Another way to disable it is by forced uninstall: rpm --erase --nodeps ModemManager


Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.
Remember to kill the running copy: killall -KILL modem-manager


== Enable firewall for MIDAS (CentOS7) ==
Caveat: it is not clear if modem-manager would not be resurrected by an update to the NetworkManager or ModemManager packages.


Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports).
== Configure Altera jtagd ==


To enable access to mhttpd:
(if needed)


<pre>
<pre>
firewall-cmd --add-port=8443/tcp --permanent
mkdir /etc/jtagd
firewall-cmd --reload
echo 'Password = "123";' > /etc/jtagd/jtagd.conf
firewall-cmd --list-all
cp -pv  /daq/daqshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts
</pre>
</pre>


To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host)
* start local jtagd: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagd
* test local connection: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagconfig
* test remote connection (add this machine to your .jtag.conf, run jtagconfig


<pre>
For more information, go to [[Quartus]]
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all
</pre>


== Enable firewall for EPICS (CentOS7) ==
== Install EOS ==


To enable access to TRIUMF EPICS servers, do this:
Instructions from here:
http://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html


<pre>
<pre>
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.132.0/23" accept"
rpm -vh --install https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/citrine/tag/el-7/x86_64/eos-repo-el7-generic-1.noarch.rpm
firewall-cmd --reload
yum-config-manager --disable eos-citrine # disable auto-update because all packages are not signed
firewall-cmd --list-all
yum-config-manager --disable eos-dep # disable auto-update because all packages are not signed.
yum install eos-client eos-fuse --enablerepo=eos-citrine
</pre>
</pre>


For UCN the controls people seem to have EPICS setup on a different server; this might be true for CMMS as well.  In this case the firewall rule change should be
== Install fix for the el7 systemd dbus boot hang ==


<pre>
Around early Summer 2018 el7 started showing a boot problem. In the nutshell,
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.139.0/23" accept"
there is a problem with the dbus connection between dbus and systemd that
firewall-cmd --reload
prevents polkit, firewalld, etc from starting. The system eventually boots
firewall-cmd --list-all
enough that one can ssh into it, but most things do not work. Notably,
</pre>
polkit is not running, firewalld is not running, ssh login takes about 15-30 second.


== Disable gdm and X11 (OPTIONAL) ==
Solution is to add a special systemd service to check that dbus started correctly.
It that runs after dbus is started, but before it is used, and it restarts dbus in a loop
with a delay until dbus starts correctly. In testing, dbus always starts correctly after
the first retry.


<pre>
<pre>
initctl stop prefdm
cd ~root/git/scripts/etc
echo "start on never" > /etc/init/prefdm.override
git pull
echo "start on never" > /etc/init/splash-manager.override
/bin/cp -vf systemd-check-dbus.perl /usr/bin/
initctl reload-configuration
/bin/cp -vf systemd-check-dbus.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable systemd-check-dbus
systemctl start systemd-check-dbus
systemctl status systemd-check-dbus
</pre>
</pre>


then enable login on default console:
After linux boots, if everything was okey, the script will report this:
<pre>
<pre>
echo "plymouth quit" >> /etc/rc.local
[root@iris01 ~]# systemctl status systemd-check-dbus
echo "X_TTY=xxx/dev/tty1" >> /etc/sysconfig/init
...
</pre>
Feb 08 17:15:49 iris01.triumf.ca systemd[1]: Starting Check that systemd is registered with dbus...
 
Feb 08 17:15:49 iris01.triumf.ca sh[4283]: Starting check for systemd dbus connection
== Install JAVAWS (OPTIONAL) ==
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:      string "org.freedesktop.DBus"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:      string "org.freedesktop.systemd1"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: systemd1 dbus service exists, success!
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: Finished check for systemd dbus connection
Feb 08 17:15:50 iris01.triumf.ca systemd[1]: Started Check that systemd is registered with dbus.
</pre>


* to run Java "web start" jnlp files (EVO, SEEVOGH, etc): javaws Downloads/spider.jnlp
If the boot problem happened, the script will report about restarting dbus.
* install javaws:
* yum install icedtea-web icedtea-web-javadoc


== Install firefox java plugin (OPTIONAL, DO NOT DO THIS) ==
Note: the systemd service file adjusts the start order of other services, this adjustment seems to reduce the probability of the problem.


This installs the Oracle Java plugin:
== Configure GRUB boot loader (CentOS7, CentOS8) ==


* rpm -vh --install ~deap/jdk-7u15-linux-x64.rpm
* emacs -nw /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX
* ls -l /usr/lib64/mozilla/plugins/
* grub2-mkconfig -o /boot/grub2/grub.cfg
* ln -s /usr/java/jdk1.7.0_15/jre/lib/amd64/libnpjp2.so /usr/lib64/mozilla/plugins/
* grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
* start firefox, go edit->preferences->general->manage add-ons->plugins
* grub2-editenv list # show contents of boot environement file
* "java plugin 1.7.0_15" should be listed
* /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file


== Install memtest86+ (CentOS7, CentOS8) ==


<pre>
yum -y install memtest86+
/bin/cp -vf /usr/share/memtest86+/20_memtest86+ /etc/grub.d/
/bin/chmod a+x /etc/grub.d/20_memtest86+
grub2-mkconfig -o /boot/grub2/grub.cfg
</pre>


== Configure USB device permissions ==
== Disable ELREPO ==


(+CentOS7)
<pre>
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo_triumf.repo
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo
</pre>
 
== Reduce install size (optional) ==


Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
This is optional. Only do this if reducing the size of the OS image is very important.


* create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
Do this for VME processors.


<pre>
<pre>
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
yum erase "texlive*" "java*" "boost*" libreoffice"*"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
#yum erase "xemacs*"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
yum erase "libstdc++-docs"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
yum erase firefox google-chrome"*"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
yum clean all
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
</pre>
</pre>


* apply new permissions: udevadm trigger --action=add
<pre>
/bin/rm -rf /usr/share/help
/bin/rm -rf /usr/share/doc
</pre>


== Disable modem-manager ==
== Update from el7.6 to el7.7 ==


The modem-manager will try to talk to any serial devices attached to USB serial ports. It assumes that those devices are modems and will send out modem-specific commands. if the devices are not modems and do not understand or do not like modem commands, well that's too bad. modem-manager is installed by the ModemManager package required by the NetworkManager package, and there is no configuration setting to turn modem-manager off.
<pre>
yum-config-manager --disable zfs
yum-config-manager --disable zfs-kmod
yum-config-manager --disable zfs-testing-kmod
yum versionlock delete zfs
yum versionlock delete kernel
yum -y update "yum*" "rpm*"
yum -y erase libqtxdg lxqt-qtplugin ### LXQT is not compatible
yum update
after rebooting into el7.7, follow instructions for updating ZFS from version 0.7 to 0.8.
</pre>


One way to disable it is: chmod a= /usr/sbin/modem-manager
== Update ZFS ==


Another way to disable it is by forced uninstall: rpm --erase --nodeps ModemManager
* CentOS-7: 0.8.5 to 2.0.7
** update kernel to latest version, reboot
** check /etc/yum.repos.d/zfs.repo has [zfs-kmod] baseurl=http://download.zfsonlinux.org/epel/7.9/kmod/$basearch/
** yum --enablerepo=zfs-kmod update
** reboot, login as root
** run "zfs version"
** run "zfs upgrade"


Remember to kill the running copy: killall -KILL modem-manager
== Switch from LADD-NIS to DAQ-NIS ==


Caveat: it is not clear if modem-manager would not be resurrected by an update to the NetworkManager or ModemManager packages.
<pre>
domainname DAQ-NIS
/usr/lib64/yp/ypinit -s daq00
ls -l /var/yp
sed -i s/LADD-NIS/DAQ-NIS/ /etc/yp.conf
sed -i s/LADD-NIS/DAQ-NIS/ /etc/sysconfig/network
systemctl restart ypserv
systemctl restart ypbind
ypwhich
ypwhich -m
</pre>


== Configure Altera jtagd ==
== Finish installation ==


(if needed)
reboot


<pre>
== Special hardware settings ==
mkdir /etc/jtagd
echo 'Password = "123";' > /etc/jtagd/jtagd.conf
cp -pv /triumfcs/trshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts
</pre>


* start local jtagd: /triumfcs/trshare/olchansk/altera/11.0/quartus/bin/jtagd
=== ASUS Crosshair mobo ===
* test local connection: /triumfcs/trshare/olchansk/altera/11.0/quartus/bin/jtagconfig
* test remote connection (add this machine to your .jtag.conf, run jtagconfig


For more information, go to [[Quartus]]
* use BIOS version 1207 or newer
* (before CentOS7) sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
* CentOS7: installs correct drivers automatically


== Install EOS ==
=== ASUS Crosshair-II mobo ===


Instructions from here:
* use BIOS version 2607 or newer
http://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html
* for the onboard IDE to work, add "all-generic-ide" to kernel boot options in grub.conf
* sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors


<pre>
=== ASUS P7P55D EVO mobo ===
rpm -vh --install https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/citrine/tag/el-7/x86_64/eos-repo-el7-generic-1.noarch.rpm
yum-config-manager --disable eos-citrine # disable auto-update because all packages are not signed
yum-config-manager --disable eos-dep # disable auto-update because all packages are not signed.
yum install eos-client eos-fuse --enablerepo=eos-citrine
</pre>


== Install fix for the el7 systemd dbus boot hang ==
* use BIOS version 2004 or newer
* SL6 - install special driver for on board PCIe GigE network port and disable on board PCI GigE network port:
** yum --enablerepo elrepo install kmod-r8168 kmod-r8169
** # do not do this: sed 's/^blacklist/#blacklist/' -i /etc/modprobe.d/blacklist-r8169.conf
** reboot
** verify that correct drivers are loaded: ethtool -i eth0; ethtool -i eth1
** note: there will be no eth1 - r8169 driver is disabled.


Around early Summer 2018 el7 started showing a boot problem. In the nutshell,
=== ASUS P6X58-E-WS mobo ===
there is a problem with the dbus connection between dbus and systemd that
prevents polkit, firewalld, etc from starting. The system eventually boots
enough that one can ssh into it, but most things do not work. Notably,
polkit is not running, firewalld is not running, ssh login takes about 15-30 second.


Solution is to add a special systemd service to check that dbus started correctly.
* BIOS settings
It that runs after dbus is started, but before it is used, and it restarts dbus in a loop
** F1 or DEL to enter BIOS setup, F8 boot menu
with a delay until dbus starts correctly. In testing, dbus always starts correctly after
** go to POWER->HW mon, confirm CPU temperature is around 30C. (heatsink is installed correctly. Bad heatsink temperature quickly goes up to 50-70C).
the first retry.
** Main menu: Storage config - SATA change IDE->AHCI
** System information: confirm BIOS version 301, CPU type, memory size
** AI Tweak: set DRAM frequency - AUTO->DDR3-1333
** Advanced->Onboard devices: LAN BOOT: enabled
** Power->HW monitor: CPU Q-FAN: enabled
** Boot->Settings: Quick boot: enabled; Full screen logo: disabled; Wait for F1: disabled
** Save and exit
 
=== ASUS E35M1-M PRO mobo ===


* http://www.asus.com/Motherboards/E35M1M_PRO/#specifications
* use BIOS version 1002 or newer
* for CPU temperature: install kmod-k10temp from ELREPO (kmod-k10temp-0.0-4.el6.elrepo.x86_64.rpm)
* for Sensors: yum --enablerepo elrepo install kmod-w83627ehf; modprobe w83627ehf; sensors
* for Graphics: yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
* to enable booting from USB3, edit /etc/dracut.conf, change line "add_drivers" to read: add_drivers+="xhci-hcd"
* to use multiple monitors, run "aticonfig --initial --heads=2 --adapter=1 --xinerama=on", to change screen layout, edit /etc/X11/xorg.conf. Only dual monitors DVI+HDMI seem to work. Tripple monitors does not seem to work.
Sensors instructions below are obolete (use driver from ELREPO)
* for Sensors, install driver for NCT6776F chip from https://github.com/groeck/w83627ehf/archives/master (in the Makefile, change the line "KERNEL_BUILD=" to read: "KERNEL_BUILD:=/usr/src/kernels/$(TARGET)"):
<pre>
<pre>
cd ~root/git/scripts/etc
cd ~root
git pull
wget http://ladd00.triumf.ca/~olchansk/linux/groeck-w83627ehf-dd3e543/w83627ehf.ko
/bin/cp -vf systemd-check-dbus.perl /usr/bin/
echo "modprobe hwmon; modprobe hwmon-vid; modprobe k10temp; rmmod w83627ehf; insmod /root/w83627ehf.ko" >> /etc/rc.local
/bin/cp -vf systemd-check-dbus.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable systemd-check-dbus
systemctl start systemd-check-dbus
systemctl status systemd-check-dbus
</pre>
</pre>


After linux boots, if everything was okey, the script will report this:
=== ASUS E45M1-M PRO mobo ===
<pre>
[root@iris01 ~]# systemctl status systemd-check-dbus
...
Feb 08 17:15:49 iris01.triumf.ca systemd[1]: Starting Check that systemd is registered with dbus...
Feb 08 17:15:49 iris01.triumf.ca sh[4283]: Starting check for systemd dbus connection
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:      string "org.freedesktop.DBus"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:      string "org.freedesktop.systemd1"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: systemd1 dbus service exists, success!
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: Finished check for systemd dbus connection
Feb 08 17:15:50 iris01.triumf.ca systemd[1]: Started Check that systemd is registered with dbus.
</pre>


If the boot problem happened, the script will report about restarting dbus.
* https://www.asus.com/Motherboards/E45M1M_PRO/#specifications
* use BIOS 1202 or newer
* follow the E35M1-M PRO instructions above


Note: the systemd service file adjusts the start order of other services, this adjustment seems to reduce the probability of the problem.
=== ASUS P9X79 WS ===


== Configure GRUB boot loader ==
* http://www.asus.com/Motherboard/P9X79_WS/
* use BIOS version 4901. Older versions seem to be ok: 3101, 3401, 4701, 4802 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS.
* (not needed for CentOS7) for CPU temperature, install coretemp
* (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above.
* BIOS Settings:
** enter "Advanced mode"
** Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default
** ### NOT THIS: Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings
** Monitor -> disable Q-fan on for all fans - let all fans always run at maximum RPMs
** Boot -> Full screen logo -> Set to "disabled"
** Wait for F1 -> Set to "disabled"


* edit /boot/grub/grub.conf, remove the "quiet" and "rhgb" options
=== ASUS P8B-M ===
* edit /boot/grub/grub.conf, comment out (with "#") the "splashimage=" line
 
* check that GRUB boot loader is installed on all system disks:
* use BIOS version 6103 or newer
** dd if=/dev/sda bs=1 count=1024 2>&1 | strings | grep GRUB
* for CPU temperature, install coretemp
** dd if=/dev/sdb bs=1 count=1024 2>&1 | strings | grep GRUB
* for sensors, install driver for NCT6776F chip same as E35M1-M above.
* if GRUB is not installed, (i.e. on the 2nd disk of machines with mirrored system disk), (but check that /dev/sdb is the right disk):
<pre>
# grub
grub&gt; device (hd0) /dev/sdb
grub&gt; root (hd0,0)
grub&gt; setup (hd0)
</pre>


== Configure GRUB boot loader (CENTOS7) ==
=== SUPERMICRO X9SCL ===


* edit /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX
* yum install kmod-w83627ehf.x86_64 coretemp
* grub2-mkconfig -o /boot/grub2/grub.cfg
* xemacs -nw /etc/rc.local, add:
* grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
<pre>
* grub2-editenv list # show contents of boot environement file
modprobe coretemp
* /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file
modprobe w83627ehf
</pre>


== Install memtest86+ (CentOS7) ==
=== ASUS Z87-WS ===


<pre>
<pre>
yum -y install memtest86+
cd ~root
/bin/cp -vf /usr/share/memtest86+/20_memtest86+ /etc/grub.d/
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko
/bin/chmod a+x /etc/grub.d/20_memtest86+
echo modprobe hwmon-vid >> /etc/rc.local
grub2-mkconfig -o /boot/grub2/grub.cfg
echo insmod /root/nct6775.ko >> /etc/rc.local
/etc/rc.local
sensors
</pre>
</pre>


== Configure GRUB boot loader (CentOS7) ==
=== ASUS Z97-WS ===


DO NOT DO ANY OF THIS.
the nct6775 driver does not work because of conflict with ACPI.


* (maybe) grub2-install /dev/sda
=== ASUS Z170-DELUXE ===
* check that GRUB boot loader is installed on all system disks:
** dd if=/dev/sda bs=1 count=1024 2>&1 | strings | grep GRUB
** dd if=/dev/sdb bs=1 count=1024 2>&1 | strings | grep GRUB
* if GRUB is not installed, (--- unfinished)


== Disable ELREPO ==
* use bios 3801
* set XMP mode (DDR4-2400)
* Advanced->On board devices: set sata mode to "M2", set PCIe slot 3 to "x4"
* boot: disable f1, disable logo, disable numlock


<pre>
=== ASUS AM1M-A ===
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo_triumf.repo
 
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo
* use BIOS 602 or later
* SL6.5 installer cannot use USB2 ports and the network. Use USB3 ports (blue colour) to boot USB installer (memtest, rescue, etc)
* SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey)
* install ATI/AMD video drivers from ELREPO (see below)
* sensors chip is ITE IT8623E, for SL6, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures):
<pre>
cd ~root
wget http://ladd00.triumf.ca/~olchansk/linux/it87.ko
echo modprobe hwmon_vid >> /etc/rc.local
echo insmod /root/it87.ko >> /etc/rc.local
. /etc/rc.local
</pre>
</pre>
* for el7 use it87.ko driver:
<pre>
cd ~root
wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/it87.ko
echo modprobe hwmon_vid >> /etc/rc.local
echo insmod /root/it87.ko >> /etc/rc.local
. /etc/rc.local
</pre>
* sensors output:
<pre>
[root@midemma02 ~]# sensors
radeon-pci-0008
Adapter: PCI adapter
temp1:        +22.0°C  (crit = +120.0°C, hyst = +90.0°C)


== Special hardware settings ==
fam15h_power-pci-00c4
Adapter: PCI adapter
power1:          N/A  (crit = 25.00 W)


=== ASUS Crosshair mobo ===
k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +22.2°C  (high = +70.0°C)
                      (crit = +70.0°C, hyst = +69.0°C)


* use BIOS version 1207 or newer
it8603-isa-0290
* (before CentOS7) sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
Adapter: ISA adapter
* CentOS7: installs correct drivers automatically
in0:          +0.96 V  (min =  +2.50 V, max =  +2.95 V)  ALARM
in1:          +2.23 V  (min =  +0.94 V, max =  +1.22 V)  ALARM
in2:          +2.03 V  (min =  +0.74 V, max =  +0.77 V)  ALARM
in3:          +2.00 V  (min =  +1.26 V, max =  +0.13 V) ALARM
in4:         +2.23 V  (min =  +2.95 V, max =  +2.15 V)  ALARM
3VSB:        +3.36 V  (min =  +6.00 V, max =  +2.50 V)  ALARM
Vbat:        +3.22 V 
+3.3V:        +3.36 V 
fan1:        611 RPM  (min =  200 RPM)
fan2:        707 RPM  (min =  600 RPM)  ALARM
temp1:        +38.0°C  (low  = +122.0°C, high = +122.0°C)  sensor = thermistor
temp2:        +22.0°C  (low  = +119.0°C, high = -35.0°C)  ALARM  sensor = thermistor
temp3:      -128.0°C  (low  = +16.0°C, high = +93.0°C)  sensor = thermistor
intrusion0: ALARM


=== ASUS Crosshair-II mobo ===
[root@midemma02 ~]#
</pre>
* AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together)


* use BIOS version 2607 or newer
=== Intel SE7230NH1 ===
* for the onboard IDE to work, add "all-generic-ide" to kernel boot options in grub.conf
 
* sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
* front panel header connector pinout is like this:
<pre>
PWR LED | 1  2|
        | 3  4|
PWR LED | 5  6|
HDD LED | 7  8|
HDD LED | 9 10|
PWR SW  |11 12| NIC1 LED
PWR SW  |13 14| NIC1 LED
RST SW  |15 16|
RST SW  |17 18|
        |19 20|
NMI SW  |21 22| NIC2 LED
NMI SW  |23 24| NIC2 LED
...    |... |
        |33 34|
</pre>


=== ASUS P7P55D EVO mobo ===
=== ASUS H110M-A/M.2 ===


* use BIOS version 2004 or newer
* use BIOS 2003 or later
* SL6 - install special driver for on board PCIe GigE network port and disable on board PCI GigE network port:
* dmidecode | grep -i nct reports: Nuvoton NCT5539D
** yum --enablerepo elrepo install kmod-r8168 kmod-r8169
* sensors chip is "NCT6793D or compatible chip", for el7, use this driver:
** # do not do this: sed 's/^blacklist/#blacklist/' -i /etc/modprobe.d/blacklist-r8169.conf
<pre>
** reboot
cd ~root
** verify that correct drivers are loaded: ethtool -i eth0; ethtool -i eth1
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko
** note: there will be no eth1 - r8169 driver is disabled.
echo modprobe hwmon-vid >> /etc/rc.local
echo insmod /root/nct6775.ko >> /etc/rc.local
/etc/rc.local
sensors
</pre>


=== ASUS P6X58-E-WS mobo ===
* sensors output:
<pre>
[root@daq03 ~]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)


* BIOS settings
nct6793-isa-0290
** F1 or DEL to enter BIOS setup, F8 boot menu
Adapter: ISA adapter
** go to POWER->HW mon, confirm CPU temperature is around 30C. (heatsink is installed correctly. Bad heatsink temperature quickly goes up to 50-70C).
in0:                      +0.34 V  (min =  +0.00 V, max =  +1.74 V)
** Main menu: Storage config - SATA change IDE->AHCI
in1:                      +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
** System information: confirm BIOS version 301, CPU type, memory size
in2:                      +3.39 V  (min =  +0.00 V, max =  +0.00 V) ALARM
** AI Tweak: set DRAM frequency - AUTO->DDR3-1333
in3:                      +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
** Advanced->Onboard devices: LAN BOOT: enabled
in4:                       +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
** Power->HW monitor: CPU Q-FAN: enabled
in5:                       +0.15 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
** Boot->Settings: Quick boot: enabled; Full screen logo: disabled; Wait for F1: disabled
in6:                       +0.97 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
** Save and exit
in7:                       +3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in8:                       +3.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
=== ASUS E35M1-M PRO mobo ===
in9:                       +1.00 V  (min = +0.00 V, max = +0.00 V)  ALARM
 
in10:                      +0.14 V  (min = +0.00 V, max = +0.00 V)  ALARM
* http://www.asus.com/Motherboards/E35M1M_PRO/#specifications
in11:                      +0.12 V  (min = +0.00 V, max = +0.00 V)  ALARM
* use BIOS version 1002 or newer
in12:                     +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* for CPU temperature: install kmod-k10temp from ELREPO (kmod-k10temp-0.0-4.el6.elrepo.x86_64.rpm)
in13:                     +0.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* for Sensors: yum --enablerepo elrepo install kmod-w83627ehf; modprobe w83627ehf; sensors
in14:                      +0.13 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* for Graphics: yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
fan1:                    1041 RPM  (min =    0 RPM)
* to enable booting from USB3, edit /etc/dracut.conf, change line "add_drivers" to read: add_drivers+="xhci-hcd"
fan2:                    1020 RPM  (min =    0 RPM)
* to use multiple monitors, run "aticonfig --initial --heads=2 --adapter=1 --xinerama=on", to change screen layout, edit /etc/X11/xorg.conf. Only dual monitors DVI+HDMI seem to work. Tripple monitors does not seem to work.
fan5:                       0 RPM  (min =    0 RPM)
fan6:                       0 RPM
SYSTIN:                  +119.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +26.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +27.5°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +111.0°C    sensor = thermistor
PECI Agent 0:              +28.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +25.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


Sensors instructions below are obolete (use driver from ELREPO)
coretemp-isa-0000
* for Sensors, install driver for NCT6776F chip from https://github.com/groeck/w83627ehf/archives/master (in the Makefile, change the line "KERNEL_BUILD=" to read: "KERNEL_BUILD:=/usr/src/kernels/$(TARGET)"):
Adapter: ISA adapter
<pre>
Physical id 0:  +31.0°C  (high = +80.0°C, crit = +100.0°C)
cd ~root
Core 0:         +31.0°C  (high = +80.0°C, crit = +100.0°C)
wget http://ladd00.triumf.ca/~olchansk/linux/groeck-w83627ehf-dd3e543/w83627ehf.ko
Core 1:         +28.0°C  (high = +80.0°C, crit = +100.0°C)
echo "modprobe hwmon; modprobe hwmon-vid; modprobe k10temp; rmmod w83627ehf; insmod /root/w83627ehf.ko" >> /etc/rc.local
 
[root@daq03 ~]#
</pre>
</pre>


=== ASUS E45M1-M PRO mobo ===
=== Supermicro X11SSH-F ===


* https://www.asus.com/Motherboards/E45M1M_PRO/#specifications
* blacklist the mei and mei_me drivers per http://www.supermicro.com/support/faqs/faq.cfm?faq=14537
* use BIOS 1202 or newer
<pre>
* follow the E35M1-M PRO instructions above
[root@alpha00 ~]# more /etc/modprobe.d/blacklist.conf
blacklist mei
blacklist mei_me
[root@alpha00 ~]#
</pre>
* mobo requires M.2 PCIe SSD (M.2 SATA SSD would not work. SATA SATA SSD ok)
* boot from M.2 PCIe SSD requires UEFI boot (from an MSDOS partition on the SSD)


=== ASUS P9X79 WS ===
=== ASUS TUF Z390M-PRO GAMING (WI-FI) ===


* http://www.asus.com/Motherboard/P9X79_WS/
* BIOS 2417 is okey, upgrade to this if older
* use BIOS version 3101, 3401, 4701 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS.
* do not set XMP memory mode
* (not needed for CentOS7) for CPU temperature, install coretemp
* in the BIOS, enable the boot compatibility support module mode: BIOS (press DEL) -> Advanced mode -> BOOT -> CSM Module -> Enable CSM "yes".
* (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above.
* for SL6, install e1000e driver from ELREPO:
* BIOS Settings:
<pre>
** enter "Advanced mode"
yum install --enablerepo=elrepo kmod-e1000e
** Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default
</pre>
** Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings
* sensors chip appears to be "Nuvoton NCT6798D" not clear what driver to use
** Boot -> Full screen logo -> Set to "disabled"
* dmidecode | grep -i nct reports: Nuvoton NCT6798D
** Wait for F1 -> Set to "disabled"
* kmod-nct6775-0.0-5.el7_7.elrepo.x86_64.rpm from ELrepo finds the chip but bombs because of conflict with ACPI


=== ASUS P8B-M ===
=== ASUS PRIME X399-A ===


* use BIOS version 6103 or newer
* BIOS 1002
* for CPU temperature, install coretemp
* for reading temperatures and fan rotations, install driver: https://github.com/electrified/asus-wmi-sensors/issues/29
* for sensors, install driver for NCT6776F chip same as E35M1-M above.


=== SUPERMICRO X9SCL ===
== Configure X11 graphics ==


* yum install kmod-w83627ehf.x86_64 coretemp
=== Special settings for DAQ ===
* xemacs -nw /etc/rc.local, add:
<pre>
modprobe coretemp
modprobe w83627ehf
</pre>


=== ASUS Z87-WS ===
* add the following at the end of /etc/X11/xorg.conf. The enables Ctrl-Alt-KP-/ and Ctrl-Alt-KP-* to unlock the keyboard after Altera Quartus crash:
<pre>Section "ServerFlags"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Option "AllowDeactivateGrabs" "true"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Option "AllowClosedownGrabs" "true"
EndSection</pre>


<pre>
=== Install NVIDIA drivers ===
cd ~root
 
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775/nct6775.ko
* yum --enablerepo=elrepo install nvidia-detect
</pre>
* run: nvidia-detect
* as instructed by nvidia-detect, install correct driver:
** yum --enablerepo=elrepo install kmod-nvidia
** yum --enablerepo=elrepo install kmod-nvidia-304xx
** yum --enablerepo=elrepo install kmod-nvidia-173xx
* (before SL6.x:  if it fails due to conflict with module-init-tools, run "yum --disablerepo \* --enablerepo elrepo update module-init-tools")
* yum erase xorg-x11-glamor ### see http://elrepo.org/tiki/kmod-nvidia (search for glamor)
* mv /etc/X11/xorg.conf /etc/X11/xorg.conf-xxx
* nvidia-xconfig
* (SL6) reboot
* (SL5) /dev/MAKEDEV nvidia
* (SL5) restart the X11 server (Ctrl-Alt-Backspace or "killall Xorg gdm-binary")
* observe that X11 server restarts using the NVIDIA driver (big NVIDIA logo on startup)
* if needed, login as root and run "nvidia-settings" to setup dual-screen configuration, etc
 
=== Install legacy NVIDIA drivers ===
 
For old NVIDIA cards:
* GeForce FX 5500


Place the modprobe and insmod lines in /etc/rc.local to load the drivers at boot time
<pre>
modprobe hwmon-vid
insmod /root/nct6775.ko
</pre>
=== ASUS AM1M-A ===
* use BIOS 602 or later
* SL6.5 installer cannot use USB2 ports and the network. Use USB3 ports (blue colour) to boot USB installer (memtest, rescue, etc)
* SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey)
* install ATI/AMD video drivers from ELREPO (see below)
* sensors chip is ITE IT8623E, for SL6, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures):
<pre>
<pre>
cd ~root
wget http://us.download.nvidia.com/XFree86/Linux-x86/173.14.31/NVIDIA-Linux-x86-173.14.31-pkg1.run
wget http://ladd00.triumf.ca/~olchansk/linux/it87.ko
sh ./NVIDIA-Linux-x86-173.14.31-pkg1.run
echo modprobe hwmon_vid >> /etc/rc.local
echo insmod /root/it87.ko >> /etc/rc.local
. /etc/rc.local
</pre>
</pre>
* for el7 use it87.ko driver:
 
* GeForce 6200 - NVIDIA Corporation NV44A [GeForce 6200]
<pre>
<pre>
cd ~root
yum install nvidia-x11-drv-304xx-304.121 --enablerepo=elrepo
wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/it87.ko
nvidia-xconfig
echo modprobe hwmon_vid >> /etc/rc.local
rmmod nvidia
echo insmod /root/it87.ko >> /etc/rc.local
killall gdm-binary
. /etc/rc.local
login as root
nvidia-settings to setup multiple displays
</pre>
</pre>
* sensors output:
<pre>
[root@midemma02 ~]# sensors
radeon-pci-0008
Adapter: PCI adapter
temp1:        +22.0°C  (crit = +120.0°C, hyst = +90.0°C)


fam15h_power-pci-00c4
=== Install ATI/AMD drivers ===
Adapter: PCI adapter
power1:          N/A  (crit = 25.00 W)


k10temp-pci-00c3
* yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
Adapter: PCI adapter
* check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx"
temp1:        +22.2°C  (high = +70.0°C)
* run "aticonfig --initial" to create xorg.conf if existing one is not good
                      (crit = +70.0°C, hyst = +69.0°C)
* run "amdcccle" as root to configure dual-screens, etc
  Note: 'amdcccle' is a GUI, so you must run this command from within a running X session
* killall Xorg


it8603-isa-0290
=== Install ATI/AMD drivers (CentOS7) ===
Adapter: ISA adapter
in0:          +0.96 V  (min = +2.50 V, max = +2.95 V)  ALARM
in1:          +2.23 V  (min = +0.94 V, max =  +1.22 V)  ALARM
in2:          +2.03 V  (min =  +0.74 V, max =  +0.77 V) ALARM
in3:          +2.00 V  (min = +1.26 V, max = +0.13 V)  ALARM
in4:          +2.23 V  (min =  +2.95 V, max =  +2.15 V)  ALARM
3VSB:        +3.36 V  (min =  +6.00 V, max =  +2.50 V)  ALARM
Vbat:        +3.22 V 
+3.3V:        +3.36 V 
fan1:        611 RPM  (min =  200 RPM)
fan2:        707 RPM  (min =  600 RPM)  ALARM
temp1:        +38.0°C  (low  = +122.0°C, high = +122.0°C)  sensor = thermistor
temp2:        +22.0°C  (low  = +119.0°C, high = -35.0°C)  ALARM  sensor = thermistor
temp3:      -128.0°C  (low  = +16.0°C, high = +93.0°C)  sensor = thermistor
intrusion0:  ALARM


[root@midemma02 ~]#
* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
</pre>
* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm
* AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together)
* yum install acpid
 
* rpm -vh --install kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
=== Intel SE7230NH1 ===
* amdconfig -f --initial
* grub2-mkconfig -o /boot/grub2/grub.cfg
* reboot
* login as root
* amdcccle
 
NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig.


* front panel header connector pinout is like this:
=== Install Intel drivers for HD4600/Z87 ===
<pre>
PWR LED | 1  2|
        | 3  4|
PWR LED | 5  6|
HDD LED | 7  8|
HDD LED | 9 10|
PWR SW  |11 12| NIC1 LED
PWR SW  |13 14| NIC1 LED
RST SW  |15 16|
RST SW  |17 18|
        |19 20|
NMI SW  |21 22| NIC2 LED
NMI SW  |23 24| NIC2 LED
...    |...  |
        |33 34|
</pre>


=== ASUS H110M-A/M.2 ===
SL6.5 has the required drivers for the socket 1150 machines with Intel HD4600 graphics and Z87 chipset.


* use BIOS 2003 or later
ASUS Z87 WS motherboard has these video connections with corresponding Intel video port assignements, as reported by "xrandr":
* sensors chip is ??? for el7, use this driver:
* DisplayPort - DP1/HDMI1
<pre>
* MiniDisplayPort - DP2/HDMI2
cd ~root
* HDMI - HDMI3
wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/nct6775.ko
 
echo modprobe hwmon_vid >> /etc/rc.local
Due to hardware limitations, 3 HDMI monitors using 2 passive DP-HDMI adapters (and 1 straight HDMI) cannot be used.
echo modprobe coretemp >> /etc/rc.local
 
echo insmod /root/nct6775.ko >> /etc/rc.local
To use 3 monitors do this:
. /etc/rc.local
* 1st monitor: DisplayPort - DP-to-HDMI-passive-adapter - HDMI monitor (not tried: DP-to-DP-cable - DisplayPort monitor).
</pre>
* 2nd monitor: MiniDisplayPort - MiniDP-to-DP-cable - DisplayPort monitor
* 3rd monitor: HDMI - HDMI-cable - HDMI monitor


* sensors output:
With the monitors I have (Dell 1920x1200 VGA-HDMI-DP), the software thinks that there are 4 monitors: somehow both DP2 and HDMI2 see 1 minitor each, but the hardware cannot drive 4 monitors, so everything goes blank. To fix, disable HDMI2 (xrandr -display :0 --output HDMI2 --off) and enable DP2 (xrandr -display :0 --output DP2 --auto).
<pre>
[root@daq03 ~]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +119.0°C)
temp2:       +29.8°C  (crit = +119.0°C)


nct6793-isa-0290
How to make this configuration permanent and how to assign monitor locations (left-right, etc), you figure it out.
Adapter: ISA adapter
in0:                      +0.34 V  (min =  +0.00 V, max =  +1.74 V)
in1:                      +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                      +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                      +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                      +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      +0.15 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      +0.97 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                      +3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                      +3.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      +1.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      +0.12 V  (min =  +0.00 V, max =  +0.00 V) ALARM
in12:                      +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                      +0.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                      +0.13 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                    1041 RPM  (min =    0 RPM)
fan2:                    1020 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM
SYSTIN:                  +119.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +26.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +27.5°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +111.0°C    sensor = thermistor
PECI Agent 0:              +28.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +25.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


coretemp-isa-0000
=== Manual selection of monitor, video mode and resolution ===
Adapter: ISA adapter
Physical id 0:  +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +28.0°C  (high = +80.0°C, crit = +100.0°C)


[root@daq03 ~]#
Automatic selection of monitor and video mode usually works. When it does not, configure it manualls:
</pre>


=== Supermicro X11SSH-F ===
* physically go to the computer
* login as root
* run "nvidia-settings" on machines using the NVIDIA driver
* run "aticonfig" on machines with the ATI/AMD driver (use "aticonfig --initial" for initial setup, and good luck with anything more complicated)
* run "system-config-display".
** In the "hardware" tab, select monitor type: "generic LCD 1280x1024" or "generic LCD 1600x1200".
** In the "settings" tab, select "1280x1024" or "1600x1200" and "Thousands of colors".
** Press "ok", the display settings application should close.
* Logout, the new login window should use the new settings.


* blacklist the mei and mei_me drivers per http://www.supermicro.com/support/faqs/faq.cfm?faq=14537
=== Disable screen saver ===
<pre>
[root@alpha00 ~]# more /etc/modprobe.d/blacklist.conf
blacklist mei
blacklist mei_me
[root@alpha00 ~]#
</pre>
* mobo requires M.2 PCIe SSD (M.2 SATA SSD would not work. SATA SATA SSD ok)
* boot from M.2 PCIe SSD requires UEFI boot (from an MSDOS partition on the SSD)


== Configure X11 graphics ==
If machine is booted without any monitor connected, current video cards to not enable any video outputs. If a monitor is connected later, there is no video image and there is no easy way to get a video image.


=== Special settings for DAQ ===
This can be solved by configuring X11 to always enable some video output. Because the monitor type is not known when X11 starts, one has to select some standard video mode (i.e. VESA 1280x1024) on some video output (VGA, DVI or HDMI).


* add the following at the end of /etc/X11/xorg.conf. The enables Ctrl-Alt-KP-/ and Ctrl-Alt-KP-* to unlock the keyboard after Altera Quartus crash:
Only NVIDIA cards with the NVIDIA driver (from EPEL) is supported by these instructions.
<pre>Section "ServerFlags"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Option "AllowDeactivateGrabs" "true"
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Option "AllowClosedownGrabs" "true"
EndSection</pre>


=== Install NVIDIA drivers ===
* create default xorg.conf: nvidia-xconfig
 
* edit /etc/X11/xorg.conf
* yum --enablerepo=elrepo install nvidia-detect
* add monitor section for the fake monitor:
* run: nvidia-detect
<pre>
* as instructed by nvidia-detect, install correct driver:
Section "Monitor"
** yum --enablerepo=elrepo install kmod-nvidia
    Identifier    "Monitor0"
** yum --enablerepo=elrepo install kmod-nvidia-304xx
    VendorName    "Unknown"
** yum --enablerepo=elrepo install kmod-nvidia-173xx
    ModelName      "Unknown"
* (before SL6.x:  if it fails due to conflict with module-init-tools, run "yum --disablerepo \* --enablerepo elrepo update module-init-tools")
    HorizSync      31.0 - 83.0
* yum erase xorg-x11-glamor ### see http://elrepo.org/tiki/kmod-nvidia (search for glamor)
    VertRefresh    59.0 - 61.0
* mv /etc/X11/xorg.conf /etc/X11/xorg.conf-xxx
    Option        "DPMS" "off"
* nvidia-xconfig
    ModeLine "1280x1024"  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync
* (SL6) reboot
EndSection
* (SL5) /dev/MAKEDEV nvidia
* (SL5) restart the X11 server (Ctrl-Alt-Backspace or "killall Xorg gdm-binary")
* observe that X11 server restarts using the NVIDIA driver (big NVIDIA logo on startup)
* if needed, login as root and run "nvidia-settings" to setup dual-screen configuration, etc
 
=== Install legacy NVIDIA drivers ===
 
For old NVIDIA cards:
* GeForce FX 5500
 
<pre>
wget http://us.download.nvidia.com/XFree86/Linux-x86/173.14.31/NVIDIA-Linux-x86-173.14.31-pkg1.run
sh ./NVIDIA-Linux-x86-173.14.31-pkg1.run
</pre>
</pre>
 
* add output selection in the "Device" section:
* GeForce 6200 - NVIDIA Corporation NV44A [GeForce 6200]
<pre>
<pre>
yum install nvidia-x11-drv-304xx-304.121 --enablerepo=elrepo
Section "Device"
nvidia-xconfig
    Identifier    "Device0"
rmmod nvidia
    Driver        "nvidia"
killall gdm-binary
    VendorName    "NVIDIA Corporation"
login as root
    BoardName      "GeForce 210"
nvidia-settings to setup multiple displays
    #Option "ConnectedMonitor" "DFP"
    #Option "ConnectedMonitor" "CRT"
    Option "ConnectedMonitor" "CRT-1"
    Option "UseEDID" "no"
EndSection
</pre>
* add fake video mode to the "Screen" section:
<pre>
Section "Screen"
    Identifier    "Screen0"
    Device        "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection    "Display"
        Depth      24
        Modes      "1280x1024"
    EndSubSection
EndSection
</pre>
* disable screen saver and DPMS power off in the "ServerLayout" or "ServerFlags" section:
<pre>
Section "ServerLayout"
    Identifier    "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option        "Xinerama" "0"
    Option        "BlankTime" "0"
    Option        "StandbyTime" "0"
    Option        "SuspendTime" "0"
    Option        "OffTime" "0"
EndSection
 
Section "ServerFlags"
    Option        "BlankTime" "0"
    Option        "StandbyTime" "0"
    Option        "SuspendTime" "0"
    Option        "OffTime" "0"
EndSection
</pre>
</pre>


=== Install ATI/AMD drivers ===
== Finish installation ==
 
* yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
* check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx"
* run "aticonfig --initial" to create xorg.conf if existing one is not good
* run "amdcccle" as root to configure dual-screens, etc
  Note: 'amdcccle' is a GUI, so you must run this command from within a running X session
* killall Xorg
 
=== Install ATI/AMD drivers (CentOS7) ===


* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
* logout and reboot the computer to have all the changes to take effect
* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm
* yum install acpid
* rpm -vh --install kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
* amdconfig -f --initial
* grub2-mkconfig -o /boot/grub2/grub.cfg
* reboot
* login as root
* amdcccle


NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig.
== Configure HTTPS server (CentOS7) ==


=== Install Intel drivers for HD4600/Z87 ===
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd.


SL6.5 has the required drivers for the socket 1150 machines with Intel HD4600 graphics and Z87 chipset.
First, configure apache httpd:


ASUS Z87 WS motherboard has these video connections with corresponding Intel video port assignements, as reported by "xrandr":
* execute these commands:
* DisplayPort - DP1/HDMI1
<pre>
* MiniDisplayPort - DP2/HDMI2
yum install -y mod_ssl certwatch crypto-utils
* HDMI - HDMI3
cd /etc/httpd/conf.d/
mv ssl.conf ssl.conf-not-used ### remove the stock ssl.conf which refers to the localhost certificate that will expire in 1 year
touch ssl.conf ### create a blank file to prevent automatic updates from installing a stock ssl.conf file
# this is done later: rm /etc/pki/tls/certs/localhost.crt
</pre>
* create new file ssl-daq12.conf # use actual hostname instead of daq12
<pre>
Listen 443 https
#SSLPassPhraseDialog exec:/usr/libexec/httpd-ssl-pass-dialog
SSLSessionCache        shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  300
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin


Due to hardware limitations, 3 HDMI monitors using 2 passive DP-HDMI adapters (and 1 straight HDMI) cannot be used.
<VirtualHost *:443>
 
ServerName daq12.triumf.ca
To use 3 monitors do this:
DocumentRoot /var/www/html
* 1st monitor: DisplayPort - DP-to-HDMI-passive-adapter - HDMI monitor (not tried: DP-to-DP-cable - DisplayPort monitor).
ErrorLog /var/log/httpd/daq12.log
* 2nd monitor: MiniDisplayPort - MiniDP-to-DP-cable - DisplayPort monitor
SSLEngine on
* 3rd monitor: HDMI - HDMI-cable - HDMI monitor
# note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
 
# new SSL settings: K.O. Jan 2020, SSLlabs rating "A+"
With the monitors I have (Dell 1920x1200 VGA-HDMI-DP), the software thinks that there are 4 monitors: somehow both DP2 and HDMI2 see 1 minitor each, but the hardware cannot drive 4 monitors, so everything goes blank. To fix, disable HDMI2 (xrandr -display :0 --output HDMI2 --off) and enable DP2 (xrandr -display :0 --output DP2 --auto).
SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
 
SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4:!RSA
How to make this configuration permanent and how to assign monitor locations (left-right, etc), you figure it out.
SSLHonorCipherOrder on
 
# pervious SSL settings:
=== Manual selection of monitor, video mode and resolution ===
#SSLProtocol all -SSLv2 -SSLv3
 
#SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
Automatic selection of monitor and video mode usually works. When it does not, configure it manualls:
SSLCertificateFile /etc/pki/tls/certs/localhost.crt
SSLCertificateKeyFile /etc/pki/tls/private/localhost.key
#SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt
#ProxyPass /elog/ http://localhost:8082/ retry=1
#ProxyPass /      http://localhost:8080/ retry=1
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
<Location />
SSLRequireSSL
AuthType Basic
AuthName "DAQ password protected site"
Require valid-user
# create password file: touch /etc/httpd/htpasswd
# to add new user or change password: htpasswd /etc/httpd/htpasswd username
AuthUserFile /etc/httpd/htpasswd
</Location>
</VirtualHost>
</pre>
* stop httpd from listening on port 80: edit /etc/httpd/conf/httpd.conf, comment-out the line "Listen 80"
* enable and start httpd:
<pre>
systemctl enable httpd
systemctl restart httpd
systemctl status httpd
</pre>
* try to access https://daq12.triumf.ca
** you should see a complaint about self-signed certificate
** you should see a request for password (do not login yet)
** if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again:
<pre>
firewall-cmd --add-port=443/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all
</pre>


* physically go to the computer
Second, configure certbot:
* login as root
* run "nvidia-settings" on machines using the NVIDIA driver
* run "aticonfig" on machines with the ATI/AMD driver (use "aticonfig --initial" for initial setup, and good luck with anything more complicated)
* run "system-config-display".
** In the "hardware" tab, select monitor type: "generic LCD 1280x1024" or "generic LCD 1600x1200".
** In the "settings" tab, select "1280x1024" or "1600x1200" and "Thousands of colors".
** Press "ok", the display settings application should close.
* Logout, the new login window should use the new settings.


=== Disable screen saver ===
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate,
renewal can continue to use the https port 443)


If machine is booted without any monitor connected, current video cards to not enable any video outputs. If a monitor is connected later, there is no video image and there is no easy way to get a video image.
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)


This can be solved by configuring X11 to always enable some video output. Because the monitor type is not known when X11 starts, one has to select some standard video mode (i.e. VESA 1280x1024) on some video output (VGA, DVI or HDMI).
* check that port 80 is not used by anything:
 
* netstat -an | grep LISTEN | grep ^tcp | grep 80
Only NVIDIA cards with the NVIDIA driver (from EPEL) is supported by these instructions.
* lsof -P | grep -i tcp | grep LISTEN | grep 80
* if lsof reports that httpd is listening on port 80, follow the httpd instructions above (remove "listen 80" from httpd.conf


* create default xorg.conf: nvidia-xconfig
* install certbot and open tcp port 80 in the firewall:
* edit /etc/X11/xorg.conf
* add monitor section for the fake monitor:
<pre>
<pre>
Section "Monitor"
yum install -y certbot python2-certbot-apache # (from EPEL)
    Identifier    "Monitor0"
firewall-cmd --add-port=80/tcp --permanent
    VendorName    "Unknown"
firewall-cmd --reload
    ModelName      "Unknown"
firewall-cmd --list-all
    HorizSync      31.0 - 83.0
    VertRefresh    59.0 - 61.0
    Option        "DPMS" "off"
    ModeLine "1280x1024"  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync
EndSection
</pre>
</pre>
* add output selection in the "Device" section:
* certbot certonly --standalone --installer apache # then answer questions:
* "activate HTTPS for daq12.triumf.ca" - say ok
* "enter email address" - enter your own email address
* "please read terms..." - read the terms and say "agree"
* it will take a few moments...
* "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration
* "congratulations..." - say ok.
* certbot install --apache --cert-name daq12.triumf.ca # then answer questions:
* "choose redirect..." - say "1" (no redirect)
* look inside ssl-daq12.conf to see that SSLCertificateFile & co point to certbot certificates in /etc/letsencrypt/live/daq12.triumf.ca/
* remove self-signed localhost certificate, it will expire in 1 year and cause warnings and complaints: rm /etc/pki/tls/certs/localhost.crt
* enable automatic renewal
<pre>
<pre>
Section "Device"
systemctl enable certbot-renew.timer
    Identifier    "Device0"
systemctl start certbot-renew.timer
    Driver        "nvidia"
systemctl list-timers --all
    VendorName    "NVIDIA Corporation"
    BoardName      "GeForce 210"
    #Option "ConnectedMonitor" "DFP"
    #Option "ConnectedMonitor" "CRT"
    Option "ConnectedMonitor" "CRT-1"
    Option "UseEDID" "no"
EndSection
</pre>
</pre>
* add fake video mode to the "Screen" section:
 
* to check corrent renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
<pre>
<pre>
Section "Screen"
certbot renew --standalone --installer apache --force-renewal
    Identifier    "Screen0"
</pre>
    Device        "Device0"
 
    Monitor        "Monitor0"
NOTE: this certificate will expire in 3 months, automatic renewal should work starting with certbot-0.12.0-4.el7.noarch.
    DefaultDepth    24
Certificate expiration should be automatically detected by "certwatch" and email
    SubSection    "Display"
will be sent to local root user, to be forwarded to an actual person by ~root/.forward.
        Depth      24
 
        Modes      "1280x1024"
Third, activate password protection:
    EndSubSection
 
EndSection
* as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
<pre>
touch /etc/httpd/htpasswd
htpasswd /etc/httpd/htpasswd midas
</pre>
</pre>
* disable screen saver and DPMS power off in the "ServerLayout" or "ServerFlags" section:
 
Final test:
* access https://daq12.triumf.ca - https status should be "green"
* login with password should work
* the apache httpd test page should load
* check site security using the SSLlabs https tester. (I get grade "A-"): https://www.ssllabs.com/ssltest/
 
From here:
* Configure selinux to allow proxying
<pre>
<pre>
Section "ServerLayout"
setsebool -P httpd_can_network_connect 1
    Identifier    "Layout0"
  systemctl restart httpd
    Screen      0 "Screen0" 0 0
</pre>
    InputDevice    "Keyboard0" "CoreKeyboard"
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
    InputDevice    "Mouse0" "CorePointer"
* enable proxy for ELOG - ditto
    Option        "Xinerama" "0"
    Option        "BlankTime" "0"
    Option        "StandbyTime" "0"
    Option        "SuspendTime" "0"
    Option        "OffTime" "0"
EndSection


Section "ServerFlags"
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
    Option        "BlankTime" "0"
try this: pip install requests==2.6.0
    Option        "StandbyTime" "0"
    Option        "SuspendTime" "0"
    Option        "OffTime" "0"
EndSection
</pre>


== Finish installation ==
== Configure large RAID6 arrays ==


* logout and reboot the computer to have all the changes to take effect
* connect the disks
 
* check the disks health
== Configure HTTPS server (CentOS7) ==
** run smart-status.perl
* partition the disks
** yum install gdisk
** gdisk /dev/sdX
** delete all partitions: o
** create new partition: n, enter, enter, enter, fd00 (default sizes, partition type fd00)
** write and exit: w
* check presence of all partitions:
** /bin/ls -l /dev/sd*1
* prepare to use an external bitmap file
** touch /md6bitmap
** edit /etc/fstab, change entry for root filesystem from: "defaults 1 1" to "defaults 0 0"
** edit /boot/grub/grub.conf, change entry "kernel ... ro ..." to "kernel ... rw ..."
* create raid array:
** mdadm --create /dev/md6 --level=6 --bitmap=/md6bitmap --raid-devices=10 /dev/sd[b-k]1
** mdadm -Ds >> /etc/mdadm.conf
** cleanup /etc/mdadm.conf
** echo "echo 16384 > /sys/block/md6/md/stripe_cache_size" >> /etc/rc.local
** echo "echo 1    > /sys/block/md6/md/sync_speed_min" >> /etc/rc.local
** source /etc/rc.local
* observe raid array rebuild:
** watch -d -n1 "cat /proc/mdstat"
 
== Configure ZFS ==


This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd.
=== Install ZFS ===


First, configure apache httpd:
(from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)


* yum install mod_ssl certwatch crypto-utils
Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs.
* cd /etc/httpd/conf.d/
* mv ssl.conf ssl.conf-not-used ### remove the stock ssl.conf which refers to the localhost certificate that will expire in 1 year
* touch ssl.conf ### create a blank file to prevent automatic updates from installing a stock ssl.conf file
* rm /etc/pki/tls/certs/localhost.crt
* create new file ssl-daq12.conf # use actual hostname instead of daq12
<pre>
Listen 443 https
#SSLPassPhraseDialog exec:/usr/libexec/httpd-ssl-pass-dialog
SSLSessionCache        shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  300
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin


<VirtualHost *:443>
<pre>
ServerName daq12.triumf.ca
#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
DocumentRoot /var/www/html
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
ErrorLog /var/log/httpd/daq12.log
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
SSLEngine on
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
# note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm
SSLProtocol all -SSLv2 -SSLv3
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm
SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm
SSLCertificateFile /etc/pki/tls/certs/localhost.crt
yum install http://download.zfsonlinux.org/epel/zfs-release.el7_9.noarch.rpm
SSLCertificateKeyFile /etc/pki/tls/private/localhost.key
yum-config-manager --disable zfs
#SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt
yum-config-manager --disable zfs-kmod
#ProxyPass /elog/ http://localhost:8082/ retry=1
yum --enablerepo=zfs-kmod clean all
#ProxyPass /      http://localhost:8080/ retry=1
yum --enablerepo=zfs-kmod install zfs
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
#sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config
<Location />
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
SSLRequireSSL
#systemctl enable zfs-import-cache
AuthType Basic
#systemctl enable zfs-mount
AuthName "DAQ password protected site"
#systemctl enable zfs-share
Require valid-user
#systemctl enable zfs-zed
# create password file: touch /etc/httpd/htpasswd
#shutdown -r now # required to load the zfs kernel modules and to disable selinux
# to add new user or change password: htpasswd /etc/httpd/htpasswd username
modprobe zfs # should work
AuthUserFile /etc/httpd/htpasswd
zpool status # should report no pools available
</Location>
</VirtualHost>
</pre>
</pre>
* stop httpd from listening on port 80: edit /etc/httpd/conf/httpd.conf, comment-out the line "Listen 80"
 
* systemctl enable httpd
#Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see #https://github.com/zfsonlinux/zfs/issues/4845
* systemctl restart httpd
 
* systemctl status httpd
* http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-quickstart.html)
* try to access https://daq12.triumf.ca
* http://www.freebsd.org/cgi/man.cgi?query=zpool&sektion=8
** you should see a complaint about self-signed certificate
 
** you should see a request for password (do not login yet)
If ZFS kernel module does not load automatically at boot time, add this to load it manually:
** if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again:
<pre>
<pre>
firewall-cmd --add-port=443/tcp --permanent
ls -l /etc/sysconfig/modules/
firewall-cmd --reload
cat > /etc/sysconfig/modules/zfs.modules <<EOF
firewall-cmd --list-all
if [ ! -e /sys/module/zfs ] ; then
  modprobe zfs;
fi
EOF
chmod +x /etc/sysconfig/modules/zfs.modules
</pre>
</pre>


Second, configure certbot:
=== Update ZFS (CentOS-7.9) ===


(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate,
* update CentOS-7.x to latest point release
renewal can continue to use the https port 443)
* reboot to latest kernel
* check that currently installed ZFS is 0.8.x (not 0.7 or older)
* then update ZFS:
<pre>
[root@daq16 ~]# zfs version
zfs-0.8.4-1
zfs-kmod-0.8.4-1
[root@daq16 ~]# yum --enablerepo=kmod-zfs update
...
[root@daq16 ~]# zfs version ### observe mismatched version numbers: 0.8.5 userspace vs 0.8.4 kernel module
zfs-0.8.5-1
zfs-kmod-0.8.4-1
</pre>
* reboot to activate the updated kernel module
* zfs version again
<pre>
[root@daq16 ~]# zpool version
zfs-0.8.5-1
zfs-kmod-0.8.5-1
</pre>
* zpool status in case some ZFS volume needs to be updated
<pre>
[root@daq16 ~]# zpool status
  pool: z12tb
state: ONLINE
...
</pre>
 
=== Update ZFS 0.7 to 0.8 ===
 
How to identify zfs 0.7: "zfs version" does not work, also "rpm -q zfs"


(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
zfs 0.7 is obsolete.


* check that port 80 is not used by anything:
To opdate to zfs 0.8 or newer, remove 0.7, then install
* netstat -an | grep LISTEN | grep ^tcp | grep 80
new version per instructions above.
* lsof -P | grep -i tcp | grep LISTEN | grep 80
* if lsof reports that httpd is listening on port 80, follow the httpd instructions above (remove "listen 80" from httpd.conf


* yum install certbot python2-certbot-apache # (from EPEL)
* remove zfs 0.7
* firewall-cmd --add-port=80/tcp --permanent
* firewall-cmd --reload
* firewall-cmd --list-all
* certbot certonly --standalone --installer apache # then answer questions:
* "activate HTTPS for daq12.triumf.ca" - say ok
* "enter email address" - enter your own email address
* "please read terms..." - read the terms and say "agree"
* it will take a few moments...
* "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration
* "congratulations..." - say ok.
* certbot install --apache --cert-name daq12.triumf.ca # then answer questions:
* "choose redirect..." - say "1" (no redirect)
* look inside ssl-daq12.conf to see that SSLCertificateFile & co point to certbot certificates in /etc/letsencrypt/live/daq12.triumf.ca/
* enable automatic renewal
<pre>
<pre>
systemctl enable certbot-renew.timer
yum versionlock delete zfs ### versionlock not needed anymore
systemctl start certbot-renew.timer
yum versionlock delete kernel ### versionlock not needed anymore
systemctl list-timers --all
rm /etc/yum.repos.d/zfs.repo* ### delete old repo files
</pre>
yum erase zfs spl
</pre>
* reboot
* install new zfs per instructions above
* zpool import -as
* zpool status ### check if any pool needs to be upgraded
* zpool upgrade zssd ### upgrade zfs pool features


* to check corrent renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
=== Lock kernel and zfs packages ===
* certbot renew --standalone --installer apache --force-renewal


NOTE: this certificate will expire in 3 months, automatic renewal should work starting with certbot-0.12.0-4.el7.noarch.
!!! THIS IS NOT NEEDED ANYMORE !!!
Certificate expiration should be automatically detected by "certwatch" and email
will be sent to local root user, to be forwarded to an actual person by ~root/.forward.
 
Third, activate password protection:


* as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
<pre>
<pre>
touch /etc/httpd/htpasswd
yum versionlock kernel
htpasswd /etc/httpd/htpasswd midas
yum versionlock zfs
yum-config-manager --disable zfs
yum-config-manager --disable zfs-kmod
</pre>
</pre>


Final test:
=== Follow generic ZFS instructions ===
* access https://daq12.triumf.ca - https status should be "green"
* login with password should work
* the apache httpd test page should load
* check site security using the SSLlabs https tester. (I get grade "A-"): https://www.ssllabs.com/ssltest/


From here:
Here: [[ZFS]]
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
* enable proxy for ELOG - ditto
* setsebool -P httpd_can_network_connect 1
* systemctl restart httpd


NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
== performance notes ==
try this: pip install requests==2.6.0


== Configure large RAID6 arrays ==
Go here: [[disk_benchmarks]]


* connect the disks
== Configure UEFI boot ==
* check the disks health
** run smart-status.perl
* partition the disks
** yum install gdisk
** gdisk /dev/sdX
** delete all partitions: o
** create new partition: n, enter, enter, enter, fd00 (default sizes, partition type fd00)
** write and exit: w
* check presence of all partitions:
** /bin/ls -l /dev/sd*1
* prepare to use an external bitmap file
** touch /md6bitmap
** edit /etc/fstab, change entry for root filesystem from: "defaults 1 1" to "defaults 0 0"
** edit /boot/grub/grub.conf, change entry "kernel ... ro ..." to "kernel ... rw ..."
* create raid array:
** mdadm --create /dev/md6 --level=6 --bitmap=/md6bitmap --raid-devices=10 /dev/sd[b-k]1
** mdadm -Ds >> /etc/mdadm.conf
** cleanup /etc/mdadm.conf
** echo "echo 16384 > /sys/block/md6/md/stripe_cache_size" >> /etc/rc.local
** echo "echo 1    > /sys/block/md6/md/sync_speed_min" >> /etc/rc.local
** source /etc/rc.local
* observe raid array rebuild:
** watch -d -n1 "cat /proc/mdstat"


== Configure ZFS ==
Some mobo can boot from NVME (PCIe) SSDs only via UEFI boot. Do this:


=== Install ZFS ===
* partition the NVME SSD using gdisk (must be GPT partition table, must have MSDOS EFI partition size 512MiB)
 
<pre>
(from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)
[root@alpha00 ~]# gdisk -l /dev/nvme0n1
 
GPT fdisk (gdisk) version 0.8.6 ...
Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs.
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 1A82CC87-2757-44ED-980F-C78E3681D9D3
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 500118158
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)


Number  Start (sector)    End (sector)  Size      Code  Name
  1            2048        1050623  512.0 MiB  EF00  EFI System
  2        1050624      500118158  238.0 GiB  8300  Linux filesystem
[root@alpha00 ~]#
</pre>
* create filesystems
<pre>
<pre>
#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
mkfs.msdos /dev/nvme0n1p1
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
mkfs.xfs /dev/nvme0n1p2
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
</pre>
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
* prepare EFI partition
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm
<pre>
yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm
mkdir /mnt/efi
yum-config-manager --disable zfs
mount /dev/nvme0n1p1 /mnt/efi
yum-config-manager --enable zfs-kmod
mkdir -p /mnt/efi/efi/boot
yum install zfs
cd /mnt/efi/efi/boot
#sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config
# with Ubuntu LTS 20.04
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
cp /boot/vmlinuz vmlinuz # copy the desired linux kernel
#shutdown -r now # required to load the zfs kernel modules and to disable selinux
#cp /boot/initramfs initramfs.img # copy the matching initramfs file
modprobe zfs # should work
cp /boot/initrd.img initrd.img # copy the matching initrd file
zpool status # should report no pools available
#from /home/olchansk/sysadm/syslinux/syslinux-6.03 copy
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/efi/syslinux.efi .
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 .
cp syslinux.efi bootx64.efi
</pre>
</pre>
 
* create syslinux config file: syslinux.cfg
#Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see #https://github.com/zfsonlinux/zfs/issues/4845
 
* http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-quickstart.html)
* http://www.freebsd.org/cgi/man.cgi?query=zpool&sektion=8
 
=== Lock kernel and zfs packages ===
 
<pre>
<pre>
yum versionlock kernel
default linux
yum versionlock zfs
label linux
kernel vmlinuz
append ro root=/dev/nvme0n1p2 nomodeset initrd=initrd.img
</pre>
* prepare system partition
<pre>
mkdir /mnt/tmp
mount /dev/nvme0n1p2 /mnt/tmp
rsync -avx / /mnt/tmp
cd /mnt/tmp
#edit etc/fstab
#edit etc/syslinux/selinux # set selinux to permissive mode because rsync did not copy the selinux labels
</pre>
* unmount and reboot
* restore selinux labels after first boot
<pre>
#login as root
cd /
restorecon -R / # can also add "-v" to see progress, but runs much slower
#edit /etc/sysconfig/selinux # enable selinux
#shutdown -r now # reboot with selinux enabled
</pre>
</pre>


= Configure UEFI secure  boot =


The above instructions do not quite work if "secure boot" is enabled.


=== Misc commands ===
These modifications are needed:


* zpool status
* ls -l /boot/efi/EFI/bootko/
* zpool get all
<pre>
* zpool iostat 1
total 140116
* zpool iostat -v 1
-rwxr-xr-x 1 root root      108 Feb 24 15:47 BOOTX64.CSV
* zpool history
-rwxr-xr-x 1 root root  1334816 Feb 24 16:16 bootx64.efi
* zpool scrub data14
-rwxr-xr-x 1 root root  217495 Feb 24 16:16 config-4.15.0-74-generic
* zpool events
-rwxr-xr-x 1 root root      105 Feb 24 15:47 grub.cfg
* arcstat.py 1
-rwxr-xr-x 1 root root  199952 Feb 24 16:16 grubx64.efi
* cat /proc/spl/kstat/zfs/arcstats
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initramfs.img
* echo 30000000000 > /sys/module/zfs/parameters/zfs_arc_meta_limit
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initrd.img-4.15.0-74-generic
* echo 32000000000 > /sys/module/zfs/parameters/zfs_arc_max
-rwxr-xr-x 1 root root  139968 Feb 24 16:16 ldlinux.e64
-rwxr-xr-x 1 root root  1269496 Feb 24 15:47 mmx64.efi
-rwxr-xr-x 1 root root  1334816 Feb 24 16:16 shimx64.efi
-rwxr-xr-x 1 root root      171 Feb 24 16:16 syslinux.cfg
-rwxr-xr-x 1 root root      102 Feb 24 16:16 syslinux.cfg~
-rwxr-xr-x 1 root root  199952 Feb 24 16:16 syslinux.efi
-rwxr-xr-x 1 root root  4068355 Feb 24 16:16 System.map-4.15.0-74-generic
-rwxr-xr-x 1 root root  8367768 Feb 24 16:16 vmlinuz
-rwxr-xr-x 1 root root  8367768 Feb 24 16:16 vmlinuz-4.15.0-74-generic
</pre>
** shmix64.efi is a copy from /boot/efi/EFI/ubuntu
** bootx64.efi is a copy of shimx64.efi (maybe not needed?)
** grubx64.efi is a copy of syslinux.efi
* efibootmgr -c -d /dev/nvme0n1 -p 2 -w -L bootko -l '\EFI\bootko\shimx64.efi'
* efibootmgr -v
<pre>
root@daqubuntu:~# efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0001,0002
Boot0000* bootko        HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\BOOTKO\SHIMX64.EFI)
Boot0001* Hard Drive    BBS(HD,,0x0)..GO..NO........y.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7....................A.......................................<..Gd-.;.A..MQ..L.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7........BO
Boot0002* ubuntu        HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO
root@daqubuntu:~#
</pre>
* NOTE: if, after running "efibootmgr -c", the UUID is zero, then it probably did not take and the entry will vanish after reboot. In my case the mistake was to use "-p 1" instead of "-p 2".


* zfs get all
Boot sequence is this:
* zfs set dedup=verify zssd/nfsroot
* shmix64.efi - Microsoft-signed boot loader is accepted by secure boot, loads and runs
* shimx64.efi loads and runs grubx64.efi, this file name is hardwired into the signed shim, cannot be changed
* grubx64.efi is syslinux.efi (could be anything)
* syslinux.efi runs, loads syslinux.cfg, loads the linux kernel, loads the initrd, runs the linux kernel with specified flags (ro root=...).


* zpool create data14 raidz2 /dev/sd[b-h]1
= UEFI syslinux kernel update =
* zfs create z8tb/data
* zfs destroy z8tb/data
* zpool add z10tb cache /dev/disk/by-id/ata-ADATA_SP550_2F4320041688
* parted /dev/sdx mklabel GPT
* blkid
* zpool iostat -v -q 1
* watch -d -n 1 "cat /proc/spl/kstat/zfs/arcstats | grep l2"
* zfs set primarycache=metadata tank/datab
* zfs set secondarycache=metadata tank/datab


* zfs userspace -p -H zssd/home1
To update the linux kernel booted by UEFI syslinux, use this script:
* zfs groupspace ...
* ~root/git/scripts/etc/update_efi.perl


=== Create raid0 (mirror) volume ===
= Update SL6 ssh =


<pre>
<pre>
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
WARNING!!!
dracut -vf
WARNING!!! original instructions used openssh 9.1, vulnerable to CVE-2024-6387
zpool create zssd mirror /dev/sdaX /dev/sdbX
WARNING!!!
zpool set cachefile=none zssd
WARNING!!! these updated instructions use OpenSSH_9.8. K.O. 3jul2024
zpool set failmode=continue zssd
WARNING!!!
zpool status
WARNING!!! see https://www.openssh.com/releasenotes.html
zpool events
WARNING!!!
zpool get all
df /zssd
ls -l /zssd
</pre>
</pre>


=== Use whole disk for zfs mirror (RAID0) ===
Stock SL6 ssh is now very old and by default, cannot connect to current Ubuntu and MacOS sshd. In reverse their ssh cannot connect to SL6 sshd.
 
== Workaround is to manually enable SL6-compatible settings ==


<pre>
<pre>
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
root@daq00:~# ssh -oHostKeyAlgorithms=+ssh-rsa -oPubKeyAcceptedAlgorithms=+ssh-rsa ladd00
[root@daq13 ~]# parted /dev/sdb
</pre>
(parted) mklabel GPT
(parted) q                                                               
[root@daq13 ~]# parted /dev/sdc
(parted) mklabel GPT                                                     
(parted) q                                                               
[root@daq13 ~]# blkid                                                   
/dev/sda1: UUID="ab920e4b-40ae-4551-aab8-f3e893d38830" TYPE="xfs"
/dev/sdb: PTTYPE="gpt"
/dev/sdc: PTTYPE="gpt"
[root@daq13 ~]# zpool create z10tb mirror /dev/sdb /dev/sdc
[root@daq13 ~]# zpool status
  pool: z10tb
state: ONLINE
  scan: none requested
config:


        NAME        STATE    READ WRITE CKSUM
Solution is to install newer ssh on affected SL6 machines:
        z10tb      ONLINE      0    0    0
          mirror-0  ONLINE      0    0    0
            sdb    ONLINE      0    0    0
            sdc    ONLINE      0    0    0


errors: No known data errors
== Install OpenSSH_9.8p1 per CVE-2024-6387 ==
[root@daq13 ~]#
[root@daq13 ~]# zfs create z10tb/emma
[root@daq13 ~]# df -kl
Filesystem      1K-blocks    Used  Available Use% Mounted on
pool          9426697856        0 9426697856  0% /pool
pool/daqstore  9426697856        0 9426697856  0% /pool/daqstore
[root@daq13 ~]#
</pre>
 
=== Enable ZFS at boot ===


<pre>
<pre>
systemctl enable zfs-import-cache
ssh root@sl6-machine
systemctl enable zfs-import-scan
cd /opt
systemctl enable zfs-mount
git clone https://daq00.triumf.ca/~olchansk/git/openssh.git
systemctl enable zfs-import.target
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/
systemctl enable zfs.target
/bin/cp -pv /etc/ssh/*key* /opt/openssh/etc/ ### copy old ssh host keys
/opt/openssh/bin/ssh-keygen -A ### generate any missing ssh host keys
# test sshd /opt/openssh/sbin/sshd -p 2222 -d
/bin/mv /usr/sbin/sshd /usr/sbin/sshd-SL6
/bin/ln -s /opt/openssh/sbin/sshd /usr/sbin/
/bin/mv /usr/bin/ssh /usr/bin/ssh-SL6
/bin/ln -s /opt/openssh/bin/ssh /usr/bin/
service sshd restart
</pre>
</pre>


=== Replace failed disk ===
== Update openssh from 9.1 to OpenSSH_9.8p1 per CVE-2024-6387 ==
 
Check for old version:


* pull failed disk out
* zpool status # identify failed disk zfs label (it should be labeled FAULTED or OFFLINE
* safe to reboot here
* install new disk
* partition new disk, i.e. "gdisk /dev/sdh", use "o" to create new partition table, use "n" to create new partition, accept all default answers, use "w" to save and exit
* safe to reboot here
* run tests on new disk (smart, diskscrub), if unhappy go back to "install new disk"
* safe to reboot here
* identify serial number of new disk, i.e. "smartctl -a /dev/sdh | grep -i serial" yields "Serial Number:    WD-WCAVY0893313"
* identify linux id of new disk by "ls -l /dev/disk/by-id | grep -i WD-WCAVY0893313" yields "ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1"
* zpool replace data11 zfs-label-of-failed-disk ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1
* zpool status should look like this:
<pre>
<pre>
[root@daq11 ~]# zpool status
[root@muon openssh]# telnet localhost 22
  pool: data11
SSH-2.0-OpenSSH_9.1
state: DEGRADED
</pre>
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Apr 29 11:51:03 2016
    24.7G scanned out of 795G at 32.3M/s, 6h46m to go
    3.00G resilvered, 3.11% done
config:


        NAME                                                  STATE    READ WRITE CKSUM
Update:
        data11                                                DEGRADED    0    0    0
          raidz2-0                                            DEGRADED    0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1    ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1    ONLINE      0    0    0
            replacing-2                                        DEGRADED    0    0    0
              17494865033746374811                            FAULTED      0    0    0  was /dev/sdi1
              ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1  ONLINE      0    0    0  (resilvering)
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1    ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1    ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1    ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1    ONLINE      0    0    0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1    ONLINE      0    0    0


errors: No known data errors
</pre>
* wait for raid rebuild ("resilvering") to complete
* zpool status should look like this:
<pre>
<pre>
[root@daq11 ~]# zpool status
cd /opt/openssh
  pool: data11
git pull
state: ONLINE
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/
  scan: resilvered 96.2G in 1h44m with 0 errors on Fri Apr 29 13:35:40 2016
service sshd restart
config:
 
        NAME                                                STATE    READ WRITE CKSUM
        data11                                              ONLINE      0    0    0
          raidz2-0                                          ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1  ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1  ONLINE      0    0    0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1  ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1  ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1  ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1  ONLINE      0    0    0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1  ONLINE      0    0    0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1  ONLINE      0    0    0
 
errors: No known data errors
</pre>
</pre>


=== Rename zfs pool ===
Check for new version:


<pre>
<pre>
zpool export oldname
telnet localhost 22
zpool import oldname z6tb
SSH-2.0-OpenSSH_9.8
</pre>
</pre>


=== Misc ===
== Build openssh ==


<pre>
<pre>
ZFS tunable parameters for hopefully speeding up resilvering:
ssh sl6-machine
 
cd git
https://www.reddit.com/r/zfs/comments/4192js/resilvering_raidz_why_so_incredibly_slow/
git clone git://anongit.mindrot.org/openssh.git
echo 0 > /sys/module/zfs/parameters/zfs_resilver_delay
cd openssh
echo 512 > /sys/module/zfs/parameters/zfs_top_maxinflight
autoreconf
echo 5000 > /sys/module/zfs/parameters/zfs_resilver_min_time_ms
xemacs -nw ./configure ### fix syntax error: line 28124 empty "if/then/else" block bombs out, fill it with "AAA=aaa"
./configure --prefix=/opt/openssh
make -j
</pre>
</pre>


Enable periodic scrub:
Install openssh:


<pre>
<pre>
cd ~/git/scripts
ssh root@sl6-machine
git pull
cd .../git/openssh
cd zfs
make install ### copies stuff to /opt/openssh
make install
/opt/openssh/sbin/sshd -p 2222 -d ### test sshd
/opt/openssh/bin/ssh -v sl6-machine ### test ssh
</pre>
</pre>


Working with ZFS snapshots:
Update for CVE-2024-6387:


* zfs list -t snapshot
* cd .../git/openssh
* cd ~/git; git clone https://github.com/zfsonlinux/zfs-auto-snapshot.git; cd zfs-auto-snapshot; make install
* git pull
 
* git checkout V_9_8_P1
If ZFS becomes 100% full, "rm" will stop working, but space can still be freed by using "echo > bigfile", afterwards "rm" works again.
* ./configure --prefix=/opt/openssh --with-ssl-dir=/opt/openssl
 
* make ### no go, wants openssl-1.1.1
== performance notes ==
* cd .../git/
 
* git clone https://github.com/openssl/openssl.git
Go here: [[disk_benchmarks]]
* cd openssl
 
* git checkout OpenSSL_1_1_1w
== Configure UEFI boot ==
* configure with prefix --prefix=/opt/openssl
 
* make, install to /opt/openssl
Some mobo can boot from NVME (PCIe) SSDs only via UEFI boot. Do this:
* cd .../openssh
 
* configure, build, does not find openssl libraries in /opt (they forgot to set RPATH for user-sepcified location of openssl)
* partition the NVME SSD using gdisk (must be GPT partition table, must have MSDOS EFI partition size 512MiB)
* LD_LIBRARY_PATH=/opt/openssl/lib, try again, now builds and installs
<pre>
* but sshd does not run, does not find libcrypto.so.1.1
[root@alpha00 ~]# gdisk -l /dev/nvme0n1
* needs ln -s .../lib/libcrypto.so.1.1 /usr/lib64, now sshd find it, everything works.
GPT fdisk (gdisk) version 0.8.6 ...
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 1A82CC87-2757-44ED-980F-C78E3681D9D3
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 500118158
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)
 
Number  Start (sector)    End (sector)  Size      Code  Name
  1           2048        1050623  512.0 MiB  EF00  EFI System
  2        1050624      500118158  238.0 GiB  8300  Linux filesystem
[root@alpha00 ~]#
</pre>
* create filesystems
<pre>
mkfs.msdos /dev/nvme0n1p1
mkfs.xfs /dev/nvme0n1p2
</pre>
* prepare EFI partition
<pre>
mkdir /mnt/efi
mount /dev/nvme0n1p1 /mnt/efi
mkdir -p /mnt/efi/efi/boot
cp /boot/vmlinuz... vmlinuz # copy the desired linux kernel
cp /boot/initramfs... initramfs.img # copy the matching initramfs file
#from /home/olchansk/sysadm/syslinux/syslinux-6.03 copy
cp .../efi64/efi/syslinux.efi .
cp .../efi64/com32/elflink/ldlinux/ldlinux.e64 .
cp syslinux.efi bootx64.efi
</pre>
* create syslinux config file: syslinux.cfg
<pre>
default linux
label linux
kernel vmlinuz
append ro root=/dev/nvme0n1p2 nomodeset initrd=initramfs.img
</pre>
* prepare system partition
<pre>
mkdir /mnt/tmp
mount /dev/nvme0n1p2 /mnt/tmp
rsync -avx / /mnt/tmp
cd /mnt/tmp
#edit etc/fstab
#edit etc/syslinux/selinux # set selinux to permissive mode because rsync did not copy the selinux labels
</pre>
* unmount and reboot
* restore selinux labels after first boot
<pre>
#login as root
cd /
restorecon -R / # can also add "-v" to see progress, but runs much slower
#edit /etc/sysconfig/selinux # enable selinux
#shutdown -r now # reboot with selinux enabled
</pre>

Latest revision as of 16:00, 9 July 2024

Notes

  • these instructions are periodically updated to include items needed for older/newer versions of Linux. They are marked like this: (SL4.2+) means Scientific Linux 4.2 and newer; (SL4 is equivalent to FC3). (FC5 only) means Fedora Core 5; etc.
  • obsolete items are marked by the "#" sign at the beginning of the line and sometimes have a comment about the reason for removal.
  • typically, we do not "upgrade" machines using the Red Hat "upgrade" function. Instead, we save critical files from the old installation and do a "fresh install" from scratch
  • starting with RHEL7, the recommended OS is CentOS7 (instead of SL7).

Disk configurations

The year is 2019 and SSDs are used exclusively, except for bulk data storage, where one used 6-8-10-12 TB HDDs

For reliability, home directories and data disks must use redundant storage - mdadm raid1 or ZFS raid1/raid6.

For non-critical machines, a single SSD seems to be reliable enough to use as a boot and OS disk. But since any storage device can fail at any time without warning, home directories and data disks should use redundant storage.

Note: for data disks bigger than 4-6TB, mdadm raid1/raid6 is no longer recommended because raid rebuild, verification and repair time has become unreasonably long. Instead, use ZFS raid1/raid6 which implements online verification, repair and disk replacement without requiring machine shutdown or OS down time.

  • single SSD - 120GB min - single partition for "/", no swap partition (create a swap file if swap is needed) - for non-critical machine with no local data storage (OS only)
  • dual SSD - 2x240GB min - all partitions mirrored (RAID1), 30GB "/", rest for /home1 - for daq station with local user home directories and no bulk data storage
  • single SSD + 2x6-8-10-12TB HDD - SSD partition: all "/", HDD partition as ZFS raid1 (mirrored) - for daq station with small local bulk data storage
  • single SSD + 6-8x6-8-10-12TB HDD - for small storage server machines - for daq station with local home directories and large bulk data storage.

For VME processors:

  • network boot - VME-CPU#Network_boot - only option for V7648/V7750, do not use for V7805 (no netboot from GigE), optional for V7865/XVB-602
  • USB boot - 8GB USB for V7805, 16GB USB for V7865/XVB-602

Preparation

  • save /etc, /var, /root, /opt, (if needed: /usr/local, /tftpboot) by rsync to some data disk (/ladd/data0/root)
  • check that "/" partition (it will be overwritten) is different from /home1 and /data partitions
  • note the MAC addresses of all network interfaces, add them to ladd00 dhcpd.conf to enable PXE boot into the SL "network installer"
  • shutdown

Running installer (CentOS7)

CentOS7 can be installed from vanilla CentOS7 installation media or from a custom USB key build per there instructions: https://daqshare.triumf.ca/~olchansk/linux/CentOS7/

The custom installer makes it easy to use a custom kickstart file (ks.cfg).

Instructions for using the usb-installer:

  • disconnect machine from network
  • plug the usb-installer into a usb3 port (blue colour)
  • reboot machine, select booting from usb (press F8 on ASUS motherboards)
  • usb-installer boot menu offers to install CentOS7, go there
  • CentOS7 should boot (many messages scroll on screen)
  • into graphical mode
  • into installer main menu
  • all installer options should "happy" except for the "installation destination"
  • go to the "installation destination" menu
    • unselect all disks except for the SSD where the OS will be installed
    • (MOST IMPORTANT: unselect the USB installer disk!)
    • select "I will configure..."
    • say "done"
    • the "manual partitionning" menu will open
      • use the "-" button to delete all existing partitions
      • select "standard partition"
      • click on the "+" button
      • in the "Add new partition" dialog, set mount point "/", capacity blank, click "add mount point"
      • check capacity (should be full size of SSD), check filesystem type (should be XFS)
      • say "done", there will be a warning about absent swap partition, say "done" again.
      • in the big useless dialog, say "accept changes"
      • should be back to the "installation summary" screen, "installation destination" should be happy now
  • after everything is happy, say "begin installation"
  • as the installation proceeds, set the password for the root user
  • after installation is complete, reboot the machine
  • unplug the usb-installer, CentOS7 should boot from SSD into the login screen
  • click on "not listed?", login as root
  • setup network connection:
    • open a terminal
    • start "nm-connection-editor"
    • click on "+" to create a new connection profile
    • select "wired ethernet"
    • select "add profile..."
    • in "Identity", set "name" to "static"
    • in "Identity", check that "Connect automatically" and "Make available..." is enabled
    • in "IPv4", set "Addresses" to "manual" instead of "dhcp"
    • enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19, search triumf.ca
    • say "Add", then close/quit the network settings
  • connect network cable
  • network should be up, ping ladd00 should work
  • run: yum update -y
  • check new kernel is installed: ls -l /boot
  • logout and restart (good luck finding these buttons in the gui!)
  • confirm correct linux kernel is selected during boot (-229.20, not the original installer kernel)
  • login as root, confirm network is up, proceed with the rest of these instructions

Configure SSH

(+CentOS7)

  • Login from the console
  • restore the SSH keys from backup (/etc/ssh/*key*)
  • service sshd restart
  • ssh into the new machine as root
  • ssh root@localhost, ctrl-C
  • ### this is done later from Konstantin's git repository - scp root@ladd00:/root/authorized_keys ~root/.ssh/
  • (not needed for SL5.5 kickstart) check that /etc/ssh/ssh_config contains "ForwardX11 yes" and "ForwardX11Trusted yes":
echo "  ForwardX11 yes" >> /etc/ssh/ssh_config
echo "  ForwardX11Trusted yes" >> /etc/ssh/ssh_config

Set hostname

Set hostname: (use full name, i.e. daq11.triumf.ca)

emacs -nw /etc/hostname

Configure email

  • TRIUMF: use relayhost = smtp.triumf.ca
  • CERN: use relayhost = cernmx.cern.ch
  • edit /etc/postfix/main.cf, set "relayhost = smtp.triumf.ca"
  • echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca bsmith@triumf.ca" >> ~root/.forward

Make log files readable

chmod a+r /var/log/messages
chmod a+r /var/log/yum.log

Activate /etc/rc.local

Activate rc.local:

chmod a+x /etc/rc.local
chmod a+x /etc/rc.d/rc.local  # TL edit
systemctl enable rc-local
systemctl start rc-local
systemctl status rc-local

Disable "persistent network names" (DO NOT DO THIS)

/bin/touch /etc/udev/rules.d/75-persistent-net-generator.rules
/bin/rm /etc/udev/rules.d/70-persistent-net.rules
#shutdown -r now

Configure NIS client (CentOS7)

yum -y install ypbind authconfig
echo "NISTIMEOUT=5" >> /etc/sysconfig/network
echo "NETWORKWAIT=yes" >> /etc/sysconfig/network
authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --nisserver ladd00.triumf.ca --update
ypwhich
ypcat -k passwd
systemctl restart autofs
  • On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
  • Use "system-config-users" to add local user accounts
  • enable selinux ssh key login to nfs mounted home directories:
setsebool -P use_nfs_home_dirs 1

Configure NIS client (CentOS8)

  • all the same as for CentOS7
  • ensure correct boot order for ypbind (in CentOS 8.1 ypbind is started before network is ready, service file uses "Wants" instead of "After")
mkdir /etc/systemd/system/ypbind.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ypbind.service.d/local.conf
systemctl daemon-reload
systemctl cat ypbind.service

Configure NIS secondary server (CentOS7)

Enable local NIS server, make local machine use it:

yum -y install ypserv
/usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines)
### ypinit will give lots of errors about "rpc.ypxfrd failed: RPC: Can't decode result"; can be ignored
systemctl disable ypxfrd yppasswdd
systemctl stop ypxfrd yppasswdd
systemctl enable rpcbind ypserv
systemctl start rpcbind ypserv
emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost"
systemctl restart ypbind
ypwhich # should say "localhost"
ypcat -k auto.master # should work

Punch hole in the firewall: (or "make" on NIS master will complain)

echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network
systemctl restart ypserv
firewall-cmd --get-services
firewall-cmd --add-service rpc-bind --permanent
firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent
firewall-cmd --reload
firewall-cmd --list-all
  • on the NIS master:
    • add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
      • TL (2020-09): we not doing this anymore? I guess it doesn't work anyway...
    • if using /var/yp/securenets, copy it from NIS master to new NIS secondary server

Enable hourly NIS update cron job (DO THIS AFTER git pull scripts, see below)

cd ~/git/scripts
git pull
cd etc
cd ~/git/scripts/etc; ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly

Configure AUTOFS (CentOS7)

yum -y install autofs
systemctl enable autofs
systemctl start autofs
ls -l /daq/daqshare


Label Selinux labels

When upgrading non-selinux machines (el6) to el7 (selinux enforcing) the existing user home directories will not have the correct selinux labels and many things will not work, including ssh logins (sshd cannot access ~user/.ssh files).

semanage fcontext -a -e /home /home1 ### selinux has special rules for /home, assign them to /home1
restorecon -R -v /home1 ### apply the new rules to files in /home1
ls -Zd /home1/alpha/.ssh
# should say: drwx------. alpha users system_u:object_r:ssh_home_t:s0  /home1/alpha/.ssh

Configure time (CentOS7)

Time server ntpd was replaced by chronyd.

yum -y install chrony
echo server time1 iburst >> /etc/chrony.conf
echo server time2 iburst >> /etc/chrony.conf
echo server time3 iburst >> /etc/chrony.conf
systemctl enable chronyd
systemctl restart chronyd
chronyc sources
chronyc tracking
  • if desired, edit /etc/chrony.conf, remove non-triumf time servers

Enable automatic system updates (CentOS7)

Disable yum-cron:

rpm --erase yum-cron
/bin/rm -v /var/lock/subsys/yum-cron
/bin/rm -v /etc/cron.daily/0yum-daily.cron
/bin/rm -v /etc/cron.hourly/0yum-hourly.cron

Enable yum-autoupdate:

yum install -y epel-release
yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm
#rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm
systemctl enable yum-autoupdate
systemctl start yum-autoupdate
systemctl status yum-autoupdate

Disable automatic system updates (CentOS7)

yum -y erase yum-autoupdate
/bin/rm -f /etc/sysconfig/yum-autoupdate.rpmsave
/bin/rm -f /var/lock/subsys/yum-autoupdate

Enable automatic system updates (CentOS8)

yum -y install dnf-automatic
systemctl enable --now dnf-automatic.timer
systemctl list-timers *dnf-*

edit /etc/dnf/automatic.conf

apply_updates = yes

Configure system services (CentOS7)

  • systemctl list-unit-files | grep enabled | sort ### (to see enabled services)
  • disable unwanted services:
systemctl disable bluetooth
systemctl disable dm-event
systemctl disable dmraid-activation
systemctl disable iscsid
systemctl disable iscsi
systemctl disable iscsiuio
systemctl disable libvirtd
systemctl disable lvm2-lmetad
systemctl disable lvm2-monitor
systemctl disable ModemManager
systemctl disable multipathd
systemctl disable netcf-transaction
systemctl disable lvm2-lvmetad.socket
systemctl disable lvm2-lvmpolld.socket
systemctl disable iscsid.socket
systemctl disable iscsiuio.socket
systemctl disable ksm
systemctl disable ksmtuned
#systemctl disable 

Erase unwanted packages (CentOS7)

  • PackageKit # bugs users about security updates, hogs yum lock
  • perl-homedir # creates unwanted $HOME/perl5
  • ModemManager # thinks that all USB-attached devices are modems
  • pcp # sends error email to itself, does not work
  • abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted
  • rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken
  • bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful.
yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion

Disable unwanted package "tracker"

The "tracker" package is part of the GNOME desktop, it scans the content of all files into a database for quick searching.

When it malfunctions, bad things happen, i.e. read through https://bugzilla.redhat.com/show_bug.cgi?id=747689

Specific problem I see is that it floods the system log with error messages. Also consumes network and filesystem bandwidth for NFS mounted home directories.

This package cannot be removed by "yum erase tracker" dues to dependencies from core GNOME desktop.

Instead, do this to deactivate it:

chmod -x /usr/libexec/tracker-*
chmod -x /usr/bin/tracker
chattr +i /usr/bin/tracker
chattr +i /usr/libexec/tracker-*

Configure external package repositories (CentOS7)

EPEL: (addtional packages)

yum install epel-release

ELREPO: (kernel modules and drivers) (CentOS8)

yum install elrepo-release

ELREPO: (kernel drivers)

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
yum -y install yum-plugin-fastestmirror

Install packages needed to continue with installation

(+CentOS7)

(these packages are sometimes missing, they are needed to follow following instructions instructions)

(SL6.5: libotf is a dependancy of emacs - SL6.5 installer fails to install it)

yum install ed patch wget git libotf gdisk emacs perl

Configure Konstantin's scripts

(+Centos7)

mkdir ~root/git
cd ~root/git
git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git
cd scripts
git pull

Go back to the NIS slave server and install the hourly NIS update cron job.

Enable yum version lock

yum install yum-plugin-versionlock
#yum versionlock packagename # yum versionlock rpcbind
#yum versionlock list # list locked packages
#yum versionlock delete packagename # unlock given package
#yum versionlock clear # delete all locks

Configure trusted ssh keys

(+CentOS7)

ssh localhost
interrupt by Ctrl-C
/bin/cp ~/git/scripts/etc/authorized_keys ~/.ssh/

Configure hardware sensors

  • yum -y install lm_sensors
  • sensors-detect (accept default answer to all questions - press ENTER)
  • systemctl restart lm_sensors
  • sensors (to see available sensors)

If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page.

Configure IPMI sensors

Some machines support the IPMI interface for monitoring the hardware: fan speeds, temperatures, voltages.

  • find out if IPMI is supported. Try this:
dmidecode | grep -i ipmi

if output is not blank, IPMI is maybe supported.

  • install and enable IPMI software:
yum install "OpenIPMI*" ipmitool
service ipmi start
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
chkconfig ipmi on
chkconfig ipmievd on
service ipmi restart
service ipmievd restart
tail -100 /var/log/messages ### look at messages logged by ipmievd
  • (CentOS7) install and enable IPMI software:
yum install "OpenIPMI*" ipmitool
systemctl start ipmi
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further.
systemctl list-unit-files | grep -i ipmi
systemctl enable ipmi
systemctl restart ipmi
systemctl status ipmi
systemctl enable ipmievd
systemctl restart ipmievd
systemctl status ipmievd
tail -100 /var/log/messages ### look at messages logged by ipmievd
  • if ipmievd complains about SEL buffer overflow, clear it manually:
ipmitool sel list ### show ipmi messages in raw format
ipmitool sel elist ### show ipmi messages in useful format
ipmitool sel elist > file ### save ipmi messages into a file
ipmitool sel clear  ### clear all accumulated ipmi messages
  • useful ipmi commands:
    • ipmitool sensor -- read hardware sensors
    • ipmitool sel elist -- report all accumulated messages

Configure ECC memory

  • check that machine has ECC memory: dmidecode --type memory | grep -i ecc

Configure mcelog (machine check exception)

  • yum install mcelog
  • check that mcelog is running: ps -efw | grep mcelog
  • (el6) chkconfig mcelogd on; service mcelogd restart
  • (el7) systemctl status mcelog.service; systemctl enable mcelog.service; systemctl restart mcelog.service

Check for MCE (machine check exception) messages:

  • mcelog --client
  • grep -i mce /var/log/messages*
  • grep -i ecc /var/log/messages*

Configure EDAC

yum install edac-utils
edac-ctl --mainboard
edac-ctl --status
lsmod | grep edac
modprobe ie31200_edac ### driver for Intel E3-1200 series ECC memory

[root@grsmid00 ~]# ls -l /sys/devices/system/edac/mc/
... empty

[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/
drwxr-xr-x. 15 root root    0 Oct 25 16:40 mc0
...
[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_count
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_noinfo_count
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow0
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow1
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow2
drwxr-xr-x. 3 root root    0 Oct 25 16:40 csrow3
-r--r--r--. 1 root root 4096 Oct 25 16:40 max_location
-r--r--r--. 1 root root 4096 Oct 25 16:40 mc_name
drwxr-xr-x. 2 root root    0 Oct 25 16:40 power
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank0
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank1
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank2
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank3
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank4
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank5
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank6
drwxr-xr-x. 3 root root    0 Oct 25 16:40 rank7
--w-------. 1 root root 4096 Oct 25 16:40 reset_counters
-r--r--r--. 1 root root 4096 Oct 25 16:40 seconds_since_reset
-r--r--r--. 1 root root 4096 Oct 25 16:40 size_mb
lrwxrwxrwx. 1 root root    0 Oct  2 12:02 subsystem -> ../../../../../bus/mc0
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_count
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_noinfo_count
-rw-r--r--. 1 root root 4096 Oct 25 16:40 uevent
[root@alpha00 ~]# 

[root@alpha00 ~]# edac-ctl --status
edac-ctl: drivers are loaded.

[root@alpha00 ~]# edac-util 
edac-util: No errors to report.

[root@alpha00 ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected

Configure SMARTD (CentOS7)

Default el7 smartd config files send deficient email notices about disk failures. Overwrite.

/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/
/bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/
systemctl enable smartd
systemctl restart smartd
systemctl status smartd

Enable User Disk Quotas (OPTIONAL)

(+CentOS7)

[root@isdaq00 home1]# grep quota /etc/fstab
UUID=5a2aefbd-45db-475e-841e-12ec89220fbd /home1 ext4 defaults,grpquota,usrquota 1 2
  • cd /; umount /home1; mount /home1
  • quotacheck -cug /home1
  • quotacheck -avug
  • quotaon -av
  • quota system is now active
  • increase the soft quota time limit from default 7days to 30 or 60 days: edquota -t
  • set quotas for all users (see below)
  • setup warnquota:
    • create warnquota config file: emacs -nw /etc/warnquota.conf
# values can be quoted:
MAIL_CMD        = "/usr/sbin/sendmail -t"
FROM            = root
SUBJECT         = User %i@%h exceeded allocated disk quota
CC_TO           = "root"
# If you set this variable CC will be used only when user has less than
# specified grace time left (examples of possible times: 5 seconds, 1 minute,
# 12 hours, 5 days)
# CC_BEFORE = 2 days
SUPPORT         = "root"
# Text in the beginning of the mail (if not specified, default text is used)
# This way text can be split to more lines
# Line breaks are done by '|' character
# The expressions %i, %h, %d, and %% are substituted for user/group name,
# host name, domain name, and '%' respectively. For backward compatibility
# %s behaves as %i but is deprecated.
MESSAGE         = User "%i" on "%h" has exceeded the allocated disk quota.||Please delete any unnecessary files on following filesystems or|contact the system administrato
r to increase your quota allocation:|
SIGNATURE       = --|automated email from warnquota
    • note that %i@%h in the SUBJECT line do not seem to work
    • create cron job: emacs -nw /etc/cron.daily/warnquota
#!/bin/sh
warnquota
#end
    • chmod a+x /etc/cron.daily/warnquota
    • touch /etc/crontab

Useful commands for managing quotas:

  • repquota -a | sort -n -k3 ### show quota of all users sorted by disk usage
  • edquota -u username ### open "vi" editor to change user quotas
  • repquote -a | grep username ### report quota for given user
  • setquota -u username 0 0 0 0 /home1 ### disable quotas for given user
  • setquota -u username 50000000 100000000 0 0 /home1 ### set quotas for 50GB soft and 100GB hard
  • edquota -t ### change user quota time limits
  • edquote -tg ### change group quota time limits

Enable NFS V4 server (CentOS7)

  • create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...)
/home1  @home_export(rw,no_root_squash,async,fsid=1)
/data1  @data_export(rw,no_root_squash,async,fsid=2)
  • check the netgroup file
    • if using NIS: check NIS netgroup: ypcat -k netgroup
    • if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
    • if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
  • enable things, start them:
firewall-cmd --get-services
firewall-cmd --permanent --add-service=nfs
firewall-cmd --permanent --add-service=rpc-bind ### needed for ubuntu automounter
firewall-cmd --reload
firewall-cmd --list-all
systemctl enable nfs-server
systemctl start nfs-server
systemctl status nfs

Enable NFS V3 server (CentOS7)

ps -efw | grep rpc.mountd # should be running!
firewall-cmd --get-services
firewall-cmd --permanent --add-service=mountd
firewall-cmd --permanent --add-service=rpc-bind
firewall-cmd --reload
firewall-cmd --list-all

Enable NFS V3 server

  • edit /etc/hosts.allow, add or uncomment "mountd: 142.90.0.0/255.255.0.0"
  • create /etc/exports. example:
/home1  @home_export(rw,no_root_squash,async)
/data1  @data_export(rw,no_root_squash,async)
  • check the netgroup file
    • if using NIS: check NIS netgroup: ypcat -k netgroup
    • if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
    • if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
  • chkconfig nfs on
  • chkconfig nfslock on
  • service nfs restart

Then on ladd00 need to do

  • ssh to root@ladd00
  • edit /etc/auto.daq to add new machine...
  • make -C /var/yp

Enable NFS V4 SERVER (SL6)

  • if used with NIS, same as NFSv3
  • if used as standalone, need to edit idmapd.conf - set the "Domain" name to the same value on NFS server and NFS slave (default automagically determined value does not always work). More TBW.

Enable AMANDA backups

AMANDA backups are already enabled by TRIUMF kickstart installs. For non-kickstart installation, follow instructions at [http://amanda/~amanda], or look at "/triumfcs/trshare/olchansk/linux/amanda/amanda-enable.perl". As final step, use [https://helpdesk.triumf.ca] to contact TRIUMF CS to add this new machine to the amanda backup list.

  • yum install triumf-amanda

Enable AMANDA backups (CentOS7)

yum install amanda-client
systemctl list-unit-files | grep -i amanda
#systemctl enable amanda
systemctl enable amanda.socket
systemctl enable amanda-udp.socket
systemctl restart amanda.socket
systemctl restart amanda-udp.socket
firewall-cmd --get-services
firewall-cmd --permanent --add-service=amanda-client
firewall-cmd --reload
firewall-cmd --list-all
echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts

On amanda server, add new machine to the disklist, then:

amcheck -c daily titan00

Enable DCACHE

DAQ dcache server is mounted as

/daq/pnfs/triumf.ca/data/

For Centos-7 machines, you need to adjust the firewall rules in order to be able to communicate with the trdata machines; this is only necessary if you are copying data to trdata. The firewall changes are

firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.212/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.107.156/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.219/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all

This instructions are unnecessary

  • # mkdir -p /pnfs
  • # edit /etc/rc.local, add to the end of file: "mount -o intr,rw,noac,hard,nfsvers=3 trdata00:/pnfs /pnfs &"
  • # . /etc/rc.local

For more information on, see TrdataDcache dcache page.

Configure Ganglia (Centos7)

CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2)

/bin/rm /etc/gmond.conf
yum -y install "ganglia-gmond*"
/bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf   # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog
/bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data
/bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf
yum erase -y ganglia-vmstat ganglia-sensors ganglia-top ganglia-smart ganglia-cpumhz
cd ~/git/scripts
git pull
/bin/cp etc/gmond.conf /etc/ganglia/gmond.conf
systemctl enable gmond
systemctl restart gmond
systemctl status gmond
cd ganglia
./ganglia-all.perl
make install
cd ~

Configure Ganglia (Centos8)

CentOS8 Ganglia instructions (EPEL8 ganglia-3.7.2)

/bin/rm /etc/gmond.conf
yum -y install "ganglia-gmond*"
/bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf
systemctl enable gmond
systemctl restart gmond
systemctl status gmond
cd ~/git/scripts/ganglia
git pull
./ganglia-all.perl
make install

Configure TRIUMF DAQ packages

(+CentOS7)

cd /etc/yum.repos.d
wget http://daq.triumf.ca/~daqweb/yum/triumf-daq.repo

Install Konstantin's packages

(+CentOS7)

yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs

Install memtest and PXE boot

!!!DO NOT DO THIS!!!

cd /boot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.bin.gz
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.10
wget http://ladd00.triumf.ca/tftpboot/gpxe-1.0.1+-gpxe.lkrn

emacs -nw /boot/grub/grub.conf
title memtest86+-5.01
      root (hd0,0)
      kernel /boot/memtest86+-5.01.bin.gz
title memtest86+-4.20
      root (hd0,0)
      kernel /boot/memtest86+-4.20.bin.gz
title memtest86+-4.10
      root (hd0,0)
      kernel /boot/memtest86+-4.10
title pxeboot
      root (hd0,0)
      kernel /boot/gpxe-1.0.1+-gpxe.lkrn

Install node monitoring

!!! OBSOLETE, DO NOT DO THIS !!!

(+CentOS7)

yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install triumf_nodeinfo
/usr/sbin/sendnodeinfo.perl --config ladd00.triumf.ca:8600
emacs -nw /etc/nodeinfo
/usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600

Install gonodeinfo node monitoring

(+Ubuntu, +CentOS7, +CentOS8)

go to https://bitbucket.org/dd1/gonodeinfo follow instructions:

yum -y install golang
mkdir ~/git
cd ~/git
git clone https://bitbucket.org/dd1/gonodeinfo.git
# or git clone https://daq.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
  • emacs -nw /etc/gonodeinfo.conf
  • change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
  • change "Servers" to read: Servers: daq00.triumf.ca:8601
  • run gonodeinfo -e
  • if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
  • on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
  • try gonodeinfo again, there should be no error
  • on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now

Install latest system updates

(+CentOS7)

yum update -y

Configure TRIUMF Printers (CentOS7)

systemctl stop cups
systemctl disable cups
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a

Disable syslog spam (CentOS7)

Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this:

echo auditctl -e 0 >> /etc/rc.local
echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local
/etc/rc.local

Install basic system packages (CentOS7)

(if starting from minimal system, basic system packages required:)

yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils sysstat iftop tcsh
yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools

Install packages needed for QUARTUS, ROOT, EPICS and MIDAS DAQ

(+CentOS7)

yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" "libusbx-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*-g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy sympy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" --exclude golang"*"git"*" mesa"*" xerces-c"*" diffuse clang i2c-tools texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras perl-GD"*" perl-Math"*" perl-Statistics-Basic cmake3 cmake3-gui extra-cmake-modules python2-pip mariadb-devel glibc-devel.i686 libzstd zlib-devel.i686

Install optional packages

!! DO NOT DO THIS !!

(do not install boost on 32-bit machines)

yum install --skip-broken "boost-*"

(packages for 32-bit software compilation on 64-bit machines. this is optional)

yum install --skip-broken giflib.i386 giflib.i686 compat-libf2c-34.i386 compat-libf2c-34.i686 mysql-devel.i686 openssl-devel.i686 unixODBC-devel.i686 libstdc++-devel.i386 libstdc++-devel.i686 "zlib-*.i686" "libXext-*.i686" "libXtst-*.i686" glibc-static.i686 freetype.i686 fontconfig.i686 libpng.i686 libXrender.i686 glibc-devel.i686 libX11-devel.i686 libXpm-devel.i686 libXft-devel.i686 mysql-devel.i686 dcap-devel.i686 gsl-devel.i686 pcre-devel.i686 fontconfig-devel.i686 freetype-devel.i686 libpng-devel.i686 libjpeg-devel.i686 libgfortran.i686 libxml2-devel.i686 gd-devel.i686 readline-devel.i686 ncurses-devel.i686 libXdmcp.i686 readline-static.i686 compat-readline5.i686

yum install boost-devel.i686

(separately install these packages - they collide with the big bunch above)

yum install rdesktop

yum reinstall urw-fonts

Install libraries for PHYSICA (CentOS7)

To run physica built on el6 from git sources on el7, do this:

(building physica on el7 is nort supported at this time)

(see more http://www.triumf.info/wiki/DAQwiki/index.php/PHYSICA)

yum -y install libX11.i686 gd.i686 libpng12.i686 readline.i686 compat-libf2c-34.i686

Install additional desktop environements (CentOS7)

# LXQT (from EPEL)
# NOT COMPATIBLE WITH el7.7 # yum -y install "lxqt*"
# Cinnamon desktop (from EPEL)
yum -y install cinnamon
# KDE5 not available yet
# MATE (from epel)
yum -y groupinstall "MATE Desktop"
yum -y install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils
yum -y erase ModemManager abrt abrt-libs abrt-gui-libs
# XFCE4 (from EPEL)
yum -y groupinstall xfce
yum -y install "xfce*plugin" xfce4-about --exclude xfce4-hamster-plugin
yum -y erase bash-completion
  • make the MATE desktop as default
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
  • lighdm login manager (from EPEL)
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5
  • and switch from gdm to lighdm
systemctl disable gdm.service
systemctl enable lightdm.service
(systemctl stop gdm; systemctl restart lightdm) &

Install SMART scripts

(+CentOS7)

ln -sf ~/git/scripts/smart-status/smart-status.perl ~/

Install NTFS drivers

yum install ntfs-3g ntfsprogs (from EPEL)

Install HFS and HFS+ drivers (CentOS7)

yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus

Install Google Chrome web browser (64-bit CentOS7)

DOES NOT WORK AS OF google-chrome-stable-114 because google uses signature incompatible with CentOS-7, see https://www.reddit.com/r/chrome/comments/13s799o/googlechromebeta_1140573545_rpm_invalid_signature/

automatic updates will fail with signature check error, to defeat it lock old version of google-chrome:

yum versionlock google-chrome-stable

THIS DOES NOT WORK ANYMORE:

/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/
yum install google-chrome-stable

Enable monitoring of HTTPS certificates

On SL6, CentOS7:

yum install crypto-utils
/etc/cron.daily/certwatch
strace -f /etc/cron.daily/certwatch  |& grep open  | grep crt

Enable 100dpi fonts for EPICS

(+CentOS7)

ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/

Enable crontab @reboot for MIDAS (CentOS7)

el7 has a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).

mkdir /etc/systemd/system/crond.service.d
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/crond.service.d/local.conf
systemctl daemon-reload
systemctl cat crond.service

el7 has a second bug, sometimes it thinks the network is running when it is not, specifically, DNS is not working and autofs mount of user home directory fails. So not only cron has to wait for ypbind and autofs to be ready, we also have to wait for DNS to be ready:

cd ~/git/scripts
git pull
cp etc/wait-for-dns.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable wait-for-dns
systemctl restart wait-for-dns # should return immediately. if there is a 30 second time, script is broken, disable it
systemctl status wait-for-dns # to see what went wrong.

Explore the systemd dependacy tree using "systemctl list-dependencies" maybe with "--all".

Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.

Enable firewall for MIDAS (CentOS7)

Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports).

To enable access to mhttpd:

firewall-cmd --add-port=8443/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all

To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host)

firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all

To enable access from the private network (replace "192.168.1.0" with your private network number):

firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="0-65535" accept"
firewall-cmd --reload
firewall-cmd --list-all

Enable firewall for EPICS (CentOS7)

To enable access to TRIUMF EPICS servers, do this:

firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.132.0/23" accept"
firewall-cmd --reload
firewall-cmd --list-all

For UCN the controls people seem to have EPICS setup on a different server; this might be true for CMMS as well. In this case the firewall rule change should be

firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.139.0/23" accept"
firewall-cmd --reload
firewall-cmd --list-all

Disable gdm and X11 (OPTIONAL)

initctl stop prefdm
echo "start on never" > /etc/init/prefdm.override
echo "start on never" > /etc/init/splash-manager.override
initctl reload-configuration

then enable login on default console:

echo "plymouth quit" >> /etc/rc.local
echo "X_TTY=xxx/dev/tty1" >> /etc/sysconfig/init

Install JAVAWS (OPTIONAL)

  • to run Java "web start" jnlp files (EVO, SEEVOGH, etc): javaws Downloads/spider.jnlp
  • install javaws:
  • yum install icedtea-web icedtea-web-javadoc

Install firefox java plugin (OPTIONAL, DO NOT DO THIS)

This installs the Oracle Java plugin:

  • rpm -vh --install ~deap/jdk-7u15-linux-x64.rpm
  • ls -l /usr/lib64/mozilla/plugins/
  • ln -s /usr/java/jdk1.7.0_15/jre/lib/amd64/libnpjp2.so /usr/lib64/mozilla/plugins/
  • start firefox, go edit->preferences->general->manage add-ons->plugins
  • "java plugin 1.7.0_15" should be listed


Configure USB device permissions

(+CentOS7)

Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.

  • create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" 
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
  • reload udev rules: udevadm control --reload-rules
  • apply new permissions: udevadm trigger --action=add
  • watch udev activity: udevadm monitor -p

Disable modem-manager

The modem-manager will try to talk to any serial devices attached to USB serial ports. It assumes that those devices are modems and will send out modem-specific commands. if the devices are not modems and do not understand or do not like modem commands, well that's too bad. modem-manager is installed by the ModemManager package required by the NetworkManager package, and there is no configuration setting to turn modem-manager off.

One way to disable it is: chmod a= /usr/sbin/modem-manager

Another way to disable it is by forced uninstall: rpm --erase --nodeps ModemManager

Remember to kill the running copy: killall -KILL modem-manager

Caveat: it is not clear if modem-manager would not be resurrected by an update to the NetworkManager or ModemManager packages.

Configure Altera jtagd

(if needed)

mkdir /etc/jtagd
echo 'Password = "123";' > /etc/jtagd/jtagd.conf
cp -pv  /daq/daqshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts
  • start local jtagd: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagd
  • test local connection: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagconfig
  • test remote connection (add this machine to your .jtag.conf, run jtagconfig

For more information, go to Quartus

Install EOS

Instructions from here: http://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html

rpm -vh --install https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/citrine/tag/el-7/x86_64/eos-repo-el7-generic-1.noarch.rpm
yum-config-manager --disable eos-citrine # disable auto-update because all packages are not signed
yum-config-manager --disable eos-dep # disable auto-update because all packages are not signed.
yum install eos-client eos-fuse --enablerepo=eos-citrine

Install fix for the el7 systemd dbus boot hang

Around early Summer 2018 el7 started showing a boot problem. In the nutshell, there is a problem with the dbus connection between dbus and systemd that prevents polkit, firewalld, etc from starting. The system eventually boots enough that one can ssh into it, but most things do not work. Notably, polkit is not running, firewalld is not running, ssh login takes about 15-30 second.

Solution is to add a special systemd service to check that dbus started correctly. It that runs after dbus is started, but before it is used, and it restarts dbus in a loop with a delay until dbus starts correctly. In testing, dbus always starts correctly after the first retry.

cd ~root/git/scripts/etc
git pull
/bin/cp -vf systemd-check-dbus.perl /usr/bin/
/bin/cp -vf systemd-check-dbus.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable systemd-check-dbus
systemctl start systemd-check-dbus
systemctl status systemd-check-dbus

After linux boots, if everything was okey, the script will report this:

[root@iris01 ~]# systemctl status systemd-check-dbus
...
Feb 08 17:15:49 iris01.triumf.ca systemd[1]: Starting Check that systemd is registered with dbus...
Feb 08 17:15:49 iris01.triumf.ca sh[4283]: Starting check for systemd dbus connection
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:       string "org.freedesktop.DBus"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List:       string "org.freedesktop.systemd1"
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: systemd1 dbus service exists, success!
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: Finished check for systemd dbus connection
Feb 08 17:15:50 iris01.triumf.ca systemd[1]: Started Check that systemd is registered with dbus.

If the boot problem happened, the script will report about restarting dbus.

Note: the systemd service file adjusts the start order of other services, this adjustment seems to reduce the probability of the problem.

Configure GRUB boot loader (CentOS7, CentOS8)

  • emacs -nw /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX
  • grub2-mkconfig -o /boot/grub2/grub.cfg
  • grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
  • grub2-editenv list # show contents of boot environement file
  • /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file

Install memtest86+ (CentOS7, CentOS8)

yum -y install memtest86+
/bin/cp -vf /usr/share/memtest86+/20_memtest86+ /etc/grub.d/
/bin/chmod a+x /etc/grub.d/20_memtest86+ 
grub2-mkconfig -o /boot/grub2/grub.cfg

Disable ELREPO

sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo_triumf.repo
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo

Reduce install size (optional)

This is optional. Only do this if reducing the size of the OS image is very important.

Do this for VME processors.

yum erase "texlive*" "java*" "boost*" libreoffice"*"
#yum erase "xemacs*"
yum erase "libstdc++-docs"
yum erase firefox google-chrome"*"
yum clean all
/bin/rm -rf /usr/share/help
/bin/rm -rf /usr/share/doc

Update from el7.6 to el7.7

yum-config-manager --disable zfs
yum-config-manager --disable zfs-kmod
yum-config-manager --disable zfs-testing-kmod
yum versionlock delete zfs
yum versionlock delete kernel
yum -y update "yum*" "rpm*"
yum -y erase libqtxdg lxqt-qtplugin ### LXQT is not compatible
yum update
after rebooting into el7.7, follow instructions for updating ZFS from version 0.7 to 0.8.

Update ZFS

Switch from LADD-NIS to DAQ-NIS

domainname DAQ-NIS
/usr/lib64/yp/ypinit -s daq00
ls -l /var/yp
sed -i s/LADD-NIS/DAQ-NIS/ /etc/yp.conf
sed -i s/LADD-NIS/DAQ-NIS/ /etc/sysconfig/network
systemctl restart ypserv
systemctl restart ypbind
ypwhich
ypwhich -m

Finish installation

reboot

Special hardware settings

ASUS Crosshair mobo

  • use BIOS version 1207 or newer
  • (before CentOS7) sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
  • CentOS7: installs correct drivers automatically

ASUS Crosshair-II mobo

  • use BIOS version 2607 or newer
  • for the onboard IDE to work, add "all-generic-ide" to kernel boot options in grub.conf
  • sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors

ASUS P7P55D EVO mobo

  • use BIOS version 2004 or newer
  • SL6 - install special driver for on board PCIe GigE network port and disable on board PCI GigE network port:
    • yum --enablerepo elrepo install kmod-r8168 kmod-r8169
    • # do not do this: sed 's/^blacklist/#blacklist/' -i /etc/modprobe.d/blacklist-r8169.conf
    • reboot
    • verify that correct drivers are loaded: ethtool -i eth0; ethtool -i eth1
    • note: there will be no eth1 - r8169 driver is disabled.

ASUS P6X58-E-WS mobo

  • BIOS settings
    • F1 or DEL to enter BIOS setup, F8 boot menu
    • go to POWER->HW mon, confirm CPU temperature is around 30C. (heatsink is installed correctly. Bad heatsink temperature quickly goes up to 50-70C).
    • Main menu: Storage config - SATA change IDE->AHCI
    • System information: confirm BIOS version 301, CPU type, memory size
    • AI Tweak: set DRAM frequency - AUTO->DDR3-1333
    • Advanced->Onboard devices: LAN BOOT: enabled
    • Power->HW monitor: CPU Q-FAN: enabled
    • Boot->Settings: Quick boot: enabled; Full screen logo: disabled; Wait for F1: disabled
    • Save and exit

ASUS E35M1-M PRO mobo

  • http://www.asus.com/Motherboards/E35M1M_PRO/#specifications
  • use BIOS version 1002 or newer
  • for CPU temperature: install kmod-k10temp from ELREPO (kmod-k10temp-0.0-4.el6.elrepo.x86_64.rpm)
  • for Sensors: yum --enablerepo elrepo install kmod-w83627ehf; modprobe w83627ehf; sensors
  • for Graphics: yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
  • to enable booting from USB3, edit /etc/dracut.conf, change line "add_drivers" to read: add_drivers+="xhci-hcd"
  • to use multiple monitors, run "aticonfig --initial --heads=2 --adapter=1 --xinerama=on", to change screen layout, edit /etc/X11/xorg.conf. Only dual monitors DVI+HDMI seem to work. Tripple monitors does not seem to work.

Sensors instructions below are obolete (use driver from ELREPO)

cd ~root
wget http://ladd00.triumf.ca/~olchansk/linux/groeck-w83627ehf-dd3e543/w83627ehf.ko
echo "modprobe hwmon; modprobe hwmon-vid; modprobe k10temp; rmmod w83627ehf; insmod /root/w83627ehf.ko" >> /etc/rc.local

ASUS E45M1-M PRO mobo

ASUS P9X79 WS

  • http://www.asus.com/Motherboard/P9X79_WS/
  • use BIOS version 4901. Older versions seem to be ok: 3101, 3401, 4701, 4802 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS.
  • (not needed for CentOS7) for CPU temperature, install coretemp
  • (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above.
  • BIOS Settings:
    • enter "Advanced mode"
    • Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default
    • ### NOT THIS: Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings
    • Monitor -> disable Q-fan on for all fans - let all fans always run at maximum RPMs
    • Boot -> Full screen logo -> Set to "disabled"
    • Wait for F1 -> Set to "disabled"

ASUS P8B-M

  • use BIOS version 6103 or newer
  • for CPU temperature, install coretemp
  • for sensors, install driver for NCT6776F chip same as E35M1-M above.

SUPERMICRO X9SCL

  • yum install kmod-w83627ehf.x86_64 coretemp
  • xemacs -nw /etc/rc.local, add:
modprobe coretemp
modprobe w83627ehf

ASUS Z87-WS

cd ~root
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko
echo modprobe hwmon-vid >> /etc/rc.local
echo insmod /root/nct6775.ko >> /etc/rc.local
/etc/rc.local
sensors

ASUS Z97-WS

the nct6775 driver does not work because of conflict with ACPI.

ASUS Z170-DELUXE

  • use bios 3801
  • set XMP mode (DDR4-2400)
  • Advanced->On board devices: set sata mode to "M2", set PCIe slot 3 to "x4"
  • boot: disable f1, disable logo, disable numlock

ASUS AM1M-A

  • use BIOS 602 or later
  • SL6.5 installer cannot use USB2 ports and the network. Use USB3 ports (blue colour) to boot USB installer (memtest, rescue, etc)
  • SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey)
  • install ATI/AMD video drivers from ELREPO (see below)
  • sensors chip is ITE IT8623E, for SL6, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures):
cd ~root
wget http://ladd00.triumf.ca/~olchansk/linux/it87.ko
echo modprobe hwmon_vid >> /etc/rc.local
echo insmod /root/it87.ko >> /etc/rc.local
. /etc/rc.local
  • for el7 use it87.ko driver:
cd ~root
wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/it87.ko
echo modprobe hwmon_vid >> /etc/rc.local
echo insmod /root/it87.ko >> /etc/rc.local
. /etc/rc.local
  • sensors output:
[root@midemma02 ~]# sensors
radeon-pci-0008
Adapter: PCI adapter
temp1:        +22.0°C  (crit = +120.0°C, hyst = +90.0°C)

fam15h_power-pci-00c4
Adapter: PCI adapter
power1:           N/A  (crit =  25.00 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +22.2°C  (high = +70.0°C)
                       (crit = +70.0°C, hyst = +69.0°C)

it8603-isa-0290
Adapter: ISA adapter
in0:          +0.96 V  (min =  +2.50 V, max =  +2.95 V)  ALARM
in1:          +2.23 V  (min =  +0.94 V, max =  +1.22 V)  ALARM
in2:          +2.03 V  (min =  +0.74 V, max =  +0.77 V)  ALARM
in3:          +2.00 V  (min =  +1.26 V, max =  +0.13 V)  ALARM
in4:          +2.23 V  (min =  +2.95 V, max =  +2.15 V)  ALARM
3VSB:         +3.36 V  (min =  +6.00 V, max =  +2.50 V)  ALARM
Vbat:         +3.22 V  
+3.3V:        +3.36 V  
fan1:         611 RPM  (min =  200 RPM)
fan2:         707 RPM  (min =  600 RPM)  ALARM
temp1:        +38.0°C  (low  = +122.0°C, high = +122.0°C)  sensor = thermistor
temp2:        +22.0°C  (low  = +119.0°C, high = -35.0°C)  ALARM  sensor = thermistor
temp3:       -128.0°C  (low  = +16.0°C, high = +93.0°C)  sensor = thermistor
intrusion0:  ALARM

[root@midemma02 ~]# 
  • AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together)

Intel SE7230NH1

  • front panel header connector pinout is like this:
PWR LED | 1  2|
        | 3  4|
PWR LED | 5  6|
HDD LED | 7  8|
HDD LED | 9 10|
PWR SW  |11 12| NIC1 LED
PWR SW  |13 14| NIC1 LED
RST SW  |15 16|
RST SW  |17 18|
        |19 20|
NMI SW  |21 22| NIC2 LED
NMI SW  |23 24| NIC2 LED
...     |...  |
        |33 34|

ASUS H110M-A/M.2

  • use BIOS 2003 or later
  • dmidecode | grep -i nct reports: Nuvoton NCT5539D
  • sensors chip is "NCT6793D or compatible chip", for el7, use this driver:
cd ~root
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko
echo modprobe hwmon-vid >> /etc/rc.local
echo insmod /root/nct6775.ko >> /etc/rc.local
/etc/rc.local
sensors
  • sensors output:
[root@daq03 ~]# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)

nct6793-isa-0290
Adapter: ISA adapter
in0:                       +0.34 V  (min =  +0.00 V, max =  +1.74 V)
in1:                       +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                       +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                       +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                       +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                       +0.15 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                       +0.97 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                       +3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                       +3.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                       +1.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      +0.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      +0.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                      +0.12 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                      +0.13 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1041 RPM  (min =    0 RPM)
fan2:                     1020 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM
SYSTIN:                   +119.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +26.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +27.5°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +111.0°C    sensor = thermistor
PECI Agent 0:              +28.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +25.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +28.0°C  (high = +80.0°C, crit = +100.0°C)

[root@daq03 ~]# 

Supermicro X11SSH-F

[root@alpha00 ~]# more /etc/modprobe.d/blacklist.conf
blacklist mei
blacklist mei_me
[root@alpha00 ~]# 
  • mobo requires M.2 PCIe SSD (M.2 SATA SSD would not work. SATA SATA SSD ok)
  • boot from M.2 PCIe SSD requires UEFI boot (from an MSDOS partition on the SSD)

ASUS TUF Z390M-PRO GAMING (WI-FI)

  • BIOS 2417 is okey, upgrade to this if older
  • do not set XMP memory mode
  • in the BIOS, enable the boot compatibility support module mode: BIOS (press DEL) -> Advanced mode -> BOOT -> CSM Module -> Enable CSM "yes".
  • for SL6, install e1000e driver from ELREPO:
yum install --enablerepo=elrepo kmod-e1000e
  • sensors chip appears to be "Nuvoton NCT6798D" not clear what driver to use
  • dmidecode | grep -i nct reports: Nuvoton NCT6798D
  • kmod-nct6775-0.0-5.el7_7.elrepo.x86_64.rpm from ELrepo finds the chip but bombs because of conflict with ACPI

ASUS PRIME X399-A

Configure X11 graphics

Special settings for DAQ

  • add the following at the end of /etc/X11/xorg.conf. The enables Ctrl-Alt-KP-/ and Ctrl-Alt-KP-* to unlock the keyboard after Altera Quartus crash:
Section "ServerFlags"
        Option "AllowDeactivateGrabs" "true"
        Option "AllowClosedownGrabs" "true"
EndSection

Install NVIDIA drivers

  • yum --enablerepo=elrepo install nvidia-detect
  • run: nvidia-detect
  • as instructed by nvidia-detect, install correct driver:
    • yum --enablerepo=elrepo install kmod-nvidia
    • yum --enablerepo=elrepo install kmod-nvidia-304xx
    • yum --enablerepo=elrepo install kmod-nvidia-173xx
  • (before SL6.x: if it fails due to conflict with module-init-tools, run "yum --disablerepo \* --enablerepo elrepo update module-init-tools")
  • yum erase xorg-x11-glamor ### see http://elrepo.org/tiki/kmod-nvidia (search for glamor)
  • mv /etc/X11/xorg.conf /etc/X11/xorg.conf-xxx
  • nvidia-xconfig
  • (SL6) reboot
  • (SL5) /dev/MAKEDEV nvidia
  • (SL5) restart the X11 server (Ctrl-Alt-Backspace or "killall Xorg gdm-binary")
  • observe that X11 server restarts using the NVIDIA driver (big NVIDIA logo on startup)
  • if needed, login as root and run "nvidia-settings" to setup dual-screen configuration, etc

Install legacy NVIDIA drivers

For old NVIDIA cards:

  • GeForce FX 5500
wget http://us.download.nvidia.com/XFree86/Linux-x86/173.14.31/NVIDIA-Linux-x86-173.14.31-pkg1.run
sh ./NVIDIA-Linux-x86-173.14.31-pkg1.run
  • GeForce 6200 - NVIDIA Corporation NV44A [GeForce 6200]
yum install nvidia-x11-drv-304xx-304.121 --enablerepo=elrepo
nvidia-xconfig
rmmod nvidia
killall gdm-binary
login as root
nvidia-settings to setup multiple displays

Install ATI/AMD drivers

  • yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
  • check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx"
  • run "aticonfig --initial" to create xorg.conf if existing one is not good
  • run "amdcccle" as root to configure dual-screens, etc
 Note: 'amdcccle' is a GUI, so you must run this command from within a running X session
  • killall Xorg

Install ATI/AMD drivers (CentOS7)

NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig.

Install Intel drivers for HD4600/Z87

SL6.5 has the required drivers for the socket 1150 machines with Intel HD4600 graphics and Z87 chipset.

ASUS Z87 WS motherboard has these video connections with corresponding Intel video port assignements, as reported by "xrandr":

  • DisplayPort - DP1/HDMI1
  • MiniDisplayPort - DP2/HDMI2
  • HDMI - HDMI3

Due to hardware limitations, 3 HDMI monitors using 2 passive DP-HDMI adapters (and 1 straight HDMI) cannot be used.

To use 3 monitors do this:

  • 1st monitor: DisplayPort - DP-to-HDMI-passive-adapter - HDMI monitor (not tried: DP-to-DP-cable - DisplayPort monitor).
  • 2nd monitor: MiniDisplayPort - MiniDP-to-DP-cable - DisplayPort monitor
  • 3rd monitor: HDMI - HDMI-cable - HDMI monitor

With the monitors I have (Dell 1920x1200 VGA-HDMI-DP), the software thinks that there are 4 monitors: somehow both DP2 and HDMI2 see 1 minitor each, but the hardware cannot drive 4 monitors, so everything goes blank. To fix, disable HDMI2 (xrandr -display :0 --output HDMI2 --off) and enable DP2 (xrandr -display :0 --output DP2 --auto).

How to make this configuration permanent and how to assign monitor locations (left-right, etc), you figure it out.

Manual selection of monitor, video mode and resolution

Automatic selection of monitor and video mode usually works. When it does not, configure it manualls:

  • physically go to the computer
  • login as root
  • run "nvidia-settings" on machines using the NVIDIA driver
  • run "aticonfig" on machines with the ATI/AMD driver (use "aticonfig --initial" for initial setup, and good luck with anything more complicated)
  • run "system-config-display".
    • In the "hardware" tab, select monitor type: "generic LCD 1280x1024" or "generic LCD 1600x1200".
    • In the "settings" tab, select "1280x1024" or "1600x1200" and "Thousands of colors".
    • Press "ok", the display settings application should close.
  • Logout, the new login window should use the new settings.

Disable screen saver

If machine is booted without any monitor connected, current video cards to not enable any video outputs. If a monitor is connected later, there is no video image and there is no easy way to get a video image.

This can be solved by configuring X11 to always enable some video output. Because the monitor type is not known when X11 starts, one has to select some standard video mode (i.e. VESA 1280x1024) on some video output (VGA, DVI or HDMI).

Only NVIDIA cards with the NVIDIA driver (from EPEL) is supported by these instructions.

  • create default xorg.conf: nvidia-xconfig
  • edit /etc/X11/xorg.conf
  • add monitor section for the fake monitor:
Section "Monitor"
    Identifier     "Monitor0"
    VendorName     "Unknown"
    ModelName      "Unknown"
    HorizSync       31.0 - 83.0
    VertRefresh     59.0 - 61.0
    Option         "DPMS" "off"
    ModeLine "1280x1024"   108.00   1280 1328 1440 1688   1024 1025 1028 1066 +hsync +vsync
EndSection
  • add output selection in the "Device" section:
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BoardName      "GeForce 210"
    #Option "ConnectedMonitor" "DFP"
    #Option "ConnectedMonitor" "CRT"
    Option "ConnectedMonitor" "CRT-1"
    Option "UseEDID" "no"
EndSection
  • add fake video mode to the "Screen" section:
Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    SubSection     "Display"
        Depth       24
        Modes       "1280x1024"
    EndSubSection
EndSection
  • disable screen saver and DPMS power off in the "ServerLayout" or "ServerFlags" section:
Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0" 0 0
    InputDevice    "Keyboard0" "CoreKeyboard"
    InputDevice    "Mouse0" "CorePointer"
    Option         "Xinerama" "0"
    Option         "BlankTime" "0"
    Option         "StandbyTime" "0"
    Option         "SuspendTime" "0"
    Option         "OffTime" "0"
EndSection

Section "ServerFlags" 
    Option         "BlankTime" "0" 
    Option         "StandbyTime" "0" 
    Option         "SuspendTime" "0" 
    Option         "OffTime" "0" 
EndSection 

Finish installation

  • logout and reboot the computer to have all the changes to take effect

Configure HTTPS server (CentOS7)

This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd.

First, configure apache httpd:

  • execute these commands:
yum install -y mod_ssl certwatch crypto-utils
cd /etc/httpd/conf.d/
mv ssl.conf ssl.conf-not-used ### remove the stock ssl.conf which refers to the localhost certificate that will expire in 1 year
touch ssl.conf ### create a blank file to prevent automatic updates from installing a stock ssl.conf file
# this is done later: rm /etc/pki/tls/certs/localhost.crt
  • create new file ssl-daq12.conf # use actual hostname instead of daq12
Listen 443 https
#SSLPassPhraseDialog exec:/usr/libexec/httpd-ssl-pass-dialog
SSLSessionCache         shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  300
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin

<VirtualHost *:443>
ServerName daq12.triumf.ca
DocumentRoot /var/www/html
ErrorLog /var/log/httpd/daq12.log
SSLEngine on
# note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
# new SSL settings: K.O. Jan 2020, SSLlabs rating "A+"
SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4:!RSA
SSLHonorCipherOrder on
# pervious SSL settings:
#SSLProtocol all -SSLv2 -SSLv3
#SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
SSLCertificateFile /etc/pki/tls/certs/localhost.crt
SSLCertificateKeyFile /etc/pki/tls/private/localhost.key
#SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt
#ProxyPass /elog/ http://localhost:8082/ retry=1
#ProxyPass /      http://localhost:8080/ retry=1
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
<Location />
SSLRequireSSL
AuthType Basic
AuthName "DAQ password protected site"
Require valid-user
# create password file: touch /etc/httpd/htpasswd
# to add new user or change password: htpasswd /etc/httpd/htpasswd username
AuthUserFile /etc/httpd/htpasswd
</Location>
</VirtualHost>
  • stop httpd from listening on port 80: edit /etc/httpd/conf/httpd.conf, comment-out the line "Listen 80"
  • enable and start httpd:
systemctl enable httpd
systemctl restart httpd
systemctl status httpd
  • try to access https://daq12.triumf.ca
    • you should see a complaint about self-signed certificate
    • you should see a request for password (do not login yet)
    • if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again:
firewall-cmd --add-port=443/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all

Second, configure certbot:

(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)

(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)

  • check that port 80 is not used by anything:
  • netstat -an | grep LISTEN | grep ^tcp | grep 80
  • lsof -P | grep -i tcp | grep LISTEN | grep 80
  • if lsof reports that httpd is listening on port 80, follow the httpd instructions above (remove "listen 80" from httpd.conf
  • install certbot and open tcp port 80 in the firewall:
yum install -y certbot python2-certbot-apache # (from EPEL)
firewall-cmd --add-port=80/tcp --permanent
firewall-cmd --reload
firewall-cmd --list-all
  • certbot certonly --standalone --installer apache # then answer questions:
  • "activate HTTPS for daq12.triumf.ca" - say ok
  • "enter email address" - enter your own email address
  • "please read terms..." - read the terms and say "agree"
  • it will take a few moments...
  • "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration
  • "congratulations..." - say ok.
  • certbot install --apache --cert-name daq12.triumf.ca # then answer questions:
  • "choose redirect..." - say "1" (no redirect)
  • look inside ssl-daq12.conf to see that SSLCertificateFile & co point to certbot certificates in /etc/letsencrypt/live/daq12.triumf.ca/
  • remove self-signed localhost certificate, it will expire in 1 year and cause warnings and complaints: rm /etc/pki/tls/certs/localhost.crt
  • enable automatic renewal
systemctl enable certbot-renew.timer
systemctl start certbot-renew.timer
systemctl list-timers --all
  • to check corrent renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal

NOTE: this certificate will expire in 3 months, automatic renewal should work starting with certbot-0.12.0-4.el7.noarch. Certificate expiration should be automatically detected by "certwatch" and email will be sent to local root user, to be forwarded to an actual person by ~root/.forward.

Third, activate password protection:

  • as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/httpd/htpasswd
htpasswd /etc/httpd/htpasswd midas

Final test:

From here:

  • Configure selinux to allow proxying
 setsebool -P httpd_can_network_connect 1
 systemctl restart httpd
  • enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
  • enable proxy for ELOG - ditto

NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0

Configure large RAID6 arrays

  • connect the disks
  • check the disks health
    • run smart-status.perl
  • partition the disks
    • yum install gdisk
    • gdisk /dev/sdX
    • delete all partitions: o
    • create new partition: n, enter, enter, enter, fd00 (default sizes, partition type fd00)
    • write and exit: w
  • check presence of all partitions:
    • /bin/ls -l /dev/sd*1
  • prepare to use an external bitmap file
    • touch /md6bitmap
    • edit /etc/fstab, change entry for root filesystem from: "defaults 1 1" to "defaults 0 0"
    • edit /boot/grub/grub.conf, change entry "kernel ... ro ..." to "kernel ... rw ..."
  • create raid array:
    • mdadm --create /dev/md6 --level=6 --bitmap=/md6bitmap --raid-devices=10 /dev/sd[b-k]1
    • mdadm -Ds >> /etc/mdadm.conf
    • cleanup /etc/mdadm.conf
    • echo "echo 16384 > /sys/block/md6/md/stripe_cache_size" >> /etc/rc.local
    • echo "echo 1 > /sys/block/md6/md/sync_speed_min" >> /etc/rc.local
    • source /etc/rc.local
  • observe raid array rebuild:
    • watch -d -n1 "cat /proc/mdstat"

Configure ZFS

Install ZFS

(from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)

Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs.

#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm
yum install http://download.zfsonlinux.org/epel/zfs-release.el7_9.noarch.rpm
yum-config-manager --disable zfs
yum-config-manager --disable zfs-kmod
yum --enablerepo=zfs-kmod clean all
yum --enablerepo=zfs-kmod install zfs
#sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
#systemctl enable zfs-import-cache
#systemctl enable zfs-mount
#systemctl enable zfs-share
#systemctl enable zfs-zed
#shutdown -r now # required to load the zfs kernel modules and to disable selinux
modprobe zfs # should work
zpool status # should report no pools available
  1. Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see #https://github.com/zfsonlinux/zfs/issues/4845

If ZFS kernel module does not load automatically at boot time, add this to load it manually:

ls -l /etc/sysconfig/modules/
cat > /etc/sysconfig/modules/zfs.modules <<EOF
if [ ! -e /sys/module/zfs ] ; then
  modprobe zfs;
fi
EOF
chmod +x /etc/sysconfig/modules/zfs.modules

Update ZFS (CentOS-7.9)

  • update CentOS-7.x to latest point release
  • reboot to latest kernel
  • check that currently installed ZFS is 0.8.x (not 0.7 or older)
  • then update ZFS:
[root@daq16 ~]# zfs version
zfs-0.8.4-1
zfs-kmod-0.8.4-1
[root@daq16 ~]# yum --enablerepo=kmod-zfs update
...
[root@daq16 ~]# zfs version ### observe mismatched version numbers: 0.8.5 userspace vs 0.8.4 kernel module
zfs-0.8.5-1
zfs-kmod-0.8.4-1
  • reboot to activate the updated kernel module
  • zfs version again
[root@daq16 ~]# zpool version
zfs-0.8.5-1
zfs-kmod-0.8.5-1
  • zpool status in case some ZFS volume needs to be updated
[root@daq16 ~]# zpool status
  pool: z12tb
 state: ONLINE
...

Update ZFS 0.7 to 0.8

How to identify zfs 0.7: "zfs version" does not work, also "rpm -q zfs"

zfs 0.7 is obsolete.

To opdate to zfs 0.8 or newer, remove 0.7, then install new version per instructions above.

  • remove zfs 0.7
yum versionlock delete zfs ### versionlock not needed anymore
yum versionlock delete kernel ### versionlock not needed anymore
rm /etc/yum.repos.d/zfs.repo* ### delete old repo files
yum erase zfs spl
  • reboot
  • install new zfs per instructions above
  • zpool import -as
  • zpool status ### check if any pool needs to be upgraded
  • zpool upgrade zssd ### upgrade zfs pool features

Lock kernel and zfs packages

!!! THIS IS NOT NEEDED ANYMORE !!!

yum versionlock kernel
yum versionlock zfs
yum-config-manager --disable zfs
yum-config-manager --disable zfs-kmod

Follow generic ZFS instructions

Here: ZFS

performance notes

Go here: disk_benchmarks

Configure UEFI boot

Some mobo can boot from NVME (PCIe) SSDs only via UEFI boot. Do this:

  • partition the NVME SSD using gdisk (must be GPT partition table, must have MSDOS EFI partition size 512MiB)
[root@alpha00 ~]# gdisk -l /dev/nvme0n1
GPT fdisk (gdisk) version 0.8.6 ...
Found valid GPT with protective MBR; using GPT.
Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 1A82CC87-2757-44ED-980F-C78E3681D9D3
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 500118158
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         1050623   512.0 MiB   EF00  EFI System
   2         1050624       500118158   238.0 GiB   8300  Linux filesystem
[root@alpha00 ~]# 
  • create filesystems
mkfs.msdos /dev/nvme0n1p1
mkfs.xfs /dev/nvme0n1p2
  • prepare EFI partition
mkdir /mnt/efi
mount /dev/nvme0n1p1 /mnt/efi
mkdir -p /mnt/efi/efi/boot
cd /mnt/efi/efi/boot
# with Ubuntu LTS 20.04
cp /boot/vmlinuz vmlinuz # copy the desired linux kernel
#cp /boot/initramfs initramfs.img # copy the matching initramfs file
cp /boot/initrd.img initrd.img # copy the matching initrd file
#from /home/olchansk/sysadm/syslinux/syslinux-6.03 copy
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/efi/syslinux.efi .
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 .
cp syslinux.efi bootx64.efi
  • create syslinux config file: syslinux.cfg
default linux
label linux
kernel vmlinuz
append ro root=/dev/nvme0n1p2 nomodeset initrd=initrd.img
  • prepare system partition
mkdir /mnt/tmp
mount /dev/nvme0n1p2 /mnt/tmp
rsync -avx / /mnt/tmp
cd /mnt/tmp
#edit etc/fstab
#edit etc/syslinux/selinux # set selinux to permissive mode because rsync did not copy the selinux labels
  • unmount and reboot
  • restore selinux labels after first boot
#login as root
cd /
restorecon -R / # can also add "-v" to see progress, but runs much slower
#edit /etc/sysconfig/selinux # enable selinux
#shutdown -r now # reboot with selinux enabled

Configure UEFI secure boot

The above instructions do not quite work if "secure boot" is enabled.

These modifications are needed:

  • ls -l /boot/efi/EFI/bootko/
total 140116
-rwxr-xr-x 1 root root      108 Feb 24 15:47 BOOTX64.CSV
-rwxr-xr-x 1 root root  1334816 Feb 24 16:16 bootx64.efi
-rwxr-xr-x 1 root root   217495 Feb 24 16:16 config-4.15.0-74-generic
-rwxr-xr-x 1 root root      105 Feb 24 15:47 grub.cfg
-rwxr-xr-x 1 root root   199952 Feb 24 16:16 grubx64.efi
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initramfs.img
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initrd.img-4.15.0-74-generic
-rwxr-xr-x 1 root root   139968 Feb 24 16:16 ldlinux.e64
-rwxr-xr-x 1 root root  1269496 Feb 24 15:47 mmx64.efi
-rwxr-xr-x 1 root root  1334816 Feb 24 16:16 shimx64.efi
-rwxr-xr-x 1 root root      171 Feb 24 16:16 syslinux.cfg
-rwxr-xr-x 1 root root      102 Feb 24 16:16 syslinux.cfg~
-rwxr-xr-x 1 root root   199952 Feb 24 16:16 syslinux.efi
-rwxr-xr-x 1 root root  4068355 Feb 24 16:16 System.map-4.15.0-74-generic
-rwxr-xr-x 1 root root  8367768 Feb 24 16:16 vmlinuz
-rwxr-xr-x 1 root root  8367768 Feb 24 16:16 vmlinuz-4.15.0-74-generic
    • shmix64.efi is a copy from /boot/efi/EFI/ubuntu
    • bootx64.efi is a copy of shimx64.efi (maybe not needed?)
    • grubx64.efi is a copy of syslinux.efi
  • efibootmgr -c -d /dev/nvme0n1 -p 2 -w -L bootko -l '\EFI\bootko\shimx64.efi'
  • efibootmgr -v
root@daqubuntu:~# efibootmgr -v
BootCurrent: 0000
Timeout: 1 seconds
BootOrder: 0000,0001,0002
Boot0000* bootko        HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\BOOTKO\SHIMX64.EFI)
Boot0001* Hard Drive    BBS(HD,,0x0)..GO..NO........y.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7....................A.......................................<..Gd-.;.A..MQ..L.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7........BO
Boot0002* ubuntu        HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO
root@daqubuntu:~# 
  • NOTE: if, after running "efibootmgr -c", the UUID is zero, then it probably did not take and the entry will vanish after reboot. In my case the mistake was to use "-p 1" instead of "-p 2".

Boot sequence is this:

  • shmix64.efi - Microsoft-signed boot loader is accepted by secure boot, loads and runs
  • shimx64.efi loads and runs grubx64.efi, this file name is hardwired into the signed shim, cannot be changed
  • grubx64.efi is syslinux.efi (could be anything)
  • syslinux.efi runs, loads syslinux.cfg, loads the linux kernel, loads the initrd, runs the linux kernel with specified flags (ro root=...).

UEFI syslinux kernel update

To update the linux kernel booted by UEFI syslinux, use this script:

  • ~root/git/scripts/etc/update_efi.perl

Update SL6 ssh

WARNING!!!
WARNING!!! original instructions used openssh 9.1, vulnerable to CVE-2024-6387
WARNING!!!
WARNING!!! these updated instructions use OpenSSH_9.8. K.O. 3jul2024
WARNING!!!
WARNING!!! see https://www.openssh.com/releasenotes.html
WARNING!!!

Stock SL6 ssh is now very old and by default, cannot connect to current Ubuntu and MacOS sshd. In reverse their ssh cannot connect to SL6 sshd.

Workaround is to manually enable SL6-compatible settings

root@daq00:~# ssh -oHostKeyAlgorithms=+ssh-rsa -oPubKeyAcceptedAlgorithms=+ssh-rsa ladd00

Solution is to install newer ssh on affected SL6 machines:

Install OpenSSH_9.8p1 per CVE-2024-6387

ssh root@sl6-machine
cd /opt
git clone https://daq00.triumf.ca/~olchansk/git/openssh.git
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/
/bin/cp -pv /etc/ssh/*key* /opt/openssh/etc/ ### copy old ssh host keys
/opt/openssh/bin/ssh-keygen -A ### generate any missing ssh host keys
# test sshd /opt/openssh/sbin/sshd -p 2222 -d
/bin/mv /usr/sbin/sshd /usr/sbin/sshd-SL6
/bin/ln -s /opt/openssh/sbin/sshd /usr/sbin/
/bin/mv /usr/bin/ssh /usr/bin/ssh-SL6
/bin/ln -s /opt/openssh/bin/ssh /usr/bin/
service sshd restart

Update openssh from 9.1 to OpenSSH_9.8p1 per CVE-2024-6387

Check for old version:

[root@muon openssh]# telnet localhost 22
SSH-2.0-OpenSSH_9.1

Update:

cd /opt/openssh
git pull
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/
service sshd restart

Check for new version:

telnet localhost 22
SSH-2.0-OpenSSH_9.8

Build openssh

ssh sl6-machine
cd git
git clone git://anongit.mindrot.org/openssh.git
cd openssh
autoreconf
xemacs -nw ./configure ### fix syntax error: line 28124 empty "if/then/else" block bombs out, fill it with "AAA=aaa"
./configure --prefix=/opt/openssh
make -j

Install openssh:

ssh root@sl6-machine
cd .../git/openssh
make install ### copies stuff to /opt/openssh
/opt/openssh/sbin/sshd -p 2222 -d ### test sshd
/opt/openssh/bin/ssh -v sl6-machine ### test ssh

Update for CVE-2024-6387:

  • cd .../git/openssh
  • git pull
  • git checkout V_9_8_P1
  • ./configure --prefix=/opt/openssh --with-ssl-dir=/opt/openssl
  • make ### no go, wants openssl-1.1.1
  • cd .../git/
  • git clone https://github.com/openssl/openssl.git
  • cd openssl
  • git checkout OpenSSL_1_1_1w
  • configure with prefix --prefix=/opt/openssl
  • make, install to /opt/openssl
  • cd .../openssh
  • configure, build, does not find openssl libraries in /opt (they forgot to set RPATH for user-sepcified location of openssl)
  • LD_LIBRARY_PATH=/opt/openssl/lib, try again, now builds and installs
  • but sshd does not run, does not find libcrypto.so.1.1
  • needs ln -s .../lib/libcrypto.so.1.1 /usr/lib64, now sshd find it, everything works.