SLinstall: Difference between revisions
(478 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Notes == | == Notes == | ||
* these instructions are periodically updated to include items needed for older/newer versions of Linux. They are marked like this: (SL4.2+) means Scientific Linux 4.2 and newer; (SL4 is equivalent to FC3). (FC5 only) means Fedora Core 5; etc. | |||
* obsolete items are marked by the "#" sign at the beginning of the line and sometimes have a comment about the reason for removal. | |||
* typically, we do not "upgrade" machines using the Red Hat "upgrade" function. Instead, we save critical files from the old installation and do a "fresh install" from scratch | |||
* starting with RHEL7, the recommended OS is CentOS7 (instead of SL7). | |||
== Disk configurations == | |||
The year is 2019 and SSDs are used exclusively, except for bulk data storage, where one used 6-8-10-12 TB HDDs | |||
For reliability, home directories and data disks must use redundant storage - mdadm raid1 or ZFS raid1/raid6. | |||
For non-critical machines, a single SSD seems to be reliable enough to use as a boot and OS disk. But since any | |||
storage device can fail at any time without warning, home directories and data disks should use redundant storage. | |||
Note: for data disks bigger than 4-6TB, mdadm raid1/raid6 is no longer recommended because raid rebuild, | |||
verification and repair time has become unreasonably long. Instead, use ZFS raid1/raid6 which implements online verification, | |||
repair and disk replacement without requiring machine shutdown or OS down time. | |||
* single SSD - 120GB min - single partition for "/", no swap partition (create a swap file if swap is needed) - for non-critical machine with no local data storage (OS only) | |||
* dual SSD - 2x240GB min - all partitions mirrored (RAID1), 30GB "/", rest for /home1 - for daq station with local user home directories and no bulk data storage | |||
* single SSD + 2x6-8-10-12TB HDD - SSD partition: all "/", HDD partition as ZFS raid1 (mirrored) - for daq station with small local bulk data storage | |||
* single SSD + 6-8x6-8-10-12TB HDD - for small storage server machines - for daq station with local home directories and large bulk data storage. | |||
For VME processors: | |||
* network boot - [[VME-CPU#Network_boot]] - only option for V7648/V7750, do not use for V7805 (no netboot from GigE), optional for V7865/XVB-602 | |||
* USB boot - 8GB USB for V7805, 16GB USB for V7865/XVB-602 | |||
== Preparation == | == Preparation == | ||
Line 18: | Line 36: | ||
* shutdown | * shutdown | ||
== Running | == Running installer (CentOS7) == | ||
CentOS7 can be installed from vanilla CentOS7 installation media or from | |||
a custom USB key build per there instructions: | |||
https://daqshare.triumf.ca/~olchansk/linux/CentOS7/ | |||
The custom installer makes it easy to use a custom kickstart file (ks.cfg). | |||
* | |||
* ( | Instructions for using the usb-installer: | ||
* | |||
* | * disconnect machine from network | ||
* | * plug the usb-installer into a usb3 port (blue colour) | ||
* | * reboot machine, select booting from usb (press F8 on ASUS motherboards) | ||
** | * usb-installer boot menu offers to install CentOS7, go there | ||
** | * CentOS7 should boot (many messages scroll on screen) | ||
** | * into graphical mode | ||
** | * into installer main menu | ||
*** " | * all installer options should "happy" except for the "installation destination" | ||
*** | * go to the "installation destination" menu | ||
*** " | ** unselect all disks except for the SSD where the OS will be installed | ||
** | ** (MOST IMPORTANT: unselect the USB installer disk!) | ||
* | ** select "I will configure..." | ||
* | ** say "done" | ||
* | ** the "manual partitionning" menu will open | ||
* | *** use the "-" button to delete all existing partitions | ||
* | *** select "standard partition" | ||
** | *** click on the "+" button | ||
* | *** in the "Add new partition" dialog, set mount point "/", capacity blank, click "add mount point" | ||
* | *** check capacity (should be full size of SSD), check filesystem type (should be XFS) | ||
* | *** say "done", there will be a warning about absent swap partition, say "done" again. | ||
* | *** in the big useless dialog, say "accept changes" | ||
*** should be back to the "installation summary" screen, "installation destination" should be happy now | |||
* | * after everything is happy, say "begin installation" | ||
* as the installation proceeds, set the password for the root user | |||
* after installation is complete, reboot the machine | |||
* unplug the usb-installer, CentOS7 should boot from SSD into the login screen | |||
* click on "not listed?", login as root | |||
* setup network connection: | |||
** open a terminal | |||
** start "nm-connection-editor" | |||
** click on "+" to create a new connection profile | |||
** select "wired ethernet" | |||
** select "add profile..." | |||
** in "Identity", set "name" to "static" | |||
** in "Identity", check that "Connect automatically" and "Make available..." is enabled | |||
** in "IPv4", set "Addresses" to "manual" instead of "dhcp" | |||
** enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19, search triumf.ca | |||
** say "Add", then close/quit the network settings | |||
* connect network cable | |||
* network should be up, ping ladd00 should work | |||
* run: yum update -y | |||
* check new kernel is installed: ls -l /boot | |||
* logout and restart (good luck finding these buttons in the gui!) | |||
* confirm correct linux kernel is selected during boot (-229.20, not the original installer kernel) | |||
* login as root, confirm network is up, proceed with the rest of these instructions | |||
== Configure SSH == | == Configure SSH == | ||
Line 64: | Line 108: | ||
</pre> | </pre> | ||
== | == Set hostname == | ||
Set hostname: (use full name, i.e. daq11.triumf.ca) | |||
<pre> | <pre> | ||
emacs -nw /etc/hostname | |||
</pre> | </pre> | ||
== Configure email == | |||
* TRIUMF: use relayhost = smtp.triumf.ca | |||
* CERN: use relayhost = cernmx.cern.ch | |||
* edit /etc/postfix/main.cf, set "relayhost = smtp.triumf.ca" | |||
* echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca bsmith@triumf.ca" >> ~root/.forward | |||
* | |||
== Make log files readable == | |||
<pre> | <pre> | ||
chmod a+r /var/log/messages | |||
chmod a+r /var/log/yum.log | |||
</pre> | </pre> | ||
== Activate /etc/rc.local == | |||
Activate rc.local: | |||
<pre> | <pre> | ||
chmod a+x /etc/rc.local | |||
chmod a+x /etc/rc.d/rc.local # TL edit | |||
systemctl enable rc-local | |||
systemctl start rc-local | |||
systemctl status rc-local | |||
</pre> | </pre> | ||
Line 260: | Line 148: | ||
#shutdown -r now | #shutdown -r now | ||
</pre> | </pre> | ||
== Configure NIS client (CentOS7) == | == Configure NIS client (CentOS7) == | ||
<pre> | <pre> | ||
yum -y install ypbind | yum -y install ypbind authconfig | ||
echo "NISTIMEOUT=5" >> /etc/sysconfig/network | echo "NISTIMEOUT=5" >> /etc/sysconfig/network | ||
echo "NETWORKWAIT=yes" >> /etc/sysconfig/network | echo "NETWORKWAIT=yes" >> /etc/sysconfig/network | ||
Line 299: | Line 158: | ||
ypwhich | ypwhich | ||
ypcat -k passwd | ypcat -k passwd | ||
systemctl restart autofs | |||
</pre> | </pre> | ||
* On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make) | * On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make) | ||
Line 307: | Line 167: | ||
</pre> | </pre> | ||
== Configure NIS | == Configure NIS client (CentOS8) == | ||
* all the same as for CentOS7 | |||
* ensure correct boot order for ypbind (in CentOS 8.1 ypbind is started before network is ready, service file uses "Wants" instead of "After") | |||
( | <pre> | ||
mkdir /etc/systemd/system/ypbind.service.d | |||
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ypbind.service.d/local.conf | |||
systemctl daemon-reload | |||
systemctl cat ypbind.service | |||
</pre> | |||
== Configure NIS secondary server (CentOS7 | == Configure NIS secondary server (CentOS7) == | ||
Enable local NIS server, make local machine use it: | Enable local NIS server, make local machine use it: | ||
<pre> | <pre> | ||
yum install ypserv | yum -y install ypserv | ||
/usr/lib64/yp/ypinit -s ladd00 (/usr/lib/yp/ypinit on 32-bit machines) | /usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines) | ||
systemctl enable rpcbind ypserv | ### ypinit will give lots of errors about "rpc.ypxfrd failed: RPC: Can't decode result"; can be ignored | ||
systemctl start rpcbind ypserv | systemctl disable ypxfrd yppasswdd | ||
systemctl stop ypxfrd yppasswdd | |||
systemctl enable rpcbind ypserv | |||
systemctl start rpcbind ypserv | |||
emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost" | |||
systemctl restart ypbind | systemctl restart ypbind | ||
ypwhich # should say "localhost" | ypwhich # should say "localhost" | ||
ypcat -k auto.master # should work | |||
</pre> | </pre> | ||
Line 341: | Line 201: | ||
echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network | echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network | ||
systemctl restart ypserv | systemctl restart ypserv | ||
firewall-cmd --get-services | |||
firewall-cmd --add-service rpc-bind --permanent | firewall-cmd --add-service rpc-bind --permanent | ||
firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent | firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent | ||
firewall-cmd --reload | firewall-cmd --reload | ||
firewall-cmd --list-all | |||
</pre> | </pre> | ||
* on the NIS master: | * on the NIS master: | ||
** add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers" | ** add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers" | ||
*** TL (2020-09): we not doing this anymore? I guess it doesn't work anyway... | |||
** if using /var/yp/securenets, copy it from NIS master to new NIS secondary server | ** if using /var/yp/securenets, copy it from NIS master to new NIS secondary server | ||
Enable hourly NIS update cron job (DO THIS AFTER git pull scripts, see below) | |||
<pre> | |||
cd ~/git/scripts | |||
git pull | |||
cd etc | |||
cd ~/git/scripts/etc; ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly | |||
</pre> | |||
== Configure AUTOFS (CentOS7) == | == Configure AUTOFS (CentOS7) == | ||
Line 365: | Line 231: | ||
</pre> | </pre> | ||
== Label Selinux labels == | |||
When upgrading non-selinux machines (el6) to el7 (selinux enforcing) the existing | |||
user home directories will not have the correct selinux labels and many things | |||
will not work, including ssh logins (sshd cannot access ~user/.ssh files). | |||
<pre> | <pre> | ||
semanage fcontext -a -e /home /home1 ### selinux has special rules for /home, assign them to /home1 | |||
restorecon -R -v /home1 ### apply the new rules to files in /home1 | |||
ls -Zd /home1/alpha/.ssh | |||
# should say: drwx------. alpha users system_u:object_r:ssh_home_t:s0 /home1/alpha/.ssh | |||
</pre> | </pre> | ||
== Configure time (CentOS7) == | == Configure time (CentOS7) == | ||
Line 416: | Line 263: | ||
* if desired, edit /etc/chrony.conf, remove non-triumf time servers | * if desired, edit /etc/chrony.conf, remove non-triumf time servers | ||
== | == Enable automatic system updates (CentOS7) == | ||
Disable yum-cron: | |||
<pre> | |||
rpm --erase yum-cron | |||
/bin/rm -v /var/lock/subsys/yum-cron | |||
/bin/rm -v /etc/cron.daily/0yum-daily.cron | |||
/bin/rm -v /etc/cron.hourly/0yum-hourly.cron | |||
</pre> | |||
Enable yum-autoupdate: | |||
<pre> | |||
yum install -y epel-release | |||
yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock | |||
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm | |||
rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm | |||
#rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm | |||
systemctl enable yum-autoupdate | |||
systemctl start yum-autoupdate | |||
systemctl status yum-autoupdate | |||
</pre> | |||
== Disable automatic system updates (CentOS7) == | |||
<pre> | <pre> | ||
yum -y erase yum-autoupdate | |||
/bin/rm -f /etc/sysconfig/yum-autoupdate.rpmsave | |||
/bin/rm -f /var/lock/subsys/yum-autoupdate | |||
</pre> | |||
== Enable automatic system updates (CentOS8) == | |||
<pre> | |||
yum -y install dnf-automatic | |||
systemctl enable --now dnf-automatic.timer | |||
systemctl list-timers *dnf-* | |||
</pre> | |||
edit /etc/dnf/automatic.conf | |||
<pre> | |||
apply_updates = yes | |||
</pre> | </pre> | ||
Line 506: | Line 325: | ||
systemctl disable multipathd | systemctl disable multipathd | ||
systemctl disable netcf-transaction | systemctl disable netcf-transaction | ||
systemctl disable lvm2-lvmetad.socket | |||
systemctl disable lvm2-lvmpolld.socket | |||
systemctl disable iscsid.socket | |||
systemctl disable iscsiuio.socket | |||
systemctl disable ksm | |||
systemctl disable ksmtuned | |||
#systemctl disable | #systemctl disable | ||
</pre> | </pre> | ||
== Erase unwanted packages == | == Erase unwanted packages (CentOS7) == | ||
* PackageKit # bugs users about security updates, hogs yum lock | |||
* perl-homedir # creates unwanted $HOME/perl5 | |||
* ModemManager # thinks that all USB-attached devices are modems | |||
* pcp # sends error email to itself, does not work | |||
* abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted | |||
* rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken | |||
* bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful. | |||
<pre> | <pre> | ||
yum erase PackageKit | yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion | ||
</pre> | </pre> | ||
== | == Disable unwanted package "tracker" == | ||
The "tracker" package is part of the GNOME desktop, it scans the content of all files | |||
into a database for quick searching. | |||
When it malfunctions, bad things happen, i.e. read through | |||
https://bugzilla.redhat.com/show_bug.cgi?id=747689 | |||
Specific problem I see is that it floods the system log with error messages. Also | |||
consumes network and filesystem bandwidth for NFS mounted home directories. | |||
This package cannot be removed by "yum erase tracker" dues to dependencies | |||
from core GNOME desktop. | |||
yum erase | |||
Instead, do this to deactivate it: | |||
<pre> | <pre> | ||
chmod -x /usr/libexec/tracker-* | |||
chmod -x /usr/bin/tracker | |||
chattr +i /usr/bin/tracker | |||
chattr +i /usr/libexec/tracker-* | |||
</pre> | </pre> | ||
Line 534: | Line 376: | ||
<pre> | <pre> | ||
yum install epel-release | yum install epel-release | ||
</pre> | |||
ELREPO: (kernel modules and drivers) (CentOS8) | |||
<pre> | |||
yum install elrepo-release | |||
</pre> | </pre> | ||
Line 552: | Line 399: | ||
<pre> | <pre> | ||
yum install ed patch wget git libotf gdisk emacs | yum install ed patch wget git libotf gdisk emacs perl | ||
</pre> | </pre> | ||
Line 588: | Line 414: | ||
</pre> | </pre> | ||
Go back to the NIS slave server and install the hourly NIS update cron job. | |||
== Enable yum version lock == | |||
<pre> | |||
yum install yum-plugin-versionlock | |||
#yum versionlock packagename # yum versionlock rpcbind | |||
#yum versionlock list # list locked packages | |||
#yum versionlock delete packagename # unlock given package | |||
#yum versionlock clear # delete all locks | |||
</pre> | |||
== Configure trusted ssh keys == | == Configure trusted ssh keys == | ||
Line 608: | Line 438: | ||
== Configure hardware sensors == | == Configure hardware sensors == | ||
* yum install lm_sensors | * yum -y install lm_sensors | ||
* sensors-detect (accept default answer to all questions - press ENTER) | * sensors-detect (accept default answer to all questions - press ENTER) | ||
* | * systemctl restart lm_sensors | ||
* sensors (to see available sensors) | * sensors (to see available sensors) | ||
If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page. | If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page. | ||
== Configure IPMI sensors == | == Configure IPMI sensors == | ||
Line 650: | Line 470: | ||
systemctl start ipmi | systemctl start ipmi | ||
ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. | ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. | ||
--- | systemctl list-unit-files | grep -i ipmi | ||
systemctl enable ipmi | |||
systemctl restart ipmi | |||
systemctl status ipmi | |||
systemctl enable ipmievd | |||
systemctl restart ipmievd | |||
systemctl status ipmievd | |||
tail -100 /var/log/messages ### look at messages logged by ipmievd | tail -100 /var/log/messages ### look at messages logged by ipmievd | ||
</pre> | </pre> | ||
Line 670: | Line 492: | ||
** ipmitool sel elist -- report all accumulated messages | ** ipmitool sel elist -- report all accumulated messages | ||
== Configure | == Configure ECC memory == | ||
* check that machine has ECC memory: dmidecode --type memory | grep -i ecc | |||
Configure mcelog (machine check exception) | |||
* yum install mcelog | |||
* check that mcelog is running: ps -efw | grep mcelog | |||
* (el6) chkconfig mcelogd on; service mcelogd restart | |||
* (el7) systemctl status mcelog.service; systemctl enable mcelog.service; systemctl restart mcelog.service | |||
Check for MCE (machine check exception) messages: | |||
* mcelog --client | |||
* grep -i mce /var/log/messages* | |||
* grep -i ecc /var/log/messages* | |||
Configure EDAC | |||
<pre> | <pre> | ||
# | yum install edac-utils | ||
edac-ctl --mainboard | |||
edac-ctl --status | |||
lsmod | grep edac | |||
modprobe ie31200_edac ### driver for Intel E3-1200 series ECC memory | |||
[root@grsmid00 ~]# ls -l /sys/devices/system/edac/mc/ | |||
... empty | |||
[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/ | |||
drwxr-xr-x. 15 root root 0 Oct 25 16:40 mc0 | |||
... | |||
[root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/mc0 | |||
total 0 | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_count | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 ce_noinfo_count | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow0 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow1 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow2 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow3 | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 max_location | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 mc_name | |||
drwxr-xr-x. 2 root root 0 Oct 25 16:40 power | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank0 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank1 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank2 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank3 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank4 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank5 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank6 | |||
drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank7 | |||
--w-------. 1 root root 4096 Oct 25 16:40 reset_counters | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 seconds_since_reset | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 size_mb | |||
lrwxrwxrwx. 1 root root 0 Oct 2 12:02 subsystem -> ../../../../../bus/mc0 | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_count | |||
-r--r--r--. 1 root root 4096 Oct 25 16:40 ue_noinfo_count | |||
-rw-r--r--. 1 root root 4096 Oct 25 16:40 uevent | |||
[root@alpha00 ~]# | |||
[root@alpha00 ~]# edac-ctl --status | |||
edac-ctl: drivers are loaded. | |||
[root@alpha00 ~]# edac-util | |||
edac-util: No errors to report. | |||
[root@alpha00 ~]# edac-util -s | |||
edac-util: EDAC drivers are loaded. 1 MC detected | |||
</pre> | </pre> | ||
== Configure SMARTD (CentOS7) == | |||
Default el7 smartd config files send deficient email notices about disk failures. Overwrite. | |||
<pre> | <pre> | ||
/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/ | |||
/bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/ | |||
systemctl enable smartd | |||
systemctl restart smartd | |||
systemctl status smartd | |||
</pre> | </pre> | ||
== Enable User Disk Quotas (OPTIONAL) == | == Enable User Disk Quotas (OPTIONAL) == | ||
(+CentOS7) | |||
* read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html | * read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html | ||
Line 743: | Line 633: | ||
* edquote -tg ### change group quota time limits | * edquote -tg ### change group quota time limits | ||
== Enable NFS server (CentOS7) == | == Enable NFS V4 server (CentOS7) == | ||
* create /etc/exports. example: | * create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...) | ||
<pre> | <pre> | ||
/home1 @home_export(rw,no_root_squash,async) | /home1 @home_export(rw,no_root_squash,async,fsid=1) | ||
/data1 @data_export(rw,no_root_squash,async) | /data1 @data_export(rw,no_root_squash,async,fsid=2) | ||
</pre> | </pre> | ||
* check the netgroup file | * check the netgroup file | ||
Line 756: | Line 646: | ||
* enable things, start them: | * enable things, start them: | ||
<pre> | <pre> | ||
firewall-cmd --get-services | |||
firewall-cmd --permanent --add-service=nfs | firewall-cmd --permanent --add-service=nfs | ||
firewall-cmd --permanent --add-service=rpc-bind ### needed for ubuntu automounter | |||
firewall-cmd --reload | firewall-cmd --reload | ||
firewall-cmd --list-all | firewall-cmd --list-all | ||
Line 762: | Line 654: | ||
systemctl start nfs-server | systemctl start nfs-server | ||
systemctl status nfs | systemctl status nfs | ||
</pre> | |||
== Enable NFS V3 server (CentOS7) == | |||
<pre> | |||
ps -efw | grep rpc.mountd # should be running! | |||
firewall-cmd --get-services | |||
firewall-cmd --permanent --add-service=mountd | |||
firewall-cmd --permanent --add-service=rpc-bind | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | </pre> | ||
Line 795: | Line 698: | ||
* yum install triumf-amanda | * yum install triumf-amanda | ||
== Enable AMANDA backups (CentOS7) == | |||
<pre> | |||
yum install amanda-client | |||
systemctl list-unit-files | grep -i amanda | |||
#systemctl enable amanda | |||
systemctl enable amanda.socket | |||
systemctl enable amanda-udp.socket | |||
systemctl restart amanda.socket | |||
systemctl restart amanda-udp.socket | |||
firewall-cmd --get-services | |||
firewall-cmd --permanent --add-service=amanda-client | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts | |||
</pre> | |||
On amanda server, add new machine to the disklist, then: | |||
<pre> | |||
amcheck -c daily titan00 | |||
</pre> | |||
== Enable DCACHE == | == Enable DCACHE == | ||
Line 801: | Line 727: | ||
/daq/pnfs/triumf.ca/data/ | /daq/pnfs/triumf.ca/data/ | ||
For Centos-7 machines, you need to adjust the firewall rules in order to be able to communicate with the trdata machines; this is only necessary if you are copying data to trdata. The firewall changes are | |||
<pre> | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.212/32" port protocol="tcp" port="0-65535" accept" | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.107.156/32" port protocol="tcp" port="0-65535" accept" | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.219/32" port protocol="tcp" port="0-65535" accept" | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
This instructions are unnecessary | This instructions are unnecessary | ||
Line 809: | Line 745: | ||
For more information on, see [[TrdataDcache]] dcache page. | For more information on, see [[TrdataDcache]] dcache page. | ||
== Configure Ganglia == | == Configure Ganglia (Centos7) == | ||
CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2) | |||
<pre> | <pre> | ||
/bin/rm /etc/gmond.conf | /bin/rm /etc/gmond.conf | ||
yum -y install "ganglia-gmond*" | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf # collects useless data | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf | |||
yum erase -y ganglia-vmstat ganglia-sensors ganglia-top ganglia-smart ganglia-cpumhz | |||
cd ~/git/scripts | |||
git pull | |||
/bin/cp etc/gmond.conf /etc/ganglia/gmond.conf | |||
systemctl enable gmond | |||
systemctl restart gmond | |||
systemctl status gmond | |||
cd ganglia | |||
/bin/ | ./ganglia-all.perl | ||
make install | |||
cd ~ | |||
/ | |||
/bin/ | |||
cp | |||
/ | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/ | |||
/bin/cp -v /dev/null /etc/ganglia/conf.d/ | |||
</pre> | </pre> | ||
== Configure Ganglia ( | == Configure Ganglia (Centos8) == | ||
CentOS8 Ganglia instructions (EPEL8 ganglia-3.7.2) | |||
<pre> | <pre> | ||
/bin/rm /etc/gmond.conf | /bin/rm /etc/gmond.conf | ||
yum -y | yum -y install "ganglia-gmond*" | ||
/bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf | /bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf | ||
systemctl enable gmond | systemctl enable gmond | ||
systemctl | systemctl restart gmond | ||
systemctl status gmond | systemctl status gmond | ||
cd ~/git/scripts/ganglia | |||
git pull | |||
./ganglia-all.perl | |||
make install | |||
</pre> | </pre> | ||
Line 894: | Line 800: | ||
<pre> | <pre> | ||
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs | yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs | ||
</pre> | </pre> | ||
== Install memtest and PXE boot == | == Install memtest and PXE boot == | ||
!!!DO NOT DO THIS!!! | |||
<pre> | <pre> | ||
Line 922: | Line 830: | ||
== Install node monitoring == | == Install node monitoring == | ||
!!! OBSOLETE, DO NOT DO THIS !!! | |||
(+CentOS7) | (+CentOS7) | ||
Line 931: | Line 841: | ||
/usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600 | /usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600 | ||
</pre> | </pre> | ||
== Install gonodeinfo node monitoring == | |||
(+Ubuntu, +CentOS7, +CentOS8) | |||
go to https://bitbucket.org/dd1/gonodeinfo | |||
follow instructions: | |||
<pre> | |||
yum -y install golang | |||
mkdir ~/git | |||
cd ~/git | |||
git clone https://bitbucket.org/dd1/gonodeinfo.git | |||
# or git clone https://daq.triumf.ca/~olchansk/git/gonodeinfo.git | |||
cd gonodeinfo | |||
git pull | |||
make | |||
make install # install gonodeinfo agent | |||
cd ~ # this is important | |||
</pre> | |||
* emacs -nw /etc/gonodeinfo.conf | |||
* change "Description", "Location", "User" and "Administrator" as appropriate (or delete them) | |||
* change "Servers" to read: Servers: daq00.triumf.ca:8601 | |||
* run gonodeinfo -e | |||
* if error is "connection refused". go to the nodeinfo server to add this client to the access control list: | |||
* on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13 | |||
* try gonodeinfo again, there should be no error | |||
* on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now | |||
== Install latest system updates == | == Install latest system updates == | ||
Line 940: | Line 878: | ||
</pre> | </pre> | ||
== Configure TRIUMF Printers == | == Configure TRIUMF Printers (CentOS7) == | ||
<pre> | <pre> | ||
systemctl stop cups | |||
systemctl disable cups | |||
yum install | echo "ServerName printers.triumf.ca" > /etc/cups/client.conf | ||
lpstat -a | |||
</pre> | |||
== Disable syslog spam (CentOS7) == | |||
Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this: | |||
<pre> | |||
echo auditctl -e 0 >> /etc/rc.local | |||
echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local | |||
/etc/rc.local | |||
</pre> | |||
== Install basic system packages (CentOS7) == | |||
(if starting from minimal system, basic system packages required:) | |||
<pre> | |||
yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils sysstat iftop tcsh | |||
yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools | |||
</pre> | </pre> | ||
Line 952: | Line 910: | ||
(+CentOS7) | (+CentOS7) | ||
yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel | yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" "libusbx-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*-g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy sympy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" --exclude golang"*"git"*" mesa"*" xerces-c"*" diffuse clang i2c-tools texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras perl-GD"*" perl-Math"*" perl-Statistics-Basic cmake3 cmake3-gui extra-cmake-modules python2-pip mariadb-devel glibc-devel.i686 libzstd zlib-devel.i686 | ||
== Install optional packages == | |||
!! DO NOT DO THIS !! | |||
(do not install boost on 32-bit machines) | (do not install boost on 32-bit machines) | ||
Line 970: | Line 932: | ||
yum reinstall urw-fonts | yum reinstall urw-fonts | ||
== | == Install libraries for PHYSICA (CentOS7) == | ||
To run physica built on el6 from git sources on el7, do this: | |||
(building physica on el7 is nort supported at this time) | |||
(see more http://www.triumf.info/wiki/DAQwiki/index.php/PHYSICA) | |||
<pre> | |||
yum -y install libX11.i686 gd.i686 libpng12.i686 readline.i686 compat-libf2c-34.i686 | |||
</pre> | |||
== Install additional desktop environements (CentOS7) == | |||
<pre> | |||
# LXQT (from EPEL) | |||
# NOT COMPATIBLE WITH el7.7 # yum -y install "lxqt*" | |||
# Cinnamon desktop (from EPEL) | |||
yum -y install cinnamon | |||
# KDE5 not available yet | |||
# MATE (from epel) | |||
yum -y groupinstall "MATE Desktop" | |||
yum -y install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils | |||
yum -y erase ModemManager abrt abrt-libs abrt-gui-libs | |||
# XFCE4 (from EPEL) | |||
yum -y groupinstall xfce | |||
yum -y install "xfce*plugin" xfce4-about --exclude xfce4-hamster-plugin | |||
yum -y erase bash-completion | |||
</pre> | |||
* make the MATE desktop as default | |||
<pre> | |||
cd ~root/git/scripts/ | |||
git pull | |||
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/ | |||
</pre> | |||
* lighdm login manager (from EPEL) | |||
<pre> | |||
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5 | |||
</pre> | |||
* and switch from gdm to lighdm | |||
<pre> | <pre> | ||
systemctl disable gdm.service | |||
systemctl enable lightdm.service | |||
(systemctl stop gdm; systemctl restart lightdm) & | |||
</pre> | </pre> | ||
Line 990: | Line 994: | ||
yum install ntfs-3g ntfsprogs (from EPEL) | yum install ntfs-3g ntfsprogs (from EPEL) | ||
== Install Google Chrome web browser (64-bit | == Install HFS and HFS+ drivers (CentOS7) == | ||
yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus | |||
== Install Google Chrome web browser (64-bit CentOS7) == | |||
DOES NOT WORK AS OF google-chrome-stable-114 because google uses signature incompatible with CentOS-7, see https://www.reddit.com/r/chrome/comments/13s799o/googlechromebeta_1140573545_rpm_invalid_signature/ | |||
automatic updates will fail with signature check error, to defeat it lock old version of google-chrome: | |||
<pre> | |||
yum versionlock google-chrome-stable | |||
</pre> | |||
THIS DOES NOT WORK ANYMORE: | |||
<pre> | <pre> | ||
/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/ | |||
yum install google-chrome-stable | |||
</pre> | </pre> | ||
== | == Enable monitoring of HTTPS certificates == | ||
On SL6, CentOS7: | |||
<pre> | <pre> | ||
yum install crypto-utils | |||
/etc/cron.daily/certwatch | |||
strace -f /etc/cron.daily/certwatch |& grep open | grep crt | |||
</pre> | </pre> | ||
== Enable 100dpi fonts for EPICS == | == Enable 100dpi fonts for EPICS == | ||
Line 1,016: | Line 1,030: | ||
<pre> | <pre> | ||
ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/ | ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/ | ||
</pre> | |||
== Enable crontab @reboot for MIDAS (CentOS7) == | |||
el7 has a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory | |||
is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be | |||
started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron). | |||
<pre> | |||
mkdir /etc/systemd/system/crond.service.d | |||
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/crond.service.d/local.conf | |||
systemctl daemon-reload | |||
systemctl cat crond.service | |||
</pre> | |||
el7 has a second bug, sometimes it thinks the network is running when it is not, specifically, | |||
DNS is not working and autofs mount of user home directory fails. So not only cron has | |||
to wait for ypbind and autofs to be ready, we also have to wait for DNS to be ready: | |||
<pre> | |||
cd ~/git/scripts | |||
git pull | |||
cp etc/wait-for-dns.service /etc/systemd/system/ | |||
systemctl daemon-reload | |||
systemctl enable wait-for-dns | |||
systemctl restart wait-for-dns # should return immediately. if there is a 30 second time, script is broken, disable it | |||
systemctl status wait-for-dns # to see what went wrong. | |||
</pre> | |||
Explore the systemd dependacy tree using "systemctl list-dependencies" maybe with "--all". | |||
Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser. | |||
== Enable firewall for MIDAS (CentOS7) == | |||
Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports). | |||
To enable access to mhttpd: | |||
<pre> | |||
firewall-cmd --add-port=8443/tcp --permanent | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host) | |||
<pre> | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept" | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
To enable access from the private network (replace "192.168.1.0" with your private network number): | |||
<pre> | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="0-65535" accept" | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
== Enable firewall for EPICS (CentOS7) == | |||
To enable access to TRIUMF EPICS servers, do this: | |||
<pre> | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.132.0/23" accept" | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
For UCN the controls people seem to have EPICS setup on a different server; this might be true for CMMS as well. In this case the firewall rule change should be | |||
<pre> | |||
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.139.0/23" accept" | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | </pre> | ||
Line 1,049: | Line 1,140: | ||
* "java plugin 1.7.0_15" should be listed | * "java plugin 1.7.0_15" should be listed | ||
== Configure USB device permissions == | == Configure USB device permissions == | ||
(+CentOS7) | |||
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc. | Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc. | ||
Line 1,066: | Line 1,152: | ||
<pre> | <pre> | ||
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules | emacs -nw /etc/udev/rules.d/99-usb-chmod.rules | ||
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" | |||
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c" | ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c" | ||
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c" | ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c" | ||
Line 1,073: | Line 1,160: | ||
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}" | ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}" | ||
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | ||
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | |||
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | ||
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}" | |||
</pre> | </pre> | ||
* reload udev rules: udevadm control --reload-rules | |||
* apply new permissions: udevadm trigger --action=add | * apply new permissions: udevadm trigger --action=add | ||
* watch udev activity: udevadm monitor -p | |||
== Disable modem-manager == | == Disable modem-manager == | ||
Line 1,097: | Line 1,188: | ||
mkdir /etc/jtagd | mkdir /etc/jtagd | ||
echo 'Password = "123";' > /etc/jtagd/jtagd.conf | echo 'Password = "123";' > /etc/jtagd/jtagd.conf | ||
cp -pv / | cp -pv /daq/daqshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts | ||
</pre> | </pre> | ||
* start local jtagd: / | * start local jtagd: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagd | ||
* test local connection: / | * test local connection: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagconfig | ||
* test remote connection (add this machine to your .jtag.conf, run jtagconfig | * test remote connection (add this machine to your .jtag.conf, run jtagconfig | ||
For more information, go to [[Quartus]] | For more information, go to [[Quartus]] | ||
== | == Install EOS == | ||
Instructions from here: | |||
http://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html | |||
<pre> | |||
rpm -vh --install https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/citrine/tag/el-7/x86_64/eos-repo-el7-generic-1.noarch.rpm | |||
yum-config-manager --disable eos-citrine # disable auto-update because all packages are not signed | |||
yum-config-manager --disable eos-dep # disable auto-update because all packages are not signed. | |||
yum install eos-client eos-fuse --enablerepo=eos-citrine | |||
</pre> | |||
== Install fix for the el7 systemd dbus boot hang == | |||
Around early Summer 2018 el7 started showing a boot problem. In the nutshell, | |||
there is a problem with the dbus connection between dbus and systemd that | |||
prevents polkit, firewalld, etc from starting. The system eventually boots | |||
enough that one can ssh into it, but most things do not work. Notably, | |||
polkit is not running, firewalld is not running, ssh login takes about 15-30 second. | |||
Solution is to add a special systemd service to check that dbus started correctly. | |||
It that runs after dbus is started, but before it is used, and it restarts dbus in a loop | |||
with a delay until dbus starts correctly. In testing, dbus always starts correctly after | |||
the first retry. | |||
<pre> | <pre> | ||
cd ~root/git/scripts/etc | |||
git pull | |||
/bin/cp -vf systemd-check-dbus.perl /usr/bin/ | |||
/bin/cp -vf systemd-check-dbus.service /etc/systemd/system/ | |||
systemctl daemon-reload | |||
systemctl enable systemd-check-dbus | |||
systemctl start systemd-check-dbus | |||
systemctl status systemd-check-dbus | |||
</pre> | </pre> | ||
After linux boots, if everything was okey, the script will report this: | |||
<pre> | |||
[root@iris01 ~]# systemctl status systemd-check-dbus | |||
... | |||
Feb 08 17:15:49 iris01.triumf.ca systemd[1]: Starting Check that systemd is registered with dbus... | |||
Feb 08 17:15:49 iris01.triumf.ca sh[4283]: Starting check for systemd dbus connection | |||
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List: string "org.freedesktop.DBus" | |||
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List: string "org.freedesktop.systemd1" | |||
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: systemd1 dbus service exists, success! | |||
Feb 08 17:15:50 iris01.triumf.ca sh[4283]: Finished check for systemd dbus connection | |||
Feb 08 17:15:50 iris01.triumf.ca systemd[1]: Started Check that systemd is registered with dbus. | |||
</pre> | |||
* | If the boot problem happened, the script will report about restarting dbus. | ||
* | |||
* | Note: the systemd service file adjusts the start order of other services, this adjustment seems to reduce the probability of the problem. | ||
** | |||
== Configure GRUB boot loader (CentOS7, CentOS8) == | |||
* emacs -nw /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX | |||
* grub2-mkconfig -o /boot/grub2/grub.cfg | |||
* grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg | |||
* grub2-editenv list # show contents of boot environement file | |||
* /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file | |||
== Install memtest86+ (CentOS7, CentOS8) == | |||
<pre> | |||
yum -y install memtest86+ | |||
/bin/cp -vf /usr/share/memtest86+/20_memtest86+ /etc/grub.d/ | |||
/bin/chmod a+x /etc/grub.d/20_memtest86+ | |||
grub2-mkconfig -o /boot/grub2/grub.cfg | |||
</pre> | |||
== Disable ELREPO == | == Disable ELREPO == | ||
Line 1,142: | Line 1,273: | ||
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo | sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo | ||
</pre> | </pre> | ||
== Reduce install size (optional) == | |||
This is optional. Only do this if reducing the size of the OS image is very important. | |||
Do this for VME processors. | |||
<pre> | |||
yum erase "texlive*" "java*" "boost*" libreoffice"*" | |||
#yum erase "xemacs*" | |||
yum erase "libstdc++-docs" | |||
yum erase firefox google-chrome"*" | |||
yum clean all | |||
</pre> | |||
<pre> | |||
/bin/rm -rf /usr/share/help | |||
/bin/rm -rf /usr/share/doc | |||
</pre> | |||
== Update from el7.6 to el7.7 == | |||
<pre> | |||
yum-config-manager --disable zfs | |||
yum-config-manager --disable zfs-kmod | |||
yum-config-manager --disable zfs-testing-kmod | |||
yum versionlock delete zfs | |||
yum versionlock delete kernel | |||
yum -y update "yum*" "rpm*" | |||
yum -y erase libqtxdg lxqt-qtplugin ### LXQT is not compatible | |||
yum update | |||
after rebooting into el7.7, follow instructions for updating ZFS from version 0.7 to 0.8. | |||
</pre> | |||
== Update ZFS == | |||
* CentOS-7: 0.8.5 to 2.0.7 | |||
** update kernel to latest version, reboot | |||
** check /etc/yum.repos.d/zfs.repo has [zfs-kmod] baseurl=http://download.zfsonlinux.org/epel/7.9/kmod/$basearch/ | |||
** yum --enablerepo=zfs-kmod update | |||
** reboot, login as root | |||
** run "zfs version" | |||
** run "zfs upgrade" | |||
== Switch from LADD-NIS to DAQ-NIS == | |||
<pre> | |||
domainname DAQ-NIS | |||
/usr/lib64/yp/ypinit -s daq00 | |||
ls -l /var/yp | |||
sed -i s/LADD-NIS/DAQ-NIS/ /etc/yp.conf | |||
sed -i s/LADD-NIS/DAQ-NIS/ /etc/sysconfig/network | |||
systemctl restart ypserv | |||
systemctl restart ypbind | |||
ypwhich | |||
ypwhich -m | |||
</pre> | |||
== Finish installation == | |||
reboot | |||
== Special hardware settings == | == Special hardware settings == | ||
Line 1,207: | Line 1,399: | ||
* http://www.asus.com/Motherboard/P9X79_WS/ | * http://www.asus.com/Motherboard/P9X79_WS/ | ||
* use BIOS version 3101, 3401 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS. | * use BIOS version 4901. Older versions seem to be ok: 3101, 3401, 4701, 4802 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS. | ||
* for CPU temperature, install coretemp | * (not needed for CentOS7) for CPU temperature, install coretemp | ||
* for sensors, install driver for NCT6776F chip same as E35M1-M above. | * (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above. | ||
* BIOS Settings: | * BIOS Settings: | ||
** enter "Advanced mode" | ** enter "Advanced mode" | ||
** Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default | ** Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default | ||
** Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings | ** ### NOT THIS: Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings | ||
** Monitor -> disable Q-fan on for all fans - let all fans always run at maximum RPMs | |||
** Boot -> Full screen logo -> Set to "disabled" | ** Boot -> Full screen logo -> Set to "disabled" | ||
** Wait for F1 -> Set to "disabled" | ** Wait for F1 -> Set to "disabled" | ||
Line 1,223: | Line 1,416: | ||
* for sensors, install driver for NCT6776F chip same as E35M1-M above. | * for sensors, install driver for NCT6776F chip same as E35M1-M above. | ||
=== SUPERMICRO | === SUPERMICRO X9SCL === | ||
* yum install kmod-w83627ehf.x86_64 coretemp | * yum install kmod-w83627ehf.x86_64 coretemp | ||
Line 1,240: | Line 1,429: | ||
<pre> | <pre> | ||
cd ~root | cd ~root | ||
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775/nct6775.ko | wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko | ||
echo modprobe hwmon-vid >> /etc/rc.local | |||
echo insmod /root/nct6775.ko >> /etc/rc.local | |||
/etc/rc.local | |||
sensors | |||
</pre> | </pre> | ||
=== ASUS Z97-WS === | |||
the nct6775 driver does not work because of conflict with ACPI. | |||
=== ASUS Z170-DELUXE === | |||
* use bios 3801 | |||
* set XMP mode (DDR4-2400) | |||
* Advanced->On board devices: set sata mode to "M2", set PCIe slot 3 to "x4" | |||
* boot: disable f1, disable logo, disable numlock | |||
=== ASUS AM1M-A === | === ASUS AM1M-A === | ||
Line 1,255: | Line 1,453: | ||
* SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey) | * SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey) | ||
* install ATI/AMD video drivers from ELREPO (see below) | * install ATI/AMD video drivers from ELREPO (see below) | ||
* sensors chip is ITE IT8623E, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures): | * sensors chip is ITE IT8623E, for SL6, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures): | ||
<pre> | <pre> | ||
cd ~root | cd ~root | ||
Line 1,261: | Line 1,459: | ||
echo modprobe hwmon_vid >> /etc/rc.local | echo modprobe hwmon_vid >> /etc/rc.local | ||
echo insmod /root/it87.ko >> /etc/rc.local | echo insmod /root/it87.ko >> /etc/rc.local | ||
sensors- | . /etc/rc.local | ||
</pre> | |||
* for el7 use it87.ko driver: | |||
<pre> | |||
cd ~root | |||
wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/it87.ko | |||
echo modprobe hwmon_vid >> /etc/rc.local | |||
echo insmod /root/it87.ko >> /etc/rc.local | |||
. /etc/rc.local | |||
</pre> | |||
* sensors output: | |||
<pre> | |||
[root@midemma02 ~]# sensors | |||
radeon-pci-0008 | |||
Adapter: PCI adapter | |||
temp1: +22.0°C (crit = +120.0°C, hyst = +90.0°C) | |||
fam15h_power-pci-00c4 | |||
Adapter: PCI adapter | |||
power1: N/A (crit = 25.00 W) | |||
k10temp-pci-00c3 | |||
Adapter: PCI adapter | |||
temp1: +22.2°C (high = +70.0°C) | |||
(crit = +70.0°C, hyst = +69.0°C) | |||
it8603-isa-0290 | |||
Adapter: ISA adapter | |||
in0: +0.96 V (min = +2.50 V, max = +2.95 V) ALARM | |||
in1: +2.23 V (min = +0.94 V, max = +1.22 V) ALARM | |||
in2: +2.03 V (min = +0.74 V, max = +0.77 V) ALARM | |||
in3: +2.00 V (min = +1.26 V, max = +0.13 V) ALARM | |||
in4: +2.23 V (min = +2.95 V, max = +2.15 V) ALARM | |||
3VSB: +3.36 V (min = +6.00 V, max = +2.50 V) ALARM | |||
Vbat: +3.22 V | |||
+3.3V: +3.36 V | |||
fan1: 611 RPM (min = 200 RPM) | |||
fan2: 707 RPM (min = 600 RPM) ALARM | |||
temp1: +38.0°C (low = +122.0°C, high = +122.0°C) sensor = thermistor | |||
temp2: +22.0°C (low = +119.0°C, high = -35.0°C) ALARM sensor = thermistor | |||
temp3: -128.0°C (low = +16.0°C, high = +93.0°C) sensor = thermistor | |||
intrusion0: ALARM | |||
[root@midemma02 ~]# | |||
</pre> | </pre> | ||
* AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together) | * AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together) | ||
Line 1,285: | Line 1,525: | ||
|33 34| | |33 34| | ||
</pre> | </pre> | ||
=== ASUS H110M-A/M.2 === | |||
* use BIOS 2003 or later | |||
* dmidecode | grep -i nct reports: Nuvoton NCT5539D | |||
* sensors chip is "NCT6793D or compatible chip", for el7, use this driver: | |||
<pre> | |||
cd ~root | |||
wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko | |||
echo modprobe hwmon-vid >> /etc/rc.local | |||
echo insmod /root/nct6775.ko >> /etc/rc.local | |||
/etc/rc.local | |||
sensors | |||
</pre> | |||
* sensors output: | |||
<pre> | |||
[root@daq03 ~]# sensors | |||
acpitz-virtual-0 | |||
Adapter: Virtual device | |||
temp1: +27.8°C (crit = +119.0°C) | |||
temp2: +29.8°C (crit = +119.0°C) | |||
nct6793-isa-0290 | |||
Adapter: ISA adapter | |||
in0: +0.34 V (min = +0.00 V, max = +1.74 V) | |||
in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in4: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in5: +0.15 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in6: +0.97 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in8: +3.12 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in9: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in10: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in11: +0.12 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in12: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in13: +0.12 V (min = +0.00 V, max = +0.00 V) ALARM | |||
in14: +0.13 V (min = +0.00 V, max = +0.00 V) ALARM | |||
fan1: 1041 RPM (min = 0 RPM) | |||
fan2: 1020 RPM (min = 0 RPM) | |||
fan5: 0 RPM (min = 0 RPM) | |||
fan6: 0 RPM | |||
SYSTIN: +119.0°C (high = +98.0°C, hyst = +95.0°C) sensor = thermistor | |||
CPUTIN: +26.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor | |||
AUXTIN0: +27.5°C sensor = thermistor | |||
AUXTIN1: +112.0°C sensor = thermistor | |||
AUXTIN2: +111.0°C sensor = thermistor | |||
AUXTIN3: +111.0°C sensor = thermistor | |||
PECI Agent 0: +28.0°C (high = +98.0°C, hyst = +95.0°C) | |||
(crit = +100.0°C) | |||
PECI Agent 0 Calibration: +25.5°C | |||
PCH_CHIP_CPU_MAX_TEMP: +0.0°C | |||
PCH_CHIP_TEMP: +0.0°C | |||
intrusion0: ALARM | |||
intrusion1: ALARM | |||
beep_enable: disabled | |||
coretemp-isa-0000 | |||
Adapter: ISA adapter | |||
Physical id 0: +31.0°C (high = +80.0°C, crit = +100.0°C) | |||
Core 0: +31.0°C (high = +80.0°C, crit = +100.0°C) | |||
Core 1: +28.0°C (high = +80.0°C, crit = +100.0°C) | |||
[root@daq03 ~]# | |||
</pre> | |||
=== Supermicro X11SSH-F === | |||
* blacklist the mei and mei_me drivers per http://www.supermicro.com/support/faqs/faq.cfm?faq=14537 | |||
<pre> | |||
[root@alpha00 ~]# more /etc/modprobe.d/blacklist.conf | |||
blacklist mei | |||
blacklist mei_me | |||
[root@alpha00 ~]# | |||
</pre> | |||
* mobo requires M.2 PCIe SSD (M.2 SATA SSD would not work. SATA SATA SSD ok) | |||
* boot from M.2 PCIe SSD requires UEFI boot (from an MSDOS partition on the SSD) | |||
=== ASUS TUF Z390M-PRO GAMING (WI-FI) === | |||
* BIOS 2417 is okey, upgrade to this if older | |||
* do not set XMP memory mode | |||
* in the BIOS, enable the boot compatibility support module mode: BIOS (press DEL) -> Advanced mode -> BOOT -> CSM Module -> Enable CSM "yes". | |||
* for SL6, install e1000e driver from ELREPO: | |||
<pre> | |||
yum install --enablerepo=elrepo kmod-e1000e | |||
</pre> | |||
* sensors chip appears to be "Nuvoton NCT6798D" not clear what driver to use | |||
* dmidecode | grep -i nct reports: Nuvoton NCT6798D | |||
* kmod-nct6775-0.0-5.el7_7.elrepo.x86_64.rpm from ELrepo finds the chip but bombs because of conflict with ACPI | |||
=== ASUS PRIME X399-A === | |||
* BIOS 1002 | |||
* for reading temperatures and fan rotations, install driver: https://github.com/electrified/asus-wmi-sensors/issues/29 | |||
== Configure X11 graphics == | == Configure X11 graphics == | ||
Line 1,338: | Line 1,675: | ||
* yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv | * yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv | ||
* check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx" | * check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx" | ||
* run "aticonfig --initial" to create xorg.conf if existing one is not good | |||
* run "amdcccle" as root to configure dual-screens, etc | * run "amdcccle" as root to configure dual-screens, etc | ||
Note: 'amdcccle' is a GUI, so you must run this command from within a running X session | Note: 'amdcccle' is a GUI, so you must run this command from within a running X session | ||
* killall Xorg | * killall Xorg | ||
=== Install ATI/AMD drivers (CentOS7) === | |||
* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm | |||
* wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm | |||
* yum install acpid | |||
* rpm -vh --install kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm | |||
* amdconfig -f --initial | |||
* grub2-mkconfig -o /boot/grub2/grub.cfg | |||
* reboot | |||
* login as root | |||
* amdcccle | |||
NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig. | |||
=== Install Intel drivers for HD4600/Z87 === | === Install Intel drivers for HD4600/Z87 === | ||
Line 1,449: | Line 1,801: | ||
* logout and reboot the computer to have all the changes to take effect | * logout and reboot the computer to have all the changes to take effect | ||
== Configure HTTPS server (CentOS7) == | |||
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd. | |||
First, configure apache httpd: | |||
* execute these commands: | |||
<pre> | |||
yum install -y mod_ssl certwatch crypto-utils | |||
cd /etc/httpd/conf.d/ | |||
mv ssl.conf ssl.conf-not-used ### remove the stock ssl.conf which refers to the localhost certificate that will expire in 1 year | |||
touch ssl.conf ### create a blank file to prevent automatic updates from installing a stock ssl.conf file | |||
# this is done later: rm /etc/pki/tls/certs/localhost.crt | |||
</pre> | |||
* create new file ssl-daq12.conf # use actual hostname instead of daq12 | |||
<pre> | |||
Listen 443 https | |||
#SSLPassPhraseDialog exec:/usr/libexec/httpd-ssl-pass-dialog | |||
SSLSessionCache shmcb:/run/httpd/sslcache(512000) | |||
SSLSessionCacheTimeout 300 | |||
SSLRandomSeed startup file:/dev/urandom 256 | |||
SSLRandomSeed connect builtin | |||
SSLCryptoDevice builtin | |||
<VirtualHost *:443> | |||
ServerName daq12.triumf.ca | |||
DocumentRoot /var/www/html | |||
ErrorLog /var/log/httpd/daq12.log | |||
SSLEngine on | |||
# note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf | |||
# new SSL settings: K.O. Jan 2020, SSLlabs rating "A+" | |||
SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 | |||
SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4:!RSA | |||
SSLHonorCipherOrder on | |||
# pervious SSL settings: | |||
#SSLProtocol all -SSLv2 -SSLv3 | |||
#SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4 | |||
SSLCertificateFile /etc/pki/tls/certs/localhost.crt | |||
SSLCertificateKeyFile /etc/pki/tls/private/localhost.key | |||
#SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt | |||
#ProxyPass /elog/ http://localhost:8082/ retry=1 | |||
#ProxyPass / http://localhost:8080/ retry=1 | |||
Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" | |||
<Location /> | |||
SSLRequireSSL | |||
AuthType Basic | |||
AuthName "DAQ password protected site" | |||
Require valid-user | |||
# create password file: touch /etc/httpd/htpasswd | |||
# to add new user or change password: htpasswd /etc/httpd/htpasswd username | |||
AuthUserFile /etc/httpd/htpasswd | |||
</Location> | |||
</VirtualHost> | |||
</pre> | |||
* stop httpd from listening on port 80: edit /etc/httpd/conf/httpd.conf, comment-out the line "Listen 80" | |||
* enable and start httpd: | |||
<pre> | |||
systemctl enable httpd | |||
systemctl restart httpd | |||
systemctl status httpd | |||
</pre> | |||
* try to access https://daq12.triumf.ca | |||
** you should see a complaint about self-signed certificate | |||
** you should see a request for password (do not login yet) | |||
** if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again: | |||
<pre> | |||
firewall-cmd --add-port=443/tcp --permanent | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
Second, configure certbot: | |||
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, | |||
renewal can continue to use the https port 443) | |||
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals) | |||
* check that port 80 is not used by anything: | |||
* netstat -an | grep LISTEN | grep ^tcp | grep 80 | |||
* lsof -P | grep -i tcp | grep LISTEN | grep 80 | |||
* if lsof reports that httpd is listening on port 80, follow the httpd instructions above (remove "listen 80" from httpd.conf | |||
* install certbot and open tcp port 80 in the firewall: | |||
<pre> | |||
yum install -y certbot python2-certbot-apache # (from EPEL) | |||
firewall-cmd --add-port=80/tcp --permanent | |||
firewall-cmd --reload | |||
firewall-cmd --list-all | |||
</pre> | |||
* certbot certonly --standalone --installer apache # then answer questions: | |||
* "activate HTTPS for daq12.triumf.ca" - say ok | |||
* "enter email address" - enter your own email address | |||
* "please read terms..." - read the terms and say "agree" | |||
* it will take a few moments... | |||
* "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration | |||
* "congratulations..." - say ok. | |||
* certbot install --apache --cert-name daq12.triumf.ca # then answer questions: | |||
* "choose redirect..." - say "1" (no redirect) | |||
* look inside ssl-daq12.conf to see that SSLCertificateFile & co point to certbot certificates in /etc/letsencrypt/live/daq12.triumf.ca/ | |||
* remove self-signed localhost certificate, it will expire in 1 year and cause warnings and complaints: rm /etc/pki/tls/certs/localhost.crt | |||
* enable automatic renewal | |||
<pre> | |||
systemctl enable certbot-renew.timer | |||
systemctl start certbot-renew.timer | |||
systemctl list-timers --all | |||
</pre> | |||
* to check corrent renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this: | |||
<pre> | |||
certbot renew --standalone --installer apache --force-renewal | |||
</pre> | |||
NOTE: this certificate will expire in 3 months, automatic renewal should work starting with certbot-0.12.0-4.el7.noarch. | |||
Certificate expiration should be automatically detected by "certwatch" and email | |||
will be sent to local root user, to be forwarded to an actual person by ~root/.forward. | |||
Third, activate password protection: | |||
* as shown in the config file above, create password file and initial user: (replace "midas" with specific username) | |||
<pre> | |||
touch /etc/httpd/htpasswd | |||
htpasswd /etc/httpd/htpasswd midas | |||
</pre> | |||
Final test: | |||
* access https://daq12.triumf.ca - https status should be "green" | |||
* login with password should work | |||
* the apache httpd test page should load | |||
* check site security using the SSLlabs https tester. (I get grade "A-"): https://www.ssllabs.com/ssltest/ | |||
From here: | |||
* Configure selinux to allow proxying | |||
<pre> | |||
setsebool -P httpd_can_network_connect 1 | |||
systemctl restart httpd | |||
</pre> | |||
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above | |||
* enable proxy for ELOG - ditto | |||
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', | |||
try this: pip install requests==2.6.0 | |||
== Configure large RAID6 arrays == | == Configure large RAID6 arrays == | ||
Line 1,477: | Line 1,972: | ||
** watch -d -n1 "cat /proc/mdstat" | ** watch -d -n1 "cat /proc/mdstat" | ||
=== performance notes === | == Configure ZFS == | ||
=== Install ZFS === | |||
(from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS) | |||
Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs. | |||
<pre> | |||
#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm | |||
#yum install http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm | |||
yum install http://download.zfsonlinux.org/epel/zfs-release.el7_9.noarch.rpm | |||
yum-config-manager --disable zfs | |||
yum-config-manager --disable zfs-kmod | |||
yum --enablerepo=zfs-kmod clean all | |||
yum --enablerepo=zfs-kmod install zfs | |||
#sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config | |||
echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs | |||
#systemctl enable zfs-import-cache | |||
#systemctl enable zfs-mount | |||
#systemctl enable zfs-share | |||
#systemctl enable zfs-zed | |||
#shutdown -r now # required to load the zfs kernel modules and to disable selinux | |||
modprobe zfs # should work | |||
zpool status # should report no pools available | |||
</pre> | |||
#Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see #https://github.com/zfsonlinux/zfs/issues/4845 | |||
* http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-quickstart.html) | |||
* http://www.freebsd.org/cgi/man.cgi?query=zpool&sektion=8 | |||
If ZFS kernel module does not load automatically at boot time, add this to load it manually: | |||
<pre> | |||
ls -l /etc/sysconfig/modules/ | |||
cat > /etc/sysconfig/modules/zfs.modules <<EOF | |||
if [ ! -e /sys/module/zfs ] ; then | |||
modprobe zfs; | |||
fi | |||
EOF | |||
chmod +x /etc/sysconfig/modules/zfs.modules | |||
</pre> | |||
=== Update ZFS (CentOS-7.9) === | |||
* update CentOS-7.x to latest point release | |||
* reboot to latest kernel | |||
* check that currently installed ZFS is 0.8.x (not 0.7 or older) | |||
* then update ZFS: | |||
<pre> | |||
[root@daq16 ~]# zfs version | |||
zfs-0.8.4-1 | |||
zfs-kmod-0.8.4-1 | |||
[root@daq16 ~]# yum --enablerepo=kmod-zfs update | |||
... | |||
[root@daq16 ~]# zfs version ### observe mismatched version numbers: 0.8.5 userspace vs 0.8.4 kernel module | |||
zfs-0.8.5-1 | |||
zfs-kmod-0.8.4-1 | |||
</pre> | |||
* reboot to activate the updated kernel module | |||
* zfs version again | |||
<pre> | |||
[root@daq16 ~]# zpool version | |||
zfs-0.8.5-1 | |||
zfs-kmod-0.8.5-1 | |||
</pre> | |||
* zpool status in case some ZFS volume needs to be updated | |||
<pre> | |||
[root@daq16 ~]# zpool status | |||
pool: z12tb | |||
state: ONLINE | |||
... | |||
</pre> | |||
=== Update ZFS 0.7 to 0.8 === | |||
How to identify zfs 0.7: "zfs version" does not work, also "rpm -q zfs" | |||
zfs 0.7 is obsolete. | |||
To opdate to zfs 0.8 or newer, remove 0.7, then install | |||
new version per instructions above. | |||
* remove zfs 0.7 | |||
<pre> | |||
yum versionlock delete zfs ### versionlock not needed anymore | |||
yum versionlock delete kernel ### versionlock not needed anymore | |||
rm /etc/yum.repos.d/zfs.repo* ### delete old repo files | |||
yum erase zfs spl | |||
</pre> | |||
* reboot | |||
* install new zfs per instructions above | |||
* zpool import -as | |||
* zpool status ### check if any pool needs to be upgraded | |||
* zpool upgrade zssd ### upgrade zfs pool features | |||
=== Lock kernel and zfs packages === | |||
!!! THIS IS NOT NEEDED ANYMORE !!! | |||
<pre> | |||
yum versionlock kernel | |||
yum versionlock zfs | |||
yum-config-manager --disable zfs | |||
yum-config-manager --disable zfs-kmod | |||
</pre> | |||
=== Follow generic ZFS instructions === | |||
Here: [[ZFS]] | |||
== performance notes == | |||
Go here: [[disk_benchmarks]] | |||
== Configure UEFI boot == | |||
Some mobo can boot from NVME (PCIe) SSDs only via UEFI boot. Do this: | |||
* partition the NVME SSD using gdisk (must be GPT partition table, must have MSDOS EFI partition size 512MiB) | |||
<pre> | |||
[root@alpha00 ~]# gdisk -l /dev/nvme0n1 | |||
GPT fdisk (gdisk) version 0.8.6 ... | |||
Found valid GPT with protective MBR; using GPT. | |||
Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB | |||
Logical sector size: 512 bytes | |||
Disk identifier (GUID): 1A82CC87-2757-44ED-980F-C78E3681D9D3 | |||
Partition table holds up to 128 entries | |||
First usable sector is 34, last usable sector is 500118158 | |||
Partitions will be aligned on 2048-sector boundaries | |||
Total free space is 2014 sectors (1007.0 KiB) | |||
* | Number Start (sector) End (sector) Size Code Name | ||
1 2048 1050623 512.0 MiB EF00 EFI System | |||
2 1050624 500118158 238.0 GiB 8300 Linux filesystem | |||
[root@alpha00 ~]# | |||
</pre> | |||
* create filesystems | |||
<pre> | |||
mkfs.msdos /dev/nvme0n1p1 | |||
mkfs.xfs /dev/nvme0n1p2 | |||
</pre> | |||
* prepare EFI partition | |||
<pre> | |||
mkdir /mnt/efi | |||
mount /dev/nvme0n1p1 /mnt/efi | |||
mkdir -p /mnt/efi/efi/boot | |||
cd /mnt/efi/efi/boot | |||
# with Ubuntu LTS 20.04 | |||
cp /boot/vmlinuz vmlinuz # copy the desired linux kernel | |||
#cp /boot/initramfs initramfs.img # copy the matching initramfs file | |||
cp /boot/initrd.img initrd.img # copy the matching initrd file | |||
#from /home/olchansk/sysadm/syslinux/syslinux-6.03 copy | |||
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/efi/syslinux.efi . | |||
cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 . | |||
cp syslinux.efi bootx64.efi | |||
</pre> | |||
* create syslinux config file: syslinux.cfg | |||
<pre> | |||
default linux | |||
label linux | |||
kernel vmlinuz | |||
append ro root=/dev/nvme0n1p2 nomodeset initrd=initrd.img | |||
</pre> | |||
* prepare system partition | |||
<pre> | |||
mkdir /mnt/tmp | |||
mount /dev/nvme0n1p2 /mnt/tmp | |||
rsync -avx / /mnt/tmp | |||
cd /mnt/tmp | |||
#edit etc/fstab | |||
#edit etc/syslinux/selinux # set selinux to permissive mode because rsync did not copy the selinux labels | |||
</pre> | |||
* unmount and reboot | |||
* restore selinux labels after first boot | |||
<pre> | |||
#login as root | |||
cd / | |||
restorecon -R / # can also add "-v" to see progress, but runs much slower | |||
#edit /etc/sysconfig/selinux # enable selinux | |||
#shutdown -r now # reboot with selinux enabled | |||
</pre> | |||
= Configure UEFI secure boot = | |||
The above instructions do not quite work if "secure boot" is enabled. | |||
These modifications are needed: | |||
* ls -l /boot/efi/EFI/bootko/ | |||
<pre> | |||
total 140116 | |||
-rwxr-xr-x 1 root root 108 Feb 24 15:47 BOOTX64.CSV | |||
-rwxr-xr-x 1 root root 1334816 Feb 24 16:16 bootx64.efi | |||
-rwxr-xr-x 1 root root 217495 Feb 24 16:16 config-4.15.0-74-generic | |||
-rwxr-xr-x 1 root root 105 Feb 24 15:47 grub.cfg | |||
-rwxr-xr-x 1 root root 199952 Feb 24 16:16 grubx64.efi | |||
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initramfs.img | |||
-rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initrd.img-4.15.0-74-generic | |||
-rwxr-xr-x 1 root root 139968 Feb 24 16:16 ldlinux.e64 | |||
-rwxr-xr-x 1 root root 1269496 Feb 24 15:47 mmx64.efi | |||
-rwxr-xr-x 1 root root 1334816 Feb 24 16:16 shimx64.efi | |||
-rwxr-xr-x 1 root root 171 Feb 24 16:16 syslinux.cfg | |||
-rwxr-xr-x 1 root root 102 Feb 24 16:16 syslinux.cfg~ | |||
-rwxr-xr-x 1 root root 199952 Feb 24 16:16 syslinux.efi | |||
-rwxr-xr-x 1 root root 4068355 Feb 24 16:16 System.map-4.15.0-74-generic | |||
-rwxr-xr-x 1 root root 8367768 Feb 24 16:16 vmlinuz | |||
-rwxr-xr-x 1 root root 8367768 Feb 24 16:16 vmlinuz-4.15.0-74-generic | |||
</pre> | |||
** shmix64.efi is a copy from /boot/efi/EFI/ubuntu | |||
** bootx64.efi is a copy of shimx64.efi (maybe not needed?) | |||
** grubx64.efi is a copy of syslinux.efi | |||
* efibootmgr -c -d /dev/nvme0n1 -p 2 -w -L bootko -l '\EFI\bootko\shimx64.efi' | |||
* efibootmgr -v | |||
<pre> | |||
root@daqubuntu:~# efibootmgr -v | |||
BootCurrent: 0000 | |||
Timeout: 1 seconds | |||
BootOrder: 0000,0001,0002 | |||
Boot0000* bootko HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\BOOTKO\SHIMX64.EFI) | |||
Boot0001* Hard Drive BBS(HD,,0x0)..GO..NO........y.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7....................A.......................................<..Gd-.;.A..MQ..L.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7........BO | |||
Boot0002* ubuntu HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO | |||
root@daqubuntu:~# | |||
</pre> | |||
* NOTE: if, after running "efibootmgr -c", the UUID is zero, then it probably did not take and the entry will vanish after reboot. In my case the mistake was to use "-p 1" instead of "-p 2". | |||
Boot sequence is this: | |||
* shmix64.efi - Microsoft-signed boot loader is accepted by secure boot, loads and runs | |||
* shimx64.efi loads and runs grubx64.efi, this file name is hardwired into the signed shim, cannot be changed | |||
* grubx64.efi is syslinux.efi (could be anything) | |||
* syslinux.efi runs, loads syslinux.cfg, loads the linux kernel, loads the initrd, runs the linux kernel with specified flags (ro root=...). | |||
= UEFI syslinux kernel update = | |||
To update the linux kernel booted by UEFI syslinux, use this script: | |||
* ~root/git/scripts/etc/update_efi.perl | |||
= Update SL6 ssh = | |||
<pre> | |||
WARNING!!! | |||
WARNING!!! original instructions used openssh 9.1, vulnerable to CVE-2024-6387 | |||
WARNING!!! | |||
WARNING!!! these updated instructions use OpenSSH_9.8. K.O. 3jul2024 | |||
WARNING!!! | |||
WARNING!!! see https://www.openssh.com/releasenotes.html | |||
WARNING!!! | |||
</pre> | |||
Stock SL6 ssh is now very old and by default, cannot connect to current Ubuntu and MacOS sshd. In reverse their ssh cannot connect to SL6 sshd. | |||
== Workaround is to manually enable SL6-compatible settings == | |||
<pre> | |||
root@daq00:~# ssh -oHostKeyAlgorithms=+ssh-rsa -oPubKeyAcceptedAlgorithms=+ssh-rsa ladd00 | |||
</pre> | |||
Solution is to install newer ssh on affected SL6 machines: | |||
== Install OpenSSH_9.8p1 per CVE-2024-6387 == | |||
<pre> | |||
ssh root@sl6-machine | |||
cd /opt | |||
git clone https://daq00.triumf.ca/~olchansk/git/openssh.git | |||
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/ | |||
/bin/cp -pv /etc/ssh/*key* /opt/openssh/etc/ ### copy old ssh host keys | |||
/opt/openssh/bin/ssh-keygen -A ### generate any missing ssh host keys | |||
# test sshd /opt/openssh/sbin/sshd -p 2222 -d | |||
/bin/mv /usr/sbin/sshd /usr/sbin/sshd-SL6 | |||
/bin/ln -s /opt/openssh/sbin/sshd /usr/sbin/ | |||
/bin/mv /usr/bin/ssh /usr/bin/ssh-SL6 | |||
/bin/ln -s /opt/openssh/bin/ssh /usr/bin/ | |||
service sshd restart | |||
</pre> | |||
== Update openssh from 9.1 to OpenSSH_9.8p1 per CVE-2024-6387 == | |||
Check for old version: | |||
<pre> | |||
[root@muon openssh]# telnet localhost 22 | |||
SSH-2.0-OpenSSH_9.1 | |||
</pre> | |||
Update: | |||
<pre> | |||
cd /opt/openssh | |||
git pull | |||
ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/ | |||
service sshd restart | |||
</pre> | |||
Check for new version: | |||
<pre> | |||
telnet localhost 22 | |||
SSH-2.0-OpenSSH_9.8 | |||
</pre> | |||
== Build openssh == | |||
<pre> | |||
ssh sl6-machine | |||
cd git | |||
git clone git://anongit.mindrot.org/openssh.git | |||
cd openssh | |||
autoreconf | |||
xemacs -nw ./configure ### fix syntax error: line 28124 empty "if/then/else" block bombs out, fill it with "AAA=aaa" | |||
./configure --prefix=/opt/openssh | |||
make -j | |||
</pre> | |||
Install openssh: | |||
<pre> | |||
ssh root@sl6-machine | |||
cd .../git/openssh | |||
make install ### copies stuff to /opt/openssh | |||
/opt/openssh/sbin/sshd -p 2222 -d ### test sshd | |||
/opt/openssh/bin/ssh -v sl6-machine ### test ssh | |||
</pre> | |||
Update for CVE-2024-6387: | |||
* | * cd .../git/openssh | ||
* | * git pull | ||
* git checkout V_9_8_P1 | |||
* ./configure --prefix=/opt/openssh --with-ssl-dir=/opt/openssl | |||
* make ### no go, wants openssl-1.1.1 | |||
* cd .../git/ | |||
* git clone https://github.com/openssl/openssl.git | |||
* cd openssl | |||
* git checkout OpenSSL_1_1_1w | |||
* configure with prefix --prefix=/opt/openssl | |||
* make, install to /opt/openssl | |||
* cd .../openssh | |||
* configure, build, does not find openssl libraries in /opt (they forgot to set RPATH for user-sepcified location of openssl) | |||
* LD_LIBRARY_PATH=/opt/openssl/lib, try again, now builds and installs | |||
* but sshd does not run, does not find libcrypto.so.1.1 | |||
* needs ln -s .../lib/libcrypto.so.1.1 /usr/lib64, now sshd find it, everything works. |
Latest revision as of 16:00, 9 July 2024
Notes
- these instructions are periodically updated to include items needed for older/newer versions of Linux. They are marked like this: (SL4.2+) means Scientific Linux 4.2 and newer; (SL4 is equivalent to FC3). (FC5 only) means Fedora Core 5; etc.
- obsolete items are marked by the "#" sign at the beginning of the line and sometimes have a comment about the reason for removal.
- typically, we do not "upgrade" machines using the Red Hat "upgrade" function. Instead, we save critical files from the old installation and do a "fresh install" from scratch
- starting with RHEL7, the recommended OS is CentOS7 (instead of SL7).
Disk configurations
The year is 2019 and SSDs are used exclusively, except for bulk data storage, where one used 6-8-10-12 TB HDDs
For reliability, home directories and data disks must use redundant storage - mdadm raid1 or ZFS raid1/raid6.
For non-critical machines, a single SSD seems to be reliable enough to use as a boot and OS disk. But since any storage device can fail at any time without warning, home directories and data disks should use redundant storage.
Note: for data disks bigger than 4-6TB, mdadm raid1/raid6 is no longer recommended because raid rebuild, verification and repair time has become unreasonably long. Instead, use ZFS raid1/raid6 which implements online verification, repair and disk replacement without requiring machine shutdown or OS down time.
- single SSD - 120GB min - single partition for "/", no swap partition (create a swap file if swap is needed) - for non-critical machine with no local data storage (OS only)
- dual SSD - 2x240GB min - all partitions mirrored (RAID1), 30GB "/", rest for /home1 - for daq station with local user home directories and no bulk data storage
- single SSD + 2x6-8-10-12TB HDD - SSD partition: all "/", HDD partition as ZFS raid1 (mirrored) - for daq station with small local bulk data storage
- single SSD + 6-8x6-8-10-12TB HDD - for small storage server machines - for daq station with local home directories and large bulk data storage.
For VME processors:
- network boot - VME-CPU#Network_boot - only option for V7648/V7750, do not use for V7805 (no netboot from GigE), optional for V7865/XVB-602
- USB boot - 8GB USB for V7805, 16GB USB for V7865/XVB-602
Preparation
- save /etc, /var, /root, /opt, (if needed: /usr/local, /tftpboot) by rsync to some data disk (/ladd/data0/root)
- check that "/" partition (it will be overwritten) is different from /home1 and /data partitions
- note the MAC addresses of all network interfaces, add them to ladd00 dhcpd.conf to enable PXE boot into the SL "network installer"
- shutdown
Running installer (CentOS7)
CentOS7 can be installed from vanilla CentOS7 installation media or from a custom USB key build per there instructions: https://daqshare.triumf.ca/~olchansk/linux/CentOS7/
The custom installer makes it easy to use a custom kickstart file (ks.cfg).
Instructions for using the usb-installer:
- disconnect machine from network
- plug the usb-installer into a usb3 port (blue colour)
- reboot machine, select booting from usb (press F8 on ASUS motherboards)
- usb-installer boot menu offers to install CentOS7, go there
- CentOS7 should boot (many messages scroll on screen)
- into graphical mode
- into installer main menu
- all installer options should "happy" except for the "installation destination"
- go to the "installation destination" menu
- unselect all disks except for the SSD where the OS will be installed
- (MOST IMPORTANT: unselect the USB installer disk!)
- select "I will configure..."
- say "done"
- the "manual partitionning" menu will open
- use the "-" button to delete all existing partitions
- select "standard partition"
- click on the "+" button
- in the "Add new partition" dialog, set mount point "/", capacity blank, click "add mount point"
- check capacity (should be full size of SSD), check filesystem type (should be XFS)
- say "done", there will be a warning about absent swap partition, say "done" again.
- in the big useless dialog, say "accept changes"
- should be back to the "installation summary" screen, "installation destination" should be happy now
- after everything is happy, say "begin installation"
- as the installation proceeds, set the password for the root user
- after installation is complete, reboot the machine
- unplug the usb-installer, CentOS7 should boot from SSD into the login screen
- click on "not listed?", login as root
- setup network connection:
- open a terminal
- start "nm-connection-editor"
- click on "+" to create a new connection profile
- select "wired ethernet"
- select "add profile..."
- in "Identity", set "name" to "static"
- in "Identity", check that "Connect automatically" and "Make available..." is enabled
- in "IPv4", set "Addresses" to "manual" instead of "dhcp"
- enter IP address, netmask 255.255.224.0, gateway 142.90.100.18, dns 142.90.100.19, search triumf.ca
- say "Add", then close/quit the network settings
- connect network cable
- network should be up, ping ladd00 should work
- run: yum update -y
- check new kernel is installed: ls -l /boot
- logout and restart (good luck finding these buttons in the gui!)
- confirm correct linux kernel is selected during boot (-229.20, not the original installer kernel)
- login as root, confirm network is up, proceed with the rest of these instructions
Configure SSH
(+CentOS7)
- Login from the console
- restore the SSH keys from backup (/etc/ssh/*key*)
- service sshd restart
- ssh into the new machine as root
- ssh root@localhost, ctrl-C
- ### this is done later from Konstantin's git repository - scp root@ladd00:/root/authorized_keys ~root/.ssh/
- (not needed for SL5.5 kickstart) check that /etc/ssh/ssh_config contains "ForwardX11 yes" and "ForwardX11Trusted yes":
echo " ForwardX11 yes" >> /etc/ssh/ssh_config echo " ForwardX11Trusted yes" >> /etc/ssh/ssh_config
Set hostname
Set hostname: (use full name, i.e. daq11.triumf.ca)
emacs -nw /etc/hostname
Configure email
- TRIUMF: use relayhost = smtp.triumf.ca
- CERN: use relayhost = cernmx.cern.ch
- edit /etc/postfix/main.cf, set "relayhost = smtp.triumf.ca"
- echo "olchansk@triumf.ca amaudruz@triumf.ca lindner@triumf.ca bsmith@triumf.ca" >> ~root/.forward
Make log files readable
chmod a+r /var/log/messages chmod a+r /var/log/yum.log
Activate /etc/rc.local
Activate rc.local:
chmod a+x /etc/rc.local chmod a+x /etc/rc.d/rc.local # TL edit systemctl enable rc-local systemctl start rc-local systemctl status rc-local
Disable "persistent network names" (DO NOT DO THIS)
/bin/touch /etc/udev/rules.d/75-persistent-net-generator.rules /bin/rm /etc/udev/rules.d/70-persistent-net.rules #shutdown -r now
Configure NIS client (CentOS7)
yum -y install ypbind authconfig echo "NISTIMEOUT=5" >> /etc/sysconfig/network echo "NETWORKWAIT=yes" >> /etc/sysconfig/network authconfig --enablenis --enablepreferdns --nisdomain LADD-NIS --nisserver ladd00.triumf.ca --update ypwhich ypcat -k passwd systemctl restart autofs
- On the master NIS node (ladd00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
- Use "system-config-users" to add local user accounts
- enable selinux ssh key login to nfs mounted home directories:
setsebool -P use_nfs_home_dirs 1
Configure NIS client (CentOS8)
- all the same as for CentOS7
- ensure correct boot order for ypbind (in CentOS 8.1 ypbind is started before network is ready, service file uses "Wants" instead of "After")
mkdir /etc/systemd/system/ypbind.service.d echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ypbind.service.d/local.conf systemctl daemon-reload systemctl cat ypbind.service
Configure NIS secondary server (CentOS7)
Enable local NIS server, make local machine use it:
yum -y install ypserv /usr/lib64/yp/ypinit -s ladd00 ### (/usr/lib/yp/ypinit on 32-bit machines) ### ypinit will give lots of errors about "rpc.ypxfrd failed: RPC: Can't decode result"; can be ignored systemctl disable ypxfrd yppasswdd systemctl stop ypxfrd yppasswdd systemctl enable rpcbind ypserv systemctl start rpcbind ypserv emacs -nw /etc/yp.conf # change "domain XXX server YYY.triumf.ca" to read "domain XXX server localhost" systemctl restart ypbind ypwhich # should say "localhost" ypcat -k auto.master # should work
Punch hole in the firewall: (or "make" on NIS master will complain)
echo YPSERV_ARGS=\"-p 800\" >> /etc/sysconfig/network systemctl restart ypserv firewall-cmd --get-services firewall-cmd --add-service rpc-bind --permanent firewall-cmd --add-port=800/tcp --add-port=800/udp --permanent firewall-cmd --reload firewall-cmd --list-all
- on the NIS master:
- add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
- TL (2020-09): we not doing this anymore? I guess it doesn't work anyway...
- if using /var/yp/securenets, copy it from NIS master to new NIS secondary server
- add the new machine to /var/yp/ypservers, run "make -C /var/yp" and also "cd /var/yp; yppush -h newmachine ypservers"
Enable hourly NIS update cron job (DO THIS AFTER git pull scripts, see below)
cd ~/git/scripts git pull cd etc cd ~/git/scripts/etc; ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
Configure AUTOFS (CentOS7)
yum -y install autofs systemctl enable autofs systemctl start autofs ls -l /daq/daqshare
Label Selinux labels
When upgrading non-selinux machines (el6) to el7 (selinux enforcing) the existing user home directories will not have the correct selinux labels and many things will not work, including ssh logins (sshd cannot access ~user/.ssh files).
semanage fcontext -a -e /home /home1 ### selinux has special rules for /home, assign them to /home1 restorecon -R -v /home1 ### apply the new rules to files in /home1 ls -Zd /home1/alpha/.ssh # should say: drwx------. alpha users system_u:object_r:ssh_home_t:s0 /home1/alpha/.ssh
Configure time (CentOS7)
Time server ntpd was replaced by chronyd.
yum -y install chrony echo server time1 iburst >> /etc/chrony.conf echo server time2 iburst >> /etc/chrony.conf echo server time3 iburst >> /etc/chrony.conf systemctl enable chronyd systemctl restart chronyd chronyc sources chronyc tracking
- if desired, edit /etc/chrony.conf, remove non-triumf time servers
Enable automatic system updates (CentOS7)
Disable yum-cron:
rpm --erase yum-cron /bin/rm -v /var/lock/subsys/yum-cron /bin/rm -v /etc/cron.daily/0yum-daily.cron /bin/rm -v /etc/cron.hourly/0yum-hourly.cron
Enable yum-autoupdate:
yum install -y epel-release yum install -y yum-changelog yum-protectbase yum-tsflags yum-versionlock rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-kernel-module-1-5.el7.cern.noarch.rpm rpm -vh --install http://linuxsoft.cern.ch/cern/centos/7.2/cern/x86_64/Packages/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm #rpm -vh --install https://daqshare.triumf.ca/~olchansk/linux/yum-autoupdate-4.4.2-1.el7.cern.noarch.rpm https://daqshare.triumf.ca/~olchansk/linux/yum-kernel-module-1-5.el7.cern.noarch.rpm systemctl enable yum-autoupdate systemctl start yum-autoupdate systemctl status yum-autoupdate
Disable automatic system updates (CentOS7)
yum -y erase yum-autoupdate /bin/rm -f /etc/sysconfig/yum-autoupdate.rpmsave /bin/rm -f /var/lock/subsys/yum-autoupdate
Enable automatic system updates (CentOS8)
yum -y install dnf-automatic systemctl enable --now dnf-automatic.timer systemctl list-timers *dnf-*
edit /etc/dnf/automatic.conf
apply_updates = yes
Configure system services (CentOS7)
- systemctl list-unit-files | grep enabled | sort ### (to see enabled services)
- disable unwanted services:
systemctl disable bluetooth systemctl disable dm-event systemctl disable dmraid-activation systemctl disable iscsid systemctl disable iscsi systemctl disable iscsiuio systemctl disable libvirtd systemctl disable lvm2-lmetad systemctl disable lvm2-monitor systemctl disable ModemManager systemctl disable multipathd systemctl disable netcf-transaction systemctl disable lvm2-lvmetad.socket systemctl disable lvm2-lvmpolld.socket systemctl disable iscsid.socket systemctl disable iscsiuio.socket systemctl disable ksm systemctl disable ksmtuned #systemctl disable
Erase unwanted packages (CentOS7)
- PackageKit # bugs users about security updates, hogs yum lock
- perl-homedir # creates unwanted $HOME/perl5
- ModemManager # thinks that all USB-attached devices are modems
- pcp # sends error email to itself, does not work
- abrt # sends email to root about useless crashes, i.e. crash of X when machine is rebooted
- rear # some kind of backup and recovery tool, not clear what it does, but it sends email complaining how it is broken
- bash-completion # "echo $HOME/<TAB>" becomes "echo \$HOME" (notice "\" added before "$") preventing tab-completion from doing anything useful.
yum -y erase PackageKit perl-homedir ModemManager pcp abrt abrt-libs abrt-gui-libs rear bash-completion
Disable unwanted package "tracker"
The "tracker" package is part of the GNOME desktop, it scans the content of all files into a database for quick searching.
When it malfunctions, bad things happen, i.e. read through https://bugzilla.redhat.com/show_bug.cgi?id=747689
Specific problem I see is that it floods the system log with error messages. Also consumes network and filesystem bandwidth for NFS mounted home directories.
This package cannot be removed by "yum erase tracker" dues to dependencies from core GNOME desktop.
Instead, do this to deactivate it:
chmod -x /usr/libexec/tracker-* chmod -x /usr/bin/tracker chattr +i /usr/bin/tracker chattr +i /usr/libexec/tracker-*
Configure external package repositories (CentOS7)
EPEL: (addtional packages)
yum install epel-release
ELREPO: (kernel modules and drivers) (CentOS8)
yum install elrepo-release
ELREPO: (kernel drivers)
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org rpm -Uvh https://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm yum -y install yum-plugin-fastestmirror
Install packages needed to continue with installation
(+CentOS7)
(these packages are sometimes missing, they are needed to follow following instructions instructions)
(SL6.5: libotf is a dependancy of emacs - SL6.5 installer fails to install it)
yum install ed patch wget git libotf gdisk emacs perl
Configure Konstantin's scripts
(+Centos7)
mkdir ~root/git cd ~root/git git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git cd scripts git pull
Go back to the NIS slave server and install the hourly NIS update cron job.
Enable yum version lock
yum install yum-plugin-versionlock #yum versionlock packagename # yum versionlock rpcbind #yum versionlock list # list locked packages #yum versionlock delete packagename # unlock given package #yum versionlock clear # delete all locks
Configure trusted ssh keys
(+CentOS7)
ssh localhost interrupt by Ctrl-C /bin/cp ~/git/scripts/etc/authorized_keys ~/.ssh/
Configure hardware sensors
- yum -y install lm_sensors
- sensors-detect (accept default answer to all questions - press ENTER)
- systemctl restart lm_sensors
- sensors (to see available sensors)
If no sensors are detected by standard drivers, follow motherboard-specific instructions at the bottom of this page.
Configure IPMI sensors
Some machines support the IPMI interface for monitoring the hardware: fan speeds, temperatures, voltages.
- find out if IPMI is supported. Try this:
dmidecode | grep -i ipmi
if output is not blank, IPMI is maybe supported.
- install and enable IPMI software:
yum install "OpenIPMI*" ipmitool service ipmi start ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. chkconfig ipmi on chkconfig ipmievd on service ipmi restart service ipmievd restart tail -100 /var/log/messages ### look at messages logged by ipmievd
- (CentOS7) install and enable IPMI software:
yum install "OpenIPMI*" ipmitool systemctl start ipmi ipmitool sensor ### to confirm IPMI is present. If output is blank, do not go further. systemctl list-unit-files | grep -i ipmi systemctl enable ipmi systemctl restart ipmi systemctl status ipmi systemctl enable ipmievd systemctl restart ipmievd systemctl status ipmievd tail -100 /var/log/messages ### look at messages logged by ipmievd
- if ipmievd complains about SEL buffer overflow, clear it manually:
ipmitool sel list ### show ipmi messages in raw format ipmitool sel elist ### show ipmi messages in useful format ipmitool sel elist > file ### save ipmi messages into a file ipmitool sel clear ### clear all accumulated ipmi messages
- useful ipmi commands:
- ipmitool sensor -- read hardware sensors
- ipmitool sel elist -- report all accumulated messages
Configure ECC memory
- check that machine has ECC memory: dmidecode --type memory | grep -i ecc
Configure mcelog (machine check exception)
- yum install mcelog
- check that mcelog is running: ps -efw | grep mcelog
- (el6) chkconfig mcelogd on; service mcelogd restart
- (el7) systemctl status mcelog.service; systemctl enable mcelog.service; systemctl restart mcelog.service
Check for MCE (machine check exception) messages:
- mcelog --client
- grep -i mce /var/log/messages*
- grep -i ecc /var/log/messages*
Configure EDAC
yum install edac-utils edac-ctl --mainboard edac-ctl --status lsmod | grep edac modprobe ie31200_edac ### driver for Intel E3-1200 series ECC memory [root@grsmid00 ~]# ls -l /sys/devices/system/edac/mc/ ... empty [root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/ drwxr-xr-x. 15 root root 0 Oct 25 16:40 mc0 ... [root@alpha00 ~]# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r--. 1 root root 4096 Oct 25 16:40 ce_count -r--r--r--. 1 root root 4096 Oct 25 16:40 ce_noinfo_count drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow0 drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow1 drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow2 drwxr-xr-x. 3 root root 0 Oct 25 16:40 csrow3 -r--r--r--. 1 root root 4096 Oct 25 16:40 max_location -r--r--r--. 1 root root 4096 Oct 25 16:40 mc_name drwxr-xr-x. 2 root root 0 Oct 25 16:40 power drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank0 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank1 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank2 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank3 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank4 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank5 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank6 drwxr-xr-x. 3 root root 0 Oct 25 16:40 rank7 --w-------. 1 root root 4096 Oct 25 16:40 reset_counters -r--r--r--. 1 root root 4096 Oct 25 16:40 seconds_since_reset -r--r--r--. 1 root root 4096 Oct 25 16:40 size_mb lrwxrwxrwx. 1 root root 0 Oct 2 12:02 subsystem -> ../../../../../bus/mc0 -r--r--r--. 1 root root 4096 Oct 25 16:40 ue_count -r--r--r--. 1 root root 4096 Oct 25 16:40 ue_noinfo_count -rw-r--r--. 1 root root 4096 Oct 25 16:40 uevent [root@alpha00 ~]# [root@alpha00 ~]# edac-ctl --status edac-ctl: drivers are loaded. [root@alpha00 ~]# edac-util edac-util: No errors to report. [root@alpha00 ~]# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected
Configure SMARTD (CentOS7)
Default el7 smartd config files send deficient email notices about disk failures. Overwrite.
/bin/cp ~/git/scripts/etc/smartd.conf /etc/smartmontools/ /bin/cp ~/git/scripts/etc/smartd_warning.sh /etc/smartmontools/ systemctl enable smartd systemctl restart smartd systemctl status smartd
Enable User Disk Quotas (OPTIONAL)
(+CentOS7)
- read http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Storage_Administration_Guide/ch-disk-quotas.html
- emacs -nw /etc/fstab, add "grpquota,usrquota" to filesystem options, e.g.:
[root@isdaq00 home1]# grep quota /etc/fstab UUID=5a2aefbd-45db-475e-841e-12ec89220fbd /home1 ext4 defaults,grpquota,usrquota 1 2
- cd /; umount /home1; mount /home1
- quotacheck -cug /home1
- quotacheck -avug
- quotaon -av
- quota system is now active
- increase the soft quota time limit from default 7days to 30 or 60 days: edquota -t
- set quotas for all users (see below)
- setup warnquota:
- create warnquota config file: emacs -nw /etc/warnquota.conf
# values can be quoted: MAIL_CMD = "/usr/sbin/sendmail -t" FROM = root SUBJECT = User %i@%h exceeded allocated disk quota CC_TO = "root" # If you set this variable CC will be used only when user has less than # specified grace time left (examples of possible times: 5 seconds, 1 minute, # 12 hours, 5 days) # CC_BEFORE = 2 days SUPPORT = "root" # Text in the beginning of the mail (if not specified, default text is used) # This way text can be split to more lines # Line breaks are done by '|' character # The expressions %i, %h, %d, and %% are substituted for user/group name, # host name, domain name, and '%' respectively. For backward compatibility # %s behaves as %i but is deprecated. MESSAGE = User "%i" on "%h" has exceeded the allocated disk quota.||Please delete any unnecessary files on following filesystems or|contact the system administrato r to increase your quota allocation:| SIGNATURE = --|automated email from warnquota
- note that %i@%h in the SUBJECT line do not seem to work
- create cron job: emacs -nw /etc/cron.daily/warnquota
#!/bin/sh warnquota #end
- chmod a+x /etc/cron.daily/warnquota
- touch /etc/crontab
Useful commands for managing quotas:
- repquota -a | sort -n -k3 ### show quota of all users sorted by disk usage
- edquota -u username ### open "vi" editor to change user quotas
- repquote -a | grep username ### report quota for given user
- setquota -u username 0 0 0 0 /home1 ### disable quotas for given user
- setquota -u username 50000000 100000000 0 0 /home1 ### set quotas for 50GB soft and 100GB hard
- edquota -t ### change user quota time limits
- edquote -tg ### change group quota time limits
Enable NFS V4 server (CentOS7)
- create /etc/exports. example: (fsid numbers should be unique and increase 1,2,3,...)
/home1 @home_export(rw,no_root_squash,async,fsid=1) /data1 @data_export(rw,no_root_squash,async,fsid=2)
- check the netgroup file
- if using NIS: check NIS netgroup: ypcat -k netgroup
- if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
- if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
- enable things, start them:
firewall-cmd --get-services firewall-cmd --permanent --add-service=nfs firewall-cmd --permanent --add-service=rpc-bind ### needed for ubuntu automounter firewall-cmd --reload firewall-cmd --list-all systemctl enable nfs-server systemctl start nfs-server systemctl status nfs
Enable NFS V3 server (CentOS7)
ps -efw | grep rpc.mountd # should be running! firewall-cmd --get-services firewall-cmd --permanent --add-service=mountd firewall-cmd --permanent --add-service=rpc-bind firewall-cmd --reload firewall-cmd --list-all
Enable NFS V3 server
- edit /etc/hosts.allow, add or uncomment "mountd: 142.90.0.0/255.255.0.0"
- create /etc/exports. example:
/home1 @home_export(rw,no_root_squash,async) /data1 @data_export(rw,no_root_squash,async)
- check the netgroup file
- if using NIS: check NIS netgroup: ypcat -k netgroup
- if no NIS, create /etc/netgroup: @daqmachines (deap00,,) (deap01,,) (deap02,,)
- if no NIS, edit /etc/nsswitch.conf, make the netgrooup line read: "netgroup: files"
- chkconfig nfs on
- chkconfig nfslock on
- service nfs restart
Then on ladd00 need to do
- ssh to root@ladd00
- edit /etc/auto.daq to add new machine...
- make -C /var/yp
Enable NFS V4 SERVER (SL6)
- if used with NIS, same as NFSv3
- if used as standalone, need to edit idmapd.conf - set the "Domain" name to the same value on NFS server and NFS slave (default automagically determined value does not always work). More TBW.
Enable AMANDA backups
AMANDA backups are already enabled by TRIUMF kickstart installs. For non-kickstart installation, follow instructions at [http://amanda/~amanda], or look at "/triumfcs/trshare/olchansk/linux/amanda/amanda-enable.perl". As final step, use [https://helpdesk.triumf.ca] to contact TRIUMF CS to add this new machine to the amanda backup list.
- yum install triumf-amanda
Enable AMANDA backups (CentOS7)
yum install amanda-client systemctl list-unit-files | grep -i amanda #systemctl enable amanda systemctl enable amanda.socket systemctl enable amanda-udp.socket systemctl restart amanda.socket systemctl restart amanda-udp.socket firewall-cmd --get-services firewall-cmd --permanent --add-service=amanda-client firewall-cmd --reload firewall-cmd --list-all echo amanda.triumf.ca amanda amdump >> /var/lib/amanda/.amandahosts
On amanda server, add new machine to the disklist, then:
amcheck -c daily titan00
Enable DCACHE
DAQ dcache server is mounted as
/daq/pnfs/triumf.ca/data/
For Centos-7 machines, you need to adjust the firewall rules in order to be able to communicate with the trdata machines; this is only necessary if you are copying data to trdata. The firewall changes are
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.212/32" port protocol="tcp" port="0-65535" accept" firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.107.156/32" port protocol="tcp" port="0-65535" accept" firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.100.219/32" port protocol="tcp" port="0-65535" accept" firewall-cmd --reload firewall-cmd --list-all
This instructions are unnecessary
- # mkdir -p /pnfs
- # edit /etc/rc.local, add to the end of file: "mount -o intr,rw,noac,hard,nfsvers=3 trdata00:/pnfs /pnfs &"
- # . /etc/rc.local
For more information on, see TrdataDcache dcache page.
Configure Ganglia (Centos7)
CentOS7 Ganglia instructions (EPEL7 ganglia-3.7.2)
/bin/rm /etc/gmond.conf yum -y install "ganglia-gmond*" /bin/cp -v /dev/null /etc/ganglia/conf.d/multicpu.conf # collects useless data /bin/cp -v /dev/null /etc/ganglia/conf.d/netstats.pyconf # spews errors into syslog /bin/cp -v /dev/null /etc/ganglia/conf.d/diskstat.pyconf # collects useless data /bin/cp -v /dev/null /etc/ganglia/conf.d/procstat.pyconf # do not create /tmp/gmond.conf yum erase -y ganglia-vmstat ganglia-sensors ganglia-top ganglia-smart ganglia-cpumhz cd ~/git/scripts git pull /bin/cp etc/gmond.conf /etc/ganglia/gmond.conf systemctl enable gmond systemctl restart gmond systemctl status gmond cd ganglia ./ganglia-all.perl make install cd ~
Configure Ganglia (Centos8)
CentOS8 Ganglia instructions (EPEL8 ganglia-3.7.2)
/bin/rm /etc/gmond.conf yum -y install "ganglia-gmond*" /bin/cp ~/git/scripts/etc/gmond.conf /etc/ganglia/gmond.conf systemctl enable gmond systemctl restart gmond systemctl status gmond cd ~/git/scripts/ganglia git pull ./ganglia-all.perl make install
Configure TRIUMF DAQ packages
(+CentOS7)
cd /etc/yum.repos.d wget http://daq.triumf.ca/~daqweb/yum/triumf-daq.repo
Install Konstantin's packages
(+CentOS7)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install diskscrub emailonreboot monitor_nfs
Install memtest and PXE boot
!!!DO NOT DO THIS!!!
cd /boot wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.bin.gz wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.bin.gz wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.10 wget http://ladd00.triumf.ca/tftpboot/gpxe-1.0.1+-gpxe.lkrn emacs -nw /boot/grub/grub.conf title memtest86+-5.01 root (hd0,0) kernel /boot/memtest86+-5.01.bin.gz title memtest86+-4.20 root (hd0,0) kernel /boot/memtest86+-4.20.bin.gz title memtest86+-4.10 root (hd0,0) kernel /boot/memtest86+-4.10 title pxeboot root (hd0,0) kernel /boot/gpxe-1.0.1+-gpxe.lkrn
Install node monitoring
!!! OBSOLETE, DO NOT DO THIS !!!
(+CentOS7)
yum --disablerepo=\* --enablerepo=triumf-daq --skip-broken install triumf_nodeinfo /usr/sbin/sendnodeinfo.perl --config ladd00.triumf.ca:8600 emacs -nw /etc/nodeinfo /usr/sbin/sendnodeinfo.perl ladd00.triumf.ca:8600
Install gonodeinfo node monitoring
(+Ubuntu, +CentOS7, +CentOS8)
go to https://bitbucket.org/dd1/gonodeinfo follow instructions:
yum -y install golang mkdir ~/git cd ~/git git clone https://bitbucket.org/dd1/gonodeinfo.git # or git clone https://daq.triumf.ca/~olchansk/git/gonodeinfo.git cd gonodeinfo git pull make make install # install gonodeinfo agent cd ~ # this is important
- emacs -nw /etc/gonodeinfo.conf
- change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
- change "Servers" to read: Servers: daq00.triumf.ca:8601
- run gonodeinfo -e
- if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
- on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
- try gonodeinfo again, there should be no error
- on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now
Install latest system updates
(+CentOS7)
yum update -y
Configure TRIUMF Printers (CentOS7)
systemctl stop cups systemctl disable cups echo "ServerName printers.triumf.ca" > /etc/cups/client.conf lpstat -a
Disable syslog spam (CentOS7)
Default el7 config is spamming the syslog with useless messages "systemd: Starting Session", etc. Disable this:
echo auditctl -e 0 >> /etc/rc.local echo /usr/bin/systemd-analyze set-log-level notice >> /etc/rc.local /etc/rc.local
Install basic system packages (CentOS7)
(if starting from minimal system, basic system packages required:)
yum install -y which psmisc redhat-lsb-core xorg-x11-xauth xterm emacs-nox rsync tcpdump strace nfs-utils sysstat iftop tcsh yum install -y gcc gcc-c++ gdb glibc-static libstdc++-static zlib zlib-devel openssl-devel httpd-tools
Install packages needed for QUARTUS, ROOT, EPICS and MIDAS DAQ
(+CentOS7)
yum install --skip-broken giflib.x86_64 sysstat "libusb-devel*" "libusbx-devel*" unixODBC-devel postgresql-devel libxml2-devel libXpm-devel libgfortran git compat-readline43 "graphviz*" dcap "tigervnc*" telnet glibc"*" strace "fftw*" libpng "freetype*" xpdf "xemacs*" tkcvs xterm mutt "*-g77*" joe "libXmu*" dcap-devel gsl-devel pcre-devel h5py gd-devel xorg-x11-fonts"*" minicom xfig"*" perl-BSD-Resource "net-snmp-*" readline-static git-all nasm imake tcl-devel gv xorg-x11-twm expat-devel screen compat-readline5 ImageMagick ImageMagick-devel wget alacarte scipy numpy sympy nedit gnuplot php-cli php-domxml-php4-php5 php-gd php-fpdf php-cli kdebase cmake tcpdump sqlite sqlite-devel kdegraphics gdisk lsof gconf-editor iftop tk-devel mcelog kdm blt itcl lz4 bzip2 pbzip2 apr-devel apr-util-devel net-tools golang"*" --exclude golang-cover"*"hg"*" --exclude golang"*"hg"*" --exclude golang-pkg"*" --exclude golang-github"*" --exclude golang"*"git"*" mesa"*" xerces-c"*" diffuse clang i2c-tools texlive-revtex texlive-revtex4 kile kbibtex xrdp glibc.i686 gimp gimp-data-extras perl-GD"*" perl-Math"*" perl-Statistics-Basic cmake3 cmake3-gui extra-cmake-modules python2-pip mariadb-devel glibc-devel.i686 libzstd zlib-devel.i686
Install optional packages
!! DO NOT DO THIS !!
(do not install boost on 32-bit machines)
yum install --skip-broken "boost-*"
(packages for 32-bit software compilation on 64-bit machines. this is optional)
yum install --skip-broken giflib.i386 giflib.i686 compat-libf2c-34.i386 compat-libf2c-34.i686 mysql-devel.i686 openssl-devel.i686 unixODBC-devel.i686 libstdc++-devel.i386 libstdc++-devel.i686 "zlib-*.i686" "libXext-*.i686" "libXtst-*.i686" glibc-static.i686 freetype.i686 fontconfig.i686 libpng.i686 libXrender.i686 glibc-devel.i686 libX11-devel.i686 libXpm-devel.i686 libXft-devel.i686 mysql-devel.i686 dcap-devel.i686 gsl-devel.i686 pcre-devel.i686 fontconfig-devel.i686 freetype-devel.i686 libpng-devel.i686 libjpeg-devel.i686 libgfortran.i686 libxml2-devel.i686 gd-devel.i686 readline-devel.i686 ncurses-devel.i686 libXdmcp.i686 readline-static.i686 compat-readline5.i686
yum install boost-devel.i686
(separately install these packages - they collide with the big bunch above)
yum install rdesktop
yum reinstall urw-fonts
Install libraries for PHYSICA (CentOS7)
To run physica built on el6 from git sources on el7, do this:
(building physica on el7 is nort supported at this time)
(see more http://www.triumf.info/wiki/DAQwiki/index.php/PHYSICA)
yum -y install libX11.i686 gd.i686 libpng12.i686 readline.i686 compat-libf2c-34.i686
Install additional desktop environements (CentOS7)
# LXQT (from EPEL) # NOT COMPATIBLE WITH el7.7 # yum -y install "lxqt*" # Cinnamon desktop (from EPEL) yum -y install cinnamon # KDE5 not available yet # MATE (from epel) yum -y groupinstall "MATE Desktop" yum -y install mate-common mate-icon-theme-faenza mate-netspeed mate-sensors-applet mate-themes-extras mate-utils yum -y erase ModemManager abrt abrt-libs abrt-gui-libs # XFCE4 (from EPEL) yum -y groupinstall xfce yum -y install "xfce*plugin" xfce4-about --exclude xfce4-hamster-plugin yum -y erase bash-completion
- make the MATE desktop as default
cd ~root/git/scripts/ git pull /bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
- lighdm login manager (from EPEL)
yum install lightdm lightdm-kde lightdm-qt lightdm-qt5
- and switch from gdm to lighdm
systemctl disable gdm.service systemctl enable lightdm.service (systemctl stop gdm; systemctl restart lightdm) &
Install SMART scripts
(+CentOS7)
ln -sf ~/git/scripts/smart-status/smart-status.perl ~/
Install NTFS drivers
yum install ntfs-3g ntfsprogs (from EPEL)
Install HFS and HFS+ drivers (CentOS7)
yum --disablerepo=\* --enablerepo=elrepo install kmod-hfs kmod-hfsplus
Install Google Chrome web browser (64-bit CentOS7)
DOES NOT WORK AS OF google-chrome-stable-114 because google uses signature incompatible with CentOS-7, see https://www.reddit.com/r/chrome/comments/13s799o/googlechromebeta_1140573545_rpm_invalid_signature/
automatic updates will fail with signature check error, to defeat it lock old version of google-chrome:
yum versionlock google-chrome-stable
THIS DOES NOT WORK ANYMORE:
/bin/cp ~/git/scripts/etc/google-chrome-64.repo /etc/yum.repos.d/ yum install google-chrome-stable
Enable monitoring of HTTPS certificates
On SL6, CentOS7:
yum install crypto-utils /etc/cron.daily/certwatch strace -f /etc/cron.daily/certwatch |& grep open | grep crt
Enable 100dpi fonts for EPICS
(+CentOS7)
ln -s /usr/share/X11/fonts/100dpi /etc/X11/fontpath.d/
Enable crontab @reboot for MIDAS (CentOS7)
el7 has a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).
mkdir /etc/systemd/system/crond.service.d echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/crond.service.d/local.conf systemctl daemon-reload systemctl cat crond.service
el7 has a second bug, sometimes it thinks the network is running when it is not, specifically, DNS is not working and autofs mount of user home directory fails. So not only cron has to wait for ypbind and autofs to be ready, we also have to wait for DNS to be ready:
cd ~/git/scripts git pull cp etc/wait-for-dns.service /etc/systemd/system/ systemctl daemon-reload systemctl enable wait-for-dns systemctl restart wait-for-dns # should return immediately. if there is a 30 second time, script is broken, disable it systemctl status wait-for-dns # to see what went wrong.
Explore the systemd dependacy tree using "systemctl list-dependencies" maybe with "--all".
Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.
Enable firewall for MIDAS (CentOS7)
Default el7 configuration prevents all access to servers running on the local machine, including access to MIDAS mhttpd (tcp port 8443) and mserver (all tcp ports).
To enable access to mhttpd:
firewall-cmd --add-port=8443/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
To enable access to the mserver from a specific host: (replace 142.90.111.175 with the IP address of the permitted host)
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.111.175/32" port protocol="tcp" port="0-65535" accept" firewall-cmd --reload firewall-cmd --list-all
To enable access from the private network (replace "192.168.1.0" with your private network number):
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="192.168.1.0/24" port protocol="tcp" port="0-65535" accept" firewall-cmd --reload firewall-cmd --list-all
Enable firewall for EPICS (CentOS7)
To enable access to TRIUMF EPICS servers, do this:
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.132.0/23" accept" firewall-cmd --reload firewall-cmd --list-all
For UCN the controls people seem to have EPICS setup on a different server; this might be true for CMMS as well. In this case the firewall rule change should be
firewall-cmd --permanent --add-rich-rule="rule family="ipv4" source address="142.90.139.0/23" accept" firewall-cmd --reload firewall-cmd --list-all
Disable gdm and X11 (OPTIONAL)
initctl stop prefdm echo "start on never" > /etc/init/prefdm.override echo "start on never" > /etc/init/splash-manager.override initctl reload-configuration
then enable login on default console:
echo "plymouth quit" >> /etc/rc.local echo "X_TTY=xxx/dev/tty1" >> /etc/sysconfig/init
Install JAVAWS (OPTIONAL)
- to run Java "web start" jnlp files (EVO, SEEVOGH, etc): javaws Downloads/spider.jnlp
- install javaws:
- yum install icedtea-web icedtea-web-javadoc
Install firefox java plugin (OPTIONAL, DO NOT DO THIS)
This installs the Oracle Java plugin:
- rpm -vh --install ~deap/jdk-7u15-linux-x64.rpm
- ls -l /usr/lib64/mozilla/plugins/
- ln -s /usr/java/jdk1.7.0_15/jre/lib/amd64/libnpjp2.so /usr/lib64/mozilla/plugins/
- start firefox, go edit->preferences->general->manage add-ons->plugins
- "java plugin 1.7.0_15" should be listed
Configure USB device permissions
(+CentOS7)
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
- create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c" ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}" ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
- reload udev rules: udevadm control --reload-rules
- apply new permissions: udevadm trigger --action=add
- watch udev activity: udevadm monitor -p
Disable modem-manager
The modem-manager will try to talk to any serial devices attached to USB serial ports. It assumes that those devices are modems and will send out modem-specific commands. if the devices are not modems and do not understand or do not like modem commands, well that's too bad. modem-manager is installed by the ModemManager package required by the NetworkManager package, and there is no configuration setting to turn modem-manager off.
One way to disable it is: chmod a= /usr/sbin/modem-manager
Another way to disable it is by forced uninstall: rpm --erase --nodeps ModemManager
Remember to kill the running copy: killall -KILL modem-manager
Caveat: it is not clear if modem-manager would not be resurrected by an update to the NetworkManager or ModemManager packages.
Configure Altera jtagd
(if needed)
mkdir /etc/jtagd echo 'Password = "123";' > /etc/jtagd/jtagd.conf cp -pv /daq/daqshare/olchansk/altera/11.0/quartus/linux/pgm_parts.txt /etc/jtagd/jtagd.pgm_parts
- start local jtagd: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagd
- test local connection: /daq/daqshare/olchansk/altera/11.0/quartus/bin/jtagconfig
- test remote connection (add this machine to your .jtag.conf, run jtagconfig
For more information, go to Quartus
Install EOS
Instructions from here: http://eos-docs.web.cern.ch/eos-docs/quickstart/setup_repo.html
rpm -vh --install https://dss-ci-repo.web.cern.ch/dss-ci-repo/eos/citrine/tag/el-7/x86_64/eos-repo-el7-generic-1.noarch.rpm yum-config-manager --disable eos-citrine # disable auto-update because all packages are not signed yum-config-manager --disable eos-dep # disable auto-update because all packages are not signed. yum install eos-client eos-fuse --enablerepo=eos-citrine
Install fix for the el7 systemd dbus boot hang
Around early Summer 2018 el7 started showing a boot problem. In the nutshell, there is a problem with the dbus connection between dbus and systemd that prevents polkit, firewalld, etc from starting. The system eventually boots enough that one can ssh into it, but most things do not work. Notably, polkit is not running, firewalld is not running, ssh login takes about 15-30 second.
Solution is to add a special systemd service to check that dbus started correctly. It that runs after dbus is started, but before it is used, and it restarts dbus in a loop with a delay until dbus starts correctly. In testing, dbus always starts correctly after the first retry.
cd ~root/git/scripts/etc git pull /bin/cp -vf systemd-check-dbus.perl /usr/bin/ /bin/cp -vf systemd-check-dbus.service /etc/systemd/system/ systemctl daemon-reload systemctl enable systemd-check-dbus systemctl start systemd-check-dbus systemctl status systemd-check-dbus
After linux boots, if everything was okey, the script will report this:
[root@iris01 ~]# systemctl status systemd-check-dbus ... Feb 08 17:15:49 iris01.triumf.ca systemd[1]: Starting Check that systemd is registered with dbus... Feb 08 17:15:49 iris01.triumf.ca sh[4283]: Starting check for systemd dbus connection Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List: string "org.freedesktop.DBus" Feb 08 17:15:50 iris01.triumf.ca sh[4283]: List: string "org.freedesktop.systemd1" Feb 08 17:15:50 iris01.triumf.ca sh[4283]: systemd1 dbus service exists, success! Feb 08 17:15:50 iris01.triumf.ca sh[4283]: Finished check for systemd dbus connection Feb 08 17:15:50 iris01.triumf.ca systemd[1]: Started Check that systemd is registered with dbus.
If the boot problem happened, the script will report about restarting dbus.
Note: the systemd service file adjusts the start order of other services, this adjustment seems to reduce the probability of the problem.
Configure GRUB boot loader (CentOS7, CentOS8)
- emacs -nw /etc/default/grub, remove "rhgb" and "quiet" from GRUB_CMDLINE_LINUX
- grub2-mkconfig -o /boot/grub2/grub.cfg
- grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg
- grub2-editenv list # show contents of boot environement file
- /bin/rm /boot/grub2/grubenv # remove stale settings, make grub2 boot from first entry in config file
Install memtest86+ (CentOS7, CentOS8)
yum -y install memtest86+ /bin/cp -vf /usr/share/memtest86+/20_memtest86+ /etc/grub.d/ /bin/chmod a+x /etc/grub.d/20_memtest86+ grub2-mkconfig -o /boot/grub2/grub.cfg
Disable ELREPO
sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo_triumf.repo sed 's/enabled=.*/enabled=0/' -i /etc/yum.repos.d/elrepo.repo
Reduce install size (optional)
This is optional. Only do this if reducing the size of the OS image is very important.
Do this for VME processors.
yum erase "texlive*" "java*" "boost*" libreoffice"*" #yum erase "xemacs*" yum erase "libstdc++-docs" yum erase firefox google-chrome"*" yum clean all
/bin/rm -rf /usr/share/help /bin/rm -rf /usr/share/doc
Update from el7.6 to el7.7
yum-config-manager --disable zfs yum-config-manager --disable zfs-kmod yum-config-manager --disable zfs-testing-kmod yum versionlock delete zfs yum versionlock delete kernel yum -y update "yum*" "rpm*" yum -y erase libqtxdg lxqt-qtplugin ### LXQT is not compatible yum update after rebooting into el7.7, follow instructions for updating ZFS from version 0.7 to 0.8.
Update ZFS
- CentOS-7: 0.8.5 to 2.0.7
- update kernel to latest version, reboot
- check /etc/yum.repos.d/zfs.repo has [zfs-kmod] baseurl=http://download.zfsonlinux.org/epel/7.9/kmod/$basearch/
- yum --enablerepo=zfs-kmod update
- reboot, login as root
- run "zfs version"
- run "zfs upgrade"
Switch from LADD-NIS to DAQ-NIS
domainname DAQ-NIS /usr/lib64/yp/ypinit -s daq00 ls -l /var/yp sed -i s/LADD-NIS/DAQ-NIS/ /etc/yp.conf sed -i s/LADD-NIS/DAQ-NIS/ /etc/sysconfig/network systemctl restart ypserv systemctl restart ypbind ypwhich ypwhich -m
Finish installation
reboot
Special hardware settings
ASUS Crosshair mobo
- use BIOS version 1207 or newer
- (before CentOS7) sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
- CentOS7: installs correct drivers automatically
ASUS Crosshair-II mobo
- use BIOS version 2607 or newer
- for the onboard IDE to work, add "all-generic-ide" to kernel boot options in grub.conf
- sensors need these drivers from ELREPO: yum install --noplugins kmod-it87 kmod-k10temp; sensors-detect; service lm_sensors restart; sensors
ASUS P7P55D EVO mobo
- use BIOS version 2004 or newer
- SL6 - install special driver for on board PCIe GigE network port and disable on board PCI GigE network port:
- yum --enablerepo elrepo install kmod-r8168 kmod-r8169
- # do not do this: sed 's/^blacklist/#blacklist/' -i /etc/modprobe.d/blacklist-r8169.conf
- reboot
- verify that correct drivers are loaded: ethtool -i eth0; ethtool -i eth1
- note: there will be no eth1 - r8169 driver is disabled.
ASUS P6X58-E-WS mobo
- BIOS settings
- F1 or DEL to enter BIOS setup, F8 boot menu
- go to POWER->HW mon, confirm CPU temperature is around 30C. (heatsink is installed correctly. Bad heatsink temperature quickly goes up to 50-70C).
- Main menu: Storage config - SATA change IDE->AHCI
- System information: confirm BIOS version 301, CPU type, memory size
- AI Tweak: set DRAM frequency - AUTO->DDR3-1333
- Advanced->Onboard devices: LAN BOOT: enabled
- Power->HW monitor: CPU Q-FAN: enabled
- Boot->Settings: Quick boot: enabled; Full screen logo: disabled; Wait for F1: disabled
- Save and exit
ASUS E35M1-M PRO mobo
- http://www.asus.com/Motherboards/E35M1M_PRO/#specifications
- use BIOS version 1002 or newer
- for CPU temperature: install kmod-k10temp from ELREPO (kmod-k10temp-0.0-4.el6.elrepo.x86_64.rpm)
- for Sensors: yum --enablerepo elrepo install kmod-w83627ehf; modprobe w83627ehf; sensors
- for Graphics: yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
- to enable booting from USB3, edit /etc/dracut.conf, change line "add_drivers" to read: add_drivers+="xhci-hcd"
- to use multiple monitors, run "aticonfig --initial --heads=2 --adapter=1 --xinerama=on", to change screen layout, edit /etc/X11/xorg.conf. Only dual monitors DVI+HDMI seem to work. Tripple monitors does not seem to work.
Sensors instructions below are obolete (use driver from ELREPO)
- for Sensors, install driver for NCT6776F chip from https://github.com/groeck/w83627ehf/archives/master (in the Makefile, change the line "KERNEL_BUILD=" to read: "KERNEL_BUILD:=/usr/src/kernels/$(TARGET)"):
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/groeck-w83627ehf-dd3e543/w83627ehf.ko echo "modprobe hwmon; modprobe hwmon-vid; modprobe k10temp; rmmod w83627ehf; insmod /root/w83627ehf.ko" >> /etc/rc.local
ASUS E45M1-M PRO mobo
- https://www.asus.com/Motherboards/E45M1M_PRO/#specifications
- use BIOS 1202 or newer
- follow the E35M1-M PRO instructions above
ASUS P9X79 WS
- http://www.asus.com/Motherboard/P9X79_WS/
- use BIOS version 4901. Older versions seem to be ok: 3101, 3401, 4701, 4802 or newer. If BIOS is 1305 or older, install P9X79-WS-CAP-Converter.ROM (BIOS 2902/3101), then the new BIOS.
- (not needed for CentOS7) for CPU temperature, install coretemp
- (not needed for CentOS7) for sensors, install driver for NCT6776F chip same as E35M1-M above.
- BIOS Settings:
- enter "Advanced mode"
- Ai Tweaker -> Ai Overclock Tuner -> Set to "XMP" - this enables DDR3-1600 RAM speed vs DDR3-1333 by default
- ### NOT THIS: Monitor -> CPU fan speed low limit -> Set to "200 RPM" - we are using high efficiency slow turning CPU coolers and the default 600 RPM is right on the edge of firing false warnings
- Monitor -> disable Q-fan on for all fans - let all fans always run at maximum RPMs
- Boot -> Full screen logo -> Set to "disabled"
- Wait for F1 -> Set to "disabled"
ASUS P8B-M
- use BIOS version 6103 or newer
- for CPU temperature, install coretemp
- for sensors, install driver for NCT6776F chip same as E35M1-M above.
SUPERMICRO X9SCL
- yum install kmod-w83627ehf.x86_64 coretemp
- xemacs -nw /etc/rc.local, add:
modprobe coretemp modprobe w83627ehf
ASUS Z87-WS
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko echo modprobe hwmon-vid >> /etc/rc.local echo insmod /root/nct6775.ko >> /etc/rc.local /etc/rc.local sensors
ASUS Z97-WS
the nct6775 driver does not work because of conflict with ACPI.
ASUS Z170-DELUXE
- use bios 3801
- set XMP mode (DDR4-2400)
- Advanced->On board devices: set sata mode to "M2", set PCIe slot 3 to "x4"
- boot: disable f1, disable logo, disable numlock
ASUS AM1M-A
- use BIOS 602 or later
- SL6.5 installer cannot use USB2 ports and the network. Use USB3 ports (blue colour) to boot USB installer (memtest, rescue, etc)
- SL6.5 kernels require boot option "iommu=soft" or USB2 and network do not work. (USB3 - blue ports - seems okey)
- install ATI/AMD video drivers from ELREPO (see below)
- sensors chip is ITE IT8623E, for SL6, use standalone driver from lm_sensors. (2 fans rpm, 2 temperatures):
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/it87.ko echo modprobe hwmon_vid >> /etc/rc.local echo insmod /root/it87.ko >> /etc/rc.local . /etc/rc.local
- for el7 use it87.ko driver:
cd ~root wget https://daqshare.triumf.ca/~olchansk/linux/CentOS7/it87.ko echo modprobe hwmon_vid >> /etc/rc.local echo insmod /root/it87.ko >> /etc/rc.local . /etc/rc.local
- sensors output:
[root@midemma02 ~]# sensors radeon-pci-0008 Adapter: PCI adapter temp1: +22.0°C (crit = +120.0°C, hyst = +90.0°C) fam15h_power-pci-00c4 Adapter: PCI adapter power1: N/A (crit = 25.00 W) k10temp-pci-00c3 Adapter: PCI adapter temp1: +22.2°C (high = +70.0°C) (crit = +70.0°C, hyst = +69.0°C) it8603-isa-0290 Adapter: ISA adapter in0: +0.96 V (min = +2.50 V, max = +2.95 V) ALARM in1: +2.23 V (min = +0.94 V, max = +1.22 V) ALARM in2: +2.03 V (min = +0.74 V, max = +0.77 V) ALARM in3: +2.00 V (min = +1.26 V, max = +0.13 V) ALARM in4: +2.23 V (min = +2.95 V, max = +2.15 V) ALARM 3VSB: +3.36 V (min = +6.00 V, max = +2.50 V) ALARM Vbat: +3.22 V +3.3V: +3.36 V fan1: 611 RPM (min = 200 RPM) fan2: 707 RPM (min = 600 RPM) ALARM temp1: +38.0°C (low = +122.0°C, high = +122.0°C) sensor = thermistor temp2: +22.0°C (low = +119.0°C, high = -35.0°C) ALARM sensor = thermistor temp3: -128.0°C (low = +16.0°C, high = +93.0°C) sensor = thermistor intrusion0: ALARM [root@midemma02 ~]#
- AMD "Athlon(tm) 5350 APU" graphics supports 2 monitors maximum (mobo has 3 video outputs, only 2 can be used together)
Intel SE7230NH1
- front panel header connector pinout is like this:
PWR LED | 1 2| | 3 4| PWR LED | 5 6| HDD LED | 7 8| HDD LED | 9 10| PWR SW |11 12| NIC1 LED PWR SW |13 14| NIC1 LED RST SW |15 16| RST SW |17 18| |19 20| NMI SW |21 22| NIC2 LED NMI SW |23 24| NIC2 LED ... |... | |33 34|
ASUS H110M-A/M.2
- use BIOS 2003 or later
- dmidecode | grep -i nct reports: Nuvoton NCT5539D
- sensors chip is "NCT6793D or compatible chip", for el7, use this driver:
cd ~root wget http://ladd00.triumf.ca/~olchansk/linux/nct6775.ko echo modprobe hwmon-vid >> /etc/rc.local echo insmod /root/nct6775.ko >> /etc/rc.local /etc/rc.local sensors
- sensors output:
[root@daq03 ~]# sensors acpitz-virtual-0 Adapter: Virtual device temp1: +27.8°C (crit = +119.0°C) temp2: +29.8°C (crit = +119.0°C) nct6793-isa-0290 Adapter: ISA adapter in0: +0.34 V (min = +0.00 V, max = +1.74 V) in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM in2: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM in3: +3.39 V (min = +0.00 V, max = +0.00 V) ALARM in4: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM in5: +0.15 V (min = +0.00 V, max = +0.00 V) ALARM in6: +0.97 V (min = +0.00 V, max = +0.00 V) ALARM in7: +3.38 V (min = +0.00 V, max = +0.00 V) ALARM in8: +3.12 V (min = +0.00 V, max = +0.00 V) ALARM in9: +1.00 V (min = +0.00 V, max = +0.00 V) ALARM in10: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM in11: +0.12 V (min = +0.00 V, max = +0.00 V) ALARM in12: +0.14 V (min = +0.00 V, max = +0.00 V) ALARM in13: +0.12 V (min = +0.00 V, max = +0.00 V) ALARM in14: +0.13 V (min = +0.00 V, max = +0.00 V) ALARM fan1: 1041 RPM (min = 0 RPM) fan2: 1020 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) fan6: 0 RPM SYSTIN: +119.0°C (high = +98.0°C, hyst = +95.0°C) sensor = thermistor CPUTIN: +26.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor AUXTIN0: +27.5°C sensor = thermistor AUXTIN1: +112.0°C sensor = thermistor AUXTIN2: +111.0°C sensor = thermistor AUXTIN3: +111.0°C sensor = thermistor PECI Agent 0: +28.0°C (high = +98.0°C, hyst = +95.0°C) (crit = +100.0°C) PECI Agent 0 Calibration: +25.5°C PCH_CHIP_CPU_MAX_TEMP: +0.0°C PCH_CHIP_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +31.0°C (high = +80.0°C, crit = +100.0°C) Core 0: +31.0°C (high = +80.0°C, crit = +100.0°C) Core 1: +28.0°C (high = +80.0°C, crit = +100.0°C) [root@daq03 ~]#
Supermicro X11SSH-F
- blacklist the mei and mei_me drivers per http://www.supermicro.com/support/faqs/faq.cfm?faq=14537
[root@alpha00 ~]# more /etc/modprobe.d/blacklist.conf blacklist mei blacklist mei_me [root@alpha00 ~]#
- mobo requires M.2 PCIe SSD (M.2 SATA SSD would not work. SATA SATA SSD ok)
- boot from M.2 PCIe SSD requires UEFI boot (from an MSDOS partition on the SSD)
ASUS TUF Z390M-PRO GAMING (WI-FI)
- BIOS 2417 is okey, upgrade to this if older
- do not set XMP memory mode
- in the BIOS, enable the boot compatibility support module mode: BIOS (press DEL) -> Advanced mode -> BOOT -> CSM Module -> Enable CSM "yes".
- for SL6, install e1000e driver from ELREPO:
yum install --enablerepo=elrepo kmod-e1000e
- sensors chip appears to be "Nuvoton NCT6798D" not clear what driver to use
- dmidecode | grep -i nct reports: Nuvoton NCT6798D
- kmod-nct6775-0.0-5.el7_7.elrepo.x86_64.rpm from ELrepo finds the chip but bombs because of conflict with ACPI
ASUS PRIME X399-A
- BIOS 1002
- for reading temperatures and fan rotations, install driver: https://github.com/electrified/asus-wmi-sensors/issues/29
Configure X11 graphics
Special settings for DAQ
- add the following at the end of /etc/X11/xorg.conf. The enables Ctrl-Alt-KP-/ and Ctrl-Alt-KP-* to unlock the keyboard after Altera Quartus crash:
Section "ServerFlags" Option "AllowDeactivateGrabs" "true" Option "AllowClosedownGrabs" "true" EndSection
Install NVIDIA drivers
- yum --enablerepo=elrepo install nvidia-detect
- run: nvidia-detect
- as instructed by nvidia-detect, install correct driver:
- yum --enablerepo=elrepo install kmod-nvidia
- yum --enablerepo=elrepo install kmod-nvidia-304xx
- yum --enablerepo=elrepo install kmod-nvidia-173xx
- (before SL6.x: if it fails due to conflict with module-init-tools, run "yum --disablerepo \* --enablerepo elrepo update module-init-tools")
- yum erase xorg-x11-glamor ### see http://elrepo.org/tiki/kmod-nvidia (search for glamor)
- mv /etc/X11/xorg.conf /etc/X11/xorg.conf-xxx
- nvidia-xconfig
- (SL6) reboot
- (SL5) /dev/MAKEDEV nvidia
- (SL5) restart the X11 server (Ctrl-Alt-Backspace or "killall Xorg gdm-binary")
- observe that X11 server restarts using the NVIDIA driver (big NVIDIA logo on startup)
- if needed, login as root and run "nvidia-settings" to setup dual-screen configuration, etc
Install legacy NVIDIA drivers
For old NVIDIA cards:
- GeForce FX 5500
wget http://us.download.nvidia.com/XFree86/Linux-x86/173.14.31/NVIDIA-Linux-x86-173.14.31-pkg1.run sh ./NVIDIA-Linux-x86-173.14.31-pkg1.run
- GeForce 6200 - NVIDIA Corporation NV44A [GeForce 6200]
yum install nvidia-x11-drv-304xx-304.121 --enablerepo=elrepo nvidia-xconfig rmmod nvidia killall gdm-binary login as root nvidia-settings to setup multiple displays
Install ATI/AMD drivers
- yum --enablerepo elrepo install kmod-fglrx fglrx-x11-drv
- check that /etc/X11/xorg.conf section "Device" entry "Driver" says "fglrx"
- run "aticonfig --initial" to create xorg.conf if existing one is not good
- run "amdcccle" as root to configure dual-screens, etc
Note: 'amdcccle' is a GUI, so you must run this command from within a running X session
- killall Xorg
Install ATI/AMD drivers (CentOS7)
- wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
- wget http://elrepo.org/linux/testing/el7/x86_64/RPMS/kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm
- yum install acpid
- rpm -vh --install kmod-fglrx-15.12-3.el7.elrepo.x86_64.rpm fglrx-x11-drv-15.12-3.el7.elrepo.x86_64.rpm
- amdconfig -f --initial
- grub2-mkconfig -o /boot/grub2/grub.cfg
- reboot
- login as root
- amdcccle
NOTE: if both drivers - radeon and fglrx are loaded, boot will hang. the radeon driver is supposed to be blacklisted through grub rdblacklist=radeon entry which is installed by running grub2-mkconfig.
Install Intel drivers for HD4600/Z87
SL6.5 has the required drivers for the socket 1150 machines with Intel HD4600 graphics and Z87 chipset.
ASUS Z87 WS motherboard has these video connections with corresponding Intel video port assignements, as reported by "xrandr":
- DisplayPort - DP1/HDMI1
- MiniDisplayPort - DP2/HDMI2
- HDMI - HDMI3
Due to hardware limitations, 3 HDMI monitors using 2 passive DP-HDMI adapters (and 1 straight HDMI) cannot be used.
To use 3 monitors do this:
- 1st monitor: DisplayPort - DP-to-HDMI-passive-adapter - HDMI monitor (not tried: DP-to-DP-cable - DisplayPort monitor).
- 2nd monitor: MiniDisplayPort - MiniDP-to-DP-cable - DisplayPort monitor
- 3rd monitor: HDMI - HDMI-cable - HDMI monitor
With the monitors I have (Dell 1920x1200 VGA-HDMI-DP), the software thinks that there are 4 monitors: somehow both DP2 and HDMI2 see 1 minitor each, but the hardware cannot drive 4 monitors, so everything goes blank. To fix, disable HDMI2 (xrandr -display :0 --output HDMI2 --off) and enable DP2 (xrandr -display :0 --output DP2 --auto).
How to make this configuration permanent and how to assign monitor locations (left-right, etc), you figure it out.
Manual selection of monitor, video mode and resolution
Automatic selection of monitor and video mode usually works. When it does not, configure it manualls:
- physically go to the computer
- login as root
- run "nvidia-settings" on machines using the NVIDIA driver
- run "aticonfig" on machines with the ATI/AMD driver (use "aticonfig --initial" for initial setup, and good luck with anything more complicated)
- run "system-config-display".
- In the "hardware" tab, select monitor type: "generic LCD 1280x1024" or "generic LCD 1600x1200".
- In the "settings" tab, select "1280x1024" or "1600x1200" and "Thousands of colors".
- Press "ok", the display settings application should close.
- Logout, the new login window should use the new settings.
Disable screen saver
If machine is booted without any monitor connected, current video cards to not enable any video outputs. If a monitor is connected later, there is no video image and there is no easy way to get a video image.
This can be solved by configuring X11 to always enable some video output. Because the monitor type is not known when X11 starts, one has to select some standard video mode (i.e. VESA 1280x1024) on some video output (VGA, DVI or HDMI).
Only NVIDIA cards with the NVIDIA driver (from EPEL) is supported by these instructions.
- create default xorg.conf: nvidia-xconfig
- edit /etc/X11/xorg.conf
- add monitor section for the fake monitor:
Section "Monitor" Identifier "Monitor0" VendorName "Unknown" ModelName "Unknown" HorizSync 31.0 - 83.0 VertRefresh 59.0 - 61.0 Option "DPMS" "off" ModeLine "1280x1024" 108.00 1280 1328 1440 1688 1024 1025 1028 1066 +hsync +vsync EndSection
- add output selection in the "Device" section:
Section "Device" Identifier "Device0" Driver "nvidia" VendorName "NVIDIA Corporation" BoardName "GeForce 210" #Option "ConnectedMonitor" "DFP" #Option "ConnectedMonitor" "CRT" Option "ConnectedMonitor" "CRT-1" Option "UseEDID" "no" EndSection
- add fake video mode to the "Screen" section:
Section "Screen" Identifier "Screen0" Device "Device0" Monitor "Monitor0" DefaultDepth 24 SubSection "Display" Depth 24 Modes "1280x1024" EndSubSection EndSection
- disable screen saver and DPMS power off in the "ServerLayout" or "ServerFlags" section:
Section "ServerLayout" Identifier "Layout0" Screen 0 "Screen0" 0 0 InputDevice "Keyboard0" "CoreKeyboard" InputDevice "Mouse0" "CorePointer" Option "Xinerama" "0" Option "BlankTime" "0" Option "StandbyTime" "0" Option "SuspendTime" "0" Option "OffTime" "0" EndSection Section "ServerFlags" Option "BlankTime" "0" Option "StandbyTime" "0" Option "SuspendTime" "0" Option "OffTime" "0" EndSection
Finish installation
- logout and reboot the computer to have all the changes to take effect
Configure HTTPS server (CentOS7)
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache httpd.
First, configure apache httpd:
- execute these commands:
yum install -y mod_ssl certwatch crypto-utils cd /etc/httpd/conf.d/ mv ssl.conf ssl.conf-not-used ### remove the stock ssl.conf which refers to the localhost certificate that will expire in 1 year touch ssl.conf ### create a blank file to prevent automatic updates from installing a stock ssl.conf file # this is done later: rm /etc/pki/tls/certs/localhost.crt
- create new file ssl-daq12.conf # use actual hostname instead of daq12
Listen 443 https #SSLPassPhraseDialog exec:/usr/libexec/httpd-ssl-pass-dialog SSLSessionCache shmcb:/run/httpd/sslcache(512000) SSLSessionCacheTimeout 300 SSLRandomSeed startup file:/dev/urandom 256 SSLRandomSeed connect builtin SSLCryptoDevice builtin <VirtualHost *:443> ServerName daq12.triumf.ca DocumentRoot /var/www/html ErrorLog /var/log/httpd/daq12.log SSLEngine on # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf # new SSL settings: K.O. Jan 2020, SSLlabs rating "A+" SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4:!RSA SSLHonorCipherOrder on # pervious SSL settings: #SSLProtocol all -SSLv2 -SSLv3 #SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4 SSLCertificateFile /etc/pki/tls/certs/localhost.crt SSLCertificateKeyFile /etc/pki/tls/private/localhost.key #SSLCertificateChainFile /etc/pki/tls/certs/server-chain.crt #ProxyPass /elog/ http://localhost:8082/ retry=1 #ProxyPass / http://localhost:8080/ retry=1 Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" <Location /> SSLRequireSSL AuthType Basic AuthName "DAQ password protected site" Require valid-user # create password file: touch /etc/httpd/htpasswd # to add new user or change password: htpasswd /etc/httpd/htpasswd username AuthUserFile /etc/httpd/htpasswd </Location> </VirtualHost>
- stop httpd from listening on port 80: edit /etc/httpd/conf/httpd.conf, comment-out the line "Listen 80"
- enable and start httpd:
systemctl enable httpd systemctl restart httpd systemctl status httpd
- try to access https://daq12.triumf.ca
- you should see a complaint about self-signed certificate
- you should see a request for password (do not login yet)
- if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, then try again:
firewall-cmd --add-port=443/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
Second, configure certbot:
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
- check that port 80 is not used by anything:
- netstat -an | grep LISTEN | grep ^tcp | grep 80
- lsof -P | grep -i tcp | grep LISTEN | grep 80
- if lsof reports that httpd is listening on port 80, follow the httpd instructions above (remove "listen 80" from httpd.conf
- install certbot and open tcp port 80 in the firewall:
yum install -y certbot python2-certbot-apache # (from EPEL) firewall-cmd --add-port=80/tcp --permanent firewall-cmd --reload firewall-cmd --list-all
- certbot certonly --standalone --installer apache # then answer questions:
- "activate HTTPS for daq12.triumf.ca" - say ok
- "enter email address" - enter your own email address
- "please read terms..." - read the terms and say "agree"
- it will take a few moments...
- "please choose..." - say "easy" (http access is disabled (a) by firewall, (b) by local configuration
- "congratulations..." - say ok.
- certbot install --apache --cert-name daq12.triumf.ca # then answer questions:
- "choose redirect..." - say "1" (no redirect)
- look inside ssl-daq12.conf to see that SSLCertificateFile & co point to certbot certificates in /etc/letsencrypt/live/daq12.triumf.ca/
- remove self-signed localhost certificate, it will expire in 1 year and cause warnings and complaints: rm /etc/pki/tls/certs/localhost.crt
- enable automatic renewal
systemctl enable certbot-renew.timer systemctl start certbot-renew.timer systemctl list-timers --all
- to check corrent renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal
NOTE: this certificate will expire in 3 months, automatic renewal should work starting with certbot-0.12.0-4.el7.noarch. Certificate expiration should be automatically detected by "certwatch" and email will be sent to local root user, to be forwarded to an actual person by ~root/.forward.
Third, activate password protection:
- as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/httpd/htpasswd htpasswd /etc/httpd/htpasswd midas
Final test:
- access https://daq12.triumf.ca - https status should be "green"
- login with password should work
- the apache httpd test page should load
- check site security using the SSLlabs https tester. (I get grade "A-"): https://www.ssllabs.com/ssltest/
From here:
- Configure selinux to allow proxying
setsebool -P httpd_can_network_connect 1 systemctl restart httpd
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0
Configure large RAID6 arrays
- connect the disks
- check the disks health
- run smart-status.perl
- partition the disks
- yum install gdisk
- gdisk /dev/sdX
- delete all partitions: o
- create new partition: n, enter, enter, enter, fd00 (default sizes, partition type fd00)
- write and exit: w
- check presence of all partitions:
- /bin/ls -l /dev/sd*1
- prepare to use an external bitmap file
- touch /md6bitmap
- edit /etc/fstab, change entry for root filesystem from: "defaults 1 1" to "defaults 0 0"
- edit /boot/grub/grub.conf, change entry "kernel ... ro ..." to "kernel ... rw ..."
- create raid array:
- mdadm --create /dev/md6 --level=6 --bitmap=/md6bitmap --raid-devices=10 /dev/sd[b-k]1
- mdadm -Ds >> /etc/mdadm.conf
- cleanup /etc/mdadm.conf
- echo "echo 16384 > /sys/block/md6/md/stripe_cache_size" >> /etc/rc.local
- echo "echo 1 > /sys/block/md6/md/sync_speed_min" >> /etc/rc.local
- source /etc/rc.local
- observe raid array rebuild:
- watch -d -n1 "cat /proc/mdstat"
Configure ZFS
Install ZFS
(from here: https://github.com/zfsonlinux/zfs/wiki/RHEL-%26-CentOS)
Follow the instructions for "kABI-tracking kmod" - dkms modules seem to always mess up the system when upgrading to next release of zfs.
#rpm -vh --install http://archive.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7_3.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7_4.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7_5.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7_6.noarch.rpm #yum install http://download.zfsonlinux.org/epel/zfs-release.el7_7.noarch.rpm yum install http://download.zfsonlinux.org/epel/zfs-release.el7_9.noarch.rpm yum-config-manager --disable zfs yum-config-manager --disable zfs-kmod yum --enablerepo=zfs-kmod clean all yum --enablerepo=zfs-kmod install zfs #sed 's/^SELINUX=.*/SELINUX=disabled/' -i /etc/selinux/config echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs #systemctl enable zfs-import-cache #systemctl enable zfs-mount #systemctl enable zfs-share #systemctl enable zfs-zed #shutdown -r now # required to load the zfs kernel modules and to disable selinux modprobe zfs # should work zpool status # should report no pools available
- Note: zfs and selinux and not compatible: with selinux enabled, files on zfs cannot be deleted (files are gone, but "df" does not go down, zfs-0.6.5.7-1.el7.centos.x86_64), see #https://github.com/zfsonlinux/zfs/issues/4845
- http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/zfs-quickstart.html)
- http://www.freebsd.org/cgi/man.cgi?query=zpool&sektion=8
If ZFS kernel module does not load automatically at boot time, add this to load it manually:
ls -l /etc/sysconfig/modules/ cat > /etc/sysconfig/modules/zfs.modules <<EOF if [ ! -e /sys/module/zfs ] ; then modprobe zfs; fi EOF chmod +x /etc/sysconfig/modules/zfs.modules
Update ZFS (CentOS-7.9)
- update CentOS-7.x to latest point release
- reboot to latest kernel
- check that currently installed ZFS is 0.8.x (not 0.7 or older)
- then update ZFS:
[root@daq16 ~]# zfs version zfs-0.8.4-1 zfs-kmod-0.8.4-1 [root@daq16 ~]# yum --enablerepo=kmod-zfs update ... [root@daq16 ~]# zfs version ### observe mismatched version numbers: 0.8.5 userspace vs 0.8.4 kernel module zfs-0.8.5-1 zfs-kmod-0.8.4-1
- reboot to activate the updated kernel module
- zfs version again
[root@daq16 ~]# zpool version zfs-0.8.5-1 zfs-kmod-0.8.5-1
- zpool status in case some ZFS volume needs to be updated
[root@daq16 ~]# zpool status pool: z12tb state: ONLINE ...
Update ZFS 0.7 to 0.8
How to identify zfs 0.7: "zfs version" does not work, also "rpm -q zfs"
zfs 0.7 is obsolete.
To opdate to zfs 0.8 or newer, remove 0.7, then install new version per instructions above.
- remove zfs 0.7
yum versionlock delete zfs ### versionlock not needed anymore yum versionlock delete kernel ### versionlock not needed anymore rm /etc/yum.repos.d/zfs.repo* ### delete old repo files yum erase zfs spl
- reboot
- install new zfs per instructions above
- zpool import -as
- zpool status ### check if any pool needs to be upgraded
- zpool upgrade zssd ### upgrade zfs pool features
Lock kernel and zfs packages
!!! THIS IS NOT NEEDED ANYMORE !!!
yum versionlock kernel yum versionlock zfs yum-config-manager --disable zfs yum-config-manager --disable zfs-kmod
Follow generic ZFS instructions
Here: ZFS
performance notes
Go here: disk_benchmarks
Configure UEFI boot
Some mobo can boot from NVME (PCIe) SSDs only via UEFI boot. Do this:
- partition the NVME SSD using gdisk (must be GPT partition table, must have MSDOS EFI partition size 512MiB)
[root@alpha00 ~]# gdisk -l /dev/nvme0n1 GPT fdisk (gdisk) version 0.8.6 ... Found valid GPT with protective MBR; using GPT. Disk /dev/nvme0n1: 500118192 sectors, 238.5 GiB Logical sector size: 512 bytes Disk identifier (GUID): 1A82CC87-2757-44ED-980F-C78E3681D9D3 Partition table holds up to 128 entries First usable sector is 34, last usable sector is 500118158 Partitions will be aligned on 2048-sector boundaries Total free space is 2014 sectors (1007.0 KiB) Number Start (sector) End (sector) Size Code Name 1 2048 1050623 512.0 MiB EF00 EFI System 2 1050624 500118158 238.0 GiB 8300 Linux filesystem [root@alpha00 ~]#
- create filesystems
mkfs.msdos /dev/nvme0n1p1 mkfs.xfs /dev/nvme0n1p2
- prepare EFI partition
mkdir /mnt/efi mount /dev/nvme0n1p1 /mnt/efi mkdir -p /mnt/efi/efi/boot cd /mnt/efi/efi/boot # with Ubuntu LTS 20.04 cp /boot/vmlinuz vmlinuz # copy the desired linux kernel #cp /boot/initramfs initramfs.img # copy the matching initramfs file cp /boot/initrd.img initrd.img # copy the matching initrd file #from /home/olchansk/sysadm/syslinux/syslinux-6.03 copy cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/efi/syslinux.efi . cp /home/olchansk/sysadm/syslinux/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 . cp syslinux.efi bootx64.efi
- create syslinux config file: syslinux.cfg
default linux label linux kernel vmlinuz append ro root=/dev/nvme0n1p2 nomodeset initrd=initrd.img
- prepare system partition
mkdir /mnt/tmp mount /dev/nvme0n1p2 /mnt/tmp rsync -avx / /mnt/tmp cd /mnt/tmp #edit etc/fstab #edit etc/syslinux/selinux # set selinux to permissive mode because rsync did not copy the selinux labels
- unmount and reboot
- restore selinux labels after first boot
#login as root cd / restorecon -R / # can also add "-v" to see progress, but runs much slower #edit /etc/sysconfig/selinux # enable selinux #shutdown -r now # reboot with selinux enabled
Configure UEFI secure boot
The above instructions do not quite work if "secure boot" is enabled.
These modifications are needed:
- ls -l /boot/efi/EFI/bootko/
total 140116 -rwxr-xr-x 1 root root 108 Feb 24 15:47 BOOTX64.CSV -rwxr-xr-x 1 root root 1334816 Feb 24 16:16 bootx64.efi -rwxr-xr-x 1 root root 217495 Feb 24 16:16 config-4.15.0-74-generic -rwxr-xr-x 1 root root 105 Feb 24 15:47 grub.cfg -rwxr-xr-x 1 root root 199952 Feb 24 16:16 grubx64.efi -rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initramfs.img -rwxr-xr-x 1 root root 58986147 Feb 24 16:16 initrd.img-4.15.0-74-generic -rwxr-xr-x 1 root root 139968 Feb 24 16:16 ldlinux.e64 -rwxr-xr-x 1 root root 1269496 Feb 24 15:47 mmx64.efi -rwxr-xr-x 1 root root 1334816 Feb 24 16:16 shimx64.efi -rwxr-xr-x 1 root root 171 Feb 24 16:16 syslinux.cfg -rwxr-xr-x 1 root root 102 Feb 24 16:16 syslinux.cfg~ -rwxr-xr-x 1 root root 199952 Feb 24 16:16 syslinux.efi -rwxr-xr-x 1 root root 4068355 Feb 24 16:16 System.map-4.15.0-74-generic -rwxr-xr-x 1 root root 8367768 Feb 24 16:16 vmlinuz -rwxr-xr-x 1 root root 8367768 Feb 24 16:16 vmlinuz-4.15.0-74-generic
- shmix64.efi is a copy from /boot/efi/EFI/ubuntu
- bootx64.efi is a copy of shimx64.efi (maybe not needed?)
- grubx64.efi is a copy of syslinux.efi
- efibootmgr -c -d /dev/nvme0n1 -p 2 -w -L bootko -l '\EFI\bootko\shimx64.efi'
- efibootmgr -v
root@daqubuntu:~# efibootmgr -v BootCurrent: 0000 Timeout: 1 seconds BootOrder: 0000,0001,0002 Boot0000* bootko HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\BOOTKO\SHIMX64.EFI) Boot0001* Hard Drive BBS(HD,,0x0)..GO..NO........y.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7....................A.......................................<..Gd-.;.A..MQ..L.I.N.T.E.L. .S.S.D.P.E.K.K.W.1.2.8.G.7........BO Boot0002* ubuntu HD(2,GPT,5d1cac95-29dd-4d8a-a56e-a8f414dd4047,0x800,0x100000)/File(\EFI\UBUNTU\SHIMX64.EFI)..BO root@daqubuntu:~#
- NOTE: if, after running "efibootmgr -c", the UUID is zero, then it probably did not take and the entry will vanish after reboot. In my case the mistake was to use "-p 1" instead of "-p 2".
Boot sequence is this:
- shmix64.efi - Microsoft-signed boot loader is accepted by secure boot, loads and runs
- shimx64.efi loads and runs grubx64.efi, this file name is hardwired into the signed shim, cannot be changed
- grubx64.efi is syslinux.efi (could be anything)
- syslinux.efi runs, loads syslinux.cfg, loads the linux kernel, loads the initrd, runs the linux kernel with specified flags (ro root=...).
UEFI syslinux kernel update
To update the linux kernel booted by UEFI syslinux, use this script:
- ~root/git/scripts/etc/update_efi.perl
Update SL6 ssh
WARNING!!! WARNING!!! original instructions used openssh 9.1, vulnerable to CVE-2024-6387 WARNING!!! WARNING!!! these updated instructions use OpenSSH_9.8. K.O. 3jul2024 WARNING!!! WARNING!!! see https://www.openssh.com/releasenotes.html WARNING!!!
Stock SL6 ssh is now very old and by default, cannot connect to current Ubuntu and MacOS sshd. In reverse their ssh cannot connect to SL6 sshd.
Workaround is to manually enable SL6-compatible settings
root@daq00:~# ssh -oHostKeyAlgorithms=+ssh-rsa -oPubKeyAcceptedAlgorithms=+ssh-rsa ladd00
Solution is to install newer ssh on affected SL6 machines:
Install OpenSSH_9.8p1 per CVE-2024-6387
ssh root@sl6-machine cd /opt git clone https://daq00.triumf.ca/~olchansk/git/openssh.git ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/ /bin/cp -pv /etc/ssh/*key* /opt/openssh/etc/ ### copy old ssh host keys /opt/openssh/bin/ssh-keygen -A ### generate any missing ssh host keys # test sshd /opt/openssh/sbin/sshd -p 2222 -d /bin/mv /usr/sbin/sshd /usr/sbin/sshd-SL6 /bin/ln -s /opt/openssh/sbin/sshd /usr/sbin/ /bin/mv /usr/bin/ssh /usr/bin/ssh-SL6 /bin/ln -s /opt/openssh/bin/ssh /usr/bin/ service sshd restart
Update openssh from 9.1 to OpenSSH_9.8p1 per CVE-2024-6387
Check for old version:
[root@muon openssh]# telnet localhost 22 SSH-2.0-OpenSSH_9.1
Update:
cd /opt/openssh git pull ln -s /opt/openssh/lib64/libcrypto.so.1.1 /usr/lib64/ service sshd restart
Check for new version:
telnet localhost 22 SSH-2.0-OpenSSH_9.8
Build openssh
ssh sl6-machine cd git git clone git://anongit.mindrot.org/openssh.git cd openssh autoreconf xemacs -nw ./configure ### fix syntax error: line 28124 empty "if/then/else" block bombs out, fill it with "AAA=aaa" ./configure --prefix=/opt/openssh make -j
Install openssh:
ssh root@sl6-machine cd .../git/openssh make install ### copies stuff to /opt/openssh /opt/openssh/sbin/sshd -p 2222 -d ### test sshd /opt/openssh/bin/ssh -v sl6-machine ### test ssh
Update for CVE-2024-6387:
- cd .../git/openssh
- git pull
- git checkout V_9_8_P1
- ./configure --prefix=/opt/openssh --with-ssl-dir=/opt/openssl
- make ### no go, wants openssl-1.1.1
- cd .../git/
- git clone https://github.com/openssl/openssl.git
- cd openssl
- git checkout OpenSSL_1_1_1w
- configure with prefix --prefix=/opt/openssl
- make, install to /opt/openssl
- cd .../openssh
- configure, build, does not find openssl libraries in /opt (they forgot to set RPATH for user-sepcified location of openssl)
- LD_LIBRARY_PATH=/opt/openssl/lib, try again, now builds and installs
- but sshd does not run, does not find libcrypto.so.1.1
- needs ln -s .../lib/libcrypto.so.1.1 /usr/lib64, now sshd find it, everything works.