Ubuntu: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
 
(346 intermediate revisions by 6 users not shown)
Line 1: Line 1:
= About Ubuntu =
= Prerequisites =
 
AAA
 


* before setting up new machine run memory test
* prepare flash drive with free version of memtest86: https://www.memtest86.com
* test boot from flash drive, test takes ~ few hours
* test will end with summary page, if passed continue with Ubuntu
* number that might be worth noting is memory latency


= Ubuntu version =
= Ubuntu version =
Line 14: Line 16:
= Ubuntu installer =
= Ubuntu installer =


* updated for Ububtu LTS 20.04.01, 22.04.1
* updated for Ububtu LTS 20.04.01, 22.04.1, 24.04 (only minor differences)


* download the latest Ubuntu LTS desktop installer iso image
* download the latest Ubuntu LTS desktop installer iso image
Line 22: Line 24:
* if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
* if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
* power up
* power up
* boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F8 for ASUS, F11 for Supermicro)
* boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F2 or F8 for ASUS, F11 for Supermicro)
* follow the instruction:
* follow the instruction:
* "try ubuntu or install ubuntu" - choose "install"
* "try ubuntu or install ubuntu" - choose "install"
Line 31: Line 33:
* "where are you?" - select "Vancouver" (PST time zone)
* "where are you?" - select "Vancouver" (PST time zone)
* "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
* "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
* don't install third party sw
* installation runs in a few minutes, when finished, reboot
* installation runs in a few minutes, when finished, reboot
* login as user wheel
* login as user wheel
Line 58: Line 61:
== install git/scripts ==
== install git/scripts ==
<pre>
<pre>
apt -y install git
mkdir ~root/git
mkdir ~root/git
cd ~root/git
cd ~root/git
Line 64: Line 68:
git pull
git pull
</pre>
</pre>
* if needed, update git/scripts repository from ladd00 to daq00:
* git remote -v ### if it says daq00, we are good
* git remote set-url origin https://daq00.triumf.ca/~olchansk/git/scripts.git
* git pull ### check that it works


== configure hostname ==
== configure hostname ==
Line 76: Line 85:
<pre>
<pre>
vi /etc/fstab ### comment out the "swap" line
vi /etc/fstab ### comment out the "swap" line
</pre>
* on 64 GB RAM machines swap is not useful
* on machines booted from network (NFS-ROOT), swap does not work
* on machines running from flash (RPi, etc), flash is too slow for useful swap
* swap configured by linux installers invariably has wrong size and is not useful
<pre>
systemctl disable dphys-swapfile
systemctl stop dphys-swapfile
dphys-swapfile uninstall
</pre>
</pre>


Line 98: Line 118:


== install time synchronization ==
== install time synchronization ==
check if chrony is installed correctly and is synched to TRIUMF time servers:
<pre>
chronyc sources
chronyc tracking
</pre>
if not, remove old chrony and
<pre>
apt -y remove chrony
apt -y purge chrony
</pre>
and install it from scratch:
<pre>
<pre>
apt -y install chrony
apt -y install chrony
#echo server time1.triumf.ca iburst >> /etc/chrony/chrony.conf
#echo server time2.triumf.ca iburst >> /etc/chrony/chrony.conf
#echo server time3.triumf.ca iburst >> /etc/chrony/chrony.conf
cd ~/git/scripts
cd ~/git/scripts
git pull
git pull
Line 116: Line 150:
chronyc tracking
chronyc tracking
</pre>
</pre>
NOTE1: if time1, time2, time3 are already listed in /etc/crony/chrony.conf, please remove them and restart chrony.
NOTE2: if time1, time2, time3 are not listed in "chronyc tracking" or if they are not selected by "chronyc tracking", check that /etc/crony/chrony.conf contains "sourcedir /etc/chrony/sources.d". old versions of this file may not have it.
NOTE3: read https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server


== reenable systemd-timesyncd ==
== reenable systemd-timesyncd ==
Line 155: Line 183:
<pre>
<pre>
cd ~
cd ~
apt -y install mailutils msmtp msmtp-mta # say "no" to apparmor support
apt -y remove postfix
apt -y remove postfix
apt -y purge postfix # remove old config files
apt -y purge postfix # remove old config files
apt -y install mailutils msmtp msmtp-mta # say "no" to apparmor support
apt -y install bsd-mailx
apt -y install bsd-mailx
cd ~/git/scripts/etc
cd ~/git/scripts/etc
Line 185: Line 213:
</pre>
</pre>
<pre>
<pre>
echo olchansk@triumf.ca lindner@triumf.ca bsmith@triumf.ca >> ~root/.forward
echo olchansk@triumf.ca lindner@triumf.ca bsmith@triumf.ca dfujimoto@triumf.ca >> ~root/.forward
mailx root
mailx root
test
test
Line 214: Line 242:
</pre>
</pre>


== install missing packages ==
If on boot apparmor appears to still be confining apps (for example, mysql), edit the file /etc/default/grub and change
 
(apt eats terminal input, even the "yes |" trick does not quite work,
repeat the following commands until they report that everything
is installed)


<pre>
<pre>
yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools sysstat smartmontools lm-sensors traceroute time minicom screen git lsof debsums tmux iptables
GRUB_CMDLINE_LINUX=""
</pre>
 
to
 
<pre>
GRUB_CMDLINE_LINUX="apparmor=0"
</pre>
 
Run sudo update-grub and reboot.
 
== install missing packages ==
 
(apt eats terminal input, even the "yes |" trick does not quite work,
repeat the following commands until they report that everything
is installed)
 
<pre>
yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools traceroute time minicom screen git lsof debsums tmux iptables telnet pax rpm mtools at gdisk tcpdump
yes | apt -y install sysstat smartmontools lm-sensors
yes | apt -y install lsb-release
yes | apt -y install lsb-release
yes | apt -y install flex bison
apt -y install vim # in addition to default vim-tiny, requested by IRIS
yes | apt -y install neofetch
apt -y install gedit # requested by TACTIC
yes | apt -y install snmp snmp-mibs-downloader
apt -y install tcl
yes | apt -y install git subversion g++ gfortran cmake doxygen
apt -y install mc # requested by sol
yes | apt -y install curl libcurl4 libcurl4-openssl-dev
apt -y install pax rpm alien ### package converter tools
yes | apt -y install mariadb-client ### mysql client
apt -y install flex bison
yes | apt -y install libz-dev sqlite3 libsqlite3-dev unixodbc-dev
apt -y install neofetch
apt -y install snmp snmp-mibs-downloader
apt -y install git subversion g++ gfortran cmake doxygen
apt -y install curl libcurl4 libcurl4-openssl-dev
### conflits with mysql packages ### apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install mysql-client libmysqlclient-dev
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS
yes | apt -y install libz-dev libzstd-dev sqlite3 libsqlite3-dev unixodbc-dev
yes | apt -y install libssl-dev
yes | apt -y install libssl-dev
yes | apt -y install emacs xemacs21 joe
yes | apt -y install emacs xemacs21 joe
yes | apt -y install gnuplot dos2unix
yes | apt -y install gnuplot dos2unix
yes | apt -y install mutt bsd-mailx # email clients
yes | apt -y install mutt bsd-mailx # email clients
yes | apt -y install liblz4-tool pbzip2
yes | apt -y install liblz4-tool pbzip2 libbz2-dev
yes | apt -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h
yes | apt -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h
yes | apt -y install libreadline-dev
yes | apt -y install libreadline-dev
Line 240: Line 290:
yes | apt -y install libmotif-dev libxmu-dev
yes | apt -y install libmotif-dev libxmu-dev
yes | apt -y install libusb-dev libusb-1.0-0-dev
yes | apt -y install libusb-dev libusb-1.0-0-dev
yes | apt -y install i2c-tools libi2c-dev libi2c0
yes | apt -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig
yes | apt -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig
yes | apt -y install libjson-perl
yes | apt -y install libjson-perl
Line 249: Line 300:
yes | apt -y install linux-tools-common linux-tools-generic # cpupower frequency-info
yes | apt -y install linux-tools-common linux-tools-generic # cpupower frequency-info
yes | apt -y install rdesktop remmina remmina-plugin"*" # requested by POL
yes | apt -y install rdesktop remmina remmina-plugin"*" # requested by POL
yes | apt -y install nlohmann-json3-dev # required to build MIDAS with ROOT 6.30 on Ubuntu-22
apt -y install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev libxft-dev libxext-dev python3 libssl-dev libafterimage0 # from https://root.cern/install/dependencies/
apt -y install gfortran libpcre3-dev xlibmesa-glu-dev libglew-dev libftgl-dev libmysqlclient-dev libfftw3-dev libcfitsio-dev graphviz-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl0-dev qtwebengine5-dev nlohmann-json3-dev libtbb-dev libavahi-compat-libdnssd-dev # from https://root.cern/install/dependencies/
apt -y install libvdt-dev # for ROOT 6.32 on Ubuntu-24
apt -y install autoconf automake gperf # gnu package build tools
apt -y install u-boot-tools # for Xilinx petalinux
#apt -y install linux-headers-generic # to build linux kernel drivers
apt -y install htop
</pre>
</pre>


Line 258: Line 317:
Ubuntu LTS 22.04:
Ubuntu LTS 22.04:
<pre>
<pre>
apt -y install linux-generic-hwe-22.04 # enable linux 6.2.0 series kernel
apt -y install linux-generic-hwe-22.04 # enable linux 6.8.0 series kernel
</pre>
</pre>


== disable swap (debian 11) ==
Ubuntu LTS 24.04:
<pre>
apt -y install linux-generic-hwe-24.04 # enable linux 6.14.0 series kernel
</pre>


* on 64 GB RAM machines swap is not useful
== remove snap ==
* on machines booted from network (NFS-ROOT), swap does not work
 
* on machines running from flash (RPi, etc), flash is too slow for useful swap
remove snap at this point.
* swap configured by linux installers invariably has wrong size and is not useful
 
Do


<pre>
<pre>
systemctl disable dphys-swapfile
snap list
systemctl stop dphys-swapfile
dphys-swapfile uninstall
</pre>
</pre>


== configure DNS ==
to see a list of installed snaps. Use


<pre>
<pre>
cd ~/git/scripts
snap remove <item>
git pull
mkdir /etc/systemd/resolved.conf.d
cp etc/resolved-triumf.conf /etc/systemd/resolved.conf.d/
systemctl restart systemd-resolved
resolvectl
#systemd-analyze cat-config systemd/resolved.conf
</pre>
</pre>


== install ganglia ==
to remove each. You will need to remove snapd last.


<pre>
<pre>
apt -y install ganglia-monitor
systemctl stop snapd
cd ~root/git/scripts/ganglia
systemctl disable snapd
git pull
apt purge snapd
make install
rm -rf ~/snap
./ganglia-all.perl
</pre>
</pre>
 
== install gonodeinfo ==
 
* go to https://bitbucket.org/dd1/gonodeinfo follow instructions:
<pre>
yes | apt-get -y install golang
mkdir ~/git
cd ~/git
#git clone https://bitbucket.org/dd1/gonodeinfo.git
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
</pre>
* edit /etc/gonodeinfo.conf
* change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
* change "Servers" to read: Servers: daq00.triumf.ca:8601
* run "gonodeinfo -v"
* if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
* on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
* try gonodeinfo again, there should be no error
* on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now


== install fonts for EPICS ==
== install non-snap firefox ==


* apt install xfonts-100dpi xfonts-75dpi
delay this until the very end, download of firefox-esr deb package is very slow.
* restart Xorg (i.e. "killall Xorg", this will log you out from the console)
* xlsfonts | grep -i helvetica ### should show fonts with different sizes, not just size 0 (scalable)


== install libz.so.1 for CentOS compatibility ==
See https://askubuntu.com/questions/1399383/how-to-install-firefox-as-a-traditional-deb-package-without-snap-in-ubuntu-22
 
KO - confirm which versions on quartus need this.


<pre>
<pre>
yes | apt-get -y install zlib1g
add-apt-repository ppa:mozillateam/ppa
yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386
apt install firefox-esr
echo run firefox-esr
</pre>
</pre>


== install libpng12.so.0 for Quartus compatibility ==
Enable automatic updates:


(does not work anymore!!!)
* rerun [[#Enable_automatic_updates]]


<pre>
== configure DNS ==
wget http://ftp.ca.debian.org/debian/pool/main/libp/libpng/libpng12-0_1.2.50-2+deb8u2_amd64.deb
dpkg --install libpng12-0_1.2.50-2+deb8u2_amd64.deb
</pre>
 
== install libpng12.so.0 for Quartus 13.0sp1 ==


<pre>
<pre>
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0
cd ~/git/scripts
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0.50.0
git pull
/bin/cp -pv libpng12.so.0 libpng12.so.0.50.0 /lib/x86_64-linux-gnu/
mkdir /etc/systemd/resolved.conf.d
cp etc/resolved-triumf.conf /etc/systemd/resolved.conf.d/
systemctl restart systemd-resolved
resolvectl
#systemd-analyze cat-config systemd/resolved.conf
</pre>
</pre>


== install packages for Xilinx ==
== install ganglia ==
 
ubuntu LTS 22.04 vivado 2020.1


<pre>
<pre>
apt install autoconf libtool
apt -y install ganglia-monitor
apt install libtinfo5
cd ~root/git/scripts/ganglia
apt install texinfo
git pull
apt install zlib1g:i386
make install
./ganglia-all.perl
</pre>
</pre>


== install packages for building ROOT ==
fix gmond start before network is ready:


<pre>
<pre>
apt -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake
mkdir /etc/systemd/system/ganglia-monitor.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ganglia-monitor.service.d/local.conf
systemctl daemon-reload
systemctl cat ganglia-monitor.service
</pre>
</pre>


== install 32-bit libraries for PHYSICA ==
== install ganglia server ==
 
these instructions are for running 32-bit physica executable built for SL6 on ubuntu LTS 20.04


install physica sources (cannot build, do not have g77)
'''On the main computer only! (daq00, dsdaqgw, etc)'''


* zfs create rpool/ganglia
* apt install gmetad php php-xml rrdtool
* mv /etc/ganglia/gmetad.conf /etc/ganglia/gmetad.conf-stock
* create /etc/ganglia/gmetad.conf with the following contents:
<pre>
<pre>
cd ~/packages
data_source "my cluster" 15 localhost
git clone https://bitbucket.org/ttriumfdaq/physica.git
RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374"
setuid_username "ganglia"
rrd_rootdir "/ganglia/rrds"
case_sensitive_hostnames 0
</pre>
</pre>
* mkdir /ganglia/rrds
* chown ganglia:ganglia /ganglia/rrds
* systemctl restart gmetad
* create /etc/ganglia/gmond-collect.conf
<pre>
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  #deaf = yes
  deaf = no
  allow_extra_data = yes
  host_dmax = 0 /* 86400 */ /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 600 /*secs */
}


install 32-bit libraries using ubuntu package manager:
/*
* The cluster attributes specified will be used as part of the <CLUSTER>
* tag that will wrap all hosts collected by this instance.
*/
cluster {
  name = "DAQ"
  owner = "TRIUMF DAQ"
  latlong = "unspecified"
  url = "https://daq.triumf.ca"
}


<pre>
/* The host section describes attributes of the host, like the location */
apt install lib32z1 # libz.so
host {
</pre>
  location = "unspecified"
}


copy 32-bit SL6 shared libraries to /lib32
udp_send_channel {
  host = daq00.triumf.ca
  bind_hostname = yes
  port = 8649
}


<pre>
udp_recv_channel {
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libX11.so.6 /lib32/
  #mcast_join = 239.2.11.71
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libgd.so.2 /lib32/
  port = 8649
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libpng12.so.0 /lib32/
  #bind = 239.2.11.71
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libreadline.so.6 /lib32/
  retry_bind = true
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libncurses.so.5 /lib32/
  # Size of the UDP buffer. If you are handling lots of metrics you really
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libg2c.so.0 /lib32/
  # should bump it up to e.g. 10MB or even higher.
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libxcb.so.1 /lib32/
  #buffer = 10485760
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXpm.so.4 /lib32/
  buffer = 200000
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libjpeg.so.62 /lib32/
  family = ipv4
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfontconfig.so.1 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfreetype.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libtinfo.so.5 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXau.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libexpat.so.1 /lib32/
</pre>


ldd should report:
  acl {
    default = "deny"
    access {
      # vlan1 daq00
      ip = 142.90.111.168
      mask = 19
      action = "allow"
    }
    access {
      # MUSR VLAN
      ip = 142.90.154.73
      mask = 8
      action = "allow"
    }
    access {
      # KVM network
      ip = 192.168.1.1
      mask = 8
      action = "allow"
    }
  }
}


<pre>
udp_recv_channel {
trinatdaq:trinat> ldd /usr/local/physica/physica.exe
  #mcast_join = 239.2.11.71                                                                                                                                                                                               
linux-gate.so.1 (0xf7fa2000)
  port = 8649
libX11.so.6 => /lib32/libX11.so.6 (0xf7e43000)
  #bind = 239.2.11.71                                                                                                                                                                                                     
libgd.so.2 => /lib32/libgd.so.2 (0xf7dfe000)
  retry_bind = true
libpng12.so.0 => /lib32/libpng12.so.0 (0xf7dd6000)
  # Size of the UDP buffer. If you are handling lots of metrics you really                                                                                                                                               
libz.so.1 => /lib32/libz.so.1 (0xf7db8000)
  # should bump it up to e.g. 10MB or even higher.                                                                                                                                                                        
libreadline.so.6 => /lib32/libreadline.so.6 (0xf7d7e000)
  #buffer = 10485760                                                                                                                                                                                                     
libncurses.so.5 => /lib32/libncurses.so.5 (0xf7d5b000)
  buffer = 200000
libg2c.so.0 => /lib32/libg2c.so.0 (0xf7d3d000)
  family = ipv6
libm.so.6 => /lib32/libm.so.6 (0xf7c39000)
libgcc_s.so.1 => /lib32/libgcc_s.so.1 (0xf7c1a000)
libc.so.6 => /lib32/libc.so.6 (0xf7a2f000)
libxcb.so.1 => /lib32/libxcb.so.1 (0xf7a05000)
libdl.so.2 => /lib32/libdl.so.2 (0xf79ff000)
libXpm.so.4 => /lib32/libXpm.so.4 (0xf79ee000)
libjpeg.so.62 => /lib32/libjpeg.so.62 (0xf7997000)
libfontconfig.so.1 => /lib32/libfontconfig.so.1 (0xf7962000)
libfreetype.so.6 => /lib32/libfreetype.so.6 (0xf78c9000)
libtinfo.so.5 => /lib32/libtinfo.so.5 (0xf78b0000)
/lib/ld-linux.so.2 (0xf7fa4000)
libXau.so.6 => /lib32/libXau.so.6 (0xf78ad000)
libexpat.so.1 => /lib32/libexpat.so.1 (0xf7885000)
trinatdaq:trinat>
</pre>


set login environment:
  acl {
    default = "deny"
    access {
      # ALPHA network                                                                                                                                                                                                     
      ip = 2001:1458:202:fd::100:aa
      mask = 8
      action = "allow"
    }
    #access {                                                                                                                                                                                                             
    #  # ALPHA-g network                                                                                                                                                                                                 
    #  ip = 192.168.1.1                                                                                                                                                                                                   
    #  mask = 8                                                                                                                                                                                                           
    #  action = "allow"                                                                                                                                                                                                   
    #}                                                                                                                                                                                                                   
  }
}


tcp_accept_channel {
  port = 8649
  bind = localhost
  # If you want to gzip XML output
  gzip_output = no
}
</pre>
* add to /etc/rc.local
<pre>
<pre>
setenv TRIUMF_FONTS $HOME/packages/physica/fonts
/usr/sbin/gmond -c /etc/ganglia/gmond-collect.conf &
setenv PHYSICA_DIR $HOME/packages/physica
systemctl restart gmetad &
alias physica $PHYSICA_DIR/physica-SL6-32
</pre>
</pre>
 
* cd /ganglia
test:
* git clone https://daq00.triumf.ca/~olchansk/git/ganglia-web.git
 
* cd ganglia-web
* git checkout KO1
* ln -s /ganglia/ganglia-web /var/www/html/ganglia
* make dwoo directories
<pre>
<pre>
cd ~/packages/physica
mkdir /var/lib/ganglia-web
physica
chown www-data:www-data /var/lib/ganglia-web
@rangauss.pcm
mkdir /var/lib/ganglia-web/dwoo
chown www-data:www-data /var/lib/ganglia-web/dwoo
mkdir /var/lib/ganglia-web/dwoo/compiled
chown www-data:www-data /var/lib/ganglia-web/dwoo/compiled
mkdir /var/lib/ganglia-web/dwoo/cache
chown www-data:www-data /var/lib/ganglia-web/dwoo/cache
</pre>
</pre>
* open https://alphacpc05.cern.ch/ganglia/


== install lightdm ==
== install gonodeinfo ==
 
unlike the default gdm login manager, lightdm shows the machine hostname and does not require an extra mouse click to swicth from screen saver to login mode.


* go to https://bitbucket.org/dd1/gonodeinfo follow instructions:
<pre>
<pre>
apt -y install lightdm
apt -y install golang
# select lightdm
mkdir ~/git
cd ~/git
#git clone https://bitbucket.org/dd1/gonodeinfo.git
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git remote set-url origin https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
</pre>
</pre>
* edit /etc/gonodeinfo.conf
* change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
* change "Servers" to read: Servers: daq00.triumf.ca:8601
* run "gonodeinfo -v"
* if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
* on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
* try gonodeinfo again, there should be no error
* on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now


== install desktop environments ==
== install emailonreboot ==


note: default display manager and default desktop are deficient, please do not skip this step.
send an email if computer is rebooted


note: if apt asks to choose the display manager, select "lightdm"
<pre>
ssh root
mkdir -p ~/git
cd ~/git
git clone https://daq00.triumf.ca/~olchansk/git/rpms.git
cd rpms/emailonreboot
git pull
make
make install
</pre>


note: KO - I recommend the "MATE" desktop.
== install monitor_nfs ==


note: you will have to cut-and-paste this several times because "apt" eats commands, even with "-y" and even piped from "yes".
monitor NFS mounts and complain about dead, stale and hung mounts


<pre>
<pre>
# install MATE desktop
ssh root
DEBIAN_FRONTEND=noninteractive apt -y install ubuntu-mate-core ubuntu-mate-desktop ubuntu-mate-themes
mkdir -p ~/git
# install Cinnamon desktop
cd ~/git
DEBIAN_FRONTEND=noninteractive apt -y install cinnamon
git clone https://daq00.triumf.ca/~olchansk/git/rpms.git
# install KDE desktop
cd rpms/monitor_nfs
DEBIAN_FRONTEND=noninteractive apt -y install kubuntu-desktop
git pull
# install Lxqt desktop
make
DEBIAN_FRONTEND=noninteractive apt -y install lxqt
make install
# install Xfce4 desktop
DEBIAN_FRONTEND=noninteractive apt -y install xfce4
</pre>
</pre>


== install ROOT ==
== install fonts for EPICS ==
 
<pre>
apt -y install xfonts-100dpi xfonts-75dpi
killall Xorg # restart Xorg (i.e. "killall Xorg", this will log you out from the console)
xlsfonts | grep -i helvetica ### should show fonts with different sizes, not just size 0 (scalable)
</pre>


Please install ROOT per instructions at http://root.cern.ch.
== install libz.so.1 for CentOS compatibility ==


NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system
KO - confirm which versions on quartus need this.


NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.
<pre>
yes | apt-get -y install zlib1g
yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386
</pre>


== Install x2go ==
== install ld-ldb-x86-64.so.3 for Quartus compatibility ==


KO - is this still needed? does it cause any security problems?
Not clear from package this is supposed to come from. Copied from U-20.


x2go instructions, thanks to Art O.
Without this, Quartus lmgrd does not run.


<pre>
<pre>
add-apt-repository ppa:x2go/stable
cd /lib64
apt-get update
ln -s  ld-linux-x86-64.so.2 ld-lsb-x86-64.so.2
apt-get install x2goserver x2goserver-xsession
ln -s  ld-linux-x86-64.so.2 ld-lsb-x86-64.so.3
</pre>
</pre>


== enable root login from ladd00/daq00 ==
should look like this:
<pre>
<pre>
ssh localhost
root@daq13:/lib64# ls -l
CTRL-C
total 3
/bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/
lrwxrwxrwx 1 root root 44 Jan 28 09:07 ld-linux-x86-64.so.2 -> ../lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
lrwxrwxrwx 1 root root 20 Feb 18 16:45 ld-lsb-x86-64.so.2 -> ld-linux-x86-64.so.2
lrwxrwxrwx 1 root root 20 Feb 18 16:45 ld-lsb-x86-64.so.3 -> ld-linux-x86-64.so.2
root@daq13:/lib64#
</pre>
</pre>


== disable ssh access from outside of TRIUMF ==
== install libpng12.so.0 for Quartus 13.0sp1 and 13.1.4.182 ==
 
to stop ssh login spam, disable ssh access from outside of TRIUMF. this can be done by requesting a firewall block through the helpdesk or by local firewall rule:


<pre>
<pre>
echo iptables -I INPUT ! -s 142.90.0.0/255.255.0.0 -p tcp --dport 22 -j REJECT >> /etc/rc.local
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0
/etc/rc.local
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0.50.0
/bin/cp -pv libpng12.so.0 libpng12.so.0.50.0 /lib/x86_64-linux-gnu/
</pre>
</pre>


== install smart-status ==
== install packages for Xilinx ==
<pre>
ln -s ~/git/scripts/smart-status/smart-status.perl ~root/
</pre>


== enable boot menu and boot messages ==
ubuntu LTS 22.04 vivado 2020.1


This will enable the grub menu (with a 10 sec timeout) and
replace black screen with exciting linux boot messages.
* emacs -nw /etc/default/grub
<pre>
<pre>
GRUB_DEFAULT=0
apt install autoconf libtool
#GRUB_TIMEOUT_STYLE=hidden
apt install libtinfo5
GRUB_TIMEOUT=10
apt install texinfo
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
apt install zlib1g:i386
#GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480"
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX=""
#GRUB_GFXMODE=640x480
</pre>
</pre>
* update grub config:
 
== install packages for building ROOT ==
 
<pre>
<pre>
grub-mkconfig -o /boot/grub/grub.cfg
apt -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake
</pre>
</pre>


== reboot ==
== install wine ==


this completes installation of the base system.
As far as I know, only needed for BNMR/BNQR
 
<pre>
apt install wine winetricks
</pre>


following sections modify basic ubuntu to fix known problems and to enable special stuff.
== install lightdm ==


= Enable automatic updates =
unlike the default gdm login manager, lightdm shows the machine hostname and does not require an extra mouse click to swicth from screen saver to login mode.


<pre>
<pre>
apt install unattended-upgrades
apt -y install lightdm
cd ~/git/scripts
# select lightdm
git pull
/bin/cp -v etc/99apt-conf-ko /etc/apt/apt.conf.d/
apt-config dump | grep Unattended
</pre>
</pre>


Following is obsolete:
== install desktop environments ==


* emacs -nw /etc/apt/apt.conf.d/50unattended-upgrades
note: default display manager and default desktop are deficient, please do not skip this step.
** uncomment in Allowed-Origins "-security" and "-updates"
** add in Allowed-Origins: "Google LLC:stable";
** uncomment/add: "Unattended-Upgrade::Mail "root";
* emacs -nw /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
</pre>
* test: unattended-upgrade --dry-run -v


NOTE: update-on-shutdown is disabled.
note: if apt asks to choose the display manager, select "lightdm"


NOTE: there is no update-on-boot, but:
note: KO - I recommend the "MATE" desktop.


NOTE: if machine was off for a long time, the systemd update timer would have expired and it will fire soon after reboot, causing an automatic update run. this is unwanted, and there is no fix or workaround for it. K.O. June-2023.
note: you will have to cut-and-paste this several times because "apt" eats commands, even with "-y" and even piped from "yes".


= Fix bpool is full (obsolete) =
note: DF - on U24 this may re-install snap


THIS IS CAUSED BY OBSOLETE PACKAGE zsys. PLEASE: apt remove zsys
<pre>
# install MATE desktop
apt -y install ubuntu-mate-core ubuntu-mate-desktop ubuntu-mate-themes
# install Cinnamon desktop
apt -y install cinnamon
# install KDE desktop
apt -y install kde-standard kubuntu-settings-desktop
# install Lxqt desktop
# apt -y install lxqt # conflict over kubuntu-desktop, kubuntu-settings-desktop and desktop-base
# install Xfce4 desktop
apt -y install xfce4
</pre>


!!! only if ROOT on ZFS !!!
== install ROOT ==


There is an error in the zsys package that causes bpool to run out of space,
Please install ROOT per instructions at https://root.cern.ch.
see [[#Ubuntu zsys]] for more details.


To fix:
NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system
<pre>
cd ~/git/scripts
git pull
cp etc/zsys.conf /etc/
zsysctl service reload
zsysctl service gc
zpool list bpool
zfs list bpool
df /boot
</pre>


= IPMI instructions =
NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.


IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.
== Install x2go ==


<pre>
<pre>
apt-get install ipmitool
apt-get update
systemctl enable ipmievd
apt-get install x2goserver x2goserver-xsession
systemctl restart ipmievd
</pre>
</pre>


Run:
== enable root login from ladd00/daq00 ==
* ipmitool sel list ### event list
<pre>
* ipmitool sel elist ### event list
ssh localhost
* ipmitool sel clear ### clear event list (if it becomes full)
CTRL-C
* ipmitool sensor ### report hardware sensors
/bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/
</pre>


= move /home/wheel =
== disable ssh access from outside of TRIUMF ==


note: this MUST be done if ZFS root and NIS/autofs with /home.
to stop ssh login spam, disable ssh access from outside of TRIUMF. this can be done by requesting a firewall block through the helpdesk or by local firewall rule:


Default location of wheel's home directory will collide with autofs /home, it has to be moved,
<pre>
for example to /wheel.
echo iptables -I INPUT ! -s 142.90.0.0/255.255.0.0 -p tcp --dport 22 -j REJECT >> /etc/rc.local
/etc/rc.local
</pre>


== install smart-status ==
<pre>
<pre>
# logout from the wheel user
ln -s ~/git/scripts/smart-status/smart-status.perl ~root/
# go to another computer
ssh root@daqubuntuxxx
zfs list | grep wheel ### identify zfs name wheel_xxxxxx
zfs set mountpoint=/wheel rpool/USERDATA/wheel_hm8fzh
emacs -nw /etc/passwd ### change wheel's home directory from /home/wheel to /wheel
su - wheel ### check that user wheel still works
</pre>
</pre>


This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.
== enable boot menu and boot messages ==


= enable NIS (ubuntu 22.04, debian 11) =
This will enable the grub menu (with a 10 sec timeout) and
replace black screen with exciting linux boot messages.


* emacs -nw /etc/default/grub
<pre>
<pre>
apt -y install rpcbind nis
GRUB_DEFAULT=0
echo DAQ-NIS >> /etc/defaultdomain
GRUB_TIMEOUT_STYLE=menu
echo ypserver daq00.triumf.ca >> /etc/yp.conf
GRUB_TIMEOUT=10
systemctl enable ypbind.service
GRUB_RECORDFAIL_TIMEOUT=10
systemctl restart ypbind.service
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
systemctl status ypbind.service
#GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480"
ypwhich -m
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX=""
#GRUB_GFXMODE=640x480
</pre>
* update grub config:
<pre>
grub-mkconfig -o /boot/grub/grub.cfg
</pre>
</pre>


enable ypserv:
== Disable welcome message on ssh ==
 
On ssh, there is a lengthy welcome message. To disable:  


<pre>
<pre>
sed -i s/NISSERVER=false/NISSERVER=slave/ /etc/default/nis
vim /etc/ssh/sshd_config
/usr/lib/yp/ypinit -s daq00
echo ypserver localhost >> /etc/yp.conf
sed -i "s/ypserver .*/ypserver localhost/" /etc/yp.conf
systemctl enable ypserv
systemctl restart ypserv
systemctl restart ypbind
</pre>
</pre>


edit /etc/nsswitch.conf to read:
Ensure that the file reads


<pre>
<pre>
# begin get data from nis
PrintMod no
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis
</pre>
</pre>


enable hourly update of nis maps:
Then,


<pre>
<pre>
mkdir ~root/git
vim /etc/pam.d/sshd
cd ~root/git
</pre>
git clone http://daq00.triumf.ca/~olchansk/git/scripts.git
 
cd ~/git/scripts/etc
comment out the lines:  
 
<pre>
session    optional    pam_motd.so  motd=/run/motd.dynamic
session    optional    pam_motd.so noupdate
</pre>
 
= Enable automatic updates =
 
<pre>
apt install unattended-upgrades
cd ~/git/scripts
git pull
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
/bin/cp -v etc/99apt-conf-ko /etc/apt/apt.conf.d/
apt-config dump | grep Unattended
</pre>
</pre>


If this is a new machine, then on the master NIS node (daq00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
Following is obsolete:
 
* emacs -nw /etc/apt/apt.conf.d/50unattended-upgrades
** uncomment in Allowed-Origins "-security" and "-updates"
** add in Allowed-Origins: "Google LLC:stable";
** uncomment/add: "Unattended-Upgrade::Mail "root";
* emacs -nw /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
</pre>
* test: unattended-upgrade --dry-run -v


= enable NIS (ubuntu 20.04) =
NOTE: update-on-shutdown is disabled.


* apt-get -y install portmap nis ### will ask for NIS domain (DAQ-NIS)
NOTE: there is no update-on-boot, but:
* dpkg-reconfigure nis ### reconfigure if already installed
* ypwhich -m
* edit /etc/default/nis
** set "NISSERVER=slave"
** Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
* #edit /etc/yp.conf, comment-out everything, add "domain DAQ-NIS server localhost"
* edit /etc/yp.conf, comment-out everything, add "ypserver localhost"
* /usr/lib/yp/ypinit -s daq00
* systemctl enable nis
* systemctl restart nis
* ypwhich
* ypwhich -m
* ypcat -k passwd
* vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
<pre>
# begin get data from nis
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis
</pre>
* enable hourly update of NIS maps
<pre>
mkdir ~root/git
cd ~root/git
git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
</pre>
* ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"


= enable autofs =
NOTE: if machine was off for a long time, the systemd update timer would have expired and it will fire soon after reboot, causing an automatic update run. this is unwanted, and there is no fix or workaround for it. K.O. June-2023.


<pre>
= Fix bpool is full (obsolete) =
apt -y install autofs
systemctl enable autofs
systemctl restart autofs
ls -l /home/olchansk ### test autofs, check file owner is correct
</pre>


= enable NFS server =
THIS IS CAUSED BY OBSOLETE PACKAGE zsys. PLEASE: apt remove zsys
 
= IPMI instructions =
 
IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.


<pre>
<pre>
apt install nfs-kernel-server
apt-get install ipmitool
#edit /etc/exports
systemctl enable ipmievd
systemctl enable nfs-server
systemctl restart ipmievd
systemctl restart nfs-server
</pre>
</pre>


= NIS master =
Run:
* ipmitool sel list ### event list
* ipmitool sel elist ### event list
* ipmitool sel clear ### clear event list (if it becomes full)
* ipmitool sensor ### report hardware sensors


notes for setting up the NIS master
= move /home/wheel (U-24) =


== wheel user ==
Ubuntu LTS 24 installed on ZFS has rpool/USERDATA/home_xxx mounted on /home,
has to be moved or autofs /home will not work.


"wheel" is the default administrative user. We do not want it's password exported to NIS (encrypted password hash is world visible) and we do not want it's home directory exported to NFS (~wheel/.ssh is world visible and potentially writable: anybody can change ~wheel/.ssh/authorized_keys).
<pre>
zfs list | grep USERDATA | grep home | cut -f1 -d" "
zfs set -u mountpoint=/home1 `zfs list | grep USERDATA | grep home | cut -f1 -d" "`
emacs -nw /etc/passwd # change mount for wheel account to /home1/wheel
</pre>


* move wheel's home directory from /home/wheel to /wheel (see special section about this)
This will not take effect until rebooting.  Please ensure that ssh from root@daq00 to this computer works before rebooting; if ssh from root@daq00 doesn't work and you mess this up you will be locked out of wheel account. Once you verify that this works, reboot and make sure that when you login as wheel that home directory is /home1/wheel. 
* change wheel's UID and GID from 1000 to a value below MINUID in /var/yp/Makefile


== coherent uids ==
= move /home/wheel =


we do not want system accounts defined in /etc/passwd of the NIS master
note: this MUST be done if ZFS root and NIS/autofs with /home.
to be included in the NIS map "passwd". this causes trouble on NIS clients
where newly installed packages fail to create local system users because same
user already exists in NIS.


This is controlled by MINUID in /var/yp/Makefile.
Default location of wheel's home directory will collide with autofs /home, it has to be moved,
for example to /wheel.


Historical TRIUMF uids start from around 200, but several clusters do not have any historic TRIUMF uids below 500 and MINUID is set to:
<pre>
* DAQ-NIS: MINUID=200
# logout from the wheel user
* ISAC-NIS: MINUID=500
# go to another computer
* TITAN-NIS: MINUID=500
ssh root@daqubuntuxxx
* MUSR-NIS: MINUID=500
zfs list | grep wheel ### identify zfs name wheel_xxxxxx
* TIG-NIS: MINUID=500 (100 on SL6 mother8pi)
#zfs set mountpoint=/wheel rpool/USERDATA/wheel_hm8fzh
zfs set mountpoint=/wheel `zfs list | grep wheel | cut -f1 -d" "`
zfs list | grep wheel
emacs -nw /etc/passwd ### change wheel's home directory from /home/wheel to /wheel
su - wheel ### check that user wheel still works
</pre>


Ubuntu 20 has two programs to create users:
This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.
* adduser - creates new users with UID 1000 and up as specified in /etc/adduser.conf. No problems here.
* adduser --system - creates new system users with UID 100 and up as specified in /etc/adduser.conf. No problems here.
* useradd - creates new users with UID 1000 and up as specified in /etc/login.defs. No problems here.
* useradd --system - creates new system users with UID 999 and down (read "man useradd", section at the end about SYS_UID_MAX). This collides with NIS MINUID, these system users will be included in the NIS map and cause trouble.


This problem cannot be fixed, SYS_UID_MIN, SYS_UID_MAX and UID_MIN in /etc/login.defs do not seem
= enable NIS (ubuntu 22.04, 24.04, debian 11, 12) =
to have any effect on UIDs chosen by "useradd --system". (tested on Ubuntu LTS 20.04).


So far only these system accounts seem to be affected by this:
<pre>
* systemd-coredump
apt -y install rpcbind nis
* ganglia
echo DAQ-NIS >> /etc/defaultdomain
echo ypserver daq00.triumf.ca >> /etc/yp.conf
systemctl enable ypbind.service
systemctl restart ypbind.service
systemctl status ypbind.service
ypwhich -m
</pre>


To fix:
enable ypserv:
* run "sort -r -n -t: -k3 /etc/passwd" to identify the last unused system user uid (range 100..200)
* run "sort -r -n -t: -k3 /etc/group" to identify the last unused system user gid (range 100.200)
* systemd-coredump: manually change UID and GID (package systemd-coredump is usually not installed)
* ganglia: same thing, then change ownership on all ganglia files.


Also read systemd author's opinion on system vs user UIDs:
<pre>
https://github.com/systemd/systemd/issues/4850#issuecomment-265698275
sed -i s/NISSERVER=false/NISSERVER=slave/ /etc/default/nis
/usr/lib/yp/ypinit -s daq00
echo ypserver localhost >> /etc/yp.conf
sed -i "s/ypserver .*/ypserver localhost/" /etc/yp.conf
systemctl enable ypserv
systemctl restart ypserv
systemctl restart ypbind
</pre>


= Fix systemd-logind NIS breakage =
update /etc/nsswitch.conf and enable hourly update of NIS maps:
 
!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!
 
there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this
fix removes the delay.
 
systemd developers think that we should not use NIS and made sure there are
problems if we do. To give them credit, they do offer a workaround. Read this:
https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67


<pre>
<pre>
mkdir /etc/systemd/system/systemd-logind.service.d
mkdir ~root/git
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf
cd ~root/git
systemctl daemon-reload
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
systemctl cat systemd-logind.service
cd ~/git/scripts/etc
git pull
cp -pv nsswitch.conf-U24 /etc/nsswitch.conf
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
</pre>
</pre>


= Fix systemd-udevd NIS breakage =
If this is a new machine, then on the master NIS node (daq00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)


see same problem as above with udev getting stuck. ubuntu lts 20.04.
= enable NIS (ubuntu 20.04) =


<pre>
* apt-get -y install portmap nis ### will ask for NIS domain (DAQ-NIS)
mkdir /etc/systemd/system/systemd-udevd.service.d
* dpkg-reconfigure nis ### reconfigure if already installed
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-udevd.service.d/local.conf
* ypwhich -m
systemctl daemon-reload
* edit /etc/default/nis
systemctl cat systemd-udevd.service
** set "NISSERVER=slave"
** Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
* #edit /etc/yp.conf, comment-out everything, add "domain DAQ-NIS server localhost"
* edit /etc/yp.conf, comment-out everything, add "ypserver localhost"
* /usr/lib/yp/ypinit -s daq00
* systemctl enable nis
* systemctl restart nis
* ypwhich
* ypwhich -m
* ypcat -k passwd
* vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
<pre>
# begin get data from nis
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis
</pre>
* enable hourly update of NIS maps
<pre>
mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
</pre>
</pre>
* ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"


= Configure USB device permissions =
= enable autofs =
 
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
 
* create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:


<pre>
<pre>
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
apt -y install autofs
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}"
systemctl enable autofs
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
systemctl restart autofs
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ls -l /home/olchansk ### test autofs, check file owner is correct
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
</pre>
</pre>


* reload udev rules: udevadm control --reload-rules
= enable NFS server =
* apply new permissions: udevadm trigger --action=add
* watch udev activity: udevadm monitor -p


= Configure lightdm display manager =
* enable it
<pre>
<pre>
echo lightdm | dpkg-reconfigure -fteletype lightdm
apt install nfs-kernel-server
systemctl disable gdm
#edit /etc/exports
systemctl disable sddm
systemctl enable nfs-server
systemctl enable lightdm
systemctl restart nfs-server
</pre>
</pre>


* make the MATE desktop as default
= NIS master =
<pre>
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
</pre>


* enable login by NIS users
notes for setting up the NIS master
<pre>
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
</pre>


* restart lightdm
== wheel user ==
<pre>
systemctl stop gdm
systemctl restart lightdm
</pre>


= Install libpng12.so.0 =
"wheel" is the default administrative user. We do not want it's password exported to NIS (encrypted password hash is world visible) and we do not want it's home directory exported to NFS (~wheel/.ssh is world visible and potentially writable: anybody can change ~wheel/.ssh/authorized_keys).


Quartus 16 needs libpng12:
* move wheel's home directory from /home/wheel to /wheel (see special section about this)
* change wheel's UID and GID from 1000 to a value below MINUID in /var/yp/Makefile


<pre>
== coherent uids ==
wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb
dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb
</pre>


= Install google-chrome =
we do not want system accounts defined in /etc/passwd of the NIS master
to be included in the NIS map "passwd". this causes trouble on NIS clients
where newly installed packages fail to create local system users because same
user already exists in NIS.


<pre>
This is controlled by MINUID in /var/yp/Makefile.
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb
</pre>


confirm autoupdate is enabled, observe dl.google.com is present in the list of repositories:
Historical TRIUMF uids start from around 200, but several clusters do not have any historic TRIUMF uids below 500 and MINUID is set to:
<pre>
* DAQ-NIS: MINUID=200
apt update
* ISAC-NIS: MINUID=500
...
* TITAN-NIS: MINUID=500
Get:5 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,094 B]
* MUSR-NIS: MINUID=500
...
* TIG-NIS: MINUID=500 (100 on SL6 mother8pi)
</pre>
 
Ubuntu 20 has two programs to create users:
* adduser - creates new users with UID 1000 and up as specified in /etc/adduser.conf. No problems here.
* adduser --system - creates new system users with UID 100 and up as specified in /etc/adduser.conf. No problems here.
* useradd - creates new users with UID 1000 and up as specified in /etc/login.defs. No problems here.
* useradd --system - creates new system users with UID 999 and down (read "man useradd", section at the end about SYS_UID_MAX). This collides with NIS MINUID, these system users will be included in the NIS map and cause trouble.


FOLLOWING IS OBSOLETE:
This problem cannot be fixed, SYS_UID_MIN, SYS_UID_MAX and UID_MIN in /etc/login.defs do not seem
to have any effect on UIDs chosen by "useradd --system". (tested on Ubuntu LTS 20.04).


Instructions from here:
So far only these system accounts seem to be affected by this:
https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable
* systemd-coredump
* ganglia


<pre>
To fix:
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
* run "sort -r -n -t: -k3 /etc/passwd" to identify the last unused system user uid (range 100..200)
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-tmp.list'
* run "sort -r -n -t: -k3 /etc/group" to identify the last unused system user gid (range 100.200)
apt update
* systemd-coredump: manually change UID and GID (package systemd-coredump is usually not installed)
apt install google-chrome-stable
* ganglia: same thing, then change ownership on all ganglia files.
/bin/rm -f /etc/apt/sources.list.d/google-tmp.list
</pre>


= Install amanda client =
Also read systemd author's opinion on system vs user UIDs:
https://github.com/systemd/systemd/issues/4850#issuecomment-265698275


ONLY ONE MACHINES THAT HOST HOME DIRECTORIES
= Fix systemd-logind NIS breakage =


* apt install amanda-client
!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!
* edit /etc/amandahosts
 
<pre>
there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this
amanda.triumf.ca amanda amdump
fix removes the delay.
</pre>
* check permissions on /etc/amandahosts:
<pre>
root@daq00:/var/log/amanda# ls -l /etc/amandahosts
-rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
</pre>
* fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
* edit /etc/amanda-security.conf, add this line:
<pre>
runtar:gnutar_path=/usr/bin/tar
</pre>


On the amanda machine:
systemd developers think that we should not use NIS and made sure there are
problems if we do. To give them credit, they do offer a workaround. Read this:
https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67


* in amanda disklist, use dump type "bsdtcp-comp-user-tar"
* su - amanda and run amcheck -c daily daq00
<pre>
<pre>
-bash-4.1$ amcheck -c daily daq00
mkdir /etc/systemd/system/systemd-logind.service.d
 
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf
Amanda Backup Client Hosts Check
systemctl daemon-reload
--------------------------------
systemctl cat systemd-logind.service
Client check: 1 host checked in 0.092 seconds. 0 problems found.
 
(brought to you by Amanda 3.3.7p1.git.685ff76d)
</pre>
</pre>


= Enable rc.local =
= Fix systemd-udevd NIS breakage =


For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:
see same problem as above with udev getting stuck. ubuntu lts 20.04.


<pre>
<pre>
cd ~/git/scripts
mkdir /etc/systemd/system/systemd-udevd.service.d
git pull
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-udevd.service.d/local.conf
cp -n -v etc/rc.local /etc/
chmod a+rx /etc/rc.local
cp etc/rc-local.service /etc/systemd/system/
systemctl daemon-reload
systemctl daemon-reload
systemctl enable rc-local
systemctl cat systemd-udevd.service
systemctl start rc-local
systemctl status rc-local
</pre>
</pre>


= Remove unwanted packages =
= Configure USB device permissions =
 
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
 
* create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:


<pre>
<pre>
apt remove zsys # broken, do not use
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
apt remove sddm # login manager
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}"
apt remove avahi-daemon avahi-autoipd # not sure what it does, observed using 100% CPU
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
</pre>
</pre>


= Disable unwanted services =
* reload udev rules: udevadm control --reload-rules
* apply new permissions: udevadm trigger --action=add
* watch udev activity: udevadm monitor -p
 
= Configure lightdm display manager =


* enable it
<pre>
<pre>
systemctl disable mpd
echo lightdm | dpkg-reconfigure -fteletype lightdm
systemctl disable snapd
systemctl disable gdm
systemctl disable ModemManager
systemctl disable sddm
systemctl --global mask tracker-extract-3.service
systemctl enable lightdm
systemctl --global mask tracker-miner-fs-3.service
systemctl daemon-reload
</pre>
</pre>


= Disable sleep and suspend =
* make the MATE desktop as default
<pre>
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
</pre>


note: we see some computers randomly shutdown or go to sleep, log files indicates the "sleep" or "suspend" button was pushed by user, but no such buttons actually exist. this is the fix for this:
* enable login by NIS users
<pre>
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
</pre>


* restart lightdm
<pre>
<pre>
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service
systemctl stop gdm
systemctl restart lightdm
</pre>
</pre>


= Enable crontab @reboot for MIDAS =
= Install libpng12.so.0 =


startup scripts have a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).
Quartus 16 needs libpng12:


<pre>
<pre>
mkdir /etc/systemd/system/cron.service.d
wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/cron.service.d/local.conf
dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb
systemctl daemon-reload
systemctl cat cron.service
</pre>
</pre>


Explore the systemd dependency tree using "systemctl list-dependencies" maybe with "--all".
= Install google-chrome =


Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.
<pre>
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb
</pre>


Crontab entry to start midas: (install in the midas user crontab, not root crontab)
confirm autoupdate is enabled, observe dl.google.com is present in the list of repositories:
<pre>
apt update
...
Get:5 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,094 B]
...
</pre>
 
FOLLOWING IS OBSOLETE:
 
Instructions from here:
https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable


<pre>
<pre>
su - midasuser
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
crontab -l
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-tmp.list'
#@reboot /bin/bash -l -c "/home/trinat/bin/start-daq-applications"
apt update
#@reboot /bin/tcsh -c "/home/trinat/bin/start-daq-applications"
apt install google-chrome-stable
/bin/rm -f /etc/apt/sources.list.d/google-tmp.list
</pre>
</pre>


= Install apache httpd proxy for midas and elog =
= Install amanda client =


This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.
ONLY ON MACHINES THAT HOST HOME DIRECTORIES


First, configure apache2:
* apt install amanda-client
 
* edit /etc/amandahosts
* execute these commands:
<pre>
<pre>
apt -y install apache2
amanda.triumf.ca amanda amdump
cd /etc/apache2
</pre>
</pre>
* create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
* check permissions on /etc/amandahosts:
<pre>
<pre>
SSLSessionCache        shmcb:/run/httpd/sslcache(512000)
root@daq00:/var/log/amanda# ls -l /etc/amandahosts
SSLSessionCacheTimeout  300
-rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin
</pre>
</pre>
* create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
* fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
* edit /etc/amanda-security.conf, add this line:
<pre>
<pre>
<IfModule mod_ssl.c>
runtar:gnutar_path=/usr/bin/tar
    <VirtualHost *:443>
        ServerName daq14.triumf.ca
        DocumentRoot /var/www/html
        ErrorLog /var/log/apache2/daq14.log
        SSLEngine on
        # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
        SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
        SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
        ## use port specified in elogd.cfg
        #ProxyPass /elog/ http://localhost:8082/ retry=1
        ## use mhttpd port
        #ProxyPass /      http://localhost:8080/ retry=1
        Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
        <Location />
            SSLRequireSSL
            AuthType Basic
            AuthName "DAQ password protected site"
            Require valid-user
            # create password file: touch /etc/apache2/htpasswd
            # to add new user or change password: htpasswd /etc/apache2/htpasswd username
            AuthUserFile /etc/apache2/htpasswd
        </Location>
    </VirtualHost>
</IfModule>
</pre>
</pre>
* stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
 
* stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
On the amanda machine:
* enable ssl module
 
* enable new configurations
* in amanda disklist, use dump type "bsdtcp-comp-user-tar"
* su - amanda and run amcheck -c daily daq00
<pre>
<pre>
a2enmod ssl
-bash-4.1$ amcheck -c daily daq00
a2enmod headers
 
a2enmod proxy
Amanda Backup Client Hosts Check
a2enmod proxy_http
--------------------------------
a2enconf ssl-daq14
Client check: 1 host checked in 0.092 seconds.  0 problems found.
a2ensite daq14-ssl
 
(brought to you by Amanda 3.3.7p1.git.685ff76d)
</pre>
</pre>
* disable default ssl sites
 
= Enable rc.local =
 
For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:
 
<pre>
<pre>
a2dissite 000-default-le-ssl
cd ~/git/scripts
a2dissite 000-default
git pull
ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
cp -n -v etc/rc.local /etc/
chmod a+rx /etc/rc.local
cp etc/rc-local.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable rc-local
systemctl start rc-local
systemctl status rc-local
</pre>
</pre>
* check that there are no syntax problems
 
= Remove unwanted packages =
 
<pre>
<pre>
apache2ctl configtest
apt purge  bash-completion # broken, adds unwanted "\" if "ls -l $ROOTSYS/<tab>"
apt purge  zsys # broken, do not use
apt purge  sddm # login manager
apt purge  avahi-daemon avahi-autoipd # not sure what it does, observed using 100% CPU
apt purge  modemmanager # probes all serial ports to see if it's a modem
</pre>
</pre>
* enable and start apache2:
 
= Disable unwanted services =
 
<pre>
<pre>
systemctl enable apache2
systemctl disable mpd
systemctl restart apache2
systemctl disable snapd
systemctl status apache2
systemctl disable ModemManager
systemctl --global mask tracker-extract-3.service
systemctl --global mask tracker-miner-fs-3.service
systemctl daemon-reload
</pre>
</pre>
* apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
* if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
* try to access https://daq14.triumf.ca
** you should see a complaint about self-signed certificate
** you should see a request for password (do not login yet)
** if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.
Second, configure certbot:


(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate,
= Disable sleep and suspend =
renewal can continue to use the https port 443)


(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
note: we see some computers randomly shutdown or go to sleep, log files indicates the "sleep" or "suspend" button was pushed by user, but no such buttons actually exist. this is the fix for this:


* check that port 80 is not used by anything:
* netstat -an | grep LISTEN | grep ^tcp | grep 80
* lsof -P | grep -i tcp | grep LISTEN | grep 80
* if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
* install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
<pre>
<pre>
apt install certbot python3-certbot-apache
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service
certbot certonly --standalone --installer apache
</pre>
</pre>
* then answer questions:
 
* "activate HTTPS for daq14.triumf.ca" - say ok
= Enable crontab @reboot for MIDAS =
* "enter email address" - enter your own email address
 
* "please read terms..." - read the terms and say "agree"
startup scripts have a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).
* it will take a few moments...
 
* "congratulations..." - say ok.
<pre>
<pre>
certbot install --apache --cert-name daq14.triumf.ca
mkdir /etc/systemd/system/cron.service.d
</pre>
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/cron.service.d/local.conf
* then answer questions:
systemctl daemon-reload
* "choose redirect..." - say "1" (no redirect)
systemctl cat cron.service
* look inside /etc/apache2/sites-enabled/daq14-ssl.conf to see that SSLCertificateFile & co point to certbot certificates in
/etc/letsencrypt/live/daq14.triumf.ca/
* to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
<pre>
certbot renew --standalone --installer apache --force-renewal
</pre>
</pre>


NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot
Explore the systemd dependency tree using "systemctl list-dependencies" maybe with "--all".
 
Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.


Third, activate password protection:
Crontab entry to start midas: (install in the midas user crontab, not root crontab)


* as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
<pre>
<pre>
touch /etc/apache2/htpasswd
su - midasuser
htpasswd /etc/apache2/htpasswd midas
crontab -l
#@reboot /bin/bash -l -c "/home/trinat/bin/start-daq-applications"
#@reboot /bin/tcsh -c "/home/trinat/bin/start-daq-applications"
</pre>
</pre>


* restart apache2
= Install apache httpd proxy for midas and elog =
<pre>
 
systemctl restart apache2
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.
systemctl status apache2
 
</pre>
First, configure apache2:


From here:
* execute these commands:
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
* enable proxy for ELOG - ditto
<pre>
<pre>
a2enmod proxy
apt -y install apache2
a2enmod proxy_http
cd /etc/apache2
apache2ctl configtest
systemctl restart apache2
</pre>
</pre>
 
* create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
From here:
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
* enable proxy for ELOG - ditto
<pre>
<pre>
a2enmod proxy
SSLSessionCache        shmcb:/run/httpd/sslcache(512000)
a2enmod proxy_http
SSLSessionCacheTimeout  300
apache2ctl configtest
SSLRandomSeed startup file:/dev/urandom  256
systemctl restart apache2
SSLRandomSeed connect builtin
SSLCryptoDevice builtin
</pre>
</pre>
* try accessing MIDAS https://daq14.triumf.ca/ (make sure mhttpd is running)
* create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
* if it's not working, check odb setting FIXME!
* try accessing ELog https://daq14.triumf.ca/elog/ (make sure elogd is running)
* if it's not working, check elogd.cfg file and make sure
<pre>
<pre>
SSL                  = 0
<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerName daq14.triumf.ca
        DocumentRoot /var/www/html
        ErrorLog /var/log/apache2/daq14.log
        SSLEngine on
        # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
        SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
        SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
        ## use port specified in elogd.cfg
        #ProxyPass /elog/ http://localhost:8082/ retry=1
        ## use mhttpd port
        #ProxyPass /      http://localhost:8080/ retry=1
        Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
        <Location />
            SSLRequireSSL
            AuthType Basic
            AuthName "DAQ password protected site"
            Require valid-user
            # create password file: touch /etc/apache2/htpasswd
            # to add new user or change password: htpasswd /etc/apache2/htpasswd username
            AuthUserFile /etc/apache2/htpasswd
            RequestHeader set X-Remote-User %{REMOTE_USER}s
        </Location>
        #SSLCertificateFile /root/server.cert
        #SSLCertificateKeyFile /root/server.key
    </VirtualHost>
</IfModule>
</pre>
</pre>
 
* stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
* enable ssl module and new configurations:
try this: pip install requests==2.6.0
 
= Enable elog PDF preview =
 
see https://stackoverflow.com/questions/52998331/imagemagick-security-policy-pdf-blocking-conversion
 
* xemacs -nw /etc/ImageMagick-6/policy.xml
* remove this section at the end:
<pre>
<pre>
<!-- disable ghostscript format types -->
a2enmod ssl
<policy domain="coder" rights="none" pattern="PS" />
a2enmod headers
<policy domain="coder" rights="none" pattern="PS2" />
a2enmod proxy
<policy domain="coder" rights="none" pattern="PS3" />
a2enmod proxy_http
<policy domain="coder" rights="none" pattern="EPS" />
a2enconf ssl-daq14
<policy domain="coder" rights="none" pattern="PDF" />
a2ensite daq14-ssl
<policy domain="coder" rights="none" pattern="XPS" />
</pre>
</pre>
 
* disable default ssl sites
= Install Jupyter notebook =
 
<pre>
<pre>
From https://jupyter.org/install
a2dissite 000-default-le-ssl
apt install python3-pip
a2dissite 000-default
pip install jupyterlab
ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
pip install notebook
~/.local/bin/jupyter notebook
watch the http://localhost:8888 URL that it printed
say "no" to offer to start firefox (it will not work!)
URL is: http://localhost:8888/tree?token=xxx
from the machine where you are running the web browser (i.e. google-chrome), run (replace trinat@trinatdaq with the username and machine name where you started jupyter)
open a new shell and run: ssh -v trinat@trinatdaq -L 8888:localhost:8888
in the web browser, open http://localhost:8888
this gives us the login page
in the password or token entry field, put the token from the "tree?token=xxx" above (printed by jupyter on startup)
push button "login"
jupyter page should open with the list of files in the trinat home directory
congratulate Brian with full success
</pre>
</pre>
 
* check that there are no syntax problems
= Install ZFS quota report =
<pre>
 
apache2ctl configtest
If there are any ZFS volumes, install script to report disk and quota usage
</pre>
 
* enable and start apache2:
<pre>
<pre>
cd ~/git/scripts/quotareport
systemctl enable apache2
git pull
systemctl restart apache2
mkdir /var/www/html/zfsquotareport
systemctl status apache2
cp -pv ~/git/scripts/quotareport/sorttable.js /var/www/html/zfsquotareport/
ln -s $PWD/zfsquotareport.perl /etc/cron.daily/
touch /etc/crontab
</pre>
</pre>
* apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
* if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
* try to access https://daq14.triumf.ca
** you should see a complaint about self-signed certificate
** you should see a request for password (do not login yet)
** if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.
Second, configure certbot:
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate,
renewal can continue to use the https port 443)


If httpd is configured to redirect "/" to MIDAS mhttpd:
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
* add following to /etc/apache2/sites-enabled/xxx-ssl.conf in front of "ProxyPass / ..."
* run "systemctl reload apache2"
<pre>
## do not proxy zfs quota report directory
ProxyPass /zfsquotareport/ !
</pre>


= Install PHP =
(Note: unsurprisingly, this requires outside access to connect with letsencrypt, so won't work if PC is only accessible from on-site network)


* apt install php libapache2-mod-php
* check that port 80 is not used by anything:
* systemctl restart apache2
* netstat -an | grep LISTEN | grep ^tcp | grep 80
* create /var/www/html/info.php
* lsof -P | grep -i tcp | grep LISTEN | grep 80
* if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
 
* install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
<pre>
<pre>
<?php
apt install certbot python3-certbot-apache
certbot certonly --standalone --installer apache
phpinfo();
</pre>
* then answer questions:
* "activate HTTPS for daq14.triumf.ca" - say ok
* "enter email address" - enter your own email address
* "please read terms..." - read the terms and say "agree"
* it will take a few moments...
* "congratulations..." - say ok.
<pre>
certbot install --apache --cert-name daq14.triumf.ca
</pre>
</pre>
* open https://daq00.triumf.ca/info.php
* then answer questions:
 
* "choose redirect..." - say "1" (no redirect)
= Configure TRIUMF printers =
* look inside /etc/apache2/sites-enabled/daq14-ssl.conf to see that SSLCertificateFile & co point to certbot certificates in
 
/etc/letsencrypt/live/daq14.triumf.ca/
* to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
<pre>
<pre>
systemctl stop cups
certbot renew --standalone --installer apache --force-renewal
systemctl disable cups
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a
</pre>
</pre>


= Enable core dumps =
NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot
 
Third, activate password protection:


By default, Ubuntu LTS 20.04 installs the apport package
* as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
which disabled core dumps from user applications. (google it up!).
<pre>
It is not meant to do this and documentation claims that
touch /etc/apache2/htpasswd
it is not installed and not enabled by default. Oh, well...
htpasswd /etc/apache2/htpasswd midas
</pre>


* restart apache2
<pre>
<pre>
apt remove apport
systemctl restart apache2
apt autoremove ### will remove apport-symptoms and a few other packages
systemctl status apache2
</pre>
</pre>


After this, core dumps are written to file "core" in the current directory.
From here:
See /proc/sys/kernel/core_pattern and /proc/sys/kernel/core_uses_pid.
* enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
 
* enable proxy for ELOG - ditto
Enable core dump file names to include process id, add following to /etc/rc.local
<pre>
a2enmod proxy
a2enmod proxy_http
apache2ctl configtest
systemctl restart apache2
</pre>


* try accessing MIDAS https://daq14.triumf.ca/ (make sure mhttpd is running)
* if it's not working, check odb setting FIXME!
* try accessing ELog https://daq14.triumf.ca/elog/ (make sure elogd is running)
* if it's not working, check elogd.cfg file and make sure
<pre>
<pre>
echo 1 > /proc/sys/kernel/core_uses_pid
SSL                  = 0
</pre>
</pre>


= Enable debugger =
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
try this: pip install requests==2.6.0


By default, Ubuntu LTS 20.04 does not permit debugger to attach and debug
== generate self-signed certificate ==
already running programs. To enable it, add following to /etc/rc.local


<pre>
<pre>
echo 0 > /proc/sys/kernel/yama/ptrace_scope
# cd $HOME
# openssl req  -nodes -new -x509  -keyout server.key -out server.cert -days 1001
...+....+..+..........+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..+...+.........+......+.+...+...+.....+...............+.........+...+.+......+...+...........+....+...+..+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+.+...+..+.......+..+...+.......+......+...+..+...+......+....+...............+..+...+....+...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
......+......+.+..+......+.+......+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.+.....+......+.+.........+......+.....+.+..+...+.......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.......+....+......+.....+...+...+.......+..+.+........+.+...+......+..+..........+..+.+...........+...+.......+......+.....+.......+...+.........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:CH
State or Province Name (full name) [Some-State]:Geneve
Locality Name (eg, city) []:CERN
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CERN
Organizational Unit Name (eg, section) []:ALPHA experiment         
Common Name (e.g. server FQDN or YOUR name) []:alphacpc05.cern.ch
Email Address []:
root@alphacpc05:~#
root@alphacpc05:~#
root@alphacpc05:~# ls -l
-rw-r--r-- 1 root root 1375 juil. 10 21:43 server.cert
-rw------- 1 root root 1708 juil. 10 21:42 server.key
root@alphacpc05:~# systemctl restart apache2
</pre>
</pre>


= Disable Ubuntu Pro nag =
= Enable elog PDF preview =
 
NOTE: looks like U-24 already has this correctly.


If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:
see https://stackoverflow.com/questions/52998331/imagemagick-security-policy-pdf-blocking-conversion
 
* xemacs -nw /etc/ImageMagick-6/policy.xml
* remove this section at the end:
<pre>
<pre>
/bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />
</pre>
</pre>


= Update packages =
= Install Jupyter notebook =


* apt-get update # update package list
<pre>
* apt-get dist-upgrade # install updated packages and update "kept back" packages
From https://jupyter.org/install
* apt-get autoremove # remove packages that apt thinks should be removed
apt install python3-pip
pip install jupyterlab
pip install notebook
~/.local/bin/jupyter notebook
watch the http://localhost:8888 URL that it printed
say "no" to offer to start firefox (it will not work!)
URL is: http://localhost:8888/tree?token=xxx
from the machine where you are running the web browser (i.e. google-chrome), run (replace trinat@trinatdaq with the username and machine name where you started jupyter)
open a new shell and run: ssh -v trinat@trinatdaq -L 8888:localhost:8888
in the web browser, open http://localhost:8888
this gives us the login page
in the password or token entry field, put the token from the "tree?token=xxx" above (printed by jupyter on startup)
push button "login"
jupyter page should open with the list of files in the trinat home directory
congratulate Brian with full success
</pre>


= Finish installation =
= Install ZFS quota report =


Congratulations. There is nothing more to do!
If there are any ZFS volumes, install script to report disk and quota usage


* reboot
<pre>
<pre>
shutdown -r now
cd ~/git/scripts/quotareport
git pull
mkdir /var/www/html/zfsquotareport
cp -pv ~/git/scripts/quotareport/sorttable.js /var/www/html/zfsquotareport/
ln -s $PWD/zfsquotareport.perl /etc/cron.daily/
touch /etc/crontab
</pre>
</pre>


= Install ZFS =
If httpd is configured to redirect "/" to MIDAS mhttpd:
 
* add following to /etc/apache2/sites-enabled/xxx-ssl.conf in front of "ProxyPass / ..."
!!! after installing all the packages, after updating the system, after updating the linux kernel, after rebooting into latest kernel !!!
* run "systemctl reload apache2"
 
<pre>
<pre>
apt-get install zfsutils-linux
## do not proxy zfs quota report directory
ProxyPass /zfsquotareport/ !
</pre>
</pre>


Follow generic ZFS instructions: [[ZFS]]
= Install PHP =
 
= Update to new version of Ubuntu =


* apt install php libapache2-mod-php
* systemctl restart apache2
* create /var/www/html/info.php
<pre>
<pre>
vi /etc/update-manager/release-upgrades # set "Prompt=normal"
<?php
do-release-upgrade
phpinfo();
</pre>
</pre>
* open https://daq00.triumf.ca/info.php


Update Ubuntu LTS 20.04 to LTS 22.04:
= Configure TRIUMF printers =


<pre>
<pre>
apt remove zsys
systemctl stop cups
systemctl stop cups-browsed.service
systemctl disable cups
systemctl disable cups-browsed.service
systemctl stop snap.cups.cupsd.service
systemctl stop snap.cups.cups-browsed.service
systemctl disable snap.cups.cupsd.service
systemctl disable snap.cups.cups-browsed.service
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a
</pre>
</pre>


== daqubuntu ==
= Enable core dumps =
 
By default, Ubuntu LTS 20.04 installs the apport package
which disabled core dumps from user applications. (google it up!).
It is not meant to do this and documentation claims that
it is not installed and not enabled by default. Oh, well...


<pre>
<pre>
# reboot to clear out all updates
apt purge apport
# vi /etc/update-manager/release-upgrades # set "Prompt=normal"
apt autoremove ### will remove apport-symptoms and a few other packages
# do-release-upgrade -c
Checking for a new Ubuntu release
New release '22.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
# do-release-upgrade
...
say yes...
...
login.defs, say "Y" (erase local changes, use packaged version)
/etc/systemd/resolved.conf, say "Y" (same as above)
firefox snap, say yes
unable to reach snap store, say "skip"
/etc/gmond.conf, say "Y"
/var/yp/Makefile, say "install the package maintainer's version"
/etc/ypserv.conf, same thing
/etc/ypserv.securenets, same thing
/etc/default/nis, same thing
/etc/speech-dispatcher/modules/mary-generic.conf, same thing
/etc/apt/apt.conf.d/50unattended-upgrades, same thing
...
278 packages are going to be removed, say yes
...
restart required, say yes
...
no ping... yes ping...
...
ssh daqubuntu, ok
apt update, fail, DNS does not work, "host security.ubuntu.com" does not resolve.
fix resolver per https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#Disable_NetworkManager
apt update, apt upgrade now works, 0 packages to update
NIS does not work.
</pre>
</pre>


== midm9a ==
After this, core dumps are written to file "core" in the current directory.
See /proc/sys/kernel/core_pattern and /proc/sys/kernel/core_uses_pid.
 
Enable core dump file names to include process id, add following to /etc/rc.local


<pre>
<pre>
login.defs
echo "echo 1 > /proc/sys/kernel/core_uses_pid" >> /etc/rc.local
firefox snap
</pre>
gmond.conf
 
ypserv
= Enable debugger =
/etc/default/nis
unattended-upgrades
amanda-security.conf
remove obsolete (no)
reboot
configure dns
reenable nis
</pre>


== daq17 ==
By default, Ubuntu LTS 20.04 does not permit debugger to attach and debug
already running programs. To enable it, add following to /etc/rc.local


<pre>
<pre>
firefox snap
echo "echo 0 > /proc/sys/kernel/yama/ptrace_scope" >> /etc/rc.local
imagemagick policy.xml
gmond.conf
chrony.conf
/var/yp/Makefile
ypserv.conf
ypserv.securenets
/etc/default/nis
50unattended-upgrades
</pre>
</pre>


== daq00 ==
= Disable Ubuntu Pro nag =


per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/
best I can tell, impossible at this time.
 
== do not do this ==
 
!!! does nothing !!!


<pre>
<pre>
do-release-upgrade -f DistUpgradeViewNonInteractive
pro config set apt_news=false
</pre>
</pre>


if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.
== do not do this ==


== isdaq08 ==
!!! breaks automatic updates because 20apt-esm-hook.conf is missing !!!


* prepare
If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:
<pre>
<pre>
cd ~/git/scripts
/bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf
git pull
cd ~
apt -y install debsums
</pre>
</pre>
* check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
 
== do not do this ==
 
!!! likely same as above, breaks automatic updates !!!
 
* comment out /etc/apt/apt.conf.d/20apt-esm-hook.conf
 
== do not do this ==
 
!!! removes too many packages !!!
 
<pre>
<pre>
root@isdaq08:~# debsums -ce
apt remove ubuntu-pro-client
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/apt/apt.conf.d/10periodic
root@isdaq08:~#
</pre>
</pre>
* restore original /etc/apt/apt.conf.d/10periodic
 
= Update packages =
 
<pre>
<pre>
APT::Periodic::Update-Package-Lists "1";
apt update # update package list
APT::Periodic::Download-Upgradeable-Packages "0";
apt upgrade # install updated packages and update "kept back" packages
APT::Periodic::AutocleanInterval "0";
apt autoremove # remove packages that apt thinks should be removed
</pre>
</pre>
* apt remove ganglia-monitor
* apt remove nis
* "debsums -ce" is now empty


Run the upgrade:
= Remove obsolete packages =


* do-release-upgrade -f DistUpgradeViewNonInteractive
DO NOT DO THIS, IT REMOVES TOO MUCH !!!
<pre>
apt list '~o'
apt purge '~o'
</pre>


Post upgrade:
= Cleanup residual configs =


* configure DNS
<pre>
* apt -y install linux-generic-hwe-22.04
apt list '~c'
* /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
apt purge '~c'
* /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
</pre>
* install missing packages
* restore ganglia
* restore nis
* check zpool status, may need zpool upgrade
* reboot


= Upgrade to new version of Debian =
= Install firefox-esr =
 
install firefox-esr here


https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html
= Finish installation =


== 32-bit VME processor Debian 11 to 12 ==
Congratulations. There is nothing more to do!


* cd git/scripts; git pull; cd ~
* reboot
* apt update
* apt upgrade
* edit /etc/apt/sources.list
<pre>
<pre>
deb http://deb.debian.org/debian/ bookworm main
shutdown -r now
#deb http://deb.debian.org/debian/ bullseye main
#deb-src http://deb.debian.org/debian/ bullseye main
</pre>
</pre>
* apt update
* apt upgrade --without-new-pkgs
* apt full-upgrade
* apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
* reboot


= Ubuntu package manager =
= Update to new version of Ubuntu =


* apt-get install xxx # install package xxx
* run "do-release-upgrade -c"
* apt-get update
* if it does not report new release Ubuntu 24, check /etc/update-manager/release-upgrades has "Prompt=lts"
* apt-get upgrade
* apt-get dist-upgrade
* apt-get autoremove # remove automatically installed packages required by a removed package
* apt-get remove xxx # remove package xxx
* apt-cache search . # list all available packages
* apt-cache show "." | grep ^Package # list al available packages
* apt-cache madison root-system # show all available versions of package root-system
* apt list # list all installed packages
* dpkg --listfiles libpng16-16 # list all files from this package
* apt list --installed # list all installed packages
* dpkg -S /bin/bash # what package provides this file?
* dpkg -L bash # what files provided by this package?
* debsums -ce # show modified config files
* apt-config dump # show apt configuration


= Ubuntu zsys =
== Update Ubuntu LTS 20.04 to LTS 22.04 ==


NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230
<pre>
 
apt remove zsys
* manual removal of old snapshots
<pre>
zsysctl show
zsysctl state remove xy69ye -s
zsysctl state remove xy69ye
zsysctl state remove xy69ye -u wheel
</pre>
</pre>
* apt remove zsys


NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots
=== daqubuntu ===


* manages system snapshots
* documentation: https://github.com/ubuntu/zsys
* documentation: (go to next article via link "newer" at the bottom) https://didrocks.fr/2020/05/21/zfs-focus-on-ubuntu-20.04-lts-whats-new/
* ubuntu 20.04 bug, too many snapshots cause /boot to become full and updates fail. https://github.com/ubuntu/zsys/issues/155
* solution: use custom /etc/zsys.conf, limit number of snapshots to 10, see trinatdaq:/etc/zsys.conf
* zsys commands:
<pre>
<pre>
update-grub # list of all snapshots, errors if some snapshots are broken
# reboot to clear out all updates
zsysctl state remove lnc0k7 --system # remove snapshot
# vi /etc/update-manager/release-upgrades # set "Prompt=normal"
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
# do-release-upgrade -c
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
Checking for a new Ubuntu release
zsysctl show # show snapshots
New release '22.04 LTS' available.
</pre>
Run 'do-release-upgrade' to upgrade to it.
 
# do-release-upgrade
= Ubuntu cloning =
...
 
say yes...
to clone a ubuntu image:
...
 
login.defs, say "Y" (erase local changes, use packaged version)
<pre>
/etc/systemd/resolved.conf, say "Y" (same as above)
cd /nfsroot/lxcpet
firefox snap, say yes
emacs -nw etc/hostname ### change hostname
unable to reach snap store, say "skip"
emacs -nw etc/mailname ### change hostname (debian 11)
/etc/gmond.conf, say "Y"
emacs -nw etc/defaultdomain ### change the NIS domainname
/var/yp/Makefile, say "install the package maintainer's version"
emacs -nw etc/yp.conf ### change the NIS server
/etc/ypserv.conf, same thing
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
/etc/ypserv.securenets, same thing
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
/etc/default/nis, same thing
emacs -nw root/.ssh/authorized_keys ### update root ssh keys
/etc/speech-dispatcher/modules/mary-generic.conf, same thing
/etc/apt/apt.conf.d/50unattended-upgrades, same thing
...
278 packages are going to be removed, say yes
...
restart required, say yes
...
no ping... yes ping...
...
ssh daqubuntu, ok
apt update, fail, DNS does not work, "host security.ubuntu.com" does not resolve.
fix resolver per https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#Disable_NetworkManager
apt update, apt upgrade now works, 0 packages to update
NIS does not work.
</pre>
</pre>


= Ubuntu boot loader =
=== midm9a ===


== boot from ZFS ==
* use UEFI boot with syslinux, see here: https://daq.triumf.ca/DaqWiki/index.php/SLinstall#Configure_UEFI_boot
* apt install zfs-initramfs
* update-initramfs -v -u
* ZFS structure:
<pre>
<pre>
root@daq00:~# zfs list
login.defs
NAME                                              USED  AVAIL    REFER  MOUNTPOINT
firefox snap
rpool                                              147G  1.62T      96K  /
gmond.conf
rpool/ROOT                                        17.8G  1.62T      96K  none
ypserv
rpool/ROOT/ubuntu_00aaaa                          17.8G  1.62T    6.22G  /
/etc/default/nis
</pre>
unattended-upgrades
* copy OS image to rpool/ROOT/ubuntu_00aaaa
amanda-security.conf
* zfs set mountpoint=/ rpool
remove obsolete (no)
* zfs set mountpoint=none rpool/ROOT
reboot
* zfs set mountpoint=/ rpool/ROOT/ubuntu_00aaaa
configure dns
* zfs get all | grep mountpoint
reenable nis
<pre>
rpool                    mountpoint            /                      local
rpool/ROOT                mountpoint            none                  local
rpool/ROOT/ubuntu_00aaaa  mountpoint            /                      local
</pre>
</pre>
* in linux kernel command line (syslinux.cfg), set "root=" to "root=ZFS=rpool/ROOT/ubuntu_00aaaa"


== boot from ZFS mirror ==
=== daq17 ===


=== setup the EFI partitions ===
* assuming /dev/sdb is already setup for EFI boot, setup /dev/sda the same way:
* partition the second boot disk same as first boot disk:
<pre>
<pre>
root@grsnis01:~# gdisk -l /dev/sdb
firefox snap
Found valid GPT with protective MBR; using GPT.
imagemagick policy.xml
Number  Start (sector)    End (sector)  Size      Code  Name
gmond.conf
  1            2048        1050623  512.0 MiB  EF00  EFI system partition
chrony.conf
  2        1050624      3907029134  1.8 TiB    8300  Linux filesystem
/var/yp/Makefile
root@grsnis01:~#
ypserv.conf
</pre>
ypserv.securenets
* mkfs.msdos /dev/sdX1
/etc/default/nis
* create mount points
50unattended-upgrades
<pre>
mkdir /boot/efi-sda
mkdir /boot/efi-sdb
</pre>
</pre>
* add to /etc/fstab
<pre>
/dev/sda1 /boot/efi-sda      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
/dev/sdb1 /boot/efi-sdb      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
</pre>
* mount -a
* df | grep boot
<pre>
root@grsnis01:~# df | grep boot
/dev/sdb1                    523248    98100    425148  19% /boot/efi-sdb
/dev/sda1                    523248        4    523244  1% /boot/efi-sda
</pre>
* copy boot files to new boot disk
* cd /boot/efi-sdX; rsync -av . /boot/efi-sdY
* set BIOS to boot from "UEFI Hard drive", disable legacy boot (except for booting from USB key in legacy mode)
* if using UEFI boot syslinux per these instructions, linux kernel update has to be done manually:
* run ~/git/scripts/etc/update_efi_mirror.perl, follow instructions that it prints.


=== setup zfs partitions ===
=== daq00 ===


use partitions compatible with Ubuntu "install on ZFS"
per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/
 
* gdisk "o" to create new GPT partition table
* gdisk "n" +512M ef00 to create EFI partition
* gdisk "n" +2G 8200 to create linux swap partition (not used)
* gdisk "n" +2G BE00 to create ZFS bpool partition
* gdisk "n" xxx BF00 create ZFS rpool partition


<pre>
<pre>
# gdisk -l /dev/sda
do-release-upgrade -f DistUpgradeViewNonInteractive
Number  Start (sector)    End (sector)  Size      Code  Name
  1            2048        1050623  512.0 MiB  EF00  EFI System Partition
  2        1050624        5244927  2.0 GiB    8200 
  3        5244928        9439231  2.0 GiB    BE00 
  4        9439232      234441614  107.3 GiB  BF00 
root@midm9a:~#
</pre>
</pre>


=== setup zfs mirror ===
if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.
 
=== isdaq08 ===


* see here: https://daq.triumf.ca/DaqWiki/index.php/ZFS#Convert_pool_from_single_to_mirror
* prepare
<pre>
<pre>
root@grsnis01:~# ls -l /dev/disk/by-id/ata*part2
cd ~/git/scripts
lrwxrwxrwx 1 root root 10 Feb 19 16:47 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801081-part2 -> ../../sda2
git pull
lrwxrwxrwx 1 root root 10 Feb 19 16:47 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 -> ../../sdb2
cd ~
 
apt -y install debsums
root@grsnis01:~# zpool status
</pre>
  pool: rpool
* check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
state: ONLINE
<pre>
  scan: none requested
root@isdaq08:~# debsums -ce
config:
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/apt/apt.conf.d/10periodic
root@isdaq08:~#
</pre>
* restore original /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
</pre>
* apt remove ganglia-monitor
* apt remove nis
* "debsums -ce" is now empty


        NAME                                            STATE    READ WRITE CKSUM
Run the upgrade:
        rpool                                            ONLINE      0    0    0
          ata-WDC_WDS200T2B0A-00SM50_205007801101-part2  ONLINE      0    0    0


errors: No known data errors
* do-release-upgrade -f DistUpgradeViewNonInteractive


root@grsnis01:~# zpool attach rpool ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801081-part2
Post upgrade:


root@grsnis01:~# zpool status
* configure DNS
  pool: rpool
* apt -y install linux-generic-hwe-22.04
state: ONLINE
* /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
status: One or more devices is currently being resilvered. The pool will
* /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
        continue to function, possibly in a degraded state.
* install missing packages
action: Wait for the resilver to complete.
* restore ganglia
  scan: resilver in progress since Fri Feb 19 16:54:39 2021
* restore nis
        12.6G scanned at 3.16G/s, 1.02G issued at 262M/s, 12.6G total
* check zpool status, may need zpool upgrade
        1.02G resilvered, 8.09% done, 0 days 00:00:45 to go
* reboot
config:
 
== upgrade U-22 to U-24 ==


        NAME                                              STATE    READ WRITE CKSUM
generic instructions. below are notes from upgrades of specific machines.
        rpool                                              ONLINE      0    0    0
          mirror-0                                        ONLINE      0    0    0
            ata-WDC_WDS200T2B0A-00SM50_205007801101-part2  ONLINE      0    0    0
            ata-WDC_WDS200T2B0A-00SM50_205007801081-part2  ONLINE      0    0    0  (resilvering)


errors: No known data errors
NOTE: at CERN saw installation getting stuck on restart of autofs (automount, rsyslogd 100% CPU, huge /var/log/syslog), stop autofs before upgrade?
<pre>
systemctl stop autofs
</pre>
 
<pre>
cleanup zsys
</pre>
</pre>
* wait
 
<pre>
<pre>
root@grsnis01:~# zpool status
maybe: if snap is already removed, remove firefox to prevent snap from reinstalling.
  pool: rpool
install non-snap firefox-esr
state: ONLINE
</pre>
  scan: resilvered 12.7G in 0 days 00:00:40 with 0 errors on Fri Feb 19 16:55:19 2021
config:


        NAME                                              STATE    READ WRITE CKSUM
<pre>
        rpool                                              ONLINE      0    0    0
remove thunderbird to prevent installation of snap thunderbird in U-24
          mirror-0                                        ONLINE      0    0    0
</pre>
            ata-WDC_WDS200T2B0A-00SM50_205007801101-part2  ONLINE      0    0    0
            ata-WDC_WDS200T2B0A-00SM50_205007801081-part2  ONLINE      0    0    0


errors: No known data errors
<pre>
mount EFI partition as /boot/efi, otherwise upgrade bombs
usually mount /dev/sda1 /boot/efi
</pre>
</pre>


== maintenance commands ==
<pre>
debsums -ce
apt -y remove desktop-base # causes installer crash
apt -y remove thunderbird  # avoid forced conversion to snap
apt update
apt -y upgrade
apt -y autoremove
do-release-upgrade -c      # confirm upgrade will be to U-24
do-release-upgrade
# say "y" to all questions
# after installation starts, accept default answers to all questions
# should run for about 1 hour or so
# after upgrade finishes
apt update
apt -y upgrade
apt -y autoremove
shutdown -r now
</pre>


* update-initramfs -v -u
post upgrade:
* grub-install /dev/sda


= Convert from single to dual mirrored ZFS SSD =
* check zpool status, may need zpool upgrade
* cd /etc/apt/sources.list.d, reenable 3rd party repos (mozilla, google, etc)


Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will
if installation bombs:
add a second SSD, configure ZFS to use both SSDs in mirrored
configuration and setup grub to boot from either SSD. This
is intended to create a full redundant system where failure
of either SSD does not break the system.


* identify first SSD
<pre>
<pre>
root@midm9b:~# ./smart-status.perl
apt update
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER Errors    Link
apt upgrade # will tell us to run "apt --fix-broken install", do it
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .      6.0
apt --fix-broken install
root@midm9b:~#  
apt update
apt upgrade # should resume the upgrade, will run for a long time
apt update
apt upgrade # should do nothing
apt autoremove
shutdown -r now
</pre>
</pre>
* connect second SSD of identical size
 
=== daqubuntu, U-24 ===
 
* prepare
<pre>
<pre>
root@midm9b:~# ./smart-status.perl
cd ~/git/scripts
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER  Errors    Link
git pull
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .      6.0
cd ~
    /dev/sdb  WD Blue SA510 2.5 250GB        22243Z803852              25        .        ?        ?        .        ?        .      6.0
apt -y install debsums
root@midm9b:~#
</pre>
</pre>
* if second SSD is not autodetected, reboot
* check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
* Clone partition table automatically
<pre>
If both SSDs are identical size, use this simpler method of duplicating the partition table:
root@daqubuntu:~# debsums -ce
/etc/ganglia/gmond.conf
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
/etc/ypserv.conf
/etc/ypserv.securenets
/var/yp/Makefile
/etc/update-manager/release-upgrades
/etc/apt/apt.conf.d/10periodic
/etc/yp.conf
root@daqubuntu:~#
* restore original /etc/apt/apt.conf.d/10periodic
<pre>
<pre>
root@midm9b:~# sfdisk -d /dev/sda > part_table
APT::Periodic::Update-Package-Lists "1";
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
</pre>
</pre>
The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.
* apt remove ganglia-monitor
 
* apt remove nis
* Clone partition table manually (e.g. for different size disks)
* apt autoremove
* list partition table of first SSD:
* restore original release-upgrades: "Prompt: lts"
* "debsums -ce" is now empty
 
Check for upgrade:
 
<pre>
<pre>
root@midm9b:~# fdisk -l /dev/sda
root@daqubuntu:~# do-release-upgrade -c
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Checking for a new Ubuntu release
Disk model: WD Blue SA510 2.
There is no development version of an LTS available.
Units: sectors of 1 * 512 = 512 bytes
To upgrade to the latest non-LTS development release
Sector size (logical/physical): 512 bytes / 512 bytes
set Prompt=normal in /etc/update-manager/release-upgrades.
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@daqubuntu:~#
Disklabel type: gpt
</pre>
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E


Device      Start      End  Sectors  Size Type
Run the upgrade:
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#
</pre>
* create identical partitions on second SSD, use sector numbers from above.
<pre>
root@midm9b:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 1.0.8


Partition table scan:
* do-release-upgrade -f DistUpgradeViewNonInteractive
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present


Creating new GPT entries in memory.
Post upgrade:


Command (? for help): n
* configure DNS
Partition number (1-128, default 1):
* apt -y install linux-generic-hwe-22.04
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}:
* /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
* /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
Current type is 8300 (Linux filesystem)
* install missing packages
Hex code or GUID (L to show codes, Enter = 8300): ef00
* restore ganglia
Changed type of partition to 'EFI system partition'
* restore nis
 
* check zpool status, may need zpool upgrade
Command (? for help): n
* cd /etc/apt/sources.list.d, reenable 3rd party repos (mozilla, google, etc)
Partition number (2-128, default 2):
* reboot
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}:
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): 8200
Changed type of partition to 'Linux swap'


Command (? for help): n
=== daq14, U-20-22-24 ===
Partition number (3-128, default 3):
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}:
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'


Command (? for help): n
* apt update, apt upgrade
Partition number (4-128, default 4):
* apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 ### install kernel 5.15
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}:
* shutdown -r now
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}:  
* stuck waiting for daq14 to shutdown...
Current type is 8300 (Linux filesystem)
* reboot into kernel 5.15
Hex code or GUID (L to show codes, Enter = 8300): bf00
* ???
Changed type of partition to 'Solaris root'
<pre>
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
</pre>
* debsums -ce
<pre>
/etc/apache2/ports.conf
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/sudoers
</pre>
* apache2 restore original ports.conf, uncomment "Listen 80"
* cp -pv /etc/dnsmasq.conf.dpkg-dist /etc/dnsmasq.conf
* apt remove ganglia-monitor
* edit /etc/yp.conf, remove everything after "# ypserver ypserver.network.com"
* "debsums -ce" is now empty
* do-release-upgrade -f DistUpgradeViewNonInteractive
* runs for a long time
* stuck on "/etc/default/nis", type "Y", press enter, nothing for a bit, then resumes running
* finished
* configure DNS
* reboot
* have kernel 6.8
* apt update; apt upgrade
* apt upgrade guile-2.2-libs ### would not auto-update, "kept back", has to be done by hand
* apt autoremove
* debsums -ce
<pre>
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
</pre>
* diff /etc/default/nis.dpkg-dist  /etc/default/nis
* cp -pv /etc/default/nis.dpkg-dist  /etc/default/nis
* debsums -ce
<pre>
debsums: missing file /etc/init.d/nis (from nis package)
</pre>
* we ignore this and run the update
* do-release-upgrade -c
<pre>
Checking for a new Ubuntu release
New release '24.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
</pre>
* do-release-upgrade -f DistUpgradeViewNonInteractive
* bombs out without any error messages
* in /var/log/dist-upgrade/main.log reports "Failed to find a replacement for xapp" and other packages
* apt remove xapp usrmerge ureadahead thunderbird-gnome-support
* no go, complains about even more packages.
* apt list | grep installed | grep -v jammy ### show packages installed from non-ubuntu sources
* remove all packages marked "install,local" ### ubuntu updater does not know where they came from and so cannot update them.
* apt remove desktop-base ### not happy about this package in /var/log/dist-upgrade/apt.log
* apt autoremove
* do-release-upgrade -f DistUpgradeViewNonInteractive
* running for a long time...
 
=== alpha04 U-20-24 ===


Command (? for help): w
* apt update, apt upgrade, apt autoremove
* reboot into latest kernel (already done)
* debsums -ce
<pre>
root@alpha04:~# debsums -ce
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/default/nis
/etc/yp.conf
root@alpha04:~#
</pre>
* move /etc/dnsmasq.conf to /etc/dnsmasq.d/alpha04.conf
* apt remove dnsmasq
* apt remove ganglia-monitor
* apt remove nis
* apt autoremove
* debsums -ce ### is now empty
* do-release-upgrade -f DistUpgradeViewNonInteractive
* it runs for a long time...
* complained about /etc/fwupd config files, not sure why...
* finished
* apt update, apt upgrade, apt autoremove
* restore dnsmasq: apt install dnsmasq, systemctl status dnsmasq
* restore ganglia, per instructions
* restore NIS: apt -y install rpcbind nis, ypwhich, ypwhich -m
* zpool upgrade rpool ### also upgrade any other zfs pools, see zpool status
* remove unwanted packages, per instructions
* run gonodeinfo
* reboot
* done


Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
== vera00 U20-22-24 ==
PARTITIONS!!


Do you want to proceed? (Y/N): y
* everything same as daq14 for U20-22
OK; writing new GUID partition table (GPT) to /dev/sdb.
* kernel is still 5.15
The operation has completed successfully.
* U22-24 is going...
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
* stuck for a few minutes on /etc/fwupd config files
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
* have kernel 6.8.0-49
Disk model: WD Blue SA510 2.
* same steps as daq14
Units: sectors of 1 * 512 = 512 bytes
* reboot
Sector size (logical/physical): 512 bytes / 512 bytes
* same steps as daq14
I/O size (minimum/optimal): 512 bytes / 512 bytes
* done
Disklabel type: gpt
 
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E
== phaarmonster U22-24 ==
 
* debsums -ce
<pre>
/etc/amandahosts
/etc/apache2/ports.conf
/etc/ganglia/gmond.conf
</pre>
* do-release-upgrade -c ### reports U24.04.1
* do-release-upgrade -f DistUpgradeViewNonInteractive ### bombs
* apt remove desktop-base; apt autoremove
* do-release-upgrade ### (interactive) runs ok
* bombed !!!
* apt upgrade spews errors and tells us to run "apt --fix-broken install"
* apt --fix-broken install ### runs
* bombs with thunderbird snap errors
* again... no go
* thunderbird snap complains about mounting /home, but /home is a symlink
* rm /home, mkdir /home
* again, runs ok
* asks about /etc/fwupd/fwupd.conf - say "Y" to install updated package version
* apt install completes
* apt update; apt upgrade ### running for a long time...
* finished
* install missing packages, etc
* reboot
* long wait... came back
* DNS does now work, systemd-resolved missing, apt install systemd-resolved, "configure DNS"
* done.


Device      Start      End  Sectors  Size Type
== isdaq10 U22-24 ==
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root


* do-release-upgrade -f DistUpgradeViewNonInteractive
* bombed on desktop-base, "apt remove desktop-base; apt autoremove"
* bombed with errors
* running "apt --fix-broken install"
* complained about thunderbird snap
* complained about /etc/fwupd/fwupd.conf (say Y)
* finished ok
* apt update
* apt upgrade ### reports "1382 upgraded, 140 newly installed, 0 to remove and 29 not upgraded"


Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
== iris01 ==
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603


Device      Start      End  Sectors  Size Type
* debsums -ce
/dev/sdb1    2048  1050623  1048576  512M EFI System
<pre>
/dev/sdb2  1050624  5244927  4194304    2G Linux swap
/etc/ganglia/gmond.conf
/dev/sdb3  5244928  9439231  4194304    2G Solaris boot
/etc/default/nis
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
/etc/yp.conf
root@midm9b:~#
</pre>
</pre>
* identify second SSD partitions
* apt remove desktop-base
* apt remove thunderbird
* apt autoremove
* do-release-upgrade -f DistUpgradeViewNonInteractive
* bombed with dpkg errors:
<pre>
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part3
hplip wants wrong python
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part3 -> ../../sda3
ganglia-monitor wants wrong libapr1
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 -> ../../sdb3
sssd-ad wants bunch of wrong libraries
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
xemacs21-mule
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
libnfsidmap1 wants libldap2
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
adn so forth...
</pre>
</pre>
* convert bpool from single disk to mirrored disk:
* apt --fix-broken install -y
<pre>
* bombs on ganglia - ganglia group absent from /etc/groups
root@midm9b:~# zpool status
* fix group ganglia by hand, remove ganglia group from NIS on isdaq00
  pool: bpool
* apt --fix-broken install
state: ONLINE
* apt upgrade
config:
 
== iris00 ==


NAME                                    STATE    READ WRITE CKSUM
* stuck on "autofs: restarting", login as root, kill iris midas, kill automount, systemctl restart autofs, noble got unstuck, ctrl-c systemctl restart autofs
bpool                                  ONLINE      0    0    0
* noble running...
  99e03dc0-7d4d-f24b-8fa1-f042b9f135db  ONLINE      0    0    0
* bombed with dpkg errors
* check ganglia user, group - both ok
* apt --fix-broken install
* apt upgrade


errors: No known data errors
== tigstore01 ==


  pool: rpool
* no bomb-out
state: ONLINE
config:


NAME                                    STATE    READ WRITE CKSUM
== midm9b ==
rpool                                  ONLINE      0    0    0
  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE      0    0    0


errors: No known data errors
* apt remove desktop-base thunderbird
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
* bombed
root@midm9b:~# zpool status bpool
* apt --fix-broken-install
  pool: bpool
* apt upgrade
state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:


NAME                                                STATE    READ WRITE CKSUM
= Upgrade to new version of Debian =
bpool                                              ONLINE      0    0    0
  mirror-0                                          ONLINE      0    0    0
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0    0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0    0    0


errors: No known data errors
https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html
</pre>
* convert rpool
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
root@midm9b:~# zpool status rpool
  pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
0B resilvered, 0.03% done, no estimated completion time
config:


NAME                                                STATE    READ WRITE CKSUM
== 32-bit VME processor Debian 11 to 12 to 13 ==
rpool                                              ONLINE      0    0    0
  mirror-0                                          ONLINE      0    0    0
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0


errors: No known data errors
* cd git/scripts; git pull; cd ~
root@midm9b:~#
* apt update
</pre>
* apt upgrade
* wait for resilver to complete
* edit /etc/apt/sources.list
<pre>
<pre>
root@midm9b:~# zpool status
deb http://deb.debian.org/debian/ trixie main
  pool: bpool
#deb http://deb.debian.org/debian/ bookworm main
state: ONLINE
#deb http://deb.debian.org/debian/ bullseye main
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
#deb http://deb.debian.org/debian/ buster main
config:
#deb-src http://deb.debian.org/debian/ bullseye main
</pre>
* apt update
* apt upgrade --without-new-pkgs
* apt full-upgrade
* apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
* reboot
 
= Ubuntu package manager =


NAME                                                STATE    READ WRITE CKSUM
* apt-get install xxx # install package xxx
bpool                                              ONLINE      0    0    0
* apt-get update
  mirror-0                                          ONLINE      0    0    0
* apt-get upgrade
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0    0
* apt-get dist-upgrade
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0    0    0
* apt-get autoremove # remove automatically installed packages required by a removed package
* apt-get remove xxx # remove package xxx
* apt-cache search . # list all available packages
* apt-cache show "." | grep ^Package # list al available packages
* apt-cache madison root-system # show all available versions of package root-system
* apt list # list all installed packages
* dpkg --listfiles libpng16-16 # list all files from this package
* apt list --installed # list all installed packages
* dpkg -S /bin/bash # what package provides this file?
* dpkg -L bash # what files provided by this package?
* debsums -ce # show modified config files
* apt-config dump # show apt configuration
* apt purge '~c' # purge all [residual-config] packages
* ls -l /var/lib/dpkg/info/ # show post-install scripts


errors: No known data errors
= Ubuntu zsys =


  pool: rpool
NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230
state: ONLINE
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
config:


NAME                                                STATE    READ WRITE CKSUM
* scripted removal of old snapshots (replace "echo zfs destroy" with "zfs destroy")
rpool                                              ONLINE      0    0    0
<pre>
  mirror-0                                          ONLINE      0    0    0
zfs list -t all | cut -f1 -d " " | grep autozsys | xargs -n1 echo zfs destroy
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
</pre>
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0


errors: No known data errors
* manual removal of old snapshots
<pre>
zsysctl show
zsysctl state remove xy69ye -s
zsysctl state remove xy69ye
zsysctl state remove xy69ye -u wheel
</pre>
</pre>
* enable booting from second SSD: (instead of /dev/sda1, /dev/sdb1, use UUID=xxx)
* apt remove zsys
 
NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots
 
* manages system snapshots
* documentation: https://github.com/ubuntu/zsys
* documentation: (go to next article via link "newer" at the bottom) https://didrocks.fr/2020/05/21/zfs-focus-on-ubuntu-20.04-lts-whats-new/
* ubuntu 20.04 bug, too many snapshots cause /boot to become full and updates fail. https://github.com/ubuntu/zsys/issues/155
* solution: use custom /etc/zsys.conf, limit number of snapshots to 10, see trinatdaq:/etc/zsys.conf
* zsys commands:
<pre>
<pre>
root@midm9b:~# mkfs.msdos /dev/sdb1
update-grub # list of all snapshots, errors if some snapshots are broken
root@midm9b:~# mkdir /boot/efi-sda
zsysctl state remove lnc0k7 --system # remove snapshot
root@midm9b:~# mkdir /boot/efi-sdb
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
root@midm9b:~# echo "/dev/sda1 /boot/efi-sda      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1" >> /etc/fstab
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
root@midm9b:~# echo "/dev/sdb1 /boot/efi-sdb      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1" >> /etc/fstab
zsysctl show # show snapshots
root@midm9b:~# mount -a
root@midm9b:~# df -kl
Filesystem                                      1K-blocks    Used Available Use% Mounted on
...
/dev/sda1                                          523244  13720    509524  3% /boot/efi
/dev/sdb1                                          523244      4    523240  1% /boot/efi-sdb
...
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
sending incremental file list
EFI/
...
root@midm9b:~# ls -l /boot/efi-sda
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# ls -l /boot/efi-sdb
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~#  
</pre>
</pre>
* setup script to update grub on second SSD, it must be run manually after every kernel update
<pre>
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
root@midm9b:~# ~/update_efi_grub.perl -u
EFI dir: /boot/efi-sda
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
building file list ... done


sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
= Ubuntu cloning =
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sda/EFI
building file list ... done


sent 216 bytes  received 11 bytes  454.00 bytes/sec
to clone a ubuntu image:
total size is 5,452,378  speedup is 24,019.29
EFI dir: /boot/efi-sdb
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
building file list ... done


sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
<pre>
total size is 7,944,644  speedup is 1,492.23
cd /nfsroot/lxcpet
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/ /boot/efi-sdb/EFI
emacs -nw etc/hostname ### change hostname
building file list ... done
emacs -nw etc/mailname ### change hostname (debian 11)
emacs -nw etc/defaultdomain ### change the NIS domainname
emacs -nw etc/yp.conf ### change the NIS server
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
emacs -nw root/.ssh/authorized_keys ### update root ssh keys
</pre>


sent 216 bytes  received 11 bytes  454.00 bytes/sec
= Ubuntu boot loader =
total size is 5,452,378  speedup is 24,019.29
root@midm9b:~#
</pre>


= Disable NetworkManager =
== maintenance commands ==


NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04
* update-initramfs -v -u
* grub-install /dev/sda


NetworkManager is useful for configuring dynamic
= Convert from single to dual mirrored ZFS SSD =
network interfaces, i.e. laptops that often move
between networks, or connect to multiple choice
of wifi networks, etc.


For machines with statically configured network interfaces,
Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will
NetworkManager is not necessary.
add a second SSD, configure ZFS to use both SSDs in mirrored
configuration and setup grub to boot from either SSD. This
is intended to create a full redundant system where failure
of either SSD does not break the system.


As it has been observed to become confused and observed
== partition ==
to malfunction when network links go up and down (it keeps
unnecessarily reconfiguring the ip address, etc), it can
be usefuil to disable it.


* list all network interfaces
* identify first SSD
<pre>
<pre>
# /bin/ls -1 /sys/class/net/
root@midm9b:~# ./smart-status.perl
enp0s31f6
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER Errors    Link
lo
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .      6.0
root@midm9b:~#
</pre>
</pre>
* edit /etc/network/interfaces:
* connect second SSD of identical size
<pre>
<pre>
rename enp0s31f6=eth0
root@midm9b:~# ./smart-status.perl
auto eth0
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER  Errors    Link
iface eth0 inet static
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .       ?        .     6.0
  address 142.90.120.94/19
    /dev/sdb  WD Blue SA510 2.5 250GB        22243Z803852              25        .        ?        ?        .       ?        .     6.0
  gateway 142.90.100.18
root@midm9b:~#
</pre>
</pre>
* statically configure systemd-resolved
* if second SSD is not autodetected, reboot
** create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
* Clone partition table automatically
If both SSDs are identical size, use this simpler method of duplicating the partition table:
<pre>
<pre>
[Resolve]
root@midm9b:~# sfdisk -d /dev/sda > part_table
DNS=142.90.100.19
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb
Domains=triumf.ca
</pre>
</pre>
** systemctl restart systemd-resolved
The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.
** resolvectl
 
** systemd-analyze cat-config systemd/resolved.conf
* Clone partition table manually (e.g. for different size disks)
* disable NetworkManager
* list partition table of first SSD:
<pre>
<pre>
systemctl disable NetworkManager
root@midm9b:~# fdisk -l /dev/sda
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E
 
Device      Start      End  Sectors  Size Type
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#
</pre>
</pre>
* reboot
* create identical partitions on second SSD, use sector numbers from above.
<pre>
root@midm9b:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 1.0.8


= Configure ECC memory =
Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present


== Configure EDAC ==
Creating new GPT entries in memory.


* apt install edac-utils
Command (? for help): n
Partition number (1-128, default 1):
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'


=== Intel i3-2120 ===
Command (? for help): n
<pre>
Partition number (2-128, default 2):
root@musr00:~# edac-ctl --mainboard
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}:
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
root@musr00:~# edac-ctl --status
Current type is 8300 (Linux filesystem)
edac-ctl: drivers not loaded.
Hex code or GUID (L to show codes, Enter = 8300): 8200
</pre>
Changed type of partition to 'Linux swap'
 
Command (? for help): n
Partition number (3-128, default 3):
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}:
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'


=== Intel E-2236 ===
Command (? for help): n
<pre>
Partition number (4-128, default 4):  
root@daq00:~# edac-ctl --mainboard
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}:  
edac-ctl: mainboard: Supermicro X11SCM-F
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}:  
root@daq00:~# edac-ctl --status
Current type is 8300 (Linux filesystem)
edac-ctl: drivers are loaded.
Hex code or GUID (L to show codes, Enter = 8300): bf00
root@daq00:~# edac-util
Changed type of partition to 'Solaris root'
edac-util: No errors to report.
 
root@daq00:~# edac-util -s
Command (? for help): w
edac-util: EDAC drivers are loaded. 1 MC detected
 
</pre>
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
* check edac sysfs files (Intel)
PARTITIONS!!
<pre>
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
drwxr-xr-x 2 root root    0 Jan 25 15:10 power
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank0
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank1
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank2
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank3
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank4
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank5
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank6
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank7
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
root@daq00:~#
</pre>


=== Intel E3-1270 v6 ===
Do you want to proceed? (Y/N): y
<pre>
OK; writing new GUID partition table (GPT) to /dev/sdb.
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
The operation has completed successfully.
edac-ctl: mainboard: Supermicro X11SSH-F
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
edac-ctl: drivers are loaded.
Disk model: WD Blue SA510 2.
root@grsnis01:~# edac-util
Units: sectors of 1 * 512 = 512 bytes
edac-util: No errors to report.
Sector size (logical/physical): 512 bytes / 512 bytes
root@grsnis01:~# edac-util -s
I/O size (minimum/optimal): 512 bytes / 512 bytes
edac-util: EDAC drivers are loaded. 1 MC detected
Disklabel type: gpt
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E
total 0
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
root@grsnis01:~#
</pre>


=== Intel E3-1245 v6 ===
Device      Start      End  Sectors  Size Type
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
 
 
Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603
 
Device      Start      End  Sectors  Size Type
/dev/sdb1    2048  1050623  1048576  512M EFI System
/dev/sdb2  1050624  5244927  4194304    2G Linux swap
/dev/sdb3  5244928  9439231  4194304    2G Solaris boot
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#
</pre>


== update ZFS pools ==
* identify second SSD partitions
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part2
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part2 -> ../../sdb2
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
</pre>
* convert bpool from single disk to mirrored disk:
<pre>
<pre>
[root@alphagdaq ~]# edac-ctl --mainboard
root@midm9b:~# zpool status
edac-ctl: mainboard: Supermicro X11SSH-F
   pool: bpool
[root@alphagdaq ~]# edac-ctl --mainboard
  state: ONLINE
edac-ctl: mainboard: Supermicro X11SSH-F
config:
[root@alphagdaq ~]# edac-ctl --status
edac-ctl: drivers are loaded.
[root@alphagdaq ~]# edac-util
edac-util: No errors to report.
[root@alphagdaq ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
[root@alphagdaq ~]# ras-mc-ctl --layout
          +-----------------------------------------------+
          |                      mc0                      |
          |  csrow0  |  csrow1  |  csrow2  |  csrow3   |
----------+-----------------------------------------------+
channel1: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB |
----------+-----------------------------------------------+
[root@alphagdaq ~]# ras-mc-ctl --error-count
Label              CE UE
mc#0csrow#3channel#0 0 0
mc#0csrow#2channel#1 0 0
mc#0csrow#3channel#1 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#2channel#0 0 0
[root@alphagdaq ~]# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
[root@alphagdaq ~]# ras-mc-ctl --summary
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
[root@alphagdaq ~]#
</pre>


=== AMD 3700X ===
NAME                                    STATE    READ WRITE CKSUM
bpool                                  ONLINE      0    0    0
  99e03dc0-7d4d-f24b-8fa1-f042b9f135db  ONLINE      0    0    0


(memory is non-ECC)
errors: No known data errors


<pre>
  pool: rpool
root@daq13:~# edac-ctl --mainboard
state: ONLINE
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
config:
root@daq13:~#
 
root@daq13:~#
NAME                                    STATE    READ WRITE CKSUM
root@daq13:~# edac-ctl --status
rpool                                  ONLINE      0    0    0
edac-ctl: drivers not loaded.
  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE      0     0     0
root@daq13:~# edac-util
edac-util: Error: No memory controller data found.
root@daq13:~# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
</pre>


(memory is ECC)
errors: No known data errors
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
root@midm9b:~# zpool status bpool
  pool: bpool
state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:


<pre>
NAME                                                STATE    READ WRITE CKSUM
root@trinatdaq:~# edac-ctl --mainboard
bpool                                              ONLINE      0    0    0
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
  mirror-0                                         ONLINE      0     0     0
root@trinatdaq:~# edac-ctl --status
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0     0     0
edac-ctl: drivers are loaded.
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0     0     0
root@trinatdaq:~# edac-util
 
edac-util: No errors to report.
errors: No known data errors
root@trinatdaq:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
lrwxrwxrwx 1 root root    0 Dec 13 18:31 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank4
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank5
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
root@trinatdaq:~#
</pre>
</pre>
* convert rpool
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
root@midm9b:~# zpool status rpool
  pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
0B resilvered, 0.03% done, no estimated completion time
config:


=== AMD 5000G ===
NAME                                                STATE    READ WRITE CKSUM
 
rpool                                              ONLINE      0    0    0
* no linux driver for AMD 5000-series "G" CPU
  mirror-0                                          ONLINE      0    0    0
* no mention of ECC in the BIOS settings
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
* unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0
* unclear status of ECC support in ASUS documentation (web page out of date)
 
=== AMD 5600X ===


errors: No known data errors
root@midm9b:~#
</pre>
* wait for resilver to complete
<pre>
<pre>
root@daq17:~# edac-ctl --mainboard
root@midm9b:~# zpool status
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
  pool: bpool
root@daq17:~# edac-ctl --status
state: ONLINE
edac-ctl: drivers are loaded.
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
root@daq17:~# edac-util
config:
edac-util: No errors to report.
 
root@daq17:~# edac-util -s
NAME                                                STATE    READ WRITE CKSUM
edac-util: EDAC drivers are loaded. 1 MC detected
bpool                                              ONLINE      0    0    0
root@daq17:~# ls -l /sys/devices/system/edac/mc
  mirror-0                                         ONLINE      0    0     0
total 0
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0     0     0
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0     0     0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
root@daq17:~#
</pre>


=== AMD 3955WX ===
errors: No known data errors


<pre>
  pool: rpool
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
state: ONLINE
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
config:
edac-ctl: drivers are loaded.
 
root@alphasuperdaq:~/git/scripts/quotareport# edac-util
NAME                                                STATE    READ WRITE CKSUM
edac-util: No errors to report.
rpool                                              ONLINE      0    0    0
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
  mirror-0                                          ONLINE      0    0    0
edac-util: EDAC drivers are loaded. 1 MC detected
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0
total 0
 
drwxr-xr-x 19 root root    0 Dez 12 04:48 mc0
errors: No known data errors
drwxr-xr-x  2 root root    0 Dez 12 04:48 power
</pre>
lrwxrwxrwx  1 root root    0 Dez  9 05:31 subsystem -> ../../../../bus/edac
 
-rw-r--r-- 1 root root 4096 Dez  9 05:31 uevent
== update boot loader ==
root@alphasuperdaq:~/git/scripts/quotareport#
 
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
INSTALL SYSLINUX: https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#EFI_boot_using_syslinux
total 0
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
DO *NOT* DO THE FOLOWING:
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
* create and mount EFI partitions:
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
<pre>
drwxr-xr-x 2 root root   0 Dez 12 04:48 power
root@midm9b:~# mkfs.msdos /dev/sdb1
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank0
root@midm9b:~# mkdir /boot/efi-sda
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank1
root@midm9b:~# mkdir /boot/efi-sdb
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
root@midm20c:~# blkid | grep vfat ### identify UUID
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
/dev/sdb1: UUID="DD89-5081" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="d0cb6be4-2f67-5b42-9b26-9e6905e9f774"
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank12
/dev/sdc1: UUID="D970-86BA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="e6d3b5b9-a512-44a2-9205-1a4db06ed2a2"
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank13
/dev/sda1: UUID="DDA1-044C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6dc9dff0-1c13-8045-a906-7803d3074c70"
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
root@midm20c:~# cat /etc/fstab | grep vfat ### add mount points with correct UUID
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
#UUID=D970-86BA  /boot/efi      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
UUID=DDA1-044C  /boot/efi-sda      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank3
UUID=DD89-5081  /boot/efi-sdb      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank4
root@midm9b:~# mount -a
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
root@midm9b:~# df -kl
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
Filesystem                                      1K-blocks   Used Available Use% Mounted on
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank7
...
drwxr-xr-x 3 root root   0 Dez 12 04:48 rank8
/dev/sda1                                          523244  13720    509524  3% /boot/efi
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
/dev/sdb1                                          523244      4   523240  1% /boot/efi-sdb
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
...
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
sending incremental file list
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
EFI/
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
...
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
root@midm9b:~# ls -l /boot/efi-sda
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
total 8
root@alphasuperdaq:~#  
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
root@alphasuperdaq:~# ras-mc-ctl --layout
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
root@midm9b:~# ls -l /boot/efi-sdb
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
total 8
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
root@midm9b:~#
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
</pre>
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
* add systemd "nofail" flag to /etc/fstab, without this, systemd will stop booting if one SSD is missing
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
<pre>
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
daq00:~$ cat /etc/fstab | grep vfat
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
#UUID=31A7-24BE  /boot/efi      vfat    umask=0022,fmask=0022,dmask=0022      0      1
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
/dev/sda1 /boot/efi-sda      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
/dev/sdb1 /boot/efi-sdb      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
</pre>
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
* setup script to update grub on second SSD, it must be run manually after every kernel update
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
<pre>
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
root@midm9b:~# ~/update_efi_grub.perl -u
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
EFI dir: /boot/efi-sda
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
building file list ... done
    |                                                                                              mc0                                                                                              |
 
    |                                            csrow0                                            |                                            csrow1                                            |
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
    | channel0 | channel1 | channel2 | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  |
total size is 7,944,644  speedup is 1,492.23
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/ /boot/efi-sda/EFI
building file list ... done
 
sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
EFI dir: /boot/efi-sdb
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
building file list ... done
 
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sdb/EFI
building file list ... done
 
sent 216 bytes received 11 bytes 454.00 bytes/sec
total size is 5,452,378 speedup is 24,019.29
root@midm9b:~#
</pre>


0: |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |
= Disable NetworkManager =
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@alphasuperdaq:~# ras-mc-ctl --error-count
Label              CE UE
mc#0csrow#0channel#2 0 0
mc#0csrow#1channel#7 0 0
mc#0csrow#0channel#3 0 0
mc#0csrow#1channel#4 0 0
mc#0csrow#1channel#2 0 0
mc#0csrow#0channel#7 0 0
mc#0csrow#1channel#3 0 0
mc#0csrow#0channel#4 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#1channel#5 0 0
mc#0csrow#0channel#6 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#0channel#5 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#6 0 0
root@alphasuperdaq:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~# ras-mc-ctl --summary
No Memory errors.


No PCIe AER errors.
== Debian-12 ==


No Extlog errors.
* use netplan, https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_modern_network_configuration_for_cloud
 
* apt install netplan.io
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
* systemctl enable systemd-networkd
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
* systemctl restart systemd-networkd
root@alphasuperdaq:~#
* create /etc/netplan/50-dhcp.yaml
<pre>
network:
  version: 2
  ethernets:
    all-en:
      match:
        name: "en*"
      dhcp4: true
      dhcp6: true
      ignore-carrier: true ### do not drop IP address if network link drops
</pre>
</pre>
* netplan apply
* netplan try
* ifconfig -a ### to check IP address settings
* netstat -rn ### to check default route
* cat /etc/resolv.conf ### to check DNS
* ls -l /run/systemd/netif/leases ### systemd-networkd dhcp leases
* NOTE: without "ignore-carrier" it will drop the IP address if network link drops, re-do dhcp when links comes back
* NOTE: wait-network-online will wait for all interfaces to get an IP address


== Configure rasdaemon ==
== Ubuntu-20 ==


<pre>
NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04
apt install rasdaemon
</pre>
<pre>
systemctl enable rasdaemon
systemctl restart rasdaemon
systemctl status rasdaemon
</pre>


<pre>
NetworkManager is useful for configuring dynamic
● rasdaemon.service - RAS daemon to log the RAS events
network interfaces, i.e. laptops that often move
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
between networks, or connect to multiple choice
    Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
of wifi networks, etc.
  Main PID: 2477175 (rasdaemon)
      Tasks: 1 (limit: 76958)
    Memory: 17.1M
    CGroup: /system.slice/rasdaemon.service
            └─2477175 /usr/sbin/rasdaemon -f -r


Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
For machines with statically configured network interfaces,
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
NetworkManager is not necessary.
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
</pre>


== Get reports ==
As it has been observed to become confused and observed
to malfunction when network links go up and down (it keeps
unnecessarily reconfiguring the ip address, etc), it can
be usefuil to disable it.


* Intel 2x32GB ECC DIMMs
* list all network interfaces
<pre>
<pre>
root@daq00:~# ras-mc-ctl --layout
# /bin/ls -1 /sys/class/net/
          +-------------------------+
enp0s31f6
          |          mc0          |
lo
          |  csrow0  |  csrow1  |
</pre>
----------+-------------------------+
* edit /etc/network/interfaces:
channel1: |  16384 MB  |  16384 MB  |
<pre>
channel0: |  16384 MB  |  16384 MB  |
rename enp0s31f6=eth0
----------+-------------------------+
auto eth0
root@daq00:~# ras-mc-ctl --error-count
iface eth0 inet static
Label                  CE      UE
   address 142.90.120.94/19
mc#0csrow#1channel#1   0      0
   gateway 142.90.100.18
mc#0csrow#1channel#0   0      0
</pre>
mc#0csrow#0channel#0    0      0
* statically configure systemd-resolved
mc#0csrow#0channel#1    0      0
** create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
root@daq00:~#
<pre>
[Resolve]
DNS=142.90.100.19
Domains=triumf.ca
</pre>
** systemctl restart systemd-resolved
** resolvectl
** systemd-analyze cat-config systemd/resolved.conf
* disable NetworkManager
<pre>
systemctl disable NetworkManager
</pre>
</pre>
* reboot


* Intel 4x16GB ECC DIMMs
== U-22, U-24 ifup-ko ==
 
Network configuration of modern linux is confused. There are at least 3 configuration methods, each with different shortcomings:
* the old ifup method is barely documented
* NetworkManager is well documented and tooled, but sometimes does strange things
* systemd-networkd is mysterious, and likely to do strange stuff, like all systemd stuff
* netplan is the latest method, configuration is simple but uses NetworkManager or systemd-networkd as backend.
 
This is a solution for a specific situation of fixed computer with one fixed wired interface and maybe one or more additional interfaces for fixed wired private networks.
 
Install /etc/ifup-ko, edit it with IP addresses of main and additional interfaces, let systemd run it in the right place in the boot sequence replacing NetworkManager and NetworkManager-wait-online .
 
As bonus, /etc/ifup-ko waits up to 10 seconds for the main interface link to come up. If this is not needed, comment it out from the script.
 
* prepare
<pre>
<pre>
root@daq00:~# ras-mc-ctl --error-count
cd ~/git/scripts
Label                  CE      UE
git pull
mc#0csrow#0channel#1    0      0
cd ifup-ko
mc#0csrow#2channel#0    0      0
make install
mc#0csrow#0channel#0    0      0
</pre>
mc#0csrow#2channel#1    0      0
* confirm interface names
mc#0csrow#1channel#0    0      0
<pre>
mc#0csrow#1channel#1    0      0
systemctl start ifup-ko ### should finish immediately or after 10 seconds
mc#0csrow#3channel#0    0      0
systemctl status ifup-ko -n 1000 ### observe list of interfaces is correct, name of main interface is correct
mc#0csrow#3channel#1    0      0
</pre>
root@daq00:~#
* edit /etc/ifup-ko
root@daq00:~# ras-mc-ctl --layout
** add host IP address to the "ifconfig" line
          +-----------------------+
** add gateway IP address to the "ip route add" line
          |          mc0          |
* test
          |  csrow0  |  csrow1  |
<pre>
----------+-----------------------+
systemctl start ifup-ko ### should finish immediately
channel1: |  8192 MB  |  8192 MB  |
systemctl status ifup-ko -n 1000 ### observe everything is configured as expected
channel0: |  8192 MB  |  8192 MB  |
</pre>
----------+-----------------------+
* cut-over
root@daq00:~#
<pre>
root@daq00:~#
systemctl disable networkd-dispatcher
root@daq00:~#
systemctl disable NetworkManager
root@daq00:~# ras-mc-ctl --print-labels
systemctl disable wpa_supplicant # if no Wifi or Wifi not in use
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
</pre>
root@daq00:~# ras-mc-ctl --mainboard
* reboot
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
 
root@daq00:~# ras-mc-ctl --summary
= Disable systemd-networkd =
No Memory errors.
 
On netbooted machines, systemd-networkd should be disabled - when "apt upgrade" runs
and needs to update and configure systemd, it will stop systemd-networkd, which
will stop the network, which will stop the NFS mounted root filesystem, which will stop
the machine.
 
<pre>
systemctl disable systemd-networkd.service
systemctl disable systemd-networkd.socket
systemctl mask systemd-networkd.service
systemctl mask systemd-networkd.socket
</pre>
 
= Configure ECC memory =


No PCIe AER errors.
== Configure EDAC ==


No Extlog errors.
* apt install edac-utils rasdaemon


DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
=== Intel i3-2120 ===
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
<pre>
root@daq00:~#  
root@musr00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
root@musr00:~# edac-ctl --status
edac-ctl: drivers not loaded.
</pre>
</pre>


note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.
=== Intel E-2236 ===
 
= sensors =
 
== ASUS P9X79 WS ==
 
* https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
* BIOS version 4802
* modprobe nct6775
* modprobe coretemp
 
<pre>
<pre>
root@daq14:~# sensors
root@daq00:~# edac-ctl --mainboard
coretemp-isa-0000
edac-ctl: mainboard: Supermicro X11SCM-F
Adapter: ISA adapter
root@daq00:~# edac-ctl --status
Package id 0: +35.0°C  (high = +82.0°C, crit = +100.0°C)
edac-ctl: drivers are loaded.
Core 0:       +29.0°C  (high = +82.0°C, crit = +100.0°C)
root@daq00:~# edac-util
Core 1:       +24.0°C  (high = +82.0°C, crit = +100.0°C)
edac-util: No errors to report.
Core 2:       +35.0°C  (high = +82.0°C, crit = +100.0°C)
root@daq00:~# edac-util -s
Core 3:       +32.0°C  (high = +82.0°C, crit = +100.0°C)
edac-util: EDAC drivers are loaded. 1 MC detected
</pre>
* check edac sysfs files (Intel)
<pre>
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
drwxr-xr-x 2 root root    0 Jan 25 15:10 power
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank0
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank1
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank2
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank3
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank4
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank5
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank6
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank7
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
root@daq00:~#
</pre>


nouveau-pci-0200
=== Intel E3-1270 v6 ===
Adapter: PCI adapter
<pre>
GPU core:   900.00 mV (min =  +0.85 V, max =  +1.00 V)
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
temp1:       +39.0°C  (high = +95.0°C, hyst =  +3.0°C)
edac-ctl: mainboard: Supermicro X11SSH-F
                      (crit = +105.0°C, hyst =  +5.0°C)
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
                      (emerg = +135.0°C, hyst =  +5.0°C)
edac-ctl: drivers are loaded.
root@grsnis01:~# edac-util
edac-util: No errors to report.
root@grsnis01:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
root@grsnis01:~#
</pre>


nct6776-isa-0290
=== Intel E3-1245 v6 ===
Adapter: ISA adapter
Vcore:          1.04 V  (min = +0.00 V, max = +1.74 V)
in1:            1.01 V  (min = +0.00 V, max =  +0.00 V)  ALARM
AVCC:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:          3.33 V  (min = +0.00 V, max =  +0.00 V)  ALARM
in4:            1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:            2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:          904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:            3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:          1265 RPM  (min =    0 RPM)
fan2:          1909 RPM  (min =    0 RPM)
fan3:            0 RPM  (min =    0 RPM)
fan4:            0 RPM  (min =    0 RPM)
fan5:            0 RPM  (min =    0 RPM)
SYSTIN:        +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:        +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
AUXTIN:        +31.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
PECI Agent 0:  +31.0°C  (high = +80.0°C, hyst = +75.0°C)
                        (crit = +96.0°C)
PCH_CHIP_TEMP:  +0.0°C 
PCH_CPU_TEMP:    +0.0°C 
PCH_MCH_TEMP:    +0.0°C 
intrusion0:    ALARM
intrusion1:    ALARM
beep_enable:  disabled
 
root@daq14:~#
</pre>
 
== ASUS TUF GAMING B550M-PLUS WIFI II ==
 
* BIOS 2803, 2806
* echo modprobe nct6775 >> /etc/rc.local


<pre>
<pre>
root@midm9a:~# sensors
[root@alphagdaq ~]# edac-ctl --mainboard
nct6798-isa-0290
edac-ctl: mainboard: Supermicro X11SSH-F
Adapter: ISA adapter
[root@alphagdaq ~]# edac-ctl --mainboard
in0:                     488.00 mV (min =  +0.00 V, max =  +1.74 V)
edac-ctl: mainboard: Supermicro X11SSH-F
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
[root@alphagdaq ~]# edac-ctl --status
in2:                        3.41 V (min = +0.00 V, max = +0.00 V) ALARM
edac-ctl: drivers are loaded.
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
[root@alphagdaq ~]# edac-util
in4:                       1.01 V (min = +0.00 V, max = +0.00 V) ALARM
edac-util: No errors to report.
in5:                        1.01 V (min = +0.00 V, max = +0.00 V) ALARM
[root@alphagdaq ~]# edac-util -s
in6:                     208.00 mV (min = +0.00 V, max = +0.00 V) ALARM
edac-util: EDAC drivers are loaded. 1 MC detected
in7:                        3.41 V (min = +0.00 V, max = +0.00 V) ALARM
[root@alphagdaq ~]# ras-mc-ctl --layout
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
          +-----------------------------------------------+
in9:                        1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
          |                      mc0                      |
in10:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
          | csrow0  | csrow1  | csrow2  | csrow3  |
in11:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
----------+-----------------------------------------------+
in12:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
channel1: | 8192 MB | 8192 MB | 8192 MB | 8192 MB |
in13:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
channel0: |  8192 MB | 8192 MB | 8192 MB | 8192 MB |
in14:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
----------+-----------------------------------------------+
fan1:                        0 RPM  (min =    0 RPM)
[root@alphagdaq ~]# ras-mc-ctl --error-count
fan2:                      760 RPM  (min =    0 RPM)
Label              CE UE
fan3:                        0 RPM  (min =    0 RPM)
mc#0csrow#3channel#0 0 0
fan7:                    1264 RPM  (min =    0 RPM)
mc#0csrow#2channel#1 0 0
SYSTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
mc#0csrow#3channel#1 0 0
CPUTIN:                   +22.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
mc#0csrow#0channel#0 0 0
AUXTIN0:                   +95.0°C    sensor = thermistor
mc#0csrow#1channel#1 0 0
AUXTIN1:                  +25.0°C    sensor = thermistor
mc#0csrow#0channel#1 0 0
AUXTIN2:                   +25.0°C    sensor = thermistor
mc#0csrow#1channel#0 0 0
AUXTIN3:                   +25.0°C    sensor = thermistor
mc#0csrow#2channel#0 0 0
PECI Agent 0 Calibration: +23.5°C 
[root@alphagdaq ~]# ras-mc-ctl --mainboard
PCH_CHIP_CPU_MAX_TEMP:     +0.0°C 
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
PCH_CHIP_TEMP:             +0.0°C 
[root@alphagdaq ~]# ras-mc-ctl --summary
PCH_CPU_TEMP:               +0.0°C 
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
TSI0_TEMP:                +32.4°C 
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
intrusion0:              ALARM
[root@alphagdaq ~]#
intrusion1:              ALARM
</pre>
beep_enable:              disabled


amdgpu-pci-0800
=== AMD 3700X ===
Adapter: PCI adapter
vddgfx:        1.45 V 
vddnb:      993.00 mV
edge:        +28.0°C 
PPT:          20.00 W 


k10temp-pci-00c3
(memory is non-ECC)
Adapter: PCI adapter
Tctl:        +33.4°C 


root@midm9a:~#  
<pre>
root@daq13:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@daq13:~#
root@daq13:~#
root@daq13:~# edac-ctl --status
edac-ctl: drivers not loaded.
root@daq13:~# edac-util
edac-util: Error: No memory controller data found.
root@daq13:~# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
</pre>
</pre>


== ASUS ASUS ROG STRIX B550-XE GAMING WIFI ==
(memory is ECC)
 
* BIOS 2423, 2604
* echo modprobe nct6775 >> /etc/rc.local


<pre>
<pre>
root@daq13:~# sensors
root@trinatdaq:~# edac-ctl --mainboard
nct6798-isa-0290
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
Adapter: ISA adapter
root@trinatdaq:~# edac-ctl --status
in0:                     344.00 mV (min =  +0.00 V, max =  +1.74 V)
edac-ctl: drivers are loaded.
in1:                     992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@trinatdaq:~# edac-util
in2:                       3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
edac-util: No errors to report.
in3:                       3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
root@trinatdaq:~# edac-util -s
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
edac-util: EDAC drivers are loaded. 1 MC detected
in5:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
in6:                     216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
total 0
in7:                       3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
in8:                       3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
in9:                        1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
lrwxrwxrwx 1 root root    0 Dec 13 18:31 subsystem -> ../../../../bus/edac
in10:                    960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
in11:                    960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
total 0
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
fan1:                     845 RPM  (min =    0 RPM)
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
fan2:                     998 RPM  (min =    0 RPM)
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
fan3:                       0 RPM  (min =   0 RPM)
drwxr-xr-x 2 root root   0 Dec 15 13:04 power
fan4:                        0 RPM  (min =   0 RPM)
drwxr-xr-x 3 root root   0 Dec 15 13:04 rank4
fan5:                        0 RPM  (min =   0 RPM)
drwxr-xr-x 3 root root   0 Dec 15 13:04 rank5
SYSTIN:                   +28.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
CPUTIN:                   +27.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
AUXTIN0:                   +94.0°C    sensor = thermistor
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
AUXTIN1:                   +28.0°C    sensor = thermistor
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
AUXTIN2:                   +28.0°C    sensor = thermistor
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
AUXTIN3:                   +97.0°C    sensor = thermistor
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
PECI Agent 0 Calibration: +27.5°C 
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
PCH_CHIP_CPU_MAX_TEMP:     +0.0°C 
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
PCH_CHIP_TEMP:             +0.0°C 
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
PCH_CPU_TEMP:               +0.0°C 
root@trinatdaq:~#
TSI0_TEMP:                 +33.6°C 
</pre>
intrusion0:              ALARM
 
intrusion1:              ALARM
=== AMD 5000G ===
beep_enable:              disabled


amdgpu-pci-0600
* no linux driver for AMD 5000-series "G" CPU
Adapter: PCI adapter
* no mention of ECC in the BIOS settings
vddgfx:        1.45 V 
* unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
vddnb:      999.00 mV
* unclear status of ECC support in ASUS documentation (web page out of date)
edge:        +29.0°C 
PPT:          14.00 W 


iwlwifi_1-virtual-0
=== AMD 5600X ===
Adapter: Virtual device
temp1:        +30.0°C 


k10temp-pci-00c3
<pre>
Adapter: PCI adapter
root@daq17:~# edac-ctl --mainboard
Tctl:         +33.9°C 
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
 
root@daq17:~# edac-ctl --status
root@daq13:~#  
edac-ctl: drivers are loaded.
root@daq17:~# edac-util
edac-util: No errors to report.
root@daq17:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@daq17:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
root@daq17:~#  
</pre>
</pre>


== ASUS ASUS ROG STRIX B550-E GAMING ==
=== AMD 3955WX ===
 
* bios 2803
* echo modprobe jc42 >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local


<pre>
<pre>
root@daq17:~# sensors
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
jc42-i2c-1-1b
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
Adapter: SMBus PIIX4 adapter port 0 at 0b00
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
temp1:       +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
edac-ctl: drivers are loaded.
                      (high =  +0.0°C, hyst =  +0.0°C)
root@alphasuperdaq:~/git/scripts/quotareport# edac-util
                      (crit =  +0.0°C, hyst =  +0.0°C)
edac-util: No errors to report.
 
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
iwlwifi_1-virtual-0
edac-util: EDAC drivers are loaded. 1 MC detected
Adapter: Virtual device
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
temp1:       +28.0°C 
total 0
 
drwxr-xr-x 19 root root    0 Dez 12 04:48 mc0
nouveau-pci-0800
drwxr-xr-x  2 root root    0 Dez 12 04:48 power
Adapter: PCI adapter
lrwxrwxrwx  1 root root   0 Dez 9 05:31 subsystem -> ../../../../bus/edac
GPU core:   900.00 mV (min =  +0.85 V, max = +1.00 V)
-rw-r--r-- 1 root root 4096 Dez  9 05:31 uevent
temp1:        +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
root@alphasuperdaq:~/git/scripts/quotareport#
                      (crit = +105.0°C, hyst =  +5.0°C)
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
                      (emerg = +135.0°C, hyst =  +5.0°C)
total 0
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
nct6798-isa-0290
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
Adapter: ISA adapter
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
in0:                     288.00 mV (min =  +0.00 V, max =  +1.74 V)
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
in1:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 2 root root    0 Dez 12 04:48 power
in2:                       3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank0
in3:                       3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank1
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
in5:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
in6:                     224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank12
in7:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank13
in8:                        3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
in9:                       1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
in10:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
in11:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank3
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank4
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
fan1:                      843 RPM  (min =    0 RPM)
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank7
fan2:                      629 RPM  (min =   0 RPM)
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank8
fan3:                      746 RPM  (min =    0 RPM)
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
fan4:                        0 RPM  (min =    0 RPM)
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
fan5:                        0 RPM  (min =    0 RPM)
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
SYSTIN:                    +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
CPUTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
AUXTIN0:                  +93.0°C    sensor = thermistor
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
AUXTIN1:                  +22.0°C    sensor = thermistor
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
AUXTIN2:                  +22.0°C    sensor = thermistor
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
AUXTIN3:                  +96.0°C    sensor = thermistor
root@alphasuperdaq:~#
PECI Agent 0 Calibration:  +25.5°C 
root@alphasuperdaq:~# ras-mc-ctl --layout
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
PCH_CHIP_TEMP:              +0.0°C 
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
PCH_CPU_TEMP:              +0.0°C 
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
TSI0_TEMP:                +27.6°C 
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
intrusion0:              ALARM
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
intrusion1:              ALARM
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
beep_enable:              disabled
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                              mc0                                                                                              |
    |                                            csrow0                                            |                                            csrow1                                            |
    | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


jc42-i2c-1-1a
0: |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB  |
Adapter: SMBus PIIX4 adapter port 0 at 0b00
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
temp1:        +23.2°C (low = +0.0°C)                  ALARM (HIGH, CRIT)
root@alphasuperdaq:~# ras-mc-ctl --error-count
                      (high = +0.0°C, hyst = +0.0°C)
Label              CE UE
                      (crit = +0.0°C, hyst = +0.0°C)
mc#0csrow#0channel#2 0 0
 
mc#0csrow#1channel#7 0 0
asusec-isa-0000
mc#0csrow#0channel#3 0 0
Adapter: ISA adapter
mc#0csrow#1channel#4 0 0
CPU_Opt:        0 RPM
mc#0csrow#1channel#2 0 0
Chipset:     +34.0°C 
mc#0csrow#0channel#7 0 0
CPU:         +25.0°C 
mc#0csrow#1channel#3 0 0
Motherboard: +22.0°C 
mc#0csrow#0channel#4 0 0
T_Sensor:     -40.0°C 
mc#0csrow#1channel#1 0 0
VRM:          +31.0°C 
mc#0csrow#1channel#0 0 0
mc#0csrow#1channel#5 0 0
mc#0csrow#0channel#6 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#0channel#5 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#6 0 0
root@alphasuperdaq:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~# ras-mc-ctl --summary
No Memory errors.
 
No PCIe AER errors.


k10temp-pci-00c3
No Extlog errors.
Adapter: PCI adapter
Tctl:        +28.0°C 
Tccd1:        +27.5°C 


root@daq17:~#  
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@alphasuperdaq:~#
</pre>
</pre>


== ASUS PRIME B650-PLUS ==
=== AMD 7700X ===
 
* BIOS 1811
* echo modprobe nct6775 >> /etc/rc.local


<pre>
<pre>
root@dsdaqgw:~# sensors
root@dsfe05:~# apt install edac-utils
amdgpu-pci-0b00
root@dsfe05:~# edac-ctl --mainboard
Adapter: PCI adapter
edac-ctl: mainboard: Supermicro H13SAE-MF
vddgfx:     930.00 mV
root@dsfe05:~# edac-ctl --status
vddnb:         1.19 V 
edac-ctl: drivers are loaded.
edge:         +38.0°C 
root@dsfe05:~# edac-util
PPT:         25.10 W 
edac-util: No errors to report.
root@dsfe05:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@dsfe05:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 May 14 09:33 ce_count
-r--r--r-- 1 root root 4096 May 14 09:33 ce_noinfo_count
-r--r--r-- 1 root root 4096 May 14 09:33 max_location
-r--r--r-- 1 root root 4096 May 14 09:33 mc_name
drwxr-xr-x 2 root root    0 May 14 09:33 power
drwxr-xr-x 3 root root    0 May 14 09:33 rank4
drwxr-xr-x 3 root root    0 May 14 09:33 rank5
--w------- 1 root root 4096 May 14 09:33 reset_counters
-r--r--r-- 1 root root 4096 May 14 09:33 seconds_since_reset
-r--r--r-- 1 root root 4096 May 14 09:33 size_mb
-r--r--r-- 1 root root 4096 May 14 09:33 ue_count
-r--r--r-- 1 root root 4096 May 14 09:33 ue_noinfo_count
-rw-r--r-- 1 root root 4096 May 14 09:33 uevent
root@dsfe05:~#
</pre>
 
== Configure rasdaemon ==
 
<pre>
apt install rasdaemon
</pre>
<pre>
systemctl enable rasdaemon
systemctl restart rasdaemon
systemctl status rasdaemon
</pre>


nct6799-isa-0290
<pre>
Adapter: ISA adapter
● rasdaemon.service - RAS daemon to log the RAS events
in0:                      920.00 mV (min =  +0.00 V, max =  +1.74 V)
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
in1:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
in2:                       3.39 V  (min =  +0.00 V, max =  +0.00 V) ALARM
  Main PID: 2477175 (rasdaemon)
in3:                       3.38 V  (min =  +0.00 V, max =  +0.00 V) ALARM
      Tasks: 1 (limit: 76958)
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
    Memory: 17.1M
in5:                       1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
     CGroup: /system.slice/rasdaemon.service
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
             └─2477175 /usr/sbin/rasdaemon -f -r
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                       3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V) ALARM
in10:                       1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.10 V  (min =  +0.00 V, max =  +0.00 V) ALARM
in12:                       1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                    1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +78.0°C    sensor = thermistor
AUXTIN1:                  +11.0°C    sensor = thermistor
AUXTIN2:                  +20.0°C    sensor = thermistor
AUXTIN3:                  +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C 
PCH_CHIP_CPU_MAX_TEMP:     +0.0°C 
PCH_CHIP_TEMP:             +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +42.6°C 
intrusion0:              ALARM
intrusion1:              OK
beep_enable:             disabled


k10temp-pci-00c3
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
Adapter: PCI adapter
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
Tctl:         +42.6°C 
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Tccd1:       +36.4°C 
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
 
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
root@dsdaqgw:~#
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
</pre>
</pre>


= Enable CPU turbo mode =
== Get reports ==


* Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
* Intel 2x32GB ECC DIMMs
* Find out CPU capability
<pre>
<pre>
root@daq01:~# lscpu | grep Hz
root@daq00:~# ras-mc-ctl --layout
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
          +-------------------------+
CPU MHz:                         3965.803
          |          mc0          |
CPU max MHz:                     4000.0000
          |  csrow0  |  csrow1  |
CPU min MHz:                     800.0000
----------+-------------------------+
root@daq01:~#  
channel1: |  16384 MB  |  16384 MB  |
channel0: |  16384 MB  |  16384 MB  |
----------+-------------------------+
root@daq00:~# ras-mc-ctl --error-count
Label                  CE      UE
mc#0csrow#1channel#1    0      0
mc#0csrow#1channel#0    0      0
mc#0csrow#0channel#0    0      0
mc#0csrow#0channel#1    0      0
root@daq00:~#  
</pre>
</pre>
* Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
 
https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html
* Intel 4x16GB ECC DIMMs
* Find current frequency settings:
<pre>
<pre>
root@daq01:~# cpupower frequency-info
root@daq00:~# ras-mc-ctl --error-count
analyzing CPU 0:
Label                  CE      UE
   driver: intel_pstate
mc#0csrow#0channel#1    0      0
  CPUs which run at the same hardware frequency: 0
mc#0csrow#2channel#0    0      0
  CPUs which need to have their frequency coordinated by software: 0
mc#0csrow#0channel#0    0      0
  maximum transition latency: Cannot determine or is not supported.
mc#0csrow#2channel#1    0      0
  hardware limits: 800 MHz - 4.00 GHz
mc#0csrow#1channel#0    0      0
  available cpufreq governors: performance powersave
mc#0csrow#1channel#1    0      0
  current policy: frequency should be within 800 MHz and 4.00 GHz.
mc#0csrow#3channel#0   0      0
                  The governor "powersave" may decide which speed to use
mc#0csrow#3channel#1    0      0
                  within this range.
root@daq00:~#
  current CPU frequency: Unable to call hardware
root@daq00:~# ras-mc-ctl --layout
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
          +-----------------------+
  boost state support:
          |          mc0          |
    Supported: yes
          |  csrow0   |  csrow1  |
    Active: yes
----------+-----------------------+
root@daq01:~#  
channel1: |  8192 MB  |  8192 MB  |
</pre>
channel0: |  8192 MB  |  8192 MB  |
* Note the following:
----------+-----------------------+
** current governor is "powersave"
root@daq00:~#
** "performance" governor is available
root@daq00:~#
** "boost state support" is supported and active.
root@daq00:~#
* Confirm CPU frequency governor:
root@daq00:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --summary
No Memory errors.
 
No PCIe AER errors.
 
No Extlog errors.
 
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@daq00:~#  
</pre>
 
note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.
 
* AMD 7700 2x32GB DDR5 ECC DIMMs
 
<pre>
<pre>
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
root@dsfe05:~# systemctl status rasdaemon
powersave
● rasdaemon.service - RAS daemon to log the RAS events
powersave
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
powersave
    Active: active (running) since Tue 2024-05-14 09:36:43 PDT; 33ms ago
powersave
    Process: 4088418 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
powersave
  Main PID: 4088417 (rasdaemon)
powersave
      Tasks: 1 (limit: 37300)
powersave
    Memory: 788.0K
powersave
        CPU: 5ms
root@daq01:~#  
    CGroup: /system.slice/rasdaemon.service
</pre>  
            └─4088417 /usr/sbin/rasdaemon -f -r
* Change governor to "performance":
 
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:aer_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:aer_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: mce:mce_record event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event mce:mce_record
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:extlog_mem_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:extlog_mem_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mc_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording aer_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording extlog_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mce_record events
root@dsfe05:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 907.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 908.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 911.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
    +-----------------------------------------------------------------------------------------------+
    |                                              mc0                                              |
    |        csrow0        |        csrow1        |        csrow2        |        csrow3        |
    | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  |
----+-----------------------------------------------------------------------------------------------+
 
0: |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |
----+-----------------------------------------------------------------------------------------------+
root@dsfe05:~# ras-mc-ctl --error-count
Label              CE UE
mc#0csrow#2channel#1 0 0
mc#0csrow#2channel#0 0 0
root@dsfe05:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --summary
No Memory errors.
 
No PCIe AER errors.
 
No Extlog errors.
 
No MCE errors.
root@dsfe05:~#  
</pre>
 
= sensors =
 
== VME CPU V7865 ==
 
add to /etc/rc.local
 
<pre>
modprobe coretemp
modprobe lm75
modprobe lm90
modprobe max1668
</pre>
 
available sensors:
 
<pre>
<pre>
root@daq01:~# cpupower frequency-set --governor performance
root@lxdaq23:~# sensors
Setting cpu: 0
max6657-i2c-0-4c
Setting cpu: 1
Adapter: SMBus I801 adapter at 0400
Setting cpu: 2
temp1:        +29.1°C  (low  = -55.0°C, high = +105.0°C)
Setting cpu: 3
                      (crit = +105.0°C, hyst = +95.0°C)
Setting cpu: 4
temp2:        +31.5°C  (low  = -55.0°C, high = +105.0°C)
Setting cpu: 5
                      (crit = +105.0°C, hyst = +95.0°C)
Setting cpu: 6
 
Setting cpu: 7
coretemp-isa-0000
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Adapter: ISA adapter
performance
Core 0:      +25.0°C  (crit = +100.0°C)
performance
Core 1:       +25.0°C  (crit = +100.0°C)
performance
 
performance
max1805-i2c-0-18
performance
Adapter: SMBus I801 adapter at 0400
performance
temp1:       +35.0°C  (low  = -55.0°C, high = +127.0°C)
performance
temp2:       +63.0°C  (low  = -55.0°C, high = +127.0°C)
performance
temp3:         FAULT  (low  = -55.0°C, high = +127.0°C)  ALARM (HIGH)
root@daq01:~# cpupower frequency-info
 
analyzing CPU 0:
lm75-i2c-0-48
  driver: intel_pstate
Adapter: SMBus I801 adapter at 0400
  CPUs which run at the same hardware frequency: 0
temp1:       +34.5°C  (high = +80.0°C, hyst = +75.0°C)
  CPUs which need to have their frequency coordinated by software: 0
 
  maximum transition latencyCannot determine or is not supported.
root@lxdaq23:~#  
  hardware limits: 800 MHz - 4.00 GHz
</pre>
  available cpufreq governors: performance powersave
 
  current policy: frequency should be within 800 MHz and 4.00 GHz.
== ASUS P7P55D EVO ==
                  The governor "performance" may decide which speed to use
 
                  within this range.
* BIOS version 2101
  current CPU frequency: Unable to call hardware
 
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
<pre>
  boost state support:
root@iris01:~# sensors
    Supported: yes
coretemp-isa-0000
    Active: yes
Adapter: ISA adapter
Core 0:       +34.0°C  (high = +83.0°C, crit = +99.0°C)
Core 1:       +37.0°C  (high = +83.0°C, crit = +99.0°C)
Core 2:      +38.0°C  (high = +83.0°C, crit = +99.0°C)
Core 3:      +35.0°C  (high = +83.0°C, crit = +99.0°C)
 
nouveau-pci-0100
Adapter: PCI adapter
GPU core:   900.00 mV (min =  +0.85 V, max =  +1.05 V)
temp1:       +46.0°C  (high = +95.0°C, hyst =  +3.0°C)
                      (crit = +105.0°C, hyst = +5.0°C)
                      (emerg = +135.0°C, hyst =  +5.0°C)
 
atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      864.00 mV (min =  +0.80 V, max =  +1.60 V)
+3.3V Voltage:       3.38 V  (min =  +2.97 V, max =  +3.63 V)
+5V Voltage:         5.04 V  (min =  +4.50 V, max =  +5.50 V)
+12V Voltage:        12.15 V  (min = +10.20 V, max = +13.80 V)
CPU Fan Speed:      968 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis1 Fan Speed: 1288 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis2 Fan Speed: 1316 RPM  (min =  600 RPM, max = 7200 RPM)
Power Fan Speed:      0 RPM  (min =    0 RPM, max = 7200 RPM)
CPU Temperature:     +34.0°C  (high = +45.0°C, crit = +45.5°C)
MB Temperature:     +30.0°C  (high = +45.0°C, crit = +46.0°C)
 
root@iris01:~#
</pre>
</pre>
* monitor CPU frequency:
 
== ASUS Z97-WS ==
 
* BIOS version 2704
* load sensors drivers
<pre>
<pre>
root@daq01:~# cpupower monitor
echo modprobe coretemp >> /etc/rc.local
    | Nehalem                  || Mperf              || Idle_Stats                                   
echo modprobe nct6775 >> /etc/rc.local
CPU| C3  | C6  | PC3  | PC6  || C0  | Cx  | Freq  || POLL | C1  | C1E  | C3  | C6  | C7s  | C8   
  0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
  4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
  1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
  5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
  2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
  6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
  3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
  7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~#
</pre>
</pre>
* check that the CPU is not overheating:
* in /boot/grub/grub.cfg, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
* update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg
 
<pre>
<pre>
root@daq01:~# sensors
root@isdaq08:~# sensors
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C 
temp2:        +29.8°C 
 
nct6791-isa-0290
Adapter: ISA adapter
Vcore:                888.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                    1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:                    3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:                  3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                    1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                    1.99 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)
3VSB:                    3.44 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:                    3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                    1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)
in11:                  840.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)
in13:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                  1041 RPM  (min =    0 RPM)
fan2:                  1040 RPM  (min =    0 RPM)
fan3:                    0 RPM  (min =    0 RPM)
fan4:                    0 RPM  (min =    0 RPM)
fan5:                    0 RPM  (min =    0 RPM)
fan6:                    0 RPM  (min =    0 RPM)
SYSTIN:                +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:                +41.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:              -128.0°C    sensor = thermistor
AUXTIN1:              -128.0°C    sensor = thermistor
AUXTIN2:                +35.0°C    sensor = thermistor
AUXTIN3:              +127.0°C    sensor = thermistor
PECI Agent 0:          +41.0°C 
PCH_CHIP_CPU_MAX_TEMP:  +0.0°C 
PCH_CHIP_TEMP:          +0.0°C 
PCH_CPU_TEMP:            +0.0°C 
PCH_MCH_TEMP:            +0.0°C 
PCH_DIM0_TEMP:          +0.0°C 
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:          disabled
 
coretemp-isa-0000
coretemp-isa-0000
Adapter: ISA adapter
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Package id 0:  +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +39.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +39.0°C  (high = +80.0°C, crit = +100.0°C)
 
root@isdaq08:~#
</pre>
</pre>
* congratulations, we are running at 4 GHz now!


= Setup ubuntu as gateway to private network =
== ASUS Z170-DELUXE ==


See also:
* BIOS version 3801
* https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
* load sensors drivers
* http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1
<pre>
echo modprobe coretemp >> /etc/rc.local
echo modprobe jc42 >> /etc/rc.local
echo modprobe lm92 >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
</pre>
* in /etc/default/grub, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
* update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg


== Steps to do ==
<pre>
 
root@iris00:~# sensors
* assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
nct6793-isa-0290
* (on the gateway machine, each private network interface has to have a different network number)
Adapter: ISA adapter
* (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
in0:                      600.00 mV (min = +0.00 V, max = +1.74 V)
* assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
in1:                        1.02 V  (min = +0.00 V, max = +0.00 V)  ALARM
* (for simplicity, assign 192.168.1.1 to the gateway machine itself)
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
in4:                        1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* setup DHCP server (ISC dhcpd or dnsmasq) to give out the IP addresses
in5:                      144.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* setup tftp, pxelinux and NFS for diskless booting
in6:                        0.00 V  (min =  +0.00 V, max =  +0.00 V)
* setup time server (chronyd) to provide common time to all devices
in7:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* setup NAT so machines on private network can access the internet (to get OS updates, etc)
in8:                       3.14 V  (min =  +0.00 V, max =  +0.00 V) ALARM
* setup NIS and NFS so machines on the private network can use common home directories
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* setup rsync backup of machines on the private network
in10:                    600.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.02 V  (min =  +0.00 V, max =  +0.00 V) ALARM
in13:                    592.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    968.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                    1370 RPM  (min =    0 RPM)
fan2:                    1437 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +32.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +42.0°C  (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0:                  -128.0°C    sensor = thermistor
AUXTIN1:                  +50.0°C    sensor = thermistor
AUXTIN2:                  +22.0°C    sensor = thermistor
AUXTIN3:                  +28.0°C    sensor = thermistor
PECI Agent 0:              +50.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +42.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
PCH_MCH_TEMP:              +0.0°C 
TSI2_TEMP:                +3892314.0°C 
TSI3_TEMP:                +3892314.0°C 
TSI4_TEMP:                +3892314.0°C 
TSI5_TEMP:                +3892314.0°C 
TSI6_TEMP:                +3892314.0°C 
TSI7_TEMP:                +3892314.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


== setup hosts ==
jc42-i2c-0-1a
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  = +0.0°C)                  ALARM (HIGH, CRIT)
                      (high = +0.0°C, hyst = +0.0°C)
                      (crit = +0.0°C, hyst =  +0.0°C)


* edit /etc/hosts
jc42-i2c-0-18
<pre>
Adapter: SMBus I801 adapter at f040
192.168.1.101 dsfe01
temp1:        +34.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
... and so forth
                      (high =  +0.0°C, hyst =  +0.0°C)
</pre>
                      (crit =  +0.0°C, hyst =  +0.0°C)


== setup dns and dhcp ==
jc42-i2c-0-1b
Adapter: SMBus I801 adapter at f040
temp1:        +35.0°C  (low  = +0.0°C)                  ALARM (HIGH, CRIT)
                      (high = +0.0°C, hyst = +0.0°C)
                      (crit = +0.0°C, hyst =  +0.0°C)


* apt install dnsmasq
jc42-i2c-0-19
* edit /etc/dnsmasq.conf
Adapter: SMBus I801 adapter at f040
<pre>
temp1:        +36.0°C  (low  = +0.0°C)                  ALARM (HIGH, CRIT)
# /etc/dnsmasq.conf
                      (high = +0.0°C, hyst = +0.0°C)
# DNS settings
                      (crit = +0.0°C, hyst =  +0.0°C)
#port=0 # disable DNS function
port=53 # enable DNS function
domain-needed
bogus-priv
no-resolv
server=142.90.100.19
# DHCP settings
interface=enp1s0f0 # DHCP interface
#dhcp-range=192.168.1.50,192.168.1.150,infinite
dhcp-range=192.168.1.0,static
#log-dhcp
quiet-dhcp
#dhcp-ignore=tag:!known
dhcp-boot=pxelinux.0
#dhcp-host=ac:1f:6b:9e:7f:4a,192.168.1.100,10m
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
# TFTP settings
enable-tftp
tftp-root=/tftpboot
</pre>
* #mkdir /zssd/tftpboot ### per tftp-root (if no ZFS)
* zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
* systemctl stop systemd-resolved.service
* systemctl disable systemd-resolved.service
* rm /etc/resolv.conf
* create new /etc/resolv.conf with this contents:
<pre>
nameserver 127.0.0.1
search snolab.ca
</pre>
* systemctl enable dnsmasq
* systemctl restart dnsmasq


== setup chronyd ==
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +48.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +84.0°C, crit = +100.0°C)


* enable ntp server:
root@iris00:~#
* configure and enable chronyd per instructions above
</pre>
* echo "allow 192.168.1.0/24" > /etc/chrony/conf.d/allow-localhost.conf
* systemctl restart chronyd
* chronyc tracking ### wait until time is synchronized (a few seconds)


== setup diskless network booting ==
== ASUS Z390M-PRO GAMING (WI-FI) ==


=== setup pxelinux ===
* BIOS 3006
* load sensors drivers
<pre>
<pre>
cd ~
echo modprobe coretemp >> /etc/rc.local
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
echo modprobe nct6775 >> /etc/rc.local
tar xjvf syslinux-4.03.tar.bz2
cd syslinux-4.03
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
</pre>
</pre>
* cd /zssd/tftpboot
 
<pre>
<pre>
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
root@daq18:~# sensors
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
nct6798-isa-0290
wget http://ladd00.triumf.ca/tftpboot/modules.alias
Adapter: ISA adapter
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
in0:                      696.00 mV (min =  +0.00 V, max =  +1.74 V)
wget http://ladd00.triumf.ca/tftpboot/pci.ids
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
in2:                        3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* mkdir pxelinux.cfg
in3:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* emacs -nw pxelinux.cfg/default
in4:                        1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
<pre>
in5:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)
default menu.c32
in6:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
prompt 0
in7:                        3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.17 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.07 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                    1131 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                    1006 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +32.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN0:                   +25.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN1:                    +7.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN2:                   +8.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN3:                   +24.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN4:                   +83.0°C  (high = +80.0°C, hyst = +75.0°C)  ALARM
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +29.0°C  (high = +80.0°C, hyst = +75.0°C)
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
PCH_MCH_TEMP:              +0.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


menu title Welcome to the DSVSLICE PXE boot menu
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +39.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +39.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +33.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +32.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +31.0°C  (high = +82.0°C, crit = +100.0°C)
Core 4:        +31.0°C  (high = +82.0°C, crit = +100.0°C)
Core 5:        +30.0°C  (high = +82.0°C, crit = +100.0°C)


timeout 50
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C 


label hdt
iwlwifi_1-virtual-0
  kernel hdt.c32
Adapter: Virtual device
temp1:        +28.0°C 


label memtest86+-5.01
root@daq18:~#
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz
</pre>


label memtest86+-4.20
== ASUS H110M-A/M.2 ==
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip


label vmlinuz-5.3.0-26-generic
* BIOS version 4202
  menu default
* echo modprobe coretemp >> /etc/rc.local
  kernel vmlinuz-5.3.0-26-generic
* echo modprobe nct6775 >> /etc/rc.local
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0


#end
<pre>
</pre>
root@midpol:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +30.0°C  (high = +80.0°C, crit = +100.0°C)


=== setup linux kernel ===
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)
 
nct6793-isa-0290
Adapter: ISA adapter
in0:                      368.00 mV (min = +0.00 V, max = +1.74 V)
in1:                        1.02 V  (min = +0.00 V, max = +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      928.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                    152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                    128.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                    136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                    1004 RPM  (min =    0 RPM)
fan2:                    1143 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                  +118.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +30.0°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +110.0°C    sensor = thermistor
PECI Agent 0:              +31.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +36.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
TSI2_TEMP:                +3892314.0°C 
TSI3_TEMP:                +3892314.0°C 
TSI4_TEMP:                +3892314.0°C 
TSI5_TEMP:                +3892314.0°C 
TSI6_TEMP:                +3892314.0°C 
TSI7_TEMP:                +3892314.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


* copy the kernel files
root@midpol:~#
<pre>
cd /boot
rsync -av config* initrd* System.map* vmlinuz* /zssd/tftpboot/
</pre>
</pre>
* cd /zssd/tftpboot
* chmod a+r *


=== setup nfs ===
== ASUS P9X79 WS ==
 
* https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
* BIOS version 4802
* modprobe nct6775
* modprobe coretemp


* apt-get install nfs-kernel-server
* emacs -nw /etc/exports
<pre>
<pre>
/zssd/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
root@daq14:~# sensors
</pre>
coretemp-isa-0000
* enable services
Adapter: ISA adapter
<pre>
Package id 0:  +35.0°C  (high = +82.0°C, crit = +100.0°C)
systemctl enable nfs-server
Core 0:        +29.0°C  (high = +82.0°C, crit = +100.0°C)
systemctl enable nfs-mountd
Core 1:        +24.0°C  (high = +82.0°C, crit = +100.0°C)
systemctl enable nfs-idmapd
Core 2:        +35.0°C  (high = +82.0°C, crit = +100.0°C)
systemctl restart nfs-server
Core 3:        +32.0°C  (high = +82.0°C, crit = +100.0°C)
systemctl restart nfs-mountd
systemctl restart nfs-idmapd
</pre>
* after editing /etc/exports, run
<pre>
exportfs -av
</pre>


=== setup userland ===
nouveau-pci-0200
Adapter: PCI adapter
GPU core:    900.00 mV (min = +0.85 V, max = +1.00 V)
temp1:        +39.0°C  (high = +95.0°C, hyst = +3.0°C)
                      (crit = +105.0°C, hyst = +5.0°C)
                      (emerg = +135.0°C, hyst =  +5.0°C)


* zfs create zssd/nfsroot
nct6776-isa-0290
* zfs set dedup=verify zssd/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
Adapter: ISA adapter
* clone ubuntu
Vcore:          1.04 V  (min = +0.00 V, max =  +1.74 V)
<pre>
in1:            1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
mkdir /zssd/nfsroot/dsfe01
AVCC:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
cd /
+3.3V:          3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
rsync -avx . /zssd/nfsroot/dsfe01
in4:            1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
in5:            2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* edit config files:
in6:          904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* cd /zssd/nfsroot/dsfe01
3VSB:            3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* emacs -nw etc/hostname ### change to dsfe01
Vbat:            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
* emacs -nw etc/mailname ### change to dsfe01
fan1:         1265 RPM  (min =    0 RPM)
* emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
fan2:          1909 RPM  (min =    0 RPM)
* emacs -nw etc/defaultdomain ### change to MUSR-NIS
fan3:            0 RPM  (min =    0 RPM)
* cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
fan4:            0 RPM  (min =    0 RPM)
* emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
fan5:            0 RPM  (min =    0 RPM)
* emacs -nw root/.ssh/authorized_keys ### update root ssh keys
SYSTIN:        +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
* emacs -nw etc/fstab ### add this
CPUTIN:        +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
<pre>
AUXTIN:        +31.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
192.168.1.1:/zssd/nfsroot/dsfe01 / nfs defaults,nolock 0 0
PECI Agent 0:  +31.0°C  (high = +80.0°C, hyst = +75.0°C)
                        (crit = +96.0°C)
PCH_CHIP_TEMP:  +0.0°C 
PCH_CPU_TEMP:    +0.0°C 
PCH_MCH_TEMP:    +0.0°C 
intrusion0:    ALARM
intrusion1:    ALARM
beep_enable:  disabled
 
root@daq14:~#
</pre>
</pre>
* emacs -nw etc/chrony/chrony.conf
** comment-out all "pool" and "server" entries
** add entry "server 192.168.1.1 iburst"


After dsfe01 is booted:
== ASUS TUF GAMING B550M-PLUS WIFI II ==
 
* BIOS 2803, 2806
* echo modprobe nct6775 >> /etc/rc.local


* disable services:
<pre>
<pre>
systemctl disable apache2
root@midm9a:~# sensors
systemctl disable dnsmasq
nct6798-isa-0290
systemctl disable zfs-import-cache
Adapter: ISA adapter
</pre>
in0:                      488.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                      760 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan7:                    1264 RPM  (min =    0 RPM)
SYSTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +22.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +95.0°C    sensor = thermistor
AUXTIN1:                  +25.0°C    sensor = thermistor
AUXTIN2:                  +25.0°C    sensor = thermistor
AUXTIN3:                  +25.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +23.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +32.4°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


To setup additional machines, clone dsfe01 instead of cloning the gateway machine
amdgpu-pci-0800
Adapter: PCI adapter
vddgfx:        1.45 V 
vddnb:      993.00 mV
edge:        +28.0°C 
PPT:          20.00 W 


=== Allow manpages to be viewed ===
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +33.4°C 


If <code>/</code> is mounted over NFS, <code>man</code> will report a permission error. Fix it with:
root@midm9a:~#
</pre>


<pre>
== ASUS ASUS ROG STRIX B550-XE GAMING WIFI ==
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.bin.man
</pre>


== setup shared home directory ==
* BIOS 2423, 2604
* echo modprobe nct6775 >> /etc/rc.local


=== on the gateway machine ===
<pre>
* define netgroups
root@daq13:~# sensors
* emacs -nw /etc/netgroup
nct6798-isa-0290
<pre>
Adapter: ISA adapter
dsfe (dsfe01,,) (dsfe02,,)
in0:                      344.00 mV (min =  +0.00 V, max =  +1.74 V)
</pre>
in1:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
<pre>
in3:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
netgroup: files
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
in5:                      960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* export the home directories:
in6:                      216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* emacs -nw /etc/exports ### add this:
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
<pre>
in8:                        3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
in9:                        1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
in10:                    960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* exportfs -rc
in11:                    960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in12:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
=== on the frontend machine ===
in13:                    280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in14:                    208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* mkdir /home
fan1:                      845 RPM  (min =    0 RPM)
* emacs -nw /etc/fstab ### add this:
fan2:                      998 RPM  (min =    0 RPM)
<pre>
fan3:                        0 RPM  (min =    0 RPM)
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +28.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +27.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +94.0°C    sensor = thermistor
AUXTIN1:                  +28.0°C    sensor = thermistor
AUXTIN2:                  +28.0°C    sensor = thermistor
AUXTIN3:                  +97.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +27.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +33.6°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled
 
amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.45 V 
vddnb:      999.00 mV
edge:        +29.0°C 
PPT:          14.00 W 
 
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +33.9°C 
 
root@daq13:~#
</pre>
 
== ASUS ASUS ROG STRIX B550-E GAMING ==
 
* bios 2803
* echo modprobe jc42 >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
root@daq17:~# sensors
jc42-i2c-1-1b
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +28.0°C 
 
nouveau-pci-0800
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
                      (crit = +105.0°C, hyst =  +5.0°C)
                      (emerg = +135.0°C, hyst =  +5.0°C)
 
nct6798-isa-0290
Adapter: ISA adapter
in0:                      288.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                      843 RPM  (min =    0 RPM)
fan2:                      629 RPM  (min =    0 RPM)
fan3:                      746 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +93.0°C    sensor = thermistor
AUXTIN1:                  +22.0°C    sensor = thermistor
AUXTIN2:                  +22.0°C    sensor = thermistor
AUXTIN3:                  +96.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +25.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +27.6°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled
 
jc42-i2c-1-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +23.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
asusec-isa-0000
Adapter: ISA adapter
CPU_Opt:        0 RPM
Chipset:      +34.0°C 
CPU:          +25.0°C 
Motherboard:  +22.0°C 
T_Sensor:    -40.0°C 
VRM:          +31.0°C 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +28.0°C 
Tccd1:        +27.5°C 
 
root@daq17:~#
</pre>
 
== ASUS PRIME B650-PLUS ==
 
* BIOS 1811
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
root@dsdaqgw:~# sensors
amdgpu-pci-0b00
Adapter: PCI adapter
vddgfx:      930.00 mV
vddnb:        1.19 V 
edge:        +38.0°C 
PPT:          25.10 W 
 
nct6799-isa-0290
Adapter: ISA adapter
in0:                      920.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                    1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +78.0°C    sensor = thermistor
AUXTIN1:                  +11.0°C    sensor = thermistor
AUXTIN2:                  +20.0°C    sensor = thermistor
AUXTIN3:                  +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +42.6°C 
intrusion0:              ALARM
intrusion1:              OK
beep_enable:              disabled
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +42.6°C 
Tccd1:        +36.4°C 
 
root@dsdaqgw:~#
</pre>
 
= Enable CPU turbo mode =
 
* Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
* Find out CPU capability
<pre>
root@daq01:~# lscpu | grep Hz
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
CPU MHz:                        3965.803
CPU max MHz:                    4000.0000
CPU min MHz:                    800.0000
root@daq01:~#
</pre>
* Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html
* Find current frequency settings:
<pre>
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
root@daq01:~#
</pre>
* Note the following:
** current governor is "powersave"
** "performance" governor is available
** "boost state support" is supported and active.
* Confirm CPU frequency governor:
<pre>
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
root@daq01:~#
</pre>
* Change governor to "performance":
<pre>
root@daq01:~# cpupower frequency-set --governor performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance
performance
performance
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
</pre>
* monitor CPU frequency:
<pre>
root@daq01:~# cpupower monitor
    | Nehalem                  || Mperf              || Idle_Stats                                   
CPU| C3  | C6  | PC3  | PC6  || C0  | Cx  | Freq  || POLL | C1  | C1E  | C3  | C6  | C7s  | C8   
  0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
  4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
  1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
  5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
  2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
  6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
  3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
  7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~#
</pre>
* check that the CPU is not overheating:
<pre>
root@daq01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
</pre>
* congratulations, we are running at 4 GHz now!
 
= Setup ubuntu as gateway to private network =
 
See also:
* https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
* http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1
 
== Steps to do ==
 
!!! UPDATED 16feb2024 Ubuntu-22.04.03 !!!
 
* assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
* (on the gateway machine, each private network interface has to have a different network number)
* (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
* assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
* (for simplicity, assign 192.168.1.1 to the gateway machine itself)
* (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
* setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
* setup DHCP server (dnsmasq) to give out the IP addresses
* setup TFTP server (dnsmasq), pxelinux and NFS for diskless booting
* setup time server (chronyd) to provide common time to all devices
* setup NAT so machines on private network can access the internet (to get OS updates, etc)
* setup NIS and NFS so machines on the private network can use common home directories
* setup rsync backup of machines on the private network
 
== setup hosts ==
 
* edit /etc/hosts
<pre>
192.168.1.101 dsfe01
... and so forth
</pre>
 
== setup dns and dhcp ==
 
!!! updated 16feb2024 for Ubuntu 22.04.3 !!!
 
!!! note: stock systemd-resolved remains, is configured to forward queries to dnsmasq, configured to forward queries to TRIUMF DNS !!!
 
!!! note: per authors of systemd, bare hostnames are not permitted, a DNS domain name must always be used. DNS domain name "dsdaq" is used in this example !!!
 
* apt install dnsmasq
* ensure dnsmasq starts after all interfaces are up (Ubuntu-22)
<pre>
mkdir /etc/systemd/system/dnsmasq.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/dnsmasq.service.d/local.conf
</pre>
* edit /etc/dnsmasq.conf
<pre>
# /etc/dnsmasq.conf
# DNS settings
#port=0 # disable DNS function
port=53 # enable DNS function
bind-interfaces # do not collide with systemd-resolved, we use 127.0.0.1:53, they use 127.0.0.53:53
domain-needed
bogus-priv
no-resolv
#log-queries # log DNS quesries
# TRIUMF DNS settings
server=142.90.100.19
expand-hosts
domain=dsdaq
local=/dsdaq/
localmx # do not forward MX queries to TRIUMF
 
# DHCP settings
interface=enp1s0f0 # VX network 192.168.0.x
#interface=missing  # FEP and TSP network 192.168.1.x
interface=enp1s0f1 # controls network 192.168.2.x
#dhcp-range=192.168.1.50,192.168.1.150,infinite
dhcp-range=192.168.0.0,static
dhcp-range=192.168.2.0,static
log-dhcp # log DHCP queries
#quiet-dhcp
dhcp-ignore=tag:!known
#dhcp-boot=pxelinux.0
dhcp-option=option:dns-server,192.168.0.248
dhcp-option=option:ntp-server,192.168.0.248
# TFTP settings
enable-tftp
tftp-root=/tftpboot
</pre>
* #mkdir /tftpboot ### per tftp-root (if no ZFS)
* zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
* create resolved-dsdaq.conf with main IP address of dnsmasq
<pre>
[Resolve]
DNS=192.168.0.248
Domains=dsdaq triumf.ca
</pre>
* mkdir -p /etc/systemd/resolved.conf.d/
* /bin/rm -f /etc/systemd/resolved.conf.d/*.conf
* cp resolved-dsdaq.conf /etc/systemd/resolved.conf.d/
* systemctl stop systemd-resolved.service
* systemctl disable systemd-resolved.service
* systemctl enable dnsmasq
* systemctl restart dnsmasq
* try to "ping" or "host" some names from /etc/hosts, it should work
* try to ping daq00, daq00.triumf.ca, all should work
* resolved-dsdaq.conf goes into /etc/systemd/resolved.conf.d/ of all machines on the private network
* if not using systemd-resolved, edit /etc/resolv.conf
 
== setup chronyd ==
 
* enable ntp server:
* disable systemd-timesyncd, configure and enable chronyd per instructions above
* create dsdaq.conf
<pre>
# chrony config for dsdaq server
 
#allow 192.168.0.0
#allow 192.168.1.0
#allow 192.168.2.0
allow all
 
# end
</pre>
* cp dsdaq.conf /etc/chrony/conf.d/
* systemctl restart chronyd
* chronyc tracking ### wait until time is synchronized (a few seconds)
* create dsdaq.sources # use hostname or IP address of chronyd server
<pre>
# Put this file in /etc/chrony/sources.d
# systemctl restart chrony
# chronyc sources
# chronyc tracking
server dsdaqgw iburst prefer
# end
</pre>
* dsdaq.sources goes to /etc/chrony/sources.d of all machines on the private network
 
== setup diskless network booting ==
 
=== setup pxelinux for legacy pxe boot ===
 
* add bits in dnsmasq.conf
<pre>
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
dhcp-boot=pxelinux.0
dhcp-option=17,"192.168.0.251:/nfsroot/%s,vers=3,tcp"
</pre>
* setup pxelinux for Ubuntu-18
<pre>
cd ~
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
tar xjvf syslinux-4.03.tar.bz2
cd syslinux-4.03
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
</pre>
* cd /zssd/tftpboot
<pre>
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
wget http://ladd00.triumf.ca/tftpboot/modules.alias
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
wget http://ladd00.triumf.ca/tftpboot/pci.ids
</pre>
* mkdir pxelinux.cfg
* emacs -nw pxelinux.cfg/default
<pre>
default menu.c32
prompt 0
 
menu title Welcome to the DSVSLICE PXE boot menu
 
timeout 50
 
label hdt
  kernel hdt.c32
 
label memtest86+-5.01
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz
 
label memtest86+-4.20
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip
 
label vmlinuz-5.3.0-26-generic
  menu default
  kernel vmlinuz-5.3.0-26-generic
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0
 
#end
</pre>
 
=== setup pxelinux for efi pxe boot ===
 
* https://c-nergy.be/blog/?p=13808
* add dnsmasq.conf bits. note: to use dhcp root-path, see the "nfsroot=auto" patch below and make sure to use the "dhcp-option-force" command (mkinitramfs dhcp client does not ask for root-path, we have to force-feed it).
<pre>
# uefi pxe
 
dhcp-boot=tag:uefipxe,uefi/syslinux.efi
dhcp-option-force=tag:fe01,option:root-path,192.168.0.248:/nfsroot/fe01
 
# VX network 192.168.0.x
 
dhcp-host=40:a6:b7:c1:d9:c5,fe01,infinite,set:uefipxe,set:fe01
</pre>
* apt install syslinux pxelinux syslinux-common syslinux-efi syslinux-utils
<pre>
mkdir /tftpboot/uefi
cp /usr/lib/SYSLINUX.EFI/efi64/syslinux.efi /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/ldlinux.e64 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/menu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/hdt.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libutil.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libmenu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libcom32.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libgpl.c32 /tftpboot/uefi/
</pre>
* try to boot, it should bomb with "cannot load pxelinux.cfg/default"
* mkdir /tftpboot/uefi/pxelinux.cfg
* create /tftpboot/uefi/pxelinux.cfg/default, note nfsroot path is hardwired, note "http:" is used to load vmlinuz and initrd files (because tftp is super slow)
<pre>
default menu.c32
prompt 0
 
menu title Welcome to the DSDAQGW UEFI PXE boot menu
 
timeout 50
 
label vmlinuz-6.5.0-17-generic
  kernel http://192.168.0.248:8088/uefi/vmlinuz-6.5.0-17-generic
  append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto rw ip=dhcp panic=60
 
# append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
 
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto ip=dhcp rw panic=60
 
#end
</pre>
* try to boot, it will bomb with "cannot load http://...."
* install mini_httpd on port 8088, see https://acme.com/software/mini_httpd/
<pre>
apt install mini-httpd
emacs -nw /etc/default/mini-httpd # set "START=1"
emacs -nw /etc/mini-httpd.conf # set "host=192.168.0.248", "port=8088", "data_dir=/tftpboot"
mkdir /etc/systemd/system/mini-httpd.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/mini-httpd.service.d/local.conf
systemctl enable mini-httpd
systemctl restart mini-httpd
systemctl status mini-httpd
wget http://192.168.0.248:8088/uefi/syslinux.efi
tail -100 /var/log/mini_httpd.log
</pre>
* fix U-22 initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
** emacs -nw /usr/lib/initramfs-tools/etc/dhcp/dhclient-enter-hooks.d/config
** add "echo ROOTPATH=..." if it is missing
<pre>
                echo "ROOTSERVER='${new_routers%% *}'"
                echo "ROOTPATH='$new_root_path'"
                echo "HOSTNAME='$new_host_name'"
</pre>
* fix U-24 initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
** emacs -nw /usr/share/initramfs-tools/dhcpcd-hooks/70-net-conf
** add "ROOTPATH=..." if it is missing
<pre>
DNSDOMAIN='${new_domain_name-}'                                                                                                                                               
ROOTSERVER='${new_routers-}'                                                                                                                                                 
ROOTPATH='${new_root_path-}'                                                                                                                                                 
filename='${new_filename-}'                                                                                                                                                   
DHCPLEASETIME='${new_dhcp_lease_time-}'                                                                                                                                       
</pre>
** regenerate initramfs (be careful you generate it for the right kernel!)
** see https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/2054482
<pre>
mkinitramfs 6.5.0-18-generic
mkinitramfs 6.8.0-51-generic -o /boot/initrd.img-6.8.0-51-generic
</pre>
* copy linux kernel and initrd
<pre>
cp /boot/vmlinuz-6.5.0-18-generic /tftpboot/uefi/
cp /boot/initrd.img-6.5.0-18-generic /tftpboot/uefi/
chmod a+r /tftpboot/uefi/*
</pre>
* try to boot, should bomb with messages about "trying to mount root filesystem"
* tail /var/log/syslog
<pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  2
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: error 8 User aborted the transfer received from 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  2
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  5
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/ldlinux.e64 to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/01-40-a6-b7-c1-d9-c5 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006E not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A800 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A80 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/menu.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/libutil.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  2
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 12 hostname  fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw rpc.mountd[3350210]: authenticated mount request from 192.168.0.110:981 for /nfsroot/fe01 (/nfsroot/fe01)
Feb 16 20:45:07 dsdaqgw rpc.mountd[3350210]: authenticated unmount request from 192.168.0.110:859 for /nfsroot/fe01/tmp/autoDY4k5u (/nfsroot/fe01)
</pre>
* tail /var/log/mini_httpd.log
<pre>
192.168.0.110 - - [16/Feb/2024:20:43:15 -0800] "GET /uefi/vmlinuz-6.5.0-17-generic HTTP/1.0" 200 14227944 "" "Syslinux/6.04"
192.168.0.110 - - [16/Feb/2024:20:43:24 -0800] "GET /uefi/initrd.img-6.5.0-17-generic HTTP/1.0" 200 137824833 "" "Syslinux/6.04"
</pre>
 
=== setup efi http boot ===
 
https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html
 
=== setup linux kernel ===
 
* copy the kernel files
<pre>
cd /boot
rsync -av config* initrd* System.map* vmlinuz* /tftpboot/
</pre>
* cd /tftpboot
* chmod a+r *
 
=== setup nfs ===
 
* apt-get install nfs-kernel-server
* enable NFS over UDP, edit /etc/nfs.conf add "udp=y":
<pre>
udp=y
</pre>
<pre>
systemctl restart nfs-server.service
</pre>
* emacs -nw /etc/exports
<pre>
/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
</pre>
* enable services
<pre>
systemctl enable nfs-server
systemctl enable nfs-mountd
systemctl enable nfs-idmapd
systemctl restart nfs-server
systemctl restart nfs-mountd
systemctl restart nfs-idmapd
</pre>
* after editing /etc/exports, run
<pre>
exportfs -av
</pre>
 
=== setup userland ===
 
!!! ubuntu-18 version !!!
 
* zfs create rpool/nfsroot
* zfs set dedup=verify rpool/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
* clone ubuntu
<pre>
mkdir /nfsroot/dsfe01
cd /
rsync -avx . /nfsroot/dsfe01
</pre>
* edit config files:
* cd /nfsroot/dsfe01
* emacs -nw etc/hostname ### change to dsfe01
* emacs -nw etc/mailname ### change to dsfe01
* emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
* emacs -nw etc/defaultdomain ### change to MUSR-NIS
* cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
* emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
* emacs -nw root/.ssh/authorized_keys ### update root ssh keys
* emacs -nw etc/fstab ### add this
<pre>
192.168.1.1:/nfsroot/dsfe01 / nfs defaults,nolock 0 0
</pre>
* emacs -nw etc/chrony/chrony.conf
** comment-out all "pool" and "server" entries
** add entry "server 192.168.1.1 iburst"
 
After dsfe01 is booted:
 
* disable services:
<pre>
systemctl disable apache2
systemctl disable dnsmasq
systemctl disable zfs-import-cache
</pre>
 
To setup additional machines, clone dsfe01 instead of cloning the gateway machine
 
=== Allow manpages to be viewed ===
 
If <code>/</code> is mounted over NFS, <code>man</code> will report a permission error. Fix it with:
 
<pre>
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.bin.man
</pre>
 
== setup shared home directory ==
 
=== on the gateway machine ===
* define netgroups
* emacs -nw /etc/netgroup
<pre>
dsfe (dsfe01,,) (dsfe02,,)
</pre>
* emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
<pre>
netgroup: files
</pre>
* export the home directories:
* emacs -nw /etc/exports ### add this:
<pre>
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
</pre>
* exportfs -rc
 
=== on the frontend machine ===
 
* mkdir /home
* emacs -nw /etc/fstab ### add this:
<pre>
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
</pre>
* mount -a
 
== setup NAT ==
 
NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
 
In these examples:
* replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
* replace "enp11s0" with name of the private network interface (192.168.1.x network)
 
* emacs -nw /etc/rc.local ### add this:
<pre>
# /etc/rc.local
 
# enable NAT
 
/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -L -v
 
# uncomment following lines if machine has prohibitive FORWARD rules:
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v
 
iptables -L -v
sysctl -w net.ipv4.ip_forward=1
#sysctl -a | grep forward
 
sh /etc/firewall-rfc1918.sh
 
# end
</pre>
* emacs -nw /etc/firewall-rfc1918.sh
<pre>
# firewall-rfc1918.sh
 
# prevent RFC1918 private network IP addresses from
# going in and out from our uplink.
 
ETH=eno1
 
iptables -F in-rfc1918
iptables -N in-rfc1918
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A in-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT
 
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH
 
iptables -F out-rfc1918
iptables -N out-rfc1918
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A out-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT
 
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH
 
iptables -D FORWARD -j out-rfc1918 -o $ETH
iptables -D FORWARD -j out-rfc1918 -o $ETH
iptables -I FORWARD -j out-rfc1918 -o $ETH
 
# allow TRIUMF-SECURE network
 
iptables -I in-rfc1918 -s 10.90.0.0/255.255.0.0 -j ACCEPT
iptables -I out-rfc1918 -d 10.90.0.0/255.255.0.0 -j ACCEPT
 
# show configuration
 
iptables -L -v
 
#end
</pre>
 
= KVM =
 
<pre>
apt install cpu-checker
 
root@daq13:~# kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
root@daq13:~#
 
(if not, shutdown, go into BIOS settings, enable CPU virtualization)
 
apt install virtinst ### will install many packages
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils
 
root@daq13:/home1/wheel# virsh list --all
Id  Name          State
------------------------------
1    ubuntu-guest  running
 
apt install virt-manager
 
virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'
 
virtual machine will start, boot, etc
to get out of it, CTRL + Shift followed by ]
 
ssh wheel@daq13
virt-manager
 
run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop
 
virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none
 
virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off
</pre>
 
build image
 
<pre>
dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
cd /kvm_ladd00/
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
rsync -av . /mnt/tmp/ --delete
umount /mnt/tmp
</pre>
 
on the guest, configure network: /etc/rc.local
<pre>
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
 
touch /var/lock/subsys/local
 
ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn
 
# end
</pre>
 
== virsh commands ==
 
<pre>
virsh list --all
 
virsh start kvm-el7
virsh console kvm-el7
virsh destroy kvm-el7
 
virsh install ...
virsh undefine kvm-el7
 
virsh autostart kvm-ladd00
virsh dominfo kvm-ladd00
</pre>
 
== virtualize SL6 ladd00 ==
 
* on ladd00:
<pre>
yum install dracut-network
mkinitrd /boot/initramfs-2.6.32-754.35.1.el6.x86_64-netboot.img 2.6.32-754.35.1.el6.x86_64
</pre>
* on daq00
<pre>
zfs create rpool/kvm-ladd00
cd /kvm-ladd00
rsync -avx ladd00:/ . --exclude nfsroot
brctl addbr virbr0
ifconfig virbr0 192.168.1.1
echo /kvm-ladd00 192.168.1.2(rw,no_root_squash,no_all_squash,async,no_subtree_check) >> /etc/exports
exportfs -rv
</pre>
* create virtual machine
<pre>
virt-install --name kvm-ladd00 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --network bridge=virbr0,model=virtio --boot kernel=/kvm-ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm-ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64-netboot.img,kernel_args="root=/dev/nfs ip=192.168.1.2:192.168.1.1:192.168.1.1:255.255.255.0:ladd00::off nfsroot=192.168.1.1:/kvm-ladd00,vers=3,tcp console=ttyS0,115200n8 serial rdshell" --graphics none --nodisks --check path_in_use=off
</pre>
* adjust kvm-ladd00 image
<pre>
disable network manager
edit fstab
edit yp.conf
edit resolv.conf
edit root/.ssh/authorized_keys
enable rngd or /dev/random does not work, sshd does not work
</pre>
* virsh shutdown test24
* virsh --connect qemu:///system start test24
* virsh console test24 ### to exit, ctrl+[ or ctrl+]
* virsh undefine test24
* virsh autostart kvm-ladd00
* virsh dominfo kvm-ladd00
<pre>
root@daq00:~# virsh dominfo kvm-ladd00
Id:            1
Name:          kvm-ladd00
UUID:          1d1f8fed-8b65-4411-a51b-e0ecf359d2f1
OS Type:        hvm
State:          running
CPU(s):        2
CPU time:      27.7s
Max memory:    2097152 KiB
Used memory:    2097152 KiB
Persistent:    yes
Autostart:      enable
Managed save:  no
Security model: apparmor
Security DOI:  0
Security label: libvirt-1d1f8fed-8b65-4411-a51b-e0ecf359d2f1 (enforcing)
root@daq00:~#
</pre>
* delete unused images in /var/lib/libvirt/images
* virsh edit kvm-ladd00 # change boot command line, etc
 
== virtualize CentOS-7 daqstore ==
 
* similar to ladd00 above:
* on daqstore, install dracut-network, already there
<pre>
yum install dracut-network
yum install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install
# yum install busybox ### no rpm package?!?
</pre>
<pre>
dracut -a nfs -v /boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img 3.10.0-1160.119.1.el7.x86_64 --force
</pre>
<pre>
scp daqstore:/boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img /kvm-el7/boot/
</pre>
* on daq00
<pre>
zfs create rpool/kvm-el7
cd /kvm-el7
rsync -avx daqstore:/ .
echo 192.168.1.3 kvm-el7 >> /etc/hosts
systemctl restart dnsmasq
echo /kvm-el 192.168.1.3(rw,no_root_squash,no_all_squash,async,no_subtree_check) >> /etc/exports
exportfs -rv
</pre>
* manage virtual machine
<pre>
virsh console kvm-el7
virsh destroy kvm-el7
virsh undefine kvm-el7
</pre>
* create virtual machine
<pre>
virt-install --name kvm-el7 --os-variant centos7 --vcpus 2 --ram 2048 --import --network bridge=virbr0,model=e1000e --boot kernel=/kvm-el7/boot/vmlinuz-3.10.0-1160.119.1.el7.x86_64,initrd=/kvm-el7/boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img,kernel_args="root=/dev/nfs ip=192.168.1.3:192.168.1.1:192.168.1.1:255.255.255.0:kvm-el7::off nfsroot=192.168.1.1:/kvm-el7,vers=3,tcp rw console=ttyS0,115200n8 serial rdshell" --graphics none --nodisks --check path_in_use=off
</pre>
* adjust kvm-el7 image
<pre>
disable network manager
edit fstab
edit hostname
disable selinux in /etc/sysconfig/selinux
 
UP TO HERE --- DNS does not work!!!
 
edit yp.conf
edit resolv.conf
edit root/.ssh/authorized_keys
enable rngd or /dev/random does not work, sshd does not work
</pre>
* virsh shutdown test24
* virsh --connect qemu:///system start test24
* virsh console test24 ### to exit, ctrl+[ or ctrl+]
* virsh undefine test24
* virsh autostart kvm-ladd00
* virsh dominfo kvm-ladd00
<pre>
root@daq00:~# virsh dominfo kvm-ladd00
Id:            1
Name:          kvm-ladd00
UUID:          1d1f8fed-8b65-4411-a51b-e0ecf359d2f1
OS Type:        hvm
State:          running
CPU(s):        2
CPU time:      27.7s
Max memory:    2097152 KiB
Used memory:    2097152 KiB
Persistent:    yes
Autostart:      enable
Managed save:  no
Security model: apparmor
Security DOI:  0
Security label: libvirt-1d1f8fed-8b65-4411-a51b-e0ecf359d2f1 (enforcing)
root@daq00:~#
</pre>
* delete unused images in /var/lib/libvirt/images
* virsh edit kvm-ladd00 # change boot command line, etc
 
= ARM64 cross-compiler =
 
== Ubuntu-22 ==
 
* arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
* install packages:
<pre>
apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
</pre>
* run:
<pre>
aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static
aarch64-linux-gnu-g++-12 -o fecdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 fecdm.o dsdm.o /home/dsdaqdev/packages_common/midas/linux-aarch64-remoteonly/lib/libmidas.a -pthread -lrt -lutil /nfsroot/gdm00/usr/lib/aarch64-linux-gnu/libi2c.a -static
</pre>
 
== Ubuntu-24 ==
 
* arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
* install packages:
<pre>
apt install g++-aarch64-linux-gnu
</pre>
* build:
<pre>
aarch64-linux-gnu-g++ -c -o xvcserver_cdm.o -O2 -g -Wall -Wuninitialized -std=c++20 xvcserver_cdm.cxx
aarch64-linux-gnu-g++ -o xvcserver_cdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 xvcserver_cdm.o -pthread -lrt -lutil -static
</pre>
 
= ARM cross-compiler =
 
NOTE: updated for U-24
 
* armv7 (Cyclone-V SoC, RPi3, MityARM CAMAC) machines (Debian-12 armhf target, Ubuntu 24.04 host)
* install packages:
<pre>
apt install g++-arm-linux-gnueabihf gcc-arm-linux-gnueabihf libc6-dev-armhf-cross
</pre>
* build MIDAS frontend (static linking)
<pre>
arm-linux-gnueabihf-g++ -std=c++11 -Wall -Wuninitialized -g -O2 -I/home/dldaq/packages/midas/include -I/home/dldaq/packages/midas/mvodb -c koi2c.cxx
arm-linux-gnueabihf-g++ -o fedldb.exe -std=c++11 -Wall -Wuninitialized -g -O2 -I/home/dldaq/packages/midas/include -I/home/dldaq/packages/midas/mvodb fedldb.o koi2c.o /home/dldaq/packages/midas/linux-armv7-remoteonly/lib/libmidas.a -L/usr/arm-linux-gnueabihf/lib -L/nfsroot/dltdc/usr/lib/arm-linux-gnueabihf -static -lm -lz -lutil -lnsl -lpthread -lrt -li2c
/usr/lib/gcc-cross/arm-linux-gnueabihf/13/../../../../arm-linux-gnueabihf/bin/ld: /home/dldaq/packages/midas/linux-armv7-remoteonly/lib/libmidas.a(system.o): in function `ss_socket_connect_tcp(char const*, int, int*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
/home/dldaq/packages/midas/src/system.cxx:4984:(.text+0x252a): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
</pre>
 
= 32-bit intel cross-compiler =
 
== Ubuntu 22.04 ==
 
<pre>
apt install libstdc++-11-dev:i386
apt install zlib1g-dev:i386
</pre>
 
NOTES:
* "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
* to cross-build 32-bit MIDAS, use "make linux32".
* executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
* executables cross-build on Ubuntu-22 run on 32-bit Debian-12.
 
== Ubuntu 24.04 ==
 
<pre>
apt install gcc-i686-linux-gnu
apt install g++-i686-linux-gnu
apt install libstdc++-13-dev:i386
apt install lib32z1 lib32z1-dev
i686-linux-gnu-gcc -o ttcp.i386 ttcp.c
</pre>
 
NOTES:
* executables cross-build on Ubuntu-24 will NOT run on 32-bit Debian-12 (GLIBC mismatch, static executables maybe work)
* executables cross-build on Ubuntu-24 run on 32-bit Debian-13
 
= SSH settings for EPICS =
 
* TRIUMF EPICS runs obsolete version of SSH
* add this to the use .ssh/config
<pre>
Host sbp1*
HostKeyAlgorithms +ssh-rsa
PubKeyAcceptedAlgorithms +ssh-rsa
KexAlgorithms +diffie-hellman-group1-sha1
ForwardX11 yes
ForwardX11Trusted yes
</pre>
 
= changes for VME processors =
 
<pre>
apt -y remove sysstat man-db
apt -y purge dkms
apt -y purge mdadm
apt -y purge fwupd
apt -y purge packagekit
apt -y purge accountsservice
apt -y purge plocate
apt -y purge upower power-profiles-daemon
apt -y autoremove
</pre>
 
for D-12 32-bit CPUs:
 
<pre>
apt remove "*libavahi*"
</pre>
 
= remove snap (U-24) =
 
Note: snap stores data in $USER/snap/$SNAPNAME, removing a snap on one machine will remove this data from all users even if they want to use snap on some other machine.
 
Prepare:
 
NOTE: first remove chromium and firefox, see below.
 
NOTE: if possible, stop autofs before removing snap - otherwise it will mount all user home directories and complain that it cannot remove some snap data from them
 
<pre>
systemctl stop autofs
ls -ld /home1/*/snap/* ### remove the per-user snap directories
</pre>
 
Remove snaps:
 
<pre>
snap list
echo snap remove --purge chromium ### see below
echo snap remove --purge firefox ### see below
snap remove thunderbird
snap remove cups
snap remove hello-world
snap remove firmware-updater
snap remove gtk-common-themes
snap remove snapd-desktop-integration
snap remove snap-store
snap remove hunspell-dictionaries-1-7-2004
snap remove gnome-system-monitor
snap remove gnome-3-26-1604
snap remove gnome-3-28-1804
snap remove gnome-3-34-1804
snap remove gnome-3-38-2004
snap remove gnome-42-2204
snap remove gnome-46-2404
snap remove mesa-2404
snap remove core
snap remove core18
snap remove core20
snap remove core22
snap remove core24
snap remove bare
snap remove snapd
snap list
</pre>
<pre>
root@daqubuntu:~# snap list
No snaps are installed yet. Try 'snap install hello-world'.
root@daqubuntu:~#
</pre>
 
Identify packages that install snaps:
 
<pre>
apt list | grep snap | grep installed | grep -v -e snappy -e snapshot
</pre>
 
Typical output:
 
<pre>
firefox/noble,now 1:1snap1-0ubuntu5 amd64 [installed]
gir1.2-snapd-2/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
libsnapd-glib-2-1/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
libsnapd-qt-2-1/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
plasma-discover-backend-snap/noble,now 5.27.11-0ubuntu2 amd64 [installed]
snapd/noble-updates,now 2.66.1+24.04 amd64 [installed]
</pre>
 
Remove packages that install snaps:
 
<pre>
apt remove chromium-browser
apt remove chromium-codecs-ffmpeg-extra
apt remove thunderbird
apt remove firefox
apt remove plasma-discover-backend-snap
apt remove plasma-discover-snap-backend
apt remove snapd
apt purge  snapd
# package gir1.2-snapd-2 is required by ubuntu-mate-desktop & co
# libsnapd-glib-2-1 is required by gstreamer, gnome-remote-desktop & co
ls -l /etc/systemd/system/ | grep snap ### remove unwanted stuff
</pre>
 
Remove Chromium:
 
* ls -ld /home/*/snap/chromium/*
* echo /bin/rm -rf `ls -1d /home/*/snap/chromium/*`
* snap remove chromium
 
Remove Firefox:
 
* ls -ld /home/*/snap/firefox/*
* echo /bin/rm -rf `ls -1d /home/*/snap/firefox/*` ### this will delete "snap firefox" profiles of all users!!!
* snap remove firefox ### this will also delete "snap firefox" profiles of all users!!!
 
Remove gir1.2-snapd-2:
* echo rm -vf /usr/lib/x86_64-linux-gnu/girepository-1.0/Snapd-2.typelib
 
If "snap remove" is stuck in "change in progress" (this will remove all snaps and break snapd, which is ok,
see https://forum.snapcraft.io/t/snap-remove-taking-forever-abort-wasnt-working/48915)
<pre>
rm /var/lib/snapd/state.json
systemctl restart snapd
</pre>
 
Prevent snap from reinstalling:
<pre>
cd ~/git/scripts/etc
git pull
cp etc-apt-preferencesd-disable-snap /etc/apt/preferences.d/
</pre>
 
= install non-snap thunderbird =
 
from: https://ubuntuhandbook.org/index.php/2024/03/install-thunderbird-deb-ubuntu-2404/
 
* remove snap thunderbird
<pre>
snap remove --purge thunderbird
apt remove --purge thunderbird
</pre>
* add mozilla repository
<pre>
already done after installing firefox-esr
</pre>
* ppa deb and ubuntu snap thunderbird package names are the same, change priority and hide snap package: create /etc/apt/preferences.d/mozillateamppa
<pre>
Package: thunderbird*
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001
 
Package: thunderbird*
Pin: release o=Ubuntu
Pin-Priority: -1
</pre>
* check that it worked, it should say "build" instead of "snap"
<pre>
apt update
apt list | grep thunderbird | grep -v locale
...
thunderbird/noble 1:128.8.1+build1-0ubuntu0.24.04.1~mt1 amd64
...
</pre>
* install
<pre>
apt install thunderbird
</pre>
* run: thunderbird
 
= EFI boot using syslinux =
 
* rationale 1: GRUB is the stock boot loader with U-24. It is unnecessarily complicated. EFI BIOS can boot the linux kernel directly, without GRUB, but unfortunately a small shim bootloader is required to specify the initrd file and the root filesystem. the syslinux boot loader can do this in a very simple way.
* rationale 2: GRUB bootloader configuration is overcomplicated, and when it breaks, it is almost impossible to debug and to recover.
* rationale 3: GRUB bootloader scripts in U-24 have no support for booting from redundant SSDs.
* in the case of GRUB bootloader failure, it is simplest to boot the Ubuntu installer in recovery mode and convert the bootloader from GRUB to syslinux. Open firefox on this page and cut-and-paste the steps (only copy of vmlinux and initrd cannot copy-and-paste as of this writing).
* in the case of servers with redundant SSDs for OS and home directories (ZFS mirror), it is simplest to use the syslinux bootloader to ensure that the machine boots from either SSD (all combinations SSD failures, fail of either EFI partiion, fail of either ZFS mirror partition, machine should boot)
* check partition tables, SATA SSD
<pre>
fdisk -l /dev/sda
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A3F34DAC-DCB4-B74C-B59E-41E754807812
 
Device      Start      End  Sectors  Size Type
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
</pre>
* check partition tables, NVME SSD
<pre>
fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.75 TiB, 1920383410176 bytes, 3750748848 sectors
Disk model: SAMSUNG MZ1L21T9HCLS-00A07             
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: gpt
Disk identifier: 04ECCD46-DC2A-454C-B4A8-CCC18AA532F7
 
Device            Start        End    Sectors  Size Type
/dev/nvme0n1p1    2048    2203647    2201600    1G EFI System
/dev/nvme0n1p2  2203648    6397951    4194304    2G Linux filesystem
/dev/nvme0n1p3  6397952  23175167  16777216    8G Linux swap
/dev/nvme0n1p4 23175168 3750746111 3727570944  1.7T Linux filesystem
</pre>
* prepare boot device EFI partition, SATA SSDs
<pre>
mkfs.msdos /dev/sda1
mkfs.msdos /dev/sdb1
mkdir /boot/efi-sda
mkdir /boot/efi-sdb
mount /dev/sda1 /boot/efi-sda
mount /dev/sdb1 /boot/efi-sdb
</pre>
* prepare boot device EFI partition, NVME SSDs
<pre>
mkfs.msdos /dev/nvme0n1p1
mkfs.msdos /dev/nvme1n1p1
mkdir /boot/efi-0
mkdir /boot/efi-1
mount /dev/nvme0n1p1 /boot/efi-0
mount /dev/nvme1n1p1 /boot/efi-1
</pre>
* add them to fstab, note the "nofail" mount option
<pre>
blkid | grep vfat
/dev/sdb1: UUID="F30C-13B5" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="20e423b5-ac29-ec42-bab5-f366aefbbd2b"
/dev/sda1: UUID="F2DD-7321" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="9427646c-ce5f-fe47-9ed1-4b84cf4c348f"
grep ^UUID /etc/fstab
UUID=F2DD-7321  /boot/efi-sda      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
UUID=F30C-13B5  /boot/efi-sdb      vfat    umask=0022,fmask=0022,dmask=0022,nofail      0      1
</pre>
* prepare the EFI partitions (remove old previous subdirectories, only empty efi/boot should be there)
<pre>
cd /boot/efi-sda
mkdir -p efi/boot
</pre>
* get syslinux-6.03
<pre>
cd ~
wget https://daq00.triumf.ca/~olchansk/linux/syslinux-6.03.tar.xz
xz -d < syslinux-6.03.tar.xz | tar xvf -
</pre>
* from syslinux-6.03 copy files:
<pre>
cd /boot/efi-sda/efi/boot
cp ~/syslinux-6.03/efi64/efi/syslinux.efi .
cp ~/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 .
cp syslinux.efi bootx64.efi
</pre>
* identify the ZFS rpool label
<pre>
zfs list | grep ROOT | grep "/$" | cut -f1 -d" "
rpool/ROOT/ubuntu_9yvb17
</pre>
* create syslinux.cfg, change the root=ZFS label to match this computer
<pre>
cat << EOF | sed "s;root=.*$;root=ZFS=`zfs list | grep ROOT | grep "/$" | cut -f1 -d" "`;" > syslinux.cfg
default linux
label linux
kernel vmlinuz
append ro initrd=initrd.img root=ZFS=rpool/ROOT/ubuntu_02ruwj
EOF
</pre>
* copy linux boot files:
<pre>
cp /boot/vmlinuz vmlinuz
cp /boot/initrd.img initrd.img
</pre>
* repeat with /boot/efi-sdb, etc
* install script to set syslinux to boot the latest kernel
<pre>
cd ~/git/scripts
git pull
ln -s ~/git/scripts/etc/update_efi_syslinux.perl ~
</pre>
* update syslinux to boot the latest kernel
** run "~/update_efi_syslinux.perl" to check that it finds the EFI partitions and finds the correct kernel
** run "~/update_efi_syslinux.perl -u" to do the actual update
* or maybe install Ubuntu syslinux 6.04 and use files from there:
<pre>
apt install "syslinux*"
</pre>
 
= legacy boot using syslinux =
 
* NOTE: extlinux is not compatible with ext4 "64bit" feature, it should be turned off:
<pre>
mkfs.ext4 -O ^64bit /dev/sdX1
resize2fs -s /dev/sdX1
</pre>
 
* install syslinux and extlinux (THIS DOES NOT WORK!!!)
 
<pre>
apt -y install syslinux extlinux
dd if=/usr/lib/syslinux/mbr/mbr.bin of=/dev/sdX ### NOT /dev/sdX1 NOT !!!
cd /boot
cp /usr/lib/syslinux/modules/bios/menu.c32 .
extlinux -i .
</pre>
</pre>
* mount -a
== setup NAT ==
NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
In these examples:
* replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
* replace "enp11s0" with name of the private network interface (192.168.1.x network)
* emacs -nw /etc/rc.local ### add this:
<pre>
# /etc/rc.local
/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -L -v
# uncomment following lines if machine has prohibitive FORWARD rules:
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v


iptables -L -v
* install syslinux and extlinux
sysctl -w net.ipv4.ip_forward=1
#sysctl -a | grep forward


sh /etc/firewall-rfc1918.sh
* copy from old SL6 USB disk (this is extlinux 6.02)
 
# end
</pre>
* emacs -nw /etc/firewall-rfc1918.sh
<pre>
<pre>
# firewall-rfc1918.sh
root@localhost:/boot# ls -l
 
-rwxr-xr-x 1 root root  218952 Jan 28 17:40 extlinux
# prevent RFC1918 private network IP addresses from
-rw-r--r-- 1 root root      402 Jan 29 14:45 extlinux.conf
# going in and out from our uplink.
-rw-r--r-- 1 root root      496 Jan 29 14:39 extlinux.conf~
 
-r--r--r-- 1 root root  122044 Jan 29 14:39 ldlinux.c32
ETH=eno1
-r--r--r-- 1 root root    67072 Jan 29 14:39 ldlinux.sys
 
-rwxr-xr-x 1 root root    24156 Jan 28 17:40 libutil.c32
iptables -F in-rfc1918
-rw-r--r-- 1 root root      304 Jan 28 17:40 mbr.bin
iptables -N in-rfc1918
-rw-r--r-- 1 root root    26140 Jan 28 17:40 memdisk
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
-rw-r--r-- 1 root root    69043 Jan 28 17:40 memtest86+-4.20.iso.zip
iptables -A in-rfc1918 --dst 172.16.0.0/12  -j REJECT
-rw-r--r-- 1 root root  183012 Jan 28 17:40 memtest86+-5.01
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT
-rw-r--r-- 1 root root    26568 Jan 28 17:40 menu.c32
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH
 
iptables -F out-rfc1918
iptables -N out-rfc1918
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A out-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH
 
iptables -L -v
 
#end
</pre>
</pre>


= KVM =
* install
 
<pre>
<pre>
apt install cpu-checker
dd if=mbr.bin of=/dev/sdX ### NOT /dev/sdX1 NOT !!!
 
extlinux -i .
root@daq13:~# kvm-ok
INFO: /dev/kvm exists
KVM acceleration can be used
root@daq13:~#
 
(if not, shutdown, go into BIOS settings, enable CPU virtualization)
 
apt install virtinst ### will install many packages
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils
 
root@daq13:/home1/wheel# virsh list --all
Id  Name          State
------------------------------
1    ubuntu-guest  running
 
apt install virt-manager
 
virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'
 
virtual machine will start, boot, etc
to get out of it, CTRL + Shift followed by ]
 
ssh wheel@daq13
virt-manager
 
run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop
 
virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none
 
virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off
</pre>
</pre>
* check that partition /dev/sdX1 is marked bootable (fdisk command "a")


build image
* create /boot/extlinux.conf


<pre>
<pre>
dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
DEFAULT menu.c32
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
PROMPT 0
cd /kvm_ladd00/
TIMEOUT 50
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
rsync -av . /mnt/tmp/ --delete
umount /mnt/tmp
</pre>
 
on the guest, configure network: /etc/rc.local
<pre>
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
 
touch /var/lock/subsys/local
 
ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn


# end
MENU TITLE TRIUMF DAQ USB BOOT32 ver K.O. 2025jan28
</pre>


= ARM cross-compiler =
LABEL linux
  MENU DEFAULT
  kernel /vmlinuz
  append initrd=/initrd.img panic=60 rootdelay=5 rootwait rw root=/dev/sda1


<pre>
LABEL linux-6.1.0-28-686
apt install libgcc-9-dev-arm64-cross
  kernel vmlinuz-6.1.0-28-686
apt install gcc-arm-linux-gnueabi
  append initrd=initrd.img-6.1.0-28-686 panic=60 rootdelay=5 rootwait rw root=/dev/sda1
apt install gcc-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabi
</pre>


<pre>
LABEL memtest
arm-linux-gnueabi-gcc -o ttcp1 ttcp.c -march=armv7 -static
  kernel memtest86+-1.65
arm-linux-gnueabi-gcc -o memcpy.armv7 memcpy.cc -march=armv7 -static -O2
</pre>
</pre>


= 32-bit intel cross-compiler =
= Add user to login spash screen =


Ubuntu 22.04:
For user login to local machine, if it doesn't work by default. User should exist already from NIS service.


<pre>
<pre>
apt install libstdc++-11-dev:i386
sudo /bin/bash # as root
apt install zlib1g-dev:i386
cd /var/lib/AccountsService/users
ls # should at least display the "wheel" user
cp wheel username # we're going to copy the user account settings over to our new user
</pre>
</pre>
NOTES:
* "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
* to cross-build 32-bit MIDAS, use "make linux32".
* executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
* executables cross-build on Ubuntu-22 run on 32-bit Debian-12.

Latest revision as of 23:18, 17 April 2026

Prerequisites

  • before setting up new machine run memory test
  • prepare flash drive with free version of memtest86: https://www.memtest86.com
  • test boot from flash drive, test takes ~ few hours
  • test will end with summary page, if passed continue with Ubuntu
  • number that might be worth noting is memory latency

Ubuntu version

lsb_release -a
uname -a

Ubuntu installer

  • updated for Ububtu LTS 20.04.01, 22.04.1, 24.04 (only minor differences)
  • download the latest Ubuntu LTS desktop installer iso image
  • dd the image to a USB key
  • power down, disconnect all disks (all HDDs, all SSDs, all M.2)
  • connect the SSD to be used as system disk
  • if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
  • power up
  • boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F2 or F8 for ASUS, F11 for Supermicro)
  • follow the instruction:
  • "try ubuntu or install ubuntu" - choose "install"
  • select language - accept default
  • "updates and other software" - accept default settings ("normal install")
  • "installation type" - select "advanced features" and "experimental: use ZFS"
  • accept partition choice
  • "where are you?" - select "Vancouver" (PST time zone)
  • "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
  • don't install third party sw
  • installation runs in a few minutes, when finished, reboot
  • login as user wheel
  • answer annouying questions:
  • "livepatch" - say "next"
  • "help improve" - select "do not send", say "next"
  • "privacy" - leave "location" as "off", say "next"
  • "ready to go", say "done"
  • right-click on the desktop, say "open in terminal", a shell will open
  • say "sudo /bin/bash", enter the root password, you now have the root shell
  • run nm-connection-editor to configure the network. use netmask 255.255.224.0, gateway 142.90.100.18, DNS 142.90.100.19, search path "triumf.ca"
  • after network is up (can ping ladd00), continue with post-installation steps below

Install instructions

prepare

apt update
apt upgrade

install ssh

apt install ssh

install git/scripts

apt -y install git
mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd scripts
git pull

configure hostname

vi /etc/hostname

disable swap

ubuntu installer creates a 2 GB swap partition, not useful on 32-64 GB machine, disable it:

vi /etc/fstab ### comment out the "swap" line
  • on 64 GB RAM machines swap is not useful
  • on machines booted from network (NFS-ROOT), swap does not work
  • on machines running from flash (RPi, etc), flash is too slow for useful swap
  • swap configured by linux installers invariably has wrong size and is not useful
systemctl disable dphys-swapfile
systemctl stop dphys-swapfile
dphys-swapfile uninstall

maybe reboot

this is a good point to reboot the machine to boot the latest kernel and to set the correct hostname

install etckeeper

keep contents of /etc in a git repository:

apt -y install etckeeper

set timezone

timedatectl list-timezones | grep -i vancouver
timedatectl set-timezone America/Vancouver

install time synchronization

check if chrony is installed correctly and is synched to TRIUMF time servers:

chronyc sources
chronyc tracking

if not, remove old chrony and

apt -y remove chrony
apt -y purge chrony

and install it from scratch:

apt -y install chrony
cd ~/git/scripts
git pull
cd ~
cp ~/git/scripts/etc/triumf.sources /etc/chrony/sources.d/
systemctl disable systemd-timesyncd.service
systemctl stop systemd-timesyncd.service
systemctl disable ntp
systemctl stop ntp
systemctl enable chrony
systemctl restart chrony
chronyc sources
chronyc tracking

reenable systemd-timesyncd

ONLY IF CHRONY DOES NOT WORK

To configure systemd-timesyncd, set "NTP=" in /etc/systemd/timesyncd.conf

apt remove chrony
cat /etc/systemd/timesyncd.conf
systemctl enable systemd-timesyncd.service
systemctl restart systemd-timesyncd.service
systemctl status systemd-timesyncd.service
timedatectl status
timedatectl timesync-status

enable outgoing email (debian 11)

this is different from ubuntu 20. it uses /etc/mailname and it hardwires the hostname into main.cf.

enable outgoing email

we have an unusual email configuration. outgoing email should work to deliver error messages, notices, etc. incoming email is disabled, we do not receive email for local users.

this causes problems with TRIUMF smtp server. if our message cannot be delivered (wrong email address or receipient computer is turned off), TRIUMF smtp server will generate a delivery failure notification email and try to send it to the "from" address of the failed message. but the "from" address does not receive any email, so another delivery failure notification email is generated and an attempt to deliver it. which again fails, rinse and repeat.

as solution, kray created a special rule, email from scrap.triumf.ca does not generate delivery failure notices. failed messages sit in the queue for 5 days, then they are deleted. (K.O. - confirmed with kray 3jan2024).

to make this work we use the msmtp MTA package.

cd ~
apt -y install mailutils msmtp msmtp-mta # say "no" to apparmor support
apt -y remove postfix
apt -y purge postfix # remove old config files
apt -y install bsd-mailx
cd ~/git/scripts/etc
git pull
/bin/cp -fv aliases /etc/aliases
/bin/cp -fv msmtprc /etc/msmtprc
/bin/rm -vf ~root/.forward
/bin/rm -vf /etc/mailname
Mail root
Subject: test
test
^D
CC: <CR>

enable outgoing email (postfix)

THIS IS OBSOLETE!!!

  • TRIUMF: use smtp.triumf.ca
  • CERN: use cernmx.cern.ch
apt install postfix ### select "satellite system", enter full hostname "xxx.triumf.ca", enter "smtp.triumf.ca"
apt install mailutils
dpkg-reconfigure postfix ### (if postfix already installed)
echo olchansk@triumf.ca lindner@triumf.ca bsmith@triumf.ca dfujimoto@triumf.ca >> ~root/.forward
mailx root
test
^D

enable ping for all users (debian 11)

Without this tweak, Debian will report "operation not permitted" if a user tries to ping somewhere.

echo 'net.ipv4.ping_group_range = 0 1000' > /etc/sysctl.d/99-ping.conf

disable apparmor

On NFS-Root network booted machines!

If "man man" returns "permission denied" and syslog reports apparmor "sendmsg DENIED" errors, disable apparmor. This is supposedly fixed in kernel 6.0 and later (to be confirmed), see https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1784499

Disable apparmor, see https://ubuntu.com/server/docs/security-apparmor

This takes effect after a reboot.

systemctl stop apparmor.service
systemctl disable apparmor.service

If on boot apparmor appears to still be confining apps (for example, mysql), edit the file /etc/default/grub and change

GRUB_CMDLINE_LINUX=""

to

GRUB_CMDLINE_LINUX="apparmor=0"

Run sudo update-grub and reboot.

install missing packages

(apt eats terminal input, even the "yes |" trick does not quite work, repeat the following commands until they report that everything is installed)

yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools traceroute time minicom screen git lsof debsums tmux iptables telnet pax rpm mtools at gdisk tcpdump
yes | apt -y install sysstat smartmontools lm-sensors
yes | apt -y install lsb-release
apt -y install vim # in addition to default vim-tiny, requested by IRIS
apt -y install gedit # requested by TACTIC
apt -y install tcl
apt -y install mc # requested by sol
apt -y install pax rpm alien ### package converter tools
apt -y install flex bison
apt -y install neofetch
apt -y install snmp snmp-mibs-downloader
apt -y install git subversion g++ gfortran cmake doxygen
apt -y install curl libcurl4 libcurl4-openssl-dev
### conflits with mysql packages ### apt -y install mariadb-client libmariadb-dev ### mysql client for MIDAS
apt -y install mysql-client libmysqlclient-dev
apt -y install postgresql-common libpq-dev ### postgresql client for MIDAS
yes | apt -y install libz-dev libzstd-dev sqlite3 libsqlite3-dev unixodbc-dev
yes | apt -y install libssl-dev
yes | apt -y install emacs xemacs21 joe
yes | apt -y install gnuplot dos2unix
yes | apt -y install mutt bsd-mailx # email clients
yes | apt -y install liblz4-tool pbzip2 libbz2-dev
yes | apt -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h
yes | apt -y install libreadline-dev
yes | apt -y install ubuntu-mate-themes
yes | apt -y install libmotif-dev libxmu-dev
yes | apt -y install libusb-dev libusb-1.0-0-dev
yes | apt -y install i2c-tools libi2c-dev libi2c0
yes | apt -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig
yes | apt -y install libjson-perl
yes | apt -y install libgsl-dev # additional GNU Scientific Library
yes | apt -y install qt5-default # Qt development
yes | apt -y install python3-full python3-dev python3-dbg python3-pip ### for pyROOT
yes | apt -y install imagemagick imagemagick-common ckeditor # for elog
yes | apt -y install libjpeg-dev libjpeg-progs libjpeg-tools
yes | apt -y install linux-tools-common linux-tools-generic # cpupower frequency-info
yes | apt -y install rdesktop remmina remmina-plugin"*" # requested by POL
yes | apt -y install nlohmann-json3-dev # required to build MIDAS with ROOT 6.30 on Ubuntu-22
apt -y install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev libxft-dev libxext-dev python3 libssl-dev libafterimage0 # from https://root.cern/install/dependencies/
apt -y install gfortran libpcre3-dev xlibmesa-glu-dev libglew-dev libftgl-dev libmysqlclient-dev libfftw3-dev libcfitsio-dev graphviz-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl0-dev qtwebengine5-dev nlohmann-json3-dev libtbb-dev libavahi-compat-libdnssd-dev # from https://root.cern/install/dependencies/
apt -y install libvdt-dev # for ROOT 6.32 on Ubuntu-24
apt -y install autoconf automake gperf # gnu package build tools
apt -y install u-boot-tools # for Xilinx petalinux
#apt -y install linux-headers-generic # to build linux kernel drivers
apt -y install htop

Ubuntu LTS 20.04:

yes | apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 # enable linux 5.11 series kernel

Ubuntu LTS 22.04:

apt -y install linux-generic-hwe-22.04 # enable linux 6.8.0 series kernel

Ubuntu LTS 24.04:

apt -y install linux-generic-hwe-24.04 # enable linux 6.14.0 series kernel

remove snap

remove snap at this point.

Do

snap list

to see a list of installed snaps. Use

snap remove <item> 

to remove each. You will need to remove snapd last.

systemctl stop snapd
systemctl disable snapd
apt purge snapd
rm -rf ~/snap

install non-snap firefox

delay this until the very end, download of firefox-esr deb package is very slow.

See https://askubuntu.com/questions/1399383/how-to-install-firefox-as-a-traditional-deb-package-without-snap-in-ubuntu-22

add-apt-repository ppa:mozillateam/ppa
apt install firefox-esr
echo run firefox-esr

Enable automatic updates:

configure DNS

cd ~/git/scripts
git pull
mkdir /etc/systemd/resolved.conf.d
cp etc/resolved-triumf.conf /etc/systemd/resolved.conf.d/
systemctl restart systemd-resolved
resolvectl
#systemd-analyze cat-config systemd/resolved.conf

install ganglia

apt -y install ganglia-monitor
cd ~root/git/scripts/ganglia
git pull
make install
./ganglia-all.perl

fix gmond start before network is ready:

mkdir /etc/systemd/system/ganglia-monitor.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ganglia-monitor.service.d/local.conf
systemctl daemon-reload
systemctl cat ganglia-monitor.service

install ganglia server

On the main computer only! (daq00, dsdaqgw, etc)

  • zfs create rpool/ganglia
  • apt install gmetad php php-xml rrdtool
  • mv /etc/ganglia/gmetad.conf /etc/ganglia/gmetad.conf-stock
  • create /etc/ganglia/gmetad.conf with the following contents:
data_source "my cluster" 15 localhost
RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374"
setuid_username "ganglia"
rrd_rootdir "/ganglia/rrds"
case_sensitive_hostnames 0
  • mkdir /ganglia/rrds
  • chown ganglia:ganglia /ganglia/rrds
  • systemctl restart gmetad
  • create /etc/ganglia/gmond-collect.conf
globals {
  daemonize = yes
  setuid = yes
  user = nobody
  debug_level = 0
  max_udp_msg_len = 1472
  mute = no
  #deaf = yes
  deaf = no
  allow_extra_data = yes
  host_dmax = 0 /* 86400 */ /*secs. Expires (removes from web interface) hosts in 1 day */
  host_tmax = 20 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no
  send_metadata_interval = 600 /*secs */
}

/*
 * The cluster attributes specified will be used as part of the <CLUSTER>
 * tag that will wrap all hosts collected by this instance.
 */
cluster {
  name = "DAQ"
  owner = "TRIUMF DAQ"
  latlong = "unspecified"
  url = "https://daq.triumf.ca"
}

/* The host section describes attributes of the host, like the location */
host {
  location = "unspecified"
}

udp_send_channel {
  host = daq00.triumf.ca
  bind_hostname = yes
  port = 8649
}

udp_recv_channel {
  #mcast_join = 239.2.11.71
  port = 8649
  #bind = 239.2.11.71
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really
  # should bump it up to e.g. 10MB or even higher.
  #buffer = 10485760
  buffer = 200000
  family = ipv4

  acl {
    default = "deny"
    access {
      # vlan1 daq00
      ip = 142.90.111.168
      mask = 19
      action = "allow"
    }
    access {
      # MUSR VLAN
      ip = 142.90.154.73
      mask = 8
      action = "allow"
    }
    access {
      # KVM network
      ip = 192.168.1.1
      mask = 8
      action = "allow"
    }
  }
}

udp_recv_channel {
  #mcast_join = 239.2.11.71                                                                                                                                                                                                
  port = 8649
  #bind = 239.2.11.71                                                                                                                                                                                                      
  retry_bind = true
  # Size of the UDP buffer. If you are handling lots of metrics you really                                                                                                                                                 
  # should bump it up to e.g. 10MB or even higher.                                                                                                                                                                         
  #buffer = 10485760                                                                                                                                                                                                       
  buffer = 200000
  family = ipv6

  acl {
    default = "deny"
    access {
      # ALPHA network                                                                                                                                                                                                      
      ip = 2001:1458:202:fd::100:aa
      mask = 8
      action = "allow"
    }
    #access {                                                                                                                                                                                                              
    #  # ALPHA-g network                                                                                                                                                                                                   
    #  ip = 192.168.1.1                                                                                                                                                                                                    
    #  mask = 8                                                                                                                                                                                                            
    #  action = "allow"                                                                                                                                                                                                    
    #}                                                                                                                                                                                                                     
  }
}

tcp_accept_channel {
  port = 8649
  bind = localhost
  # If you want to gzip XML output
  gzip_output = no
}
  • add to /etc/rc.local
/usr/sbin/gmond -c /etc/ganglia/gmond-collect.conf &
systemctl restart gmetad &
mkdir /var/lib/ganglia-web
chown www-data:www-data /var/lib/ganglia-web
mkdir /var/lib/ganglia-web/dwoo
chown www-data:www-data /var/lib/ganglia-web/dwoo
mkdir /var/lib/ganglia-web/dwoo/compiled
chown www-data:www-data /var/lib/ganglia-web/dwoo/compiled
mkdir /var/lib/ganglia-web/dwoo/cache
chown www-data:www-data /var/lib/ganglia-web/dwoo/cache

install gonodeinfo

apt -y install golang
mkdir ~/git
cd ~/git
#git clone https://bitbucket.org/dd1/gonodeinfo.git
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git remote set-url origin https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
  • edit /etc/gonodeinfo.conf
  • change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
  • change "Servers" to read: Servers: daq00.triumf.ca:8601
  • run "gonodeinfo -v"
  • if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
  • on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
  • try gonodeinfo again, there should be no error
  • on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now

install emailonreboot

send an email if computer is rebooted

ssh root
mkdir -p ~/git
cd ~/git
git clone https://daq00.triumf.ca/~olchansk/git/rpms.git
cd rpms/emailonreboot
git pull
make
make install

install monitor_nfs

monitor NFS mounts and complain about dead, stale and hung mounts

ssh root
mkdir -p ~/git
cd ~/git
git clone https://daq00.triumf.ca/~olchansk/git/rpms.git
cd rpms/monitor_nfs
git pull
make
make install

install fonts for EPICS

apt -y install xfonts-100dpi xfonts-75dpi
killall Xorg # restart Xorg (i.e. "killall Xorg", this will log you out from the console)
xlsfonts | grep -i helvetica ### should show fonts with different sizes, not just size 0 (scalable)

install libz.so.1 for CentOS compatibility

KO - confirm which versions on quartus need this.

yes | apt-get -y install zlib1g
yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386

install ld-ldb-x86-64.so.3 for Quartus compatibility

Not clear from package this is supposed to come from. Copied from U-20.

Without this, Quartus lmgrd does not run.

cd /lib64
ln -s  ld-linux-x86-64.so.2 ld-lsb-x86-64.so.2
ln -s  ld-linux-x86-64.so.2 ld-lsb-x86-64.so.3

should look like this:

root@daq13:/lib64# ls -l
total 3
lrwxrwxrwx 1 root root 44 Jan 28 09:07 ld-linux-x86-64.so.2 -> ../lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
lrwxrwxrwx 1 root root 20 Feb 18 16:45 ld-lsb-x86-64.so.2 -> ld-linux-x86-64.so.2
lrwxrwxrwx 1 root root 20 Feb 18 16:45 ld-lsb-x86-64.so.3 -> ld-linux-x86-64.so.2
root@daq13:/lib64# 

install libpng12.so.0 for Quartus 13.0sp1 and 13.1.4.182

wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0.50.0
/bin/cp -pv libpng12.so.0 libpng12.so.0.50.0 /lib/x86_64-linux-gnu/

install packages for Xilinx

ubuntu LTS 22.04 vivado 2020.1

apt install autoconf libtool
apt install libtinfo5
apt install texinfo
apt install zlib1g:i386

install packages for building ROOT

apt -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake

install wine

As far as I know, only needed for BNMR/BNQR

apt install wine winetricks

install lightdm

unlike the default gdm login manager, lightdm shows the machine hostname and does not require an extra mouse click to swicth from screen saver to login mode.

apt -y install lightdm
# select lightdm

install desktop environments

note: default display manager and default desktop are deficient, please do not skip this step.

note: if apt asks to choose the display manager, select "lightdm"

note: KO - I recommend the "MATE" desktop.

note: you will have to cut-and-paste this several times because "apt" eats commands, even with "-y" and even piped from "yes".

note: DF - on U24 this may re-install snap

# install MATE desktop
apt -y install ubuntu-mate-core ubuntu-mate-desktop ubuntu-mate-themes
# install Cinnamon desktop
apt -y install cinnamon
# install KDE desktop
apt -y install kde-standard kubuntu-settings-desktop
# install Lxqt desktop
# apt -y install lxqt # conflict over kubuntu-desktop, kubuntu-settings-desktop and desktop-base
# install Xfce4 desktop
apt -y install xfce4

install ROOT

Please install ROOT per instructions at https://root.cern.ch.

NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system

NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.

Install x2go

apt-get update
apt-get install x2goserver x2goserver-xsession

enable root login from ladd00/daq00

ssh localhost
CTRL-C
/bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/

disable ssh access from outside of TRIUMF

to stop ssh login spam, disable ssh access from outside of TRIUMF. this can be done by requesting a firewall block through the helpdesk or by local firewall rule:

echo iptables -I INPUT ! -s 142.90.0.0/255.255.0.0 -p tcp --dport 22 -j REJECT >> /etc/rc.local
/etc/rc.local

install smart-status

ln -s ~/git/scripts/smart-status/smart-status.perl ~root/

enable boot menu and boot messages

This will enable the grub menu (with a 10 sec timeout) and replace black screen with exciting linux boot messages.

  • emacs -nw /etc/default/grub
GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=menu
GRUB_TIMEOUT=10
GRUB_RECORDFAIL_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480"
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX=""
#GRUB_GFXMODE=640x480
  • update grub config:
grub-mkconfig -o /boot/grub/grub.cfg

Disable welcome message on ssh

On ssh, there is a lengthy welcome message. To disable:

vim /etc/ssh/sshd_config

Ensure that the file reads

PrintMod no

Then,

vim /etc/pam.d/sshd

comment out the lines:

session    optional     pam_motd.so  motd=/run/motd.dynamic
session    optional     pam_motd.so noupdate

Enable automatic updates

apt install unattended-upgrades
cd ~/git/scripts
git pull
/bin/cp -v etc/99apt-conf-ko /etc/apt/apt.conf.d/
apt-config dump | grep Unattended

Following is obsolete:

  • emacs -nw /etc/apt/apt.conf.d/50unattended-upgrades
    • uncomment in Allowed-Origins "-security" and "-updates"
    • add in Allowed-Origins: "Google LLC:stable";
    • uncomment/add: "Unattended-Upgrade::Mail "root";
  • emacs -nw /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
  • test: unattended-upgrade --dry-run -v

NOTE: update-on-shutdown is disabled.

NOTE: there is no update-on-boot, but:

NOTE: if machine was off for a long time, the systemd update timer would have expired and it will fire soon after reboot, causing an automatic update run. this is unwanted, and there is no fix or workaround for it. K.O. June-2023.

Fix bpool is full (obsolete)

THIS IS CAUSED BY OBSOLETE PACKAGE zsys. PLEASE: apt remove zsys

IPMI instructions

IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.

apt-get install ipmitool
systemctl enable ipmievd
systemctl restart ipmievd

Run:

  • ipmitool sel list ### event list
  • ipmitool sel elist ### event list
  • ipmitool sel clear ### clear event list (if it becomes full)
  • ipmitool sensor ### report hardware sensors

move /home/wheel (U-24)

Ubuntu LTS 24 installed on ZFS has rpool/USERDATA/home_xxx mounted on /home, has to be moved or autofs /home will not work.

zfs list | grep USERDATA | grep home | cut -f1 -d" "
zfs set -u mountpoint=/home1 `zfs list | grep USERDATA | grep home | cut -f1 -d" "`
emacs -nw /etc/passwd # change mount for wheel account to /home1/wheel

This will not take effect until rebooting. Please ensure that ssh from root@daq00 to this computer works before rebooting; if ssh from root@daq00 doesn't work and you mess this up you will be locked out of wheel account. Once you verify that this works, reboot and make sure that when you login as wheel that home directory is /home1/wheel.

move /home/wheel

note: this MUST be done if ZFS root and NIS/autofs with /home.

Default location of wheel's home directory will collide with autofs /home, it has to be moved, for example to /wheel.

# logout from the wheel user
# go to another computer
ssh root@daqubuntuxxx
zfs list | grep wheel ### identify zfs name wheel_xxxxxx
#zfs set mountpoint=/wheel rpool/USERDATA/wheel_hm8fzh
zfs set mountpoint=/wheel `zfs list | grep wheel | cut -f1 -d" "`
zfs list | grep wheel
emacs -nw /etc/passwd ### change wheel's home directory from /home/wheel to /wheel
su - wheel ### check that user wheel still works

This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.

enable NIS (ubuntu 22.04, 24.04, debian 11, 12)

apt -y install rpcbind nis
echo DAQ-NIS >> /etc/defaultdomain
echo ypserver daq00.triumf.ca >> /etc/yp.conf
systemctl enable ypbind.service
systemctl restart ypbind.service
systemctl status ypbind.service
ypwhich -m

enable ypserv:

sed -i s/NISSERVER=false/NISSERVER=slave/ /etc/default/nis
/usr/lib/yp/ypinit -s daq00
echo ypserver localhost >> /etc/yp.conf
sed -i "s/ypserver .*/ypserver localhost/" /etc/yp.conf
systemctl enable ypserv
systemctl restart ypserv
systemctl restart ypbind

update /etc/nsswitch.conf and enable hourly update of NIS maps:

mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
cp -pv nsswitch.conf-U24 /etc/nsswitch.conf
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly

If this is a new machine, then on the master NIS node (daq00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)

enable NIS (ubuntu 20.04)

  • apt-get -y install portmap nis ### will ask for NIS domain (DAQ-NIS)
  • dpkg-reconfigure nis ### reconfigure if already installed
  • ypwhich -m
  • edit /etc/default/nis
    • set "NISSERVER=slave"
    • Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
  • #edit /etc/yp.conf, comment-out everything, add "domain DAQ-NIS server localhost"
  • edit /etc/yp.conf, comment-out everything, add "ypserver localhost"
  • /usr/lib/yp/ypinit -s daq00
  • systemctl enable nis
  • systemctl restart nis
  • ypwhich
  • ypwhich -m
  • ypcat -k passwd
  • vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
# begin get data from nis
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis
  • enable hourly update of NIS maps
mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
  • ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"

enable autofs

apt -y install autofs
systemctl enable autofs
systemctl restart autofs
ls -l /home/olchansk ### test autofs, check file owner is correct

enable NFS server

apt install nfs-kernel-server
#edit /etc/exports
systemctl enable nfs-server
systemctl restart nfs-server

NIS master

notes for setting up the NIS master

wheel user

"wheel" is the default administrative user. We do not want it's password exported to NIS (encrypted password hash is world visible) and we do not want it's home directory exported to NFS (~wheel/.ssh is world visible and potentially writable: anybody can change ~wheel/.ssh/authorized_keys).

  • move wheel's home directory from /home/wheel to /wheel (see special section about this)
  • change wheel's UID and GID from 1000 to a value below MINUID in /var/yp/Makefile

coherent uids

we do not want system accounts defined in /etc/passwd of the NIS master to be included in the NIS map "passwd". this causes trouble on NIS clients where newly installed packages fail to create local system users because same user already exists in NIS.

This is controlled by MINUID in /var/yp/Makefile.

Historical TRIUMF uids start from around 200, but several clusters do not have any historic TRIUMF uids below 500 and MINUID is set to:

  • DAQ-NIS: MINUID=200
  • ISAC-NIS: MINUID=500
  • TITAN-NIS: MINUID=500
  • MUSR-NIS: MINUID=500
  • TIG-NIS: MINUID=500 (100 on SL6 mother8pi)

Ubuntu 20 has two programs to create users:

  • adduser - creates new users with UID 1000 and up as specified in /etc/adduser.conf. No problems here.
  • adduser --system - creates new system users with UID 100 and up as specified in /etc/adduser.conf. No problems here.
  • useradd - creates new users with UID 1000 and up as specified in /etc/login.defs. No problems here.
  • useradd --system - creates new system users with UID 999 and down (read "man useradd", section at the end about SYS_UID_MAX). This collides with NIS MINUID, these system users will be included in the NIS map and cause trouble.

This problem cannot be fixed, SYS_UID_MIN, SYS_UID_MAX and UID_MIN in /etc/login.defs do not seem to have any effect on UIDs chosen by "useradd --system". (tested on Ubuntu LTS 20.04).

So far only these system accounts seem to be affected by this:

  • systemd-coredump
  • ganglia

To fix:

  • run "sort -r -n -t: -k3 /etc/passwd" to identify the last unused system user uid (range 100..200)
  • run "sort -r -n -t: -k3 /etc/group" to identify the last unused system user gid (range 100.200)
  • systemd-coredump: manually change UID and GID (package systemd-coredump is usually not installed)
  • ganglia: same thing, then change ownership on all ganglia files.

Also read systemd author's opinion on system vs user UIDs: https://github.com/systemd/systemd/issues/4850#issuecomment-265698275

Fix systemd-logind NIS breakage

!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!

there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this fix removes the delay.

systemd developers think that we should not use NIS and made sure there are problems if we do. To give them credit, they do offer a workaround. Read this: https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67

mkdir /etc/systemd/system/systemd-logind.service.d
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf
systemctl daemon-reload
systemctl cat systemd-logind.service

Fix systemd-udevd NIS breakage

see same problem as above with udev getting stuck. ubuntu lts 20.04.

mkdir /etc/systemd/system/systemd-udevd.service.d
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-udevd.service.d/local.conf
systemctl daemon-reload
systemctl cat systemd-udevd.service

Configure USB device permissions

Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.

  • create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" 
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
  • reload udev rules: udevadm control --reload-rules
  • apply new permissions: udevadm trigger --action=add
  • watch udev activity: udevadm monitor -p

Configure lightdm display manager

  • enable it
echo lightdm | dpkg-reconfigure -fteletype lightdm
systemctl disable gdm
systemctl disable sddm
systemctl enable lightdm
  • make the MATE desktop as default
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
  • enable login by NIS users
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
  • restart lightdm
systemctl stop gdm
systemctl restart lightdm

Install libpng12.so.0

Quartus 16 needs libpng12:

wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb
dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb

Install google-chrome

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb

confirm autoupdate is enabled, observe dl.google.com is present in the list of repositories:

apt update
...
Get:5 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,094 B]
...

FOLLOWING IS OBSOLETE:

Instructions from here: https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-tmp.list'
apt update
apt install google-chrome-stable
/bin/rm -f /etc/apt/sources.list.d/google-tmp.list

Install amanda client

ONLY ON MACHINES THAT HOST HOME DIRECTORIES

  • apt install amanda-client
  • edit /etc/amandahosts
amanda.triumf.ca amanda amdump
  • check permissions on /etc/amandahosts:
root@daq00:/var/log/amanda# ls -l /etc/amandahosts
-rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
  • fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
  • edit /etc/amanda-security.conf, add this line:
runtar:gnutar_path=/usr/bin/tar

On the amanda machine:

  • in amanda disklist, use dump type "bsdtcp-comp-user-tar"
  • su - amanda and run amcheck -c daily daq00
-bash-4.1$ amcheck -c daily daq00

Amanda Backup Client Hosts Check
--------------------------------
Client check: 1 host checked in 0.092 seconds.  0 problems found.

(brought to you by Amanda 3.3.7p1.git.685ff76d)

Enable rc.local

For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:

cd ~/git/scripts
git pull
cp -n -v etc/rc.local /etc/
chmod a+rx /etc/rc.local
cp etc/rc-local.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable rc-local
systemctl start rc-local
systemctl status rc-local

Remove unwanted packages

apt purge  bash-completion # broken, adds unwanted "\" if "ls -l $ROOTSYS/<tab>"
apt purge  zsys # broken, do not use
apt purge  sddm # login manager
apt purge  avahi-daemon avahi-autoipd # not sure what it does, observed using 100% CPU
apt purge  modemmanager # probes all serial ports to see if it's a modem

Disable unwanted services

systemctl disable mpd
systemctl disable snapd
systemctl disable ModemManager
systemctl --global mask tracker-extract-3.service
systemctl --global mask tracker-miner-fs-3.service
systemctl daemon-reload

Disable sleep and suspend

note: we see some computers randomly shutdown or go to sleep, log files indicates the "sleep" or "suspend" button was pushed by user, but no such buttons actually exist. this is the fix for this:

systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service

Enable crontab @reboot for MIDAS

startup scripts have a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).

mkdir /etc/systemd/system/cron.service.d
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/cron.service.d/local.conf
systemctl daemon-reload
systemctl cat cron.service

Explore the systemd dependency tree using "systemctl list-dependencies" maybe with "--all".

Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.

Crontab entry to start midas: (install in the midas user crontab, not root crontab)

su - midasuser
crontab -l
#@reboot /bin/bash -l -c "/home/trinat/bin/start-daq-applications"
#@reboot /bin/tcsh -c "/home/trinat/bin/start-daq-applications"

Install apache httpd proxy for midas and elog

This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.

First, configure apache2:

  • execute these commands:
apt -y install apache2
cd /etc/apache2
  • create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
SSLSessionCache         shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  300
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin
  • create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerName daq14.triumf.ca
        DocumentRoot /var/www/html
        ErrorLog /var/log/apache2/daq14.log
        SSLEngine on
        # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
        SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
        SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
        ## use port specified in elogd.cfg
        #ProxyPass /elog/ http://localhost:8082/ retry=1 
        ## use mhttpd port
        #ProxyPass /      http://localhost:8080/ retry=1 
        Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
        <Location />
            SSLRequireSSL
            AuthType Basic
            AuthName "DAQ password protected site"
            Require valid-user
            # create password file: touch /etc/apache2/htpasswd
            # to add new user or change password: htpasswd /etc/apache2/htpasswd username
            AuthUserFile /etc/apache2/htpasswd
            RequestHeader set X-Remote-User %{REMOTE_USER}s
        </Location>
        #SSLCertificateFile /root/server.cert 
        #SSLCertificateKeyFile /root/server.key 
    </VirtualHost>
</IfModule>
  • stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
  • enable ssl module and new configurations:
a2enmod ssl
a2enmod headers
a2enmod proxy
a2enmod proxy_http
a2enconf ssl-daq14
a2ensite daq14-ssl
  • disable default ssl sites
a2dissite 000-default-le-ssl
a2dissite 000-default
ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
  • check that there are no syntax problems
apache2ctl configtest
  • enable and start apache2:
systemctl enable apache2
systemctl restart apache2
systemctl status apache2
  • apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
  • if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
  • try to access https://daq14.triumf.ca
    • you should see a complaint about self-signed certificate
    • you should see a request for password (do not login yet)
    • if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.

Second, configure certbot:

(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)

(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)

(Note: unsurprisingly, this requires outside access to connect with letsencrypt, so won't work if PC is only accessible from on-site network)

  • check that port 80 is not used by anything:
  • netstat -an | grep LISTEN | grep ^tcp | grep 80
  • lsof -P | grep -i tcp | grep LISTEN | grep 80
  • if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
  • install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
apt install certbot python3-certbot-apache
certbot certonly --standalone --installer apache
  • then answer questions:
  • "activate HTTPS for daq14.triumf.ca" - say ok
  • "enter email address" - enter your own email address
  • "please read terms..." - read the terms and say "agree"
  • it will take a few moments...
  • "congratulations..." - say ok.
certbot install --apache --cert-name daq14.triumf.ca
  • then answer questions:
  • "choose redirect..." - say "1" (no redirect)
  • look inside /etc/apache2/sites-enabled/daq14-ssl.conf to see that SSLCertificateFile & co point to certbot certificates in

/etc/letsencrypt/live/daq14.triumf.ca/

  • to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal

NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot

Third, activate password protection:

  • as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/apache2/htpasswd
htpasswd /etc/apache2/htpasswd midas
  • restart apache2
systemctl restart apache2
systemctl status apache2

From here:

  • enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
  • enable proxy for ELOG - ditto
a2enmod proxy
a2enmod proxy_http
apache2ctl configtest
systemctl restart apache2
SSL                  = 0

NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0

generate self-signed certificate

# cd $HOME
# openssl req  -nodes -new -x509  -keyout server.key -out server.cert -days 1001
...+....+..+..........+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..+...+.........+......+.+...+...+.....+...............+.........+...+.+......+...+...........+....+...+..+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+.+...+..+.......+..+...+.......+......+...+..+...+......+....+...............+..+...+....+...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
......+......+.+..+......+.+......+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.+.....+......+.+.........+......+.....+.+..+...+.......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.......+....+......+.....+...+...+.......+..+.+........+.+...+......+..+..........+..+.+...........+...+.......+......+.....+.......+...+.........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:CH
State or Province Name (full name) [Some-State]:Geneve
Locality Name (eg, city) []:CERN
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CERN
Organizational Unit Name (eg, section) []:ALPHA experiment           
Common Name (e.g. server FQDN or YOUR name) []:alphacpc05.cern.ch
Email Address []:
root@alphacpc05:~# 
root@alphacpc05:~# 
root@alphacpc05:~# ls -l
-rw-r--r-- 1 root root 1375 juil. 10 21:43 server.cert
-rw------- 1 root root 1708 juil. 10 21:42 server.key
root@alphacpc05:~# systemctl restart apache2

Enable elog PDF preview

NOTE: looks like U-24 already has this correctly.

see https://stackoverflow.com/questions/52998331/imagemagick-security-policy-pdf-blocking-conversion

  • xemacs -nw /etc/ImageMagick-6/policy.xml
  • remove this section at the end:
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />

Install Jupyter notebook

From https://jupyter.org/install
apt install python3-pip
pip install jupyterlab
pip install notebook
~/.local/bin/jupyter notebook
watch the http://localhost:8888 URL that it printed
say "no" to offer to start firefox (it will not work!)
URL is: http://localhost:8888/tree?token=xxx
from the machine where you are running the web browser (i.e. google-chrome), run (replace trinat@trinatdaq with the username and machine name where you started jupyter)
open a new shell and run: ssh -v trinat@trinatdaq -L 8888:localhost:8888
in the web browser, open http://localhost:8888
this gives us the login page
in the password or token entry field, put the token from the "tree?token=xxx" above (printed by jupyter on startup)
push button "login"
jupyter page should open with the list of files in the trinat home directory
congratulate Brian with full success

Install ZFS quota report

If there are any ZFS volumes, install script to report disk and quota usage

cd ~/git/scripts/quotareport
git pull
mkdir /var/www/html/zfsquotareport
cp -pv ~/git/scripts/quotareport/sorttable.js /var/www/html/zfsquotareport/
ln -s $PWD/zfsquotareport.perl /etc/cron.daily/
touch /etc/crontab

If httpd is configured to redirect "/" to MIDAS mhttpd:

  • add following to /etc/apache2/sites-enabled/xxx-ssl.conf in front of "ProxyPass / ..."
  • run "systemctl reload apache2"
## do not proxy zfs quota report directory 
ProxyPass /zfsquotareport/ ! 

Install PHP

  • apt install php libapache2-mod-php
  • systemctl restart apache2
  • create /var/www/html/info.php
<?php 
 
phpinfo(); 

Configure TRIUMF printers

systemctl stop cups
systemctl stop cups-browsed.service
systemctl disable cups
systemctl disable cups-browsed.service
systemctl stop snap.cups.cupsd.service
systemctl stop snap.cups.cups-browsed.service
systemctl disable snap.cups.cupsd.service
systemctl disable snap.cups.cups-browsed.service
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a

Enable core dumps

By default, Ubuntu LTS 20.04 installs the apport package which disabled core dumps from user applications. (google it up!). It is not meant to do this and documentation claims that it is not installed and not enabled by default. Oh, well...

apt purge apport
apt autoremove ### will remove apport-symptoms and a few other packages

After this, core dumps are written to file "core" in the current directory. See /proc/sys/kernel/core_pattern and /proc/sys/kernel/core_uses_pid.

Enable core dump file names to include process id, add following to /etc/rc.local

echo "echo 1 > /proc/sys/kernel/core_uses_pid" >> /etc/rc.local

Enable debugger

By default, Ubuntu LTS 20.04 does not permit debugger to attach and debug already running programs. To enable it, add following to /etc/rc.local

echo "echo 0 > /proc/sys/kernel/yama/ptrace_scope" >> /etc/rc.local

Disable Ubuntu Pro nag

best I can tell, impossible at this time.

do not do this

!!! does nothing !!!

pro config set apt_news=false

do not do this

!!! breaks automatic updates because 20apt-esm-hook.conf is missing !!!

If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:

/bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf

do not do this

!!! likely same as above, breaks automatic updates !!!

  • comment out /etc/apt/apt.conf.d/20apt-esm-hook.conf

do not do this

!!! removes too many packages !!!

apt remove ubuntu-pro-client

Update packages

apt update # update package list
apt upgrade # install updated packages and update "kept back" packages
apt autoremove # remove packages that apt thinks should be removed

Remove obsolete packages

DO NOT DO THIS, IT REMOVES TOO MUCH !!!

apt list '~o'
apt purge '~o'

Cleanup residual configs

apt list '~c'
apt purge '~c'

Install firefox-esr

install firefox-esr here

Finish installation

Congratulations. There is nothing more to do!

  • reboot
shutdown -r now

Update to new version of Ubuntu

  • run "do-release-upgrade -c"
  • if it does not report new release Ubuntu 24, check /etc/update-manager/release-upgrades has "Prompt=lts"

Update Ubuntu LTS 20.04 to LTS 22.04

apt remove zsys

daqubuntu

# reboot to clear out all updates
# vi /etc/update-manager/release-upgrades # set "Prompt=normal"
# do-release-upgrade -c
Checking for a new Ubuntu release
New release '22.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
# do-release-upgrade
...
say yes...
...
login.defs, say "Y" (erase local changes, use packaged version)
/etc/systemd/resolved.conf, say "Y" (same as above)
firefox snap, say yes
unable to reach snap store, say "skip"
/etc/gmond.conf, say "Y"
/var/yp/Makefile, say "install the package maintainer's version"
/etc/ypserv.conf, same thing
/etc/ypserv.securenets, same thing
/etc/default/nis, same thing
/etc/speech-dispatcher/modules/mary-generic.conf, same thing
/etc/apt/apt.conf.d/50unattended-upgrades, same thing
...
278 packages are going to be removed, say yes
...
restart required, say yes
...
no ping... yes ping...
...
ssh daqubuntu, ok
apt update, fail, DNS does not work, "host security.ubuntu.com" does not resolve.
fix resolver per https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#Disable_NetworkManager
apt update, apt upgrade now works, 0 packages to update
NIS does not work.

midm9a

login.defs
firefox snap
gmond.conf
ypserv
/etc/default/nis
unattended-upgrades
amanda-security.conf
remove obsolete (no)
reboot
configure dns
reenable nis

daq17

firefox snap
imagemagick policy.xml
gmond.conf
chrony.conf
/var/yp/Makefile
ypserv.conf
ypserv.securenets
/etc/default/nis
50unattended-upgrades

daq00

per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/

do-release-upgrade -f DistUpgradeViewNonInteractive

if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.

isdaq08

  • prepare
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
root@isdaq08:~# debsums -ce
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/apt/apt.conf.d/10periodic
root@isdaq08:~# 
  • restore original /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1"; 
APT::Periodic::Download-Upgradeable-Packages "0"; 
APT::Periodic::AutocleanInterval "0"; 
  • apt remove ganglia-monitor
  • apt remove nis
  • "debsums -ce" is now empty

Run the upgrade:

  • do-release-upgrade -f DistUpgradeViewNonInteractive

Post upgrade:

  • configure DNS
  • apt -y install linux-generic-hwe-22.04
  • /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
  • /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
  • install missing packages
  • restore ganglia
  • restore nis
  • check zpool status, may need zpool upgrade
  • reboot

upgrade U-22 to U-24

generic instructions. below are notes from upgrades of specific machines.

NOTE: at CERN saw installation getting stuck on restart of autofs (automount, rsyslogd 100% CPU, huge /var/log/syslog), stop autofs before upgrade?

systemctl stop autofs
cleanup zsys
maybe: if snap is already removed, remove firefox to prevent snap from reinstalling.
install non-snap firefox-esr
remove thunderbird to prevent installation of snap thunderbird in U-24
mount EFI partition as /boot/efi, otherwise upgrade bombs
usually mount /dev/sda1 /boot/efi
debsums -ce
apt -y remove desktop-base # causes installer crash
apt -y remove thunderbird  # avoid forced conversion to snap
apt update
apt -y upgrade
apt -y autoremove
do-release-upgrade -c      # confirm upgrade will be to U-24
do-release-upgrade
# say "y" to all questions
# after installation starts, accept default answers to all questions
# should run for about 1 hour or so
# after upgrade finishes
apt update
apt -y upgrade
apt -y autoremove
shutdown -r now

post upgrade:

  • check zpool status, may need zpool upgrade
  • cd /etc/apt/sources.list.d, reenable 3rd party repos (mozilla, google, etc)

if installation bombs:

apt update
apt upgrade # will tell us to run "apt --fix-broken install", do it
apt --fix-broken install
apt update
apt upgrade # should resume the upgrade, will run for a long time
apt update
apt upgrade # should do nothing
apt autoremove
shutdown -r now

daqubuntu, U-24

  • prepare
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
root@daqubuntu:~# debsums -ce
/etc/ganglia/gmond.conf
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
/etc/ypserv.conf
/etc/ypserv.securenets
/var/yp/Makefile
/etc/update-manager/release-upgrades
/etc/apt/apt.conf.d/10periodic
/etc/yp.conf
root@daqubuntu:~# 
* restore original /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1"; 
APT::Periodic::Download-Upgradeable-Packages "0"; 
APT::Periodic::AutocleanInterval "0"; 
  • apt remove ganglia-monitor
  • apt remove nis
  • apt autoremove
  • restore original release-upgrades: "Prompt: lts"
  • "debsums -ce" is now empty

Check for upgrade:

root@daqubuntu:~# do-release-upgrade -c
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release 
set Prompt=normal in /etc/update-manager/release-upgrades.
root@daqubuntu:~# 

Run the upgrade:

  • do-release-upgrade -f DistUpgradeViewNonInteractive

Post upgrade:

  • configure DNS
  • apt -y install linux-generic-hwe-22.04
  • /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
  • /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
  • install missing packages
  • restore ganglia
  • restore nis
  • check zpool status, may need zpool upgrade
  • cd /etc/apt/sources.list.d, reenable 3rd party repos (mozilla, google, etc)
  • reboot

daq14, U-20-22-24

  • apt update, apt upgrade
  • apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 ### install kernel 5.15
  • shutdown -r now
  • stuck waiting for daq14 to shutdown...
  • reboot into kernel 5.15
  • ???
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • debsums -ce
/etc/apache2/ports.conf
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/sudoers
  • apache2 restore original ports.conf, uncomment "Listen 80"
  • cp -pv /etc/dnsmasq.conf.dpkg-dist /etc/dnsmasq.conf
  • apt remove ganglia-monitor
  • edit /etc/yp.conf, remove everything after "# ypserver ypserver.network.com"
  • "debsums -ce" is now empty
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • runs for a long time
  • stuck on "/etc/default/nis", type "Y", press enter, nothing for a bit, then resumes running
  • finished
  • configure DNS
  • reboot
  • have kernel 6.8
  • apt update; apt upgrade
  • apt upgrade guile-2.2-libs ### would not auto-update, "kept back", has to be done by hand
  • apt autoremove
  • debsums -ce
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
  • diff /etc/default/nis.dpkg-dist /etc/default/nis
  • cp -pv /etc/default/nis.dpkg-dist /etc/default/nis
  • debsums -ce
debsums: missing file /etc/init.d/nis (from nis package)
  • we ignore this and run the update
  • do-release-upgrade -c
Checking for a new Ubuntu release
New release '24.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • bombs out without any error messages
  • in /var/log/dist-upgrade/main.log reports "Failed to find a replacement for xapp" and other packages
  • apt remove xapp usrmerge ureadahead thunderbird-gnome-support
  • no go, complains about even more packages.
  • apt list | grep installed | grep -v jammy ### show packages installed from non-ubuntu sources
  • remove all packages marked "install,local" ### ubuntu updater does not know where they came from and so cannot update them.
  • apt remove desktop-base ### not happy about this package in /var/log/dist-upgrade/apt.log
  • apt autoremove
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • running for a long time...

alpha04 U-20-24

  • apt update, apt upgrade, apt autoremove
  • reboot into latest kernel (already done)
  • debsums -ce
root@alpha04:~# debsums -ce
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/default/nis
/etc/yp.conf
root@alpha04:~# 
  • move /etc/dnsmasq.conf to /etc/dnsmasq.d/alpha04.conf
  • apt remove dnsmasq
  • apt remove ganglia-monitor
  • apt remove nis
  • apt autoremove
  • debsums -ce ### is now empty
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • it runs for a long time...
  • complained about /etc/fwupd config files, not sure why...
  • finished
  • apt update, apt upgrade, apt autoremove
  • restore dnsmasq: apt install dnsmasq, systemctl status dnsmasq
  • restore ganglia, per instructions
  • restore NIS: apt -y install rpcbind nis, ypwhich, ypwhich -m
  • zpool upgrade rpool ### also upgrade any other zfs pools, see zpool status
  • remove unwanted packages, per instructions
  • run gonodeinfo
  • reboot
  • done

vera00 U20-22-24

  • everything same as daq14 for U20-22
  • kernel is still 5.15
  • U22-24 is going...
  • stuck for a few minutes on /etc/fwupd config files
  • have kernel 6.8.0-49
  • same steps as daq14
  • reboot
  • same steps as daq14
  • done

phaarmonster U22-24

  • debsums -ce
/etc/amandahosts
/etc/apache2/ports.conf
/etc/ganglia/gmond.conf
  • do-release-upgrade -c ### reports U24.04.1
  • do-release-upgrade -f DistUpgradeViewNonInteractive ### bombs
  • apt remove desktop-base; apt autoremove
  • do-release-upgrade ### (interactive) runs ok
  • bombed !!!
  • apt upgrade spews errors and tells us to run "apt --fix-broken install"
  • apt --fix-broken install ### runs
  • bombs with thunderbird snap errors
  • again... no go
  • thunderbird snap complains about mounting /home, but /home is a symlink
  • rm /home, mkdir /home
  • again, runs ok
  • asks about /etc/fwupd/fwupd.conf - say "Y" to install updated package version
  • apt install completes
  • apt update; apt upgrade ### running for a long time...
  • finished
  • install missing packages, etc
  • reboot
  • long wait... came back
  • DNS does now work, systemd-resolved missing, apt install systemd-resolved, "configure DNS"
  • done.

isdaq10 U22-24

  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • bombed on desktop-base, "apt remove desktop-base; apt autoremove"
  • bombed with errors
  • running "apt --fix-broken install"
  • complained about thunderbird snap
  • complained about /etc/fwupd/fwupd.conf (say Y)
  • finished ok
  • apt update
  • apt upgrade ### reports "1382 upgraded, 140 newly installed, 0 to remove and 29 not upgraded"

iris01

  • debsums -ce
/etc/ganglia/gmond.conf
/etc/default/nis
/etc/yp.conf
  • apt remove desktop-base
  • apt remove thunderbird
  • apt autoremove
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • bombed with dpkg errors:
hplip wants wrong python
ganglia-monitor wants wrong libapr1
sssd-ad wants bunch of wrong libraries
xemacs21-mule
libnfsidmap1 wants libldap2
adn so forth...
  • apt --fix-broken install -y
  • bombs on ganglia - ganglia group absent from /etc/groups
  • fix group ganglia by hand, remove ganglia group from NIS on isdaq00
  • apt --fix-broken install
  • apt upgrade

iris00

  • stuck on "autofs: restarting", login as root, kill iris midas, kill automount, systemctl restart autofs, noble got unstuck, ctrl-c systemctl restart autofs
  • noble running...
  • bombed with dpkg errors
  • check ganglia user, group - both ok
  • apt --fix-broken install
  • apt upgrade

tigstore01

  • no bomb-out

midm9b

  • apt remove desktop-base thunderbird
  • bombed
  • apt --fix-broken-install
  • apt upgrade

Upgrade to new version of Debian

https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html

32-bit VME processor Debian 11 to 12 to 13

  • cd git/scripts; git pull; cd ~
  • apt update
  • apt upgrade
  • edit /etc/apt/sources.list
deb http://deb.debian.org/debian/ trixie main
#deb http://deb.debian.org/debian/ bookworm main
#deb http://deb.debian.org/debian/ bullseye main
#deb http://deb.debian.org/debian/ buster main
#deb-src http://deb.debian.org/debian/ bullseye main
  • apt update
  • apt upgrade --without-new-pkgs
  • apt full-upgrade
  • apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
  • reboot

Ubuntu package manager

  • apt-get install xxx # install package xxx
  • apt-get update
  • apt-get upgrade
  • apt-get dist-upgrade
  • apt-get autoremove # remove automatically installed packages required by a removed package
  • apt-get remove xxx # remove package xxx
  • apt-cache search . # list all available packages
  • apt-cache show "." | grep ^Package # list al available packages
  • apt-cache madison root-system # show all available versions of package root-system
  • apt list # list all installed packages
  • dpkg --listfiles libpng16-16 # list all files from this package
  • apt list --installed # list all installed packages
  • dpkg -S /bin/bash # what package provides this file?
  • dpkg -L bash # what files provided by this package?
  • debsums -ce # show modified config files
  • apt-config dump # show apt configuration
  • apt purge '~c' # purge all [residual-config] packages
  • ls -l /var/lib/dpkg/info/ # show post-install scripts

Ubuntu zsys

NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230

  • scripted removal of old snapshots (replace "echo zfs destroy" with "zfs destroy")
zfs list -t all | cut -f1 -d " " | grep autozsys | xargs -n1 echo zfs destroy
  • manual removal of old snapshots
zsysctl show
zsysctl state remove xy69ye -s
zsysctl state remove xy69ye
zsysctl state remove xy69ye -u wheel
  • apt remove zsys

NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots

update-grub # list of all snapshots, errors if some snapshots are broken
zsysctl state remove lnc0k7 --system # remove snapshot
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
zsysctl show # show snapshots

Ubuntu cloning

to clone a ubuntu image:

cd /nfsroot/lxcpet
emacs -nw etc/hostname ### change hostname
emacs -nw etc/mailname ### change hostname (debian 11)
emacs -nw etc/defaultdomain ### change the NIS domainname
emacs -nw etc/yp.conf ### change the NIS server
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
emacs -nw root/.ssh/authorized_keys ### update root ssh keys

Ubuntu boot loader

maintenance commands

  • update-initramfs -v -u
  • grub-install /dev/sda

Convert from single to dual mirrored ZFS SSD

Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will add a second SSD, configure ZFS to use both SSDs in mirrored configuration and setup grub to boot from either SSD. This is intended to create a full redundant system where failure of either SSD does not break the system.

partition

  • identify first SSD
root@midm9b:~# ./smart-status.perl 
        Disk                    model               serial     temperature  realloc  pending   uncorr  CRC err     RRER Errors     Link
    /dev/sda  WD Blue SA510 2.5 250GB         22243Z803769              24        .        ?        ?        .        ?        .      6.0
root@midm9b:~# 
  • connect second SSD of identical size
root@midm9b:~# ./smart-status.perl 
        Disk                    model               serial     temperature  realloc  pending   uncorr  CRC err     RRER   Errors     Link
    /dev/sda  WD Blue SA510 2.5 250GB         22243Z803769              24        .        ?        ?        .        ?        .      6.0
    /dev/sdb  WD Blue SA510 2.5 250GB         22243Z803852              25        .        ?        ?        .        ?        .      6.0
root@midm9b:~# 
  • if second SSD is not autodetected, reboot
  • Clone partition table automatically

If both SSDs are identical size, use this simpler method of duplicating the partition table:

root@midm9b:~# sfdisk -d /dev/sda > part_table
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb

The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.

  • Clone partition table manually (e.g. for different size disks)
  • list partition table of first SSD:
root@midm9b:~# fdisk -l /dev/sda
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624   5244927   4194304     2G Linux swap
/dev/sda3  5244928   9439231   4194304     2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~# 
  • create identical partitions on second SSD, use sector numbers from above.
root@midm9b:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 1.0.8

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): n
Partition number (1-128, default 1): 
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}: 
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'

Command (? for help): n
Partition number (2-128, default 2): 
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}: 
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): 8200
Changed type of partition to 'Linux swap'

Command (? for help): n
Partition number (3-128, default 3): 
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}: 
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'

Command (? for help): n
Partition number (4-128, default 4): 
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}: 
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}: 
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): bf00
Changed type of partition to 'Solaris root'

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdb.
The operation has completed successfully.
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624   5244927   4194304     2G Linux swap
/dev/sda3  5244928   9439231   4194304     2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root


Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603

Device       Start       End   Sectors   Size Type
/dev/sdb1     2048   1050623   1048576   512M EFI System
/dev/sdb2  1050624   5244927   4194304     2G Linux swap
/dev/sdb3  5244928   9439231   4194304     2G Solaris boot
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~# 

update ZFS pools

  • identify second SSD partitions
root@midm9b:~# ls -l /dev/disk/by-id/ata*part2
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part2 -> ../../sda2
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part2 -> ../../sdb2
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
  • convert bpool from single disk to mirrored disk:
root@midm9b:~# zpool status
  pool: bpool
 state: ONLINE
config:

	NAME                                    STATE     READ WRITE CKSUM
	bpool                                   ONLINE       0     0     0
	  99e03dc0-7d4d-f24b-8fa1-f042b9f135db  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

	NAME                                    STATE     READ WRITE CKSUM
	rpool                                   ONLINE       0     0     0
	  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE       0     0     0

errors: No known data errors
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
root@midm9b:~# zpool status bpool
  pool: bpool
 state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	bpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE       0     0     0

errors: No known data errors
  • convert rpool
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
root@midm9b:~# zpool status rpool
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
	5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
	0B resilvered, 0.03% done, no estimated completion time
config:

	NAME                                                STATE     READ WRITE CKSUM
	rpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE       0     0     0

errors: No known data errors
root@midm9b:~# 
  • wait for resilver to complete
root@midm9b:~# zpool status
  pool: bpool
 state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	bpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	rpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE       0     0     0

errors: No known data errors

update boot loader

INSTALL SYSLINUX: https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#EFI_boot_using_syslinux

DO *NOT* DO THE FOLOWING:

  • create and mount EFI partitions:
root@midm9b:~# mkfs.msdos /dev/sdb1
root@midm9b:~# mkdir /boot/efi-sda
root@midm9b:~# mkdir /boot/efi-sdb
root@midm20c:~# blkid | grep vfat ### identify UUID
/dev/sdb1: UUID="DD89-5081" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="d0cb6be4-2f67-5b42-9b26-9e6905e9f774"
/dev/sdc1: UUID="D970-86BA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="e6d3b5b9-a512-44a2-9205-1a4db06ed2a2"
/dev/sda1: UUID="DDA1-044C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6dc9dff0-1c13-8045-a906-7803d3074c70"
root@midm20c:~# cat /etc/fstab | grep vfat ### add mount points with correct UUID
#UUID=D970-86BA  /boot/efi       vfat    umask=0022,fmask=0022,dmask=0022      0       1
UUID=DDA1-044C  /boot/efi-sda       vfat    umask=0022,fmask=0022,dmask=0022      0       1
UUID=DD89-5081  /boot/efi-sdb       vfat    umask=0022,fmask=0022,dmask=0022      0       1
root@midm9b:~# mount -a
root@midm9b:~# df -kl
Filesystem                                       1K-blocks    Used Available Use% Mounted on
...
/dev/sda1                                           523244   13720    509524   3% /boot/efi
/dev/sdb1                                           523244       4    523240   1% /boot/efi-sdb
...
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
sending incremental file list
EFI/
...
root@midm9b:~# ls -l /boot/efi-sda
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# ls -l /boot/efi-sdb
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# 
  • add systemd "nofail" flag to /etc/fstab, without this, systemd will stop booting if one SSD is missing
daq00:~$ cat /etc/fstab | grep vfat
#UUID=31A7-24BE  /boot/efi       vfat    umask=0022,fmask=0022,dmask=0022      0       1
/dev/sda1 /boot/efi-sda       vfat    umask=0022,fmask=0022,dmask=0022,nofail      0       1
/dev/sdb1 /boot/efi-sdb       vfat    umask=0022,fmask=0022,dmask=0022,nofail      0       1
  • setup script to update grub on second SSD, it must be run manually after every kernel update
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
root@midm9b:~# ~/update_efi_grub.perl -u
EFI dir: /boot/efi-sda
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
building file list ... done

sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sda/EFI
building file list ... done

sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
EFI dir: /boot/efi-sdb
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
building file list ... done

sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sdb/EFI
building file list ... done

sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
root@midm9b:~# 

Disable NetworkManager

Debian-12

network:
  version: 2
  ethernets:
    all-en:
      match:
        name: "en*"
      dhcp4: true
      dhcp6: true
      ignore-carrier: true ### do not drop IP address if network link drops
  • netplan apply
  • netplan try
  • ifconfig -a ### to check IP address settings
  • netstat -rn ### to check default route
  • cat /etc/resolv.conf ### to check DNS
  • ls -l /run/systemd/netif/leases ### systemd-networkd dhcp leases
  • NOTE: without "ignore-carrier" it will drop the IP address if network link drops, re-do dhcp when links comes back
  • NOTE: wait-network-online will wait for all interfaces to get an IP address

Ubuntu-20

NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04

NetworkManager is useful for configuring dynamic network interfaces, i.e. laptops that often move between networks, or connect to multiple choice of wifi networks, etc.

For machines with statically configured network interfaces, NetworkManager is not necessary.

As it has been observed to become confused and observed to malfunction when network links go up and down (it keeps unnecessarily reconfiguring the ip address, etc), it can be usefuil to disable it.

  • list all network interfaces
# /bin/ls -1 /sys/class/net/
enp0s31f6
lo
  • edit /etc/network/interfaces:
rename enp0s31f6=eth0
auto eth0
iface eth0 inet static
   address 142.90.120.94/19
   gateway 142.90.100.18
  • statically configure systemd-resolved
    • create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
[Resolve]
DNS=142.90.100.19
Domains=triumf.ca
    • systemctl restart systemd-resolved
    • resolvectl
    • systemd-analyze cat-config systemd/resolved.conf
  • disable NetworkManager
systemctl disable NetworkManager
  • reboot

U-22, U-24 ifup-ko

Network configuration of modern linux is confused. There are at least 3 configuration methods, each with different shortcomings:

  • the old ifup method is barely documented
  • NetworkManager is well documented and tooled, but sometimes does strange things
  • systemd-networkd is mysterious, and likely to do strange stuff, like all systemd stuff
  • netplan is the latest method, configuration is simple but uses NetworkManager or systemd-networkd as backend.

This is a solution for a specific situation of fixed computer with one fixed wired interface and maybe one or more additional interfaces for fixed wired private networks.

Install /etc/ifup-ko, edit it with IP addresses of main and additional interfaces, let systemd run it in the right place in the boot sequence replacing NetworkManager and NetworkManager-wait-online .

As bonus, /etc/ifup-ko waits up to 10 seconds for the main interface link to come up. If this is not needed, comment it out from the script.

  • prepare
cd ~/git/scripts
git pull
cd ifup-ko
make install
  • confirm interface names
systemctl start ifup-ko ### should finish immediately or after 10 seconds
systemctl status ifup-ko -n 1000 ### observe list of interfaces is correct, name of main interface is correct
  • edit /etc/ifup-ko
    • add host IP address to the "ifconfig" line
    • add gateway IP address to the "ip route add" line
  • test
systemctl start ifup-ko ### should finish immediately
systemctl status ifup-ko -n 1000 ### observe everything is configured as expected
  • cut-over
systemctl disable networkd-dispatcher
systemctl disable NetworkManager
systemctl disable wpa_supplicant # if no Wifi or Wifi not in use
  • reboot

Disable systemd-networkd

On netbooted machines, systemd-networkd should be disabled - when "apt upgrade" runs and needs to update and configure systemd, it will stop systemd-networkd, which will stop the network, which will stop the NFS mounted root filesystem, which will stop the machine.

systemctl disable systemd-networkd.service
systemctl disable systemd-networkd.socket
systemctl mask systemd-networkd.service
systemctl mask systemd-networkd.socket

Configure ECC memory

Configure EDAC

  • apt install edac-utils rasdaemon

Intel i3-2120

root@musr00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
root@musr00:~# edac-ctl --status
edac-ctl: drivers not loaded.

Intel E-2236

root@daq00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SCM-F
root@daq00:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@daq00:~# edac-util 
edac-util: No errors to report.
root@daq00:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
  • check edac sysfs files (Intel)
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
drwxr-xr-x 2 root root    0 Jan 25 15:10 power
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank0
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank1
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank2
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank3
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank4
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank5
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank6
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank7
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
root@daq00:~# 

Intel E3-1270 v6

root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
edac-ctl: drivers are loaded.
root@grsnis01:~# edac-util
edac-util: No errors to report.
root@grsnis01:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
root@grsnis01:~# 

Intel E3-1245 v6

[root@alphagdaq ~]# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
[root@alphagdaq ~]# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
[root@alphagdaq ~]# edac-ctl --status
edac-ctl: drivers are loaded.
[root@alphagdaq ~]# edac-util
edac-util: No errors to report.
[root@alphagdaq ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
[root@alphagdaq ~]# ras-mc-ctl --layout
          +-----------------------------------------------+
          |                      mc0                      |
          |  csrow0   |  csrow1   |  csrow2   |  csrow3   |
----------+-----------------------------------------------+
channel1: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
----------+-----------------------------------------------+
[root@alphagdaq ~]# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#3channel#0	0	0
mc#0csrow#2channel#1	0	0
mc#0csrow#3channel#1	0	0
mc#0csrow#0channel#0	0	0
mc#0csrow#1channel#1	0	0
mc#0csrow#0channel#1	0	0
mc#0csrow#1channel#0	0	0
mc#0csrow#2channel#0	0	0
[root@alphagdaq ~]# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
[root@alphagdaq ~]# ras-mc-ctl --summary
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
[root@alphagdaq ~]# 

AMD 3700X

(memory is non-ECC)

root@daq13:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@daq13:~# 
root@daq13:~# 
root@daq13:~# edac-ctl --status
edac-ctl: drivers not loaded.
root@daq13:~# edac-util 
edac-util: Error: No memory controller data found.
root@daq13:~# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent

(memory is ECC)

root@trinatdaq:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@trinatdaq:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@trinatdaq:~# edac-util 
edac-util: No errors to report.
root@trinatdaq:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
lrwxrwxrwx 1 root root    0 Dec 13 18:31 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank4
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank5
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
root@trinatdaq:~# 

AMD 5000G

  • no linux driver for AMD 5000-series "G" CPU
  • no mention of ECC in the BIOS settings
  • unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
  • unclear status of ECC support in ASUS documentation (web page out of date)

AMD 5600X

root@daq17:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
root@daq17:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@daq17:~# edac-util
edac-util: No errors to report.
root@daq17:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@daq17:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
root@daq17:~# 

AMD 3955WX

root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
edac-ctl: drivers are loaded.
root@alphasuperdaq:~/git/scripts/quotareport# edac-util 
edac-util: No errors to report.
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 19 root root    0 Dez 12 04:48 mc0
drwxr-xr-x  2 root root    0 Dez 12 04:48 power
lrwxrwxrwx  1 root root    0 Dez  9 05:31 subsystem -> ../../../../bus/edac
-rw-r--r--  1 root root 4096 Dez  9 05:31 uevent
root@alphasuperdaq:~/git/scripts/quotareport# 
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
drwxr-xr-x 2 root root    0 Dez 12 04:48 power
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank0
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank1
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank12
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank13
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank3
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank4
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank7
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank8
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
root@alphasuperdaq:~# 
root@alphasuperdaq:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                              mc0                                                                                              |
    |                                            csrow0                                             |                                            csrow1                                             |
    | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

0: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@alphasuperdaq:~# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#0channel#2	0	0
mc#0csrow#1channel#7	0	0
mc#0csrow#0channel#3	0	0
mc#0csrow#1channel#4	0	0
mc#0csrow#1channel#2	0	0
mc#0csrow#0channel#7	0	0
mc#0csrow#1channel#3	0	0
mc#0csrow#0channel#4	0	0
mc#0csrow#1channel#1	0	0
mc#0csrow#1channel#0	0	0
mc#0csrow#1channel#5	0	0
mc#0csrow#0channel#6	0	0
mc#0csrow#0channel#1	0	0
mc#0csrow#0channel#5	0	0
mc#0csrow#0channel#0	0	0
mc#0csrow#1channel#6	0	0
root@alphasuperdaq:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@alphasuperdaq:~#

AMD 7700X

root@dsfe05:~# apt install edac-utils
root@dsfe05:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro H13SAE-MF
root@dsfe05:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@dsfe05:~# edac-util
edac-util: No errors to report.
root@dsfe05:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@dsfe05:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 May 14 09:33 ce_count
-r--r--r-- 1 root root 4096 May 14 09:33 ce_noinfo_count
-r--r--r-- 1 root root 4096 May 14 09:33 max_location
-r--r--r-- 1 root root 4096 May 14 09:33 mc_name
drwxr-xr-x 2 root root    0 May 14 09:33 power
drwxr-xr-x 3 root root    0 May 14 09:33 rank4
drwxr-xr-x 3 root root    0 May 14 09:33 rank5
--w------- 1 root root 4096 May 14 09:33 reset_counters
-r--r--r-- 1 root root 4096 May 14 09:33 seconds_since_reset
-r--r--r-- 1 root root 4096 May 14 09:33 size_mb
-r--r--r-- 1 root root 4096 May 14 09:33 ue_count
-r--r--r-- 1 root root 4096 May 14 09:33 ue_noinfo_count
-rw-r--r-- 1 root root 4096 May 14 09:33 uevent
root@dsfe05:~# 

Configure rasdaemon

apt install rasdaemon
systemctl enable rasdaemon
systemctl restart rasdaemon
systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
   Main PID: 2477175 (rasdaemon)
      Tasks: 1 (limit: 76958)
     Memory: 17.1M
     CGroup: /system.slice/rasdaemon.service
             └─2477175 /usr/sbin/rasdaemon -f -r

Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events

Get reports

  • Intel 2x32GB ECC DIMMs
root@daq00:~# ras-mc-ctl --layout
          +-------------------------+
          |           mc0           |
          |   csrow0   |   csrow1   |
----------+-------------------------+
channel1: |  16384 MB  |  16384 MB  |
channel0: |  16384 MB  |  16384 MB  |
----------+-------------------------+
root@daq00:~# ras-mc-ctl --error-count
Label                   CE      UE
mc#0csrow#1channel#1    0       0
mc#0csrow#1channel#0    0       0
mc#0csrow#0channel#0    0       0
mc#0csrow#0channel#1    0       0
root@daq00:~# 
  • Intel 4x16GB ECC DIMMs
root@daq00:~# ras-mc-ctl --error-count
Label                   CE      UE
mc#0csrow#0channel#1    0       0
mc#0csrow#2channel#0    0       0
mc#0csrow#0channel#0    0       0
mc#0csrow#2channel#1    0       0
mc#0csrow#1channel#0    0       0
mc#0csrow#1channel#1    0       0
mc#0csrow#3channel#0    0       0
mc#0csrow#3channel#1    0       0
root@daq00:~# 
root@daq00:~# ras-mc-ctl --layout
          +-----------------------+
          |          mc0          |
          |  csrow0   |  csrow1   |
----------+-----------------------+
channel1: |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |
----------+-----------------------+
root@daq00:~# 
root@daq00:~# 
root@daq00:~# 
root@daq00:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@daq00:~# 

note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.

  • AMD 7700 2x32GB DDR5 ECC DIMMs
root@dsfe05:~# systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-05-14 09:36:43 PDT; 33ms ago
    Process: 4088418 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
   Main PID: 4088417 (rasdaemon)
      Tasks: 1 (limit: 37300)
     Memory: 788.0K
        CPU: 5ms
     CGroup: /system.slice/rasdaemon.service
             └─4088417 /usr/sbin/rasdaemon -f -r

May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:aer_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:aer_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: mce:mce_record event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event mce:mce_record
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:extlog_mem_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:extlog_mem_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mc_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording aer_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording extlog_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mce_record events
root@dsfe05:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 907.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 908.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 911.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
    +-----------------------------------------------------------------------------------------------+
    |                                              mc0                                              |
    |        csrow0         |        csrow1         |        csrow2         |        csrow3         |
    | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  |
----+-----------------------------------------------------------------------------------------------+

0: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----+-----------------------------------------------------------------------------------------------+
root@dsfe05:~# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#2channel#1	0	0
mc#0csrow#2channel#0	0	0
root@dsfe05:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.
root@dsfe05:~# 

sensors

VME CPU V7865

add to /etc/rc.local

modprobe coretemp
modprobe lm75
modprobe lm90
modprobe max1668

available sensors:

root@lxdaq23:~# sensors
max6657-i2c-0-4c
Adapter: SMBus I801 adapter at 0400
temp1:        +29.1°C  (low  = -55.0°C, high = +105.0°C)
                       (crit = +105.0°C, hyst = +95.0°C)
temp2:        +31.5°C  (low  = -55.0°C, high = +105.0°C)
                       (crit = +105.0°C, hyst = +95.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +25.0°C  (crit = +100.0°C)
Core 1:       +25.0°C  (crit = +100.0°C)

max1805-i2c-0-18
Adapter: SMBus I801 adapter at 0400
temp1:        +35.0°C  (low  = -55.0°C, high = +127.0°C)
temp2:        +63.0°C  (low  = -55.0°C, high = +127.0°C)
temp3:          FAULT  (low  = -55.0°C, high = +127.0°C)  ALARM (HIGH)

lm75-i2c-0-48
Adapter: SMBus I801 adapter at 0400
temp1:        +34.5°C  (high = +80.0°C, hyst = +75.0°C)

root@lxdaq23:~# 

ASUS P7P55D EVO

  • BIOS version 2101
root@iris01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +34.0°C  (high = +83.0°C, crit = +99.0°C)
Core 1:       +37.0°C  (high = +83.0°C, crit = +99.0°C)
Core 2:       +38.0°C  (high = +83.0°C, crit = +99.0°C)
Core 3:       +35.0°C  (high = +83.0°C, crit = +99.0°C)

nouveau-pci-0100
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.05 V)
temp1:        +46.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      864.00 mV (min =  +0.80 V, max =  +1.60 V)
+3.3V Voltage:        3.38 V  (min =  +2.97 V, max =  +3.63 V)
+5V Voltage:          5.04 V  (min =  +4.50 V, max =  +5.50 V)
+12V Voltage:        12.15 V  (min = +10.20 V, max = +13.80 V)
CPU Fan Speed:       968 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis1 Fan Speed: 1288 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis2 Fan Speed: 1316 RPM  (min =  600 RPM, max = 7200 RPM)
Power Fan Speed:       0 RPM  (min =    0 RPM, max = 7200 RPM)
CPU Temperature:     +34.0°C  (high = +45.0°C, crit = +45.5°C)
MB Temperature:      +30.0°C  (high = +45.0°C, crit = +46.0°C)

root@iris01:~# 

ASUS Z97-WS

  • BIOS version 2704
  • load sensors drivers
echo modprobe coretemp >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
  • in /boot/grub/grub.cfg, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
  • update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg
root@isdaq08:~# sensors
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  
temp2:        +29.8°C  

nct6791-isa-0290
Adapter: ISA adapter
Vcore:                 888.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                     1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:                    3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:                   3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                     1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                     1.99 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                     0.00 V  (min =  +0.00 V, max =  +0.00 V)
3VSB:                    3.44 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:                    3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                     1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)
in11:                  840.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)
in13:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    0.00 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                  1041 RPM  (min =    0 RPM)
fan2:                  1040 RPM  (min =    0 RPM)
fan3:                     0 RPM  (min =    0 RPM)
fan4:                     0 RPM  (min =    0 RPM)
fan5:                     0 RPM  (min =    0 RPM)
fan6:                     0 RPM  (min =    0 RPM)
SYSTIN:                 +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:                 +41.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:               -128.0°C    sensor = thermistor
AUXTIN1:               -128.0°C    sensor = thermistor
AUXTIN2:                +35.0°C    sensor = thermistor
AUXTIN3:               +127.0°C    sensor = thermistor
PECI Agent 0:           +41.0°C  
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
PCH_CHIP_TEMP:           +0.0°C  
PCH_CPU_TEMP:            +0.0°C  
PCH_MCH_TEMP:            +0.0°C  
PCH_DIM0_TEMP:           +0.0°C  
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:           disabled

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +39.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +39.0°C  (high = +80.0°C, crit = +100.0°C)

root@isdaq08:~# 

ASUS Z170-DELUXE

  • BIOS version 3801
  • load sensors drivers
echo modprobe coretemp >> /etc/rc.local
echo modprobe jc42 >> /etc/rc.local
echo modprobe lm92 >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
  • in /etc/default/grub, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
  • update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg
root@iris00:~# sensors
nct6793-isa-0290
Adapter: ISA adapter
in0:                      600.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      144.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                        0.00 V  (min =  +0.00 V, max =  +0.00 V)
in7:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     600.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     592.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     968.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1370 RPM  (min =    0 RPM)
fan2:                     1437 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +32.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +42.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  -128.0°C    sensor = thermistor
AUXTIN1:                   +50.0°C    sensor = thermistor
AUXTIN2:                   +22.0°C    sensor = thermistor
AUXTIN3:                   +28.0°C    sensor = thermistor
PECI Agent 0:              +50.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +42.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
PCH_MCH_TEMP:               +0.0°C  
TSI2_TEMP:                +3892314.0°C  
TSI3_TEMP:                +3892314.0°C  
TSI4_TEMP:                +3892314.0°C  
TSI5_TEMP:                +3892314.0°C  
TSI6_TEMP:                +3892314.0°C  
TSI7_TEMP:                +3892314.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

jc42-i2c-0-1a
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-18
Adapter: SMBus I801 adapter at f040
temp1:        +34.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-1b
Adapter: SMBus I801 adapter at f040
temp1:        +35.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-19
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +48.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +84.0°C, crit = +100.0°C)

root@iris00:~# 

ASUS Z390M-PRO GAMING (WI-FI)

  • BIOS 3006
  • load sensors drivers
echo modprobe coretemp >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
root@daq18:~# sensors
nct6798-isa-0290
Adapter: ISA adapter
in0:                      696.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)
in6:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.17 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.07 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                     1131 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                     1006 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +32.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN0:                   +25.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN1:                    +7.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN2:                    +8.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN3:                   +24.0°C  (high = +80.0°C, hyst = +75.0°C)
                                    (crit = +100.0°C)  sensor = thermistor
AUXTIN4:                   +83.0°C  (high = +80.0°C, hyst = +75.0°C)  ALARM
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +29.0°C  (high = +80.0°C, hyst = +75.0°C)
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
PCH_MCH_TEMP:               +0.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +39.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +39.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +33.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +32.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +31.0°C  (high = +82.0°C, crit = +100.0°C)
Core 4:        +31.0°C  (high = +82.0°C, crit = +100.0°C)
Core 5:        +30.0°C  (high = +82.0°C, crit = +100.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +28.0°C  

root@daq18:~# 

ASUS H110M-A/M.2

  • BIOS version 4202
  • echo modprobe coretemp >> /etc/rc.local
  • echo modprobe nct6775 >> /etc/rc.local
root@midpol:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +30.0°C  (high = +80.0°C, crit = +100.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)

nct6793-isa-0290
Adapter: ISA adapter
in0:                      368.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      928.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     128.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1004 RPM  (min =    0 RPM)
fan2:                     1143 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                   +118.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +30.0°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +110.0°C    sensor = thermistor
PECI Agent 0:              +31.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +36.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
TSI2_TEMP:                +3892314.0°C  
TSI3_TEMP:                +3892314.0°C  
TSI4_TEMP:                +3892314.0°C  
TSI5_TEMP:                +3892314.0°C  
TSI6_TEMP:                +3892314.0°C  
TSI7_TEMP:                +3892314.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

root@midpol:~# 

ASUS P9X79 WS

root@daq14:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +29.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +24.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +82.0°C, crit = +100.0°C)

nouveau-pci-0200
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +39.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

nct6776-isa-0290
Adapter: ISA adapter
Vcore:           1.04 V  (min =  +0.00 V, max =  +1.74 V)
in1:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:           3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:             2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:           904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:            3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:          1265 RPM  (min =    0 RPM)
fan2:          1909 RPM  (min =    0 RPM)
fan3:             0 RPM  (min =    0 RPM)
fan4:             0 RPM  (min =    0 RPM)
fan5:             0 RPM  (min =    0 RPM)
SYSTIN:         +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:         +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
AUXTIN:         +31.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
PECI Agent 0:   +31.0°C  (high = +80.0°C, hyst = +75.0°C)
                         (crit = +96.0°C)
PCH_CHIP_TEMP:   +0.0°C  
PCH_CPU_TEMP:    +0.0°C  
PCH_MCH_TEMP:    +0.0°C  
intrusion0:    ALARM
intrusion1:    ALARM
beep_enable:   disabled

root@daq14:~# 

ASUS TUF GAMING B550M-PLUS WIFI II

  • BIOS 2803, 2806
  • echo modprobe nct6775 >> /etc/rc.local
root@midm9a:~# sensors
nct6798-isa-0290
Adapter: ISA adapter
in0:                      488.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                      760 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan7:                     1264 RPM  (min =    0 RPM)
SYSTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +22.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +95.0°C    sensor = thermistor
AUXTIN1:                   +25.0°C    sensor = thermistor
AUXTIN2:                   +25.0°C    sensor = thermistor
AUXTIN3:                   +25.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +23.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +32.4°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

amdgpu-pci-0800
Adapter: PCI adapter
vddgfx:        1.45 V  
vddnb:       993.00 mV 
edge:         +28.0°C  
PPT:          20.00 W  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +33.4°C  

root@midm9a:~# 

ASUS ASUS ROG STRIX B550-XE GAMING WIFI

  • BIOS 2423, 2604
  • echo modprobe nct6775 >> /etc/rc.local
root@daq13:~# sensors
nct6798-isa-0290
Adapter: ISA adapter
in0:                      344.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                      845 RPM  (min =    0 RPM)
fan2:                      998 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +28.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +27.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +94.0°C    sensor = thermistor
AUXTIN1:                   +28.0°C    sensor = thermistor
AUXTIN2:                   +28.0°C    sensor = thermistor
AUXTIN3:                   +97.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +27.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +33.6°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.45 V  
vddnb:       999.00 mV 
edge:         +29.0°C  
PPT:          14.00 W  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +33.9°C  

root@daq13:~# 

ASUS ASUS ROG STRIX B550-E GAMING

  • bios 2803
  • echo modprobe jc42 >> /etc/rc.local
  • echo modprobe nct6775 >> /etc/rc.local
root@daq17:~# sensors
jc42-i2c-1-1b
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +28.0°C  

nouveau-pci-0800
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

nct6798-isa-0290
Adapter: ISA adapter
in0:                      288.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                      843 RPM  (min =    0 RPM)
fan2:                      629 RPM  (min =    0 RPM)
fan3:                      746 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +93.0°C    sensor = thermistor
AUXTIN1:                   +22.0°C    sensor = thermistor
AUXTIN2:                   +22.0°C    sensor = thermistor
AUXTIN3:                   +96.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +25.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +27.6°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

jc42-i2c-1-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +23.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

asusec-isa-0000
Adapter: ISA adapter
CPU_Opt:        0 RPM
Chipset:      +34.0°C  
CPU:          +25.0°C  
Motherboard:  +22.0°C  
T_Sensor:     -40.0°C  
VRM:          +31.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +28.0°C  
Tccd1:        +27.5°C  

root@daq17:~# 

ASUS PRIME B650-PLUS

  • BIOS 1811
  • echo modprobe nct6775 >> /etc/rc.local
root@dsdaqgw:~# sensors
amdgpu-pci-0b00
Adapter: PCI adapter
vddgfx:      930.00 mV 
vddnb:         1.19 V  
edge:         +38.0°C  
PPT:          25.10 W  

nct6799-isa-0290
Adapter: ISA adapter
in0:                      920.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                     1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +78.0°C    sensor = thermistor
AUXTIN1:                   +11.0°C    sensor = thermistor
AUXTIN2:                   +20.0°C    sensor = thermistor
AUXTIN3:                   +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +42.6°C  
intrusion0:               ALARM
intrusion1:               OK
beep_enable:              disabled

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +42.6°C  
Tccd1:        +36.4°C  

root@dsdaqgw:~# 

Enable CPU turbo mode

  • Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
  • Find out CPU capability
root@daq01:~# lscpu | grep Hz
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
CPU MHz:                         3965.803
CPU max MHz:                     4000.0000
CPU min MHz:                     800.0000
root@daq01:~# 
  • Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.

https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html

  • Find current frequency settings:
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
root@daq01:~# 
  • Note the following:
    • current governor is "powersave"
    • "performance" governor is available
    • "boost state support" is supported and active.
  • Confirm CPU frequency governor:
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
root@daq01:~# 
  • Change governor to "performance":
root@daq01:~# cpupower frequency-set --governor performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance
performance
performance
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
  • monitor CPU frequency:
root@daq01:~# cpupower monitor
    | Nehalem                   || Mperf              || Idle_Stats                                     
 CPU| C3   | C6   | PC3  | PC6   || C0   | Cx   | Freq  || POLL | C1   | C1E  | C3   | C6   | C7s  | C8    
   0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
   4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
   1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
   5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
   2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
   6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
   3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
   7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~# 
  • check that the CPU is not overheating:
root@daq01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
  • congratulations, we are running at 4 GHz now!

Setup ubuntu as gateway to private network

See also:

Steps to do

!!! UPDATED 16feb2024 Ubuntu-22.04.03 !!!

  • assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
  • (on the gateway machine, each private network interface has to have a different network number)
  • (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
  • assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
  • (for simplicity, assign 192.168.1.1 to the gateway machine itself)
  • (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
  • setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
  • setup DHCP server (dnsmasq) to give out the IP addresses
  • setup TFTP server (dnsmasq), pxelinux and NFS for diskless booting
  • setup time server (chronyd) to provide common time to all devices
  • setup NAT so machines on private network can access the internet (to get OS updates, etc)
  • setup NIS and NFS so machines on the private network can use common home directories
  • setup rsync backup of machines on the private network

setup hosts

  • edit /etc/hosts
192.168.1.101 dsfe01
... and so forth

setup dns and dhcp

!!! updated 16feb2024 for Ubuntu 22.04.3 !!!

!!! note: stock systemd-resolved remains, is configured to forward queries to dnsmasq, configured to forward queries to TRIUMF DNS !!!

!!! note: per authors of systemd, bare hostnames are not permitted, a DNS domain name must always be used. DNS domain name "dsdaq" is used in this example !!!

  • apt install dnsmasq
  • ensure dnsmasq starts after all interfaces are up (Ubuntu-22)
mkdir /etc/systemd/system/dnsmasq.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/dnsmasq.service.d/local.conf
  • edit /etc/dnsmasq.conf
# /etc/dnsmasq.conf
# DNS settings 
#port=0 # disable DNS function 
port=53 # enable DNS function 
bind-interfaces # do not collide with systemd-resolved, we use 127.0.0.1:53, they use 127.0.0.53:53 
domain-needed 
bogus-priv 
no-resolv 
#log-queries # log DNS quesries 
 
# TRIUMF DNS settings 
 
server=142.90.100.19 
expand-hosts 
domain=dsdaq 
local=/dsdaq/ 
localmx # do not forward MX queries to TRIUMF 

# DHCP settings 
interface=enp1s0f0 # VX network 192.168.0.x 
#interface=missing  # FEP and TSP network 192.168.1.x 
interface=enp1s0f1 # controls network 192.168.2.x 
#dhcp-range=192.168.1.50,192.168.1.150,infinite 
dhcp-range=192.168.0.0,static 
dhcp-range=192.168.2.0,static 
log-dhcp # log DHCP queries 
#quiet-dhcp 
dhcp-ignore=tag:!known 
#dhcp-boot=pxelinux.0 
 
dhcp-option=option:dns-server,192.168.0.248 
dhcp-option=option:ntp-server,192.168.0.248 
 
# TFTP settings 
 
enable-tftp 
tftp-root=/tftpboot 
  • #mkdir /tftpboot ### per tftp-root (if no ZFS)
  • zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
  • create resolved-dsdaq.conf with main IP address of dnsmasq
[Resolve]
DNS=192.168.0.248
Domains=dsdaq triumf.ca
  • mkdir -p /etc/systemd/resolved.conf.d/
  • /bin/rm -f /etc/systemd/resolved.conf.d/*.conf
  • cp resolved-dsdaq.conf /etc/systemd/resolved.conf.d/
  • systemctl stop systemd-resolved.service
  • systemctl disable systemd-resolved.service
  • systemctl enable dnsmasq
  • systemctl restart dnsmasq
  • try to "ping" or "host" some names from /etc/hosts, it should work
  • try to ping daq00, daq00.triumf.ca, all should work
  • resolved-dsdaq.conf goes into /etc/systemd/resolved.conf.d/ of all machines on the private network
  • if not using systemd-resolved, edit /etc/resolv.conf

setup chronyd

  • enable ntp server:
  • disable systemd-timesyncd, configure and enable chronyd per instructions above
  • create dsdaq.conf
# chrony config for dsdaq server

#allow 192.168.0.0
#allow 192.168.1.0
#allow 192.168.2.0
allow all

# end
  • cp dsdaq.conf /etc/chrony/conf.d/
  • systemctl restart chronyd
  • chronyc tracking ### wait until time is synchronized (a few seconds)
  • create dsdaq.sources # use hostname or IP address of chronyd server
# Put this file in /etc/chrony/sources.d
# systemctl restart chrony
# chronyc sources
# chronyc tracking
server dsdaqgw iburst prefer
# end
  • dsdaq.sources goes to /etc/chrony/sources.d of all machines on the private network

setup diskless network booting

setup pxelinux for legacy pxe boot

  • add bits in dnsmasq.conf
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
dhcp-boot=pxelinux.0
dhcp-option=17,"192.168.0.251:/nfsroot/%s,vers=3,tcp"
  • setup pxelinux for Ubuntu-18
cd ~
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
tar xjvf syslinux-4.03.tar.bz2
cd syslinux-4.03
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
  • cd /zssd/tftpboot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
wget http://ladd00.triumf.ca/tftpboot/modules.alias
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
wget http://ladd00.triumf.ca/tftpboot/pci.ids
  • mkdir pxelinux.cfg
  • emacs -nw pxelinux.cfg/default
default menu.c32
prompt 0

menu title Welcome to the DSVSLICE PXE boot menu

timeout 50

label hdt
  kernel hdt.c32

label memtest86+-5.01 
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz 

label memtest86+-4.20
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip

label vmlinuz-5.3.0-26-generic
  menu default
  kernel vmlinuz-5.3.0-26-generic
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0

#end

setup pxelinux for efi pxe boot

  • https://c-nergy.be/blog/?p=13808
  • add dnsmasq.conf bits. note: to use dhcp root-path, see the "nfsroot=auto" patch below and make sure to use the "dhcp-option-force" command (mkinitramfs dhcp client does not ask for root-path, we have to force-feed it).
# uefi pxe

dhcp-boot=tag:uefipxe,uefi/syslinux.efi
dhcp-option-force=tag:fe01,option:root-path,192.168.0.248:/nfsroot/fe01

# VX network 192.168.0.x

dhcp-host=40:a6:b7:c1:d9:c5,fe01,infinite,set:uefipxe,set:fe01
  • apt install syslinux pxelinux syslinux-common syslinux-efi syslinux-utils
mkdir /tftpboot/uefi
cp /usr/lib/SYSLINUX.EFI/efi64/syslinux.efi /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/ldlinux.e64 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/menu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/hdt.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libutil.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libmenu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libcom32.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libgpl.c32 /tftpboot/uefi/
  • try to boot, it should bomb with "cannot load pxelinux.cfg/default"
  • mkdir /tftpboot/uefi/pxelinux.cfg
  • create /tftpboot/uefi/pxelinux.cfg/default, note nfsroot path is hardwired, note "http:" is used to load vmlinuz and initrd files (because tftp is super slow)
default menu.c32
prompt 0

menu title Welcome to the DSDAQGW UEFI PXE boot menu

timeout 50

label vmlinuz-6.5.0-17-generic
  kernel http://192.168.0.248:8088/uefi/vmlinuz-6.5.0-17-generic
  append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto rw ip=dhcp panic=60

# append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60

#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto ip=dhcp rw panic=60

#end
apt install mini-httpd
emacs -nw /etc/default/mini-httpd # set "START=1"
emacs -nw /etc/mini-httpd.conf # set "host=192.168.0.248", "port=8088", "data_dir=/tftpboot"
mkdir /etc/systemd/system/mini-httpd.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/mini-httpd.service.d/local.conf
systemctl enable mini-httpd
systemctl restart mini-httpd
systemctl status mini-httpd
wget http://192.168.0.248:8088/uefi/syslinux.efi
tail -100 /var/log/mini_httpd.log
  • fix U-22 initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
    • emacs -nw /usr/lib/initramfs-tools/etc/dhcp/dhclient-enter-hooks.d/config
    • add "echo ROOTPATH=..." if it is missing
                echo "ROOTSERVER='${new_routers%% *}'" 
                echo "ROOTPATH='$new_root_path'" 
                echo "HOSTNAME='$new_host_name'" 
  • fix U-24 initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
    • emacs -nw /usr/share/initramfs-tools/dhcpcd-hooks/70-net-conf
    • add "ROOTPATH=..." if it is missing
DNSDOMAIN='${new_domain_name-}'                                                                                                                                                
ROOTSERVER='${new_routers-}'                                                                                                                                                   
ROOTPATH='${new_root_path-}'                                                                                                                                                   
filename='${new_filename-}'                                                                                                                                                    
DHCPLEASETIME='${new_dhcp_lease_time-}'                                                                                                                                        
mkinitramfs 6.5.0-18-generic
mkinitramfs 6.8.0-51-generic -o /boot/initrd.img-6.8.0-51-generic
  • copy linux kernel and initrd
cp /boot/vmlinuz-6.5.0-18-generic /tftpboot/uefi/
cp /boot/initrd.img-6.5.0-18-generic /tftpboot/uefi/
chmod a+r /tftpboot/uefi/*
  • try to boot, should bomb with messages about "trying to mount root filesystem"
  • tail /var/log/syslog
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  2
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: error 8 User aborted the transfer received from 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  2
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  5
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/ldlinux.e64 to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/01-40-a6-b7-c1-d9-c5 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006E not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A800 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A80 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/menu.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/libutil.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  2
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 12 hostname  fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw rpc.mountd[3350210]: authenticated mount request from 192.168.0.110:981 for /nfsroot/fe01 (/nfsroot/fe01)
Feb 16 20:45:07 dsdaqgw rpc.mountd[3350210]: authenticated unmount request from 192.168.0.110:859 for /nfsroot/fe01/tmp/autoDY4k5u (/nfsroot/fe01)
  • tail /var/log/mini_httpd.log
192.168.0.110 - - [16/Feb/2024:20:43:15 -0800] "GET /uefi/vmlinuz-6.5.0-17-generic HTTP/1.0" 200 14227944 "" "Syslinux/6.04"
192.168.0.110 - - [16/Feb/2024:20:43:24 -0800] "GET /uefi/initrd.img-6.5.0-17-generic HTTP/1.0" 200 137824833 "" "Syslinux/6.04"

setup efi http boot

https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html

setup linux kernel

  • copy the kernel files
cd /boot
rsync -av config* initrd* System.map* vmlinuz* /tftpboot/
  • cd /tftpboot
  • chmod a+r *

setup nfs

  • apt-get install nfs-kernel-server
  • enable NFS over UDP, edit /etc/nfs.conf add "udp=y":
udp=y
systemctl restart nfs-server.service
  • emacs -nw /etc/exports
/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
  • enable services
systemctl enable nfs-server
systemctl enable nfs-mountd
systemctl enable nfs-idmapd
systemctl restart nfs-server
systemctl restart nfs-mountd
systemctl restart nfs-idmapd
  • after editing /etc/exports, run
exportfs -av

setup userland

!!! ubuntu-18 version !!!

  • zfs create rpool/nfsroot
  • zfs set dedup=verify rpool/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
  • clone ubuntu
mkdir /nfsroot/dsfe01
cd /
rsync -avx . /nfsroot/dsfe01
  • edit config files:
  • cd /nfsroot/dsfe01
  • emacs -nw etc/hostname ### change to dsfe01
  • emacs -nw etc/mailname ### change to dsfe01
  • emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
  • emacs -nw etc/defaultdomain ### change to MUSR-NIS
  • cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
  • emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
  • emacs -nw root/.ssh/authorized_keys ### update root ssh keys
  • emacs -nw etc/fstab ### add this
192.168.1.1:/nfsroot/dsfe01 / nfs defaults,nolock 0 0
  • emacs -nw etc/chrony/chrony.conf
    • comment-out all "pool" and "server" entries
    • add entry "server 192.168.1.1 iburst"

After dsfe01 is booted:

  • disable services:
systemctl disable apache2
systemctl disable dnsmasq
systemctl disable zfs-import-cache

To setup additional machines, clone dsfe01 instead of cloning the gateway machine

Allow manpages to be viewed

If / is mounted over NFS, man will report a permission error. Fix it with:

ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.bin.man

setup shared home directory

on the gateway machine

  • define netgroups
  • emacs -nw /etc/netgroup
dsfe (dsfe01,,) (dsfe02,,)
  • emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
netgroup: files
  • export the home directories:
  • emacs -nw /etc/exports ### add this:
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
  • exportfs -rc

on the frontend machine

  • mkdir /home
  • emacs -nw /etc/fstab ### add this:
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
  • mount -a

setup NAT

NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation

In these examples:

  • replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
  • replace "enp11s0" with name of the private network interface (192.168.1.x network)
  • emacs -nw /etc/rc.local ### add this:
# /etc/rc.local

# enable NAT

/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -L -v

# uncomment following lines if machine has prohibitive FORWARD rules:
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v

iptables -L -v
sysctl -w net.ipv4.ip_forward=1
#sysctl -a | grep forward

sh /etc/firewall-rfc1918.sh

# end
  • emacs -nw /etc/firewall-rfc1918.sh
# firewall-rfc1918.sh

# prevent RFC1918 private network IP addresses from
# going in and out from our uplink.

ETH=eno1

iptables -F in-rfc1918
iptables -N in-rfc1918
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A in-rfc1918 --dst 172.16.0.0/12   -j REJECT
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT

iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH

iptables -F out-rfc1918
iptables -N out-rfc1918
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A out-rfc1918 --dst 172.16.0.0/12   -j REJECT
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT

iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH

iptables -D FORWARD -j out-rfc1918 -o $ETH 
iptables -D FORWARD -j out-rfc1918 -o $ETH 
iptables -I FORWARD -j out-rfc1918 -o $ETH 

# allow TRIUMF-SECURE network

iptables -I in-rfc1918 -s 10.90.0.0/255.255.0.0 -j ACCEPT 
iptables -I out-rfc1918 -d 10.90.0.0/255.255.0.0 -j ACCEPT 

# show configuration

iptables -L -v

#end

KVM

apt install cpu-checker

root@daq13:~# kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
root@daq13:~# 

(if not, shutdown, go into BIOS settings, enable CPU virtualization)

apt install virtinst ### will install many packages
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils

root@daq13:/home1/wheel# virsh list --all
 Id   Name           State
------------------------------
 1    ubuntu-guest   running

apt install virt-manager

virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'

virtual machine will start, boot, etc
to get out of it, CTRL + Shift followed by ]

ssh wheel@daq13
virt-manager

run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop

virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none

virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off

build image

dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
cd /kvm_ladd00/
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
rsync -av . /mnt/tmp/ --delete
umount /mnt/tmp

on the guest, configure network: /etc/rc.local

#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn

# end

virsh commands

virsh list --all

virsh start kvm-el7
virsh console kvm-el7
virsh destroy kvm-el7

virsh install ...
virsh undefine kvm-el7

virsh autostart kvm-ladd00
virsh dominfo kvm-ladd00

virtualize SL6 ladd00

  • on ladd00:
yum install dracut-network
mkinitrd /boot/initramfs-2.6.32-754.35.1.el6.x86_64-netboot.img 2.6.32-754.35.1.el6.x86_64
  • on daq00
zfs create rpool/kvm-ladd00
cd /kvm-ladd00
rsync -avx ladd00:/ . --exclude nfsroot
brctl addbr virbr0
ifconfig virbr0 192.168.1.1
echo /kvm-ladd00 192.168.1.2(rw,no_root_squash,no_all_squash,async,no_subtree_check) >> /etc/exports
exportfs -rv
  • create virtual machine
virt-install --name kvm-ladd00 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --network bridge=virbr0,model=virtio --boot kernel=/kvm-ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm-ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64-netboot.img,kernel_args="root=/dev/nfs ip=192.168.1.2:192.168.1.1:192.168.1.1:255.255.255.0:ladd00::off nfsroot=192.168.1.1:/kvm-ladd00,vers=3,tcp console=ttyS0,115200n8 serial rdshell" --graphics none --nodisks --check path_in_use=off
  • adjust kvm-ladd00 image
disable network manager
edit fstab
edit yp.conf
edit resolv.conf
edit root/.ssh/authorized_keys
enable rngd or /dev/random does not work, sshd does not work
  • virsh shutdown test24
  • virsh --connect qemu:///system start test24
  • virsh console test24 ### to exit, ctrl+[ or ctrl+]
  • virsh undefine test24
  • virsh autostart kvm-ladd00
  • virsh dominfo kvm-ladd00
root@daq00:~# virsh dominfo kvm-ladd00
Id:             1
Name:           kvm-ladd00
UUID:           1d1f8fed-8b65-4411-a51b-e0ecf359d2f1
OS Type:        hvm
State:          running
CPU(s):         2
CPU time:       27.7s
Max memory:     2097152 KiB
Used memory:    2097152 KiB
Persistent:     yes
Autostart:      enable
Managed save:   no
Security model: apparmor
Security DOI:   0
Security label: libvirt-1d1f8fed-8b65-4411-a51b-e0ecf359d2f1 (enforcing)
root@daq00:~# 
  • delete unused images in /var/lib/libvirt/images
  • virsh edit kvm-ladd00 # change boot command line, etc

virtualize CentOS-7 daqstore

  • similar to ladd00 above:
  • on daqstore, install dracut-network, already there
yum install dracut-network
yum install qemu-kvm libvirt libvirt-python libguestfs-tools virt-install
# yum install busybox ### no rpm package?!?
dracut -a nfs -v /boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img 3.10.0-1160.119.1.el7.x86_64 --force
scp daqstore:/boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img /kvm-el7/boot/
  • on daq00
zfs create rpool/kvm-el7
cd /kvm-el7
rsync -avx daqstore:/ .
echo 192.168.1.3 kvm-el7 >> /etc/hosts
systemctl restart dnsmasq
echo /kvm-el 192.168.1.3(rw,no_root_squash,no_all_squash,async,no_subtree_check) >> /etc/exports
exportfs -rv
  • manage virtual machine
virsh console kvm-el7
virsh destroy kvm-el7
virsh undefine kvm-el7
  • create virtual machine
virt-install --name kvm-el7 --os-variant centos7 --vcpus 2 --ram 2048 --import --network bridge=virbr0,model=e1000e --boot kernel=/kvm-el7/boot/vmlinuz-3.10.0-1160.119.1.el7.x86_64,initrd=/kvm-el7/boot/initramfs-3.10.0-1160.119.1.el7.x86_64-virt.img,kernel_args="root=/dev/nfs ip=192.168.1.3:192.168.1.1:192.168.1.1:255.255.255.0:kvm-el7::off nfsroot=192.168.1.1:/kvm-el7,vers=3,tcp rw console=ttyS0,115200n8 serial rdshell" --graphics none --nodisks --check path_in_use=off
  • adjust kvm-el7 image
disable network manager
edit fstab
edit hostname
disable selinux in /etc/sysconfig/selinux

UP TO HERE --- DNS does not work!!!

edit yp.conf
edit resolv.conf
edit root/.ssh/authorized_keys
enable rngd or /dev/random does not work, sshd does not work
  • virsh shutdown test24
  • virsh --connect qemu:///system start test24
  • virsh console test24 ### to exit, ctrl+[ or ctrl+]
  • virsh undefine test24
  • virsh autostart kvm-ladd00
  • virsh dominfo kvm-ladd00
root@daq00:~# virsh dominfo kvm-ladd00
Id:             1
Name:           kvm-ladd00
UUID:           1d1f8fed-8b65-4411-a51b-e0ecf359d2f1
OS Type:        hvm
State:          running
CPU(s):         2
CPU time:       27.7s
Max memory:     2097152 KiB
Used memory:    2097152 KiB
Persistent:     yes
Autostart:      enable
Managed save:   no
Security model: apparmor
Security DOI:   0
Security label: libvirt-1d1f8fed-8b65-4411-a51b-e0ecf359d2f1 (enforcing)
root@daq00:~# 
  • delete unused images in /var/lib/libvirt/images
  • virsh edit kvm-ladd00 # change boot command line, etc

ARM64 cross-compiler

Ubuntu-22

  • arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
  • install packages:
apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
  • run:
aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static
aarch64-linux-gnu-g++-12 -o fecdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 fecdm.o dsdm.o /home/dsdaqdev/packages_common/midas/linux-aarch64-remoteonly/lib/libmidas.a -pthread -lrt -lutil /nfsroot/gdm00/usr/lib/aarch64-linux-gnu/libi2c.a -static

Ubuntu-24

  • arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
  • install packages:
apt install g++-aarch64-linux-gnu
  • build:
aarch64-linux-gnu-g++ -c -o xvcserver_cdm.o -O2 -g -Wall -Wuninitialized -std=c++20 xvcserver_cdm.cxx 
aarch64-linux-gnu-g++ -o xvcserver_cdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 xvcserver_cdm.o -pthread -lrt -lutil -static

ARM cross-compiler

NOTE: updated for U-24

  • armv7 (Cyclone-V SoC, RPi3, MityARM CAMAC) machines (Debian-12 armhf target, Ubuntu 24.04 host)
  • install packages:
apt install g++-arm-linux-gnueabihf gcc-arm-linux-gnueabihf libc6-dev-armhf-cross
  • build MIDAS frontend (static linking)
arm-linux-gnueabihf-g++ -std=c++11 -Wall -Wuninitialized -g -O2 -I/home/dldaq/packages/midas/include -I/home/dldaq/packages/midas/mvodb -c koi2c.cxx
arm-linux-gnueabihf-g++ -o fedldb.exe -std=c++11 -Wall -Wuninitialized -g -O2 -I/home/dldaq/packages/midas/include -I/home/dldaq/packages/midas/mvodb fedldb.o koi2c.o /home/dldaq/packages/midas/linux-armv7-remoteonly/lib/libmidas.a -L/usr/arm-linux-gnueabihf/lib -L/nfsroot/dltdc/usr/lib/arm-linux-gnueabihf -static -lm -lz -lutil -lnsl -lpthread -lrt -li2c
/usr/lib/gcc-cross/arm-linux-gnueabihf/13/../../../../arm-linux-gnueabihf/bin/ld: /home/dldaq/packages/midas/linux-armv7-remoteonly/lib/libmidas.a(system.o): in function `ss_socket_connect_tcp(char const*, int, int*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*)':
/home/dldaq/packages/midas/src/system.cxx:4984:(.text+0x252a): warning: Using 'getaddrinfo' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

32-bit intel cross-compiler

Ubuntu 22.04

apt install libstdc++-11-dev:i386
apt install zlib1g-dev:i386

NOTES:

  • "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
  • to cross-build 32-bit MIDAS, use "make linux32".
  • executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
  • executables cross-build on Ubuntu-22 run on 32-bit Debian-12.

Ubuntu 24.04

apt install gcc-i686-linux-gnu
apt install g++-i686-linux-gnu
apt install libstdc++-13-dev:i386
apt install lib32z1 lib32z1-dev
i686-linux-gnu-gcc -o ttcp.i386 ttcp.c

NOTES:

  • executables cross-build on Ubuntu-24 will NOT run on 32-bit Debian-12 (GLIBC mismatch, static executables maybe work)
  • executables cross-build on Ubuntu-24 run on 32-bit Debian-13

SSH settings for EPICS

  • TRIUMF EPICS runs obsolete version of SSH
  • add this to the use .ssh/config
Host sbp1*
HostKeyAlgorithms +ssh-rsa
PubKeyAcceptedAlgorithms +ssh-rsa
KexAlgorithms +diffie-hellman-group1-sha1
ForwardX11 yes
ForwardX11Trusted yes

changes for VME processors

apt -y remove sysstat man-db
apt -y purge dkms
apt -y purge mdadm
apt -y purge fwupd
apt -y purge packagekit
apt -y purge accountsservice
apt -y purge plocate
apt -y purge upower power-profiles-daemon
apt -y autoremove

for D-12 32-bit CPUs:

apt remove "*libavahi*"

remove snap (U-24)

Note: snap stores data in $USER/snap/$SNAPNAME, removing a snap on one machine will remove this data from all users even if they want to use snap on some other machine.

Prepare:

NOTE: first remove chromium and firefox, see below.

NOTE: if possible, stop autofs before removing snap - otherwise it will mount all user home directories and complain that it cannot remove some snap data from them

systemctl stop autofs
ls -ld /home1/*/snap/* ### remove the per-user snap directories

Remove snaps:

snap list
echo snap remove --purge chromium ### see below
echo snap remove --purge firefox ### see below
snap remove thunderbird
snap remove cups
snap remove hello-world
snap remove firmware-updater
snap remove gtk-common-themes
snap remove snapd-desktop-integration
snap remove snap-store
snap remove hunspell-dictionaries-1-7-2004
snap remove gnome-system-monitor
snap remove gnome-3-26-1604
snap remove gnome-3-28-1804
snap remove gnome-3-34-1804
snap remove gnome-3-38-2004
snap remove gnome-42-2204
snap remove gnome-46-2404
snap remove mesa-2404
snap remove core
snap remove core18
snap remove core20
snap remove core22
snap remove core24
snap remove bare
snap remove snapd
snap list
root@daqubuntu:~# snap list
No snaps are installed yet. Try 'snap install hello-world'.
root@daqubuntu:~# 

Identify packages that install snaps:

apt list | grep snap | grep installed | grep -v -e snappy -e snapshot

Typical output:

firefox/noble,now 1:1snap1-0ubuntu5 amd64 [installed]
gir1.2-snapd-2/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
libsnapd-glib-2-1/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
libsnapd-qt-2-1/noble,now 1.64-0ubuntu5 amd64 [installed,automatic]
plasma-discover-backend-snap/noble,now 5.27.11-0ubuntu2 amd64 [installed]
snapd/noble-updates,now 2.66.1+24.04 amd64 [installed]

Remove packages that install snaps:

apt remove chromium-browser
apt remove chromium-codecs-ffmpeg-extra
apt remove thunderbird
apt remove firefox
apt remove plasma-discover-backend-snap
apt remove plasma-discover-snap-backend
apt remove snapd
apt purge  snapd
# package gir1.2-snapd-2 is required by ubuntu-mate-desktop & co
# libsnapd-glib-2-1 is required by gstreamer, gnome-remote-desktop & co
ls -l /etc/systemd/system/ | grep snap ### remove unwanted stuff

Remove Chromium:

  • ls -ld /home/*/snap/chromium/*
  • echo /bin/rm -rf `ls -1d /home/*/snap/chromium/*`
  • snap remove chromium

Remove Firefox:

  • ls -ld /home/*/snap/firefox/*
  • echo /bin/rm -rf `ls -1d /home/*/snap/firefox/*` ### this will delete "snap firefox" profiles of all users!!!
  • snap remove firefox ### this will also delete "snap firefox" profiles of all users!!!

Remove gir1.2-snapd-2:

  • echo rm -vf /usr/lib/x86_64-linux-gnu/girepository-1.0/Snapd-2.typelib

If "snap remove" is stuck in "change in progress" (this will remove all snaps and break snapd, which is ok, see https://forum.snapcraft.io/t/snap-remove-taking-forever-abort-wasnt-working/48915)

rm /var/lib/snapd/state.json
systemctl restart snapd

Prevent snap from reinstalling:

cd ~/git/scripts/etc
git pull
cp etc-apt-preferencesd-disable-snap /etc/apt/preferences.d/

install non-snap thunderbird

from: https://ubuntuhandbook.org/index.php/2024/03/install-thunderbird-deb-ubuntu-2404/

  • remove snap thunderbird
snap remove --purge thunderbird
apt remove --purge thunderbird
  • add mozilla repository
already done after installing firefox-esr
  • ppa deb and ubuntu snap thunderbird package names are the same, change priority and hide snap package: create /etc/apt/preferences.d/mozillateamppa
Package: thunderbird*
Pin: release o=LP-PPA-mozillateam
Pin-Priority: 1001

Package: thunderbird*
Pin: release o=Ubuntu
Pin-Priority: -1
  • check that it worked, it should say "build" instead of "snap"
apt update
apt list | grep thunderbird | grep -v locale
...
thunderbird/noble 1:128.8.1+build1-0ubuntu0.24.04.1~mt1 amd64
...
  • install
apt install thunderbird
  • run: thunderbird

EFI boot using syslinux

  • rationale 1: GRUB is the stock boot loader with U-24. It is unnecessarily complicated. EFI BIOS can boot the linux kernel directly, without GRUB, but unfortunately a small shim bootloader is required to specify the initrd file and the root filesystem. the syslinux boot loader can do this in a very simple way.
  • rationale 2: GRUB bootloader configuration is overcomplicated, and when it breaks, it is almost impossible to debug and to recover.
  • rationale 3: GRUB bootloader scripts in U-24 have no support for booting from redundant SSDs.
  • in the case of GRUB bootloader failure, it is simplest to boot the Ubuntu installer in recovery mode and convert the bootloader from GRUB to syslinux. Open firefox on this page and cut-and-paste the steps (only copy of vmlinux and initrd cannot copy-and-paste as of this writing).
  • in the case of servers with redundant SSDs for OS and home directories (ZFS mirror), it is simplest to use the syslinux bootloader to ensure that the machine boots from either SSD (all combinations SSD failures, fail of either EFI partiion, fail of either ZFS mirror partition, machine should boot)
  • check partition tables, SATA SSD
fdisk -l /dev/sda
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: A3F34DAC-DCB4-B74C-B59E-41E754807812

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624   5244927   4194304     2G Linux swap
/dev/sda3  5244928   9439231   4194304     2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
  • check partition tables, NVME SSD
fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 1.75 TiB, 1920383410176 bytes, 3750748848 sectors
Disk model: SAMSUNG MZ1L21T9HCLS-00A07              
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 131072 bytes / 131072 bytes
Disklabel type: gpt
Disk identifier: 04ECCD46-DC2A-454C-B4A8-CCC18AA532F7

Device            Start        End    Sectors  Size Type
/dev/nvme0n1p1     2048    2203647    2201600    1G EFI System
/dev/nvme0n1p2  2203648    6397951    4194304    2G Linux filesystem
/dev/nvme0n1p3  6397952   23175167   16777216    8G Linux swap
/dev/nvme0n1p4 23175168 3750746111 3727570944  1.7T Linux filesystem
  • prepare boot device EFI partition, SATA SSDs
mkfs.msdos /dev/sda1
mkfs.msdos /dev/sdb1
mkdir /boot/efi-sda
mkdir /boot/efi-sdb
mount /dev/sda1 /boot/efi-sda
mount /dev/sdb1 /boot/efi-sdb
  • prepare boot device EFI partition, NVME SSDs
mkfs.msdos /dev/nvme0n1p1
mkfs.msdos /dev/nvme1n1p1
mkdir /boot/efi-0
mkdir /boot/efi-1
mount /dev/nvme0n1p1 /boot/efi-0
mount /dev/nvme1n1p1 /boot/efi-1
  • add them to fstab, note the "nofail" mount option
blkid | grep vfat
/dev/sdb1: UUID="F30C-13B5" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="20e423b5-ac29-ec42-bab5-f366aefbbd2b"
/dev/sda1: UUID="F2DD-7321" BLOCK_SIZE="512" TYPE="vfat" PARTUUID="9427646c-ce5f-fe47-9ed1-4b84cf4c348f"
grep ^UUID /etc/fstab
UUID=F2DD-7321  /boot/efi-sda       vfat    umask=0022,fmask=0022,dmask=0022,nofail      0       1
UUID=F30C-13B5  /boot/efi-sdb       vfat    umask=0022,fmask=0022,dmask=0022,nofail      0       1
  • prepare the EFI partitions (remove old previous subdirectories, only empty efi/boot should be there)
cd /boot/efi-sda
mkdir -p efi/boot
  • get syslinux-6.03
cd ~
wget https://daq00.triumf.ca/~olchansk/linux/syslinux-6.03.tar.xz
xz -d < syslinux-6.03.tar.xz | tar xvf -
  • from syslinux-6.03 copy files:
cd /boot/efi-sda/efi/boot
cp ~/syslinux-6.03/efi64/efi/syslinux.efi .
cp ~/syslinux-6.03/efi64/com32/elflink/ldlinux/ldlinux.e64 .
cp syslinux.efi bootx64.efi
  • identify the ZFS rpool label
zfs list | grep ROOT | grep "/$" | cut -f1 -d" "
rpool/ROOT/ubuntu_9yvb17
  • create syslinux.cfg, change the root=ZFS label to match this computer
cat << EOF | sed "s;root=.*$;root=ZFS=`zfs list | grep ROOT | grep "/$" | cut -f1 -d" "`;" > syslinux.cfg
default linux
label linux
kernel vmlinuz
append ro initrd=initrd.img root=ZFS=rpool/ROOT/ubuntu_02ruwj
EOF
  • copy linux boot files:
cp /boot/vmlinuz vmlinuz
cp /boot/initrd.img initrd.img
  • repeat with /boot/efi-sdb, etc
  • install script to set syslinux to boot the latest kernel
cd ~/git/scripts
git pull
ln -s ~/git/scripts/etc/update_efi_syslinux.perl ~
  • update syslinux to boot the latest kernel
    • run "~/update_efi_syslinux.perl" to check that it finds the EFI partitions and finds the correct kernel
    • run "~/update_efi_syslinux.perl -u" to do the actual update
  • or maybe install Ubuntu syslinux 6.04 and use files from there:
apt install "syslinux*"

legacy boot using syslinux

  • NOTE: extlinux is not compatible with ext4 "64bit" feature, it should be turned off:
mkfs.ext4 -O ^64bit /dev/sdX1
resize2fs -s /dev/sdX1
  • install syslinux and extlinux (THIS DOES NOT WORK!!!)
apt -y install syslinux extlinux
dd if=/usr/lib/syslinux/mbr/mbr.bin of=/dev/sdX ### NOT /dev/sdX1 NOT !!!
cd /boot
cp /usr/lib/syslinux/modules/bios/menu.c32 .
extlinux -i .
  • install syslinux and extlinux
  • copy from old SL6 USB disk (this is extlinux 6.02)
root@localhost:/boot# ls -l
-rwxr-xr-x 1 root root   218952 Jan 28 17:40 extlinux
-rw-r--r-- 1 root root      402 Jan 29 14:45 extlinux.conf
-rw-r--r-- 1 root root      496 Jan 29 14:39 extlinux.conf~
-r--r--r-- 1 root root   122044 Jan 29 14:39 ldlinux.c32
-r--r--r-- 1 root root    67072 Jan 29 14:39 ldlinux.sys
-rwxr-xr-x 1 root root    24156 Jan 28 17:40 libutil.c32
-rw-r--r-- 1 root root      304 Jan 28 17:40 mbr.bin
-rw-r--r-- 1 root root    26140 Jan 28 17:40 memdisk
-rw-r--r-- 1 root root    69043 Jan 28 17:40 memtest86+-4.20.iso.zip
-rw-r--r-- 1 root root   183012 Jan 28 17:40 memtest86+-5.01
-rw-r--r-- 1 root root    26568 Jan 28 17:40 menu.c32
  • install
dd if=mbr.bin of=/dev/sdX ### NOT /dev/sdX1 NOT !!!
extlinux -i .
  • check that partition /dev/sdX1 is marked bootable (fdisk command "a")
  • create /boot/extlinux.conf
DEFAULT menu.c32
PROMPT 0
TIMEOUT 50

MENU TITLE TRIUMF DAQ USB BOOT32 ver K.O. 2025jan28

LABEL linux
  MENU DEFAULT
  kernel /vmlinuz
  append initrd=/initrd.img panic=60 rootdelay=5 rootwait rw root=/dev/sda1

LABEL linux-6.1.0-28-686
  kernel vmlinuz-6.1.0-28-686
  append initrd=initrd.img-6.1.0-28-686 panic=60 rootdelay=5 rootwait rw root=/dev/sda1

LABEL memtest
  kernel memtest86+-1.65

Add user to login spash screen

For user login to local machine, if it doesn't work by default. User should exist already from NIS service.

sudo /bin/bash # as root
cd /var/lib/AccountsService/users
ls # should at least display the "wheel" user
cp wheel username # we're going to copy the user account settings over to our new user