Ubuntu: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
 
(49 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= About Ubuntu =
= Prerequisites =
 
AAA
 


* before setting up new machine run memory test
* prepare flash drive with free version of memtest86: https://www.memtest86.com
* test boot from flash drive, test takes ~ few hours
* test will end with summary page, if passed continue with Ubuntu
* number that might be worth noting is memory latency


= Ubuntu version =
= Ubuntu version =
Line 14: Line 16:
= Ubuntu installer =
= Ubuntu installer =


* updated for Ububtu LTS 20.04.01, 22.04.1
* updated for Ububtu LTS 20.04.01, 22.04.1, 24.04 (only minor differences)


* download the latest Ubuntu LTS desktop installer iso image
* download the latest Ubuntu LTS desktop installer iso image
Line 31: Line 33:
* "where are you?" - select "Vancouver" (PST time zone)
* "where are you?" - select "Vancouver" (PST time zone)
* "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
* "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
* don't install third party sw
* installation runs in a few minutes, when finished, reboot
* installation runs in a few minutes, when finished, reboot
* login as user wheel
* login as user wheel
Line 65: Line 68:
git pull
git pull
</pre>
</pre>
* if needed, update git/scripts repository from ladd00 to daq00:
* git remote -v ### if it says daq00, we are good
* git remote set-url origin https://daq00.triumf.ca/~olchansk/git/scripts.git
* git pull ### check that it works


== configure hostname ==
== configure hostname ==
Line 120: Line 128:
NOTE1: if time1, time2, time3 are already listed in /etc/crony/chrony.conf, please remove them and restart chrony.
NOTE1: if time1, time2, time3 are already listed in /etc/crony/chrony.conf, please remove them and restart chrony.


NOTE2: if time1, time2, time3 are not listed in "chronyc tracking" or if they are not selected by "chronyc tracking", check that /etc/crony/chrony.conf contains "sourcedir /etc/chrony/sources.d". old versions of this file may not have it.
NOTE2: if time1, time2, time3 are not listed in "chronyc tracking" or if they are not selected by "chronyc tracking", check that /etc/crony/chrony.conf contains "sourcedir /etc/chrony/sources.d", see NOTE4.


NOTE3: read https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server
NOTE3: read https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server
NOTE4: update very old chrony config file, remove chrony, than install it from scratch as above
<pre>
grep sourcedir /etc/chrony/chrony.conf ### if we have it, we are good
apt remove chrony
apt purge chrony
</pre>


== reenable systemd-timesyncd ==
== reenable systemd-timesyncd ==
Line 222: Line 237:


<pre>
<pre>
yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools sysstat smartmontools lm-sensors traceroute time minicom screen git lsof debsums tmux iptables
yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools traceroute time minicom screen git lsof debsums tmux iptables telnet
yes | apt -y install sysstat smartmontools lm-sensors
yes | apt -y install lsb-release
yes | apt -y install lsb-release
apt -y install vim # in addition to default vim-tiny, requested by IRIS
apt -y install vim # in addition to default vim-tiny, requested by IRIS
apt -y install tcl
apt -y install pax rpm alien ### package converter tools
yes | apt -y install flex bison
yes | apt -y install flex bison
yes | apt -y install neofetch
yes | apt -y install neofetch
Line 255: Line 273:
apt -y install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev libxft-dev libxext-dev python3 libssl-dev libafterimage0 # from https://root.cern/install/dependencies/
apt -y install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev libxft-dev libxext-dev python3 libssl-dev libafterimage0 # from https://root.cern/install/dependencies/
apt -y install gfortran libpcre3-dev xlibmesa-glu-dev libglew-dev libftgl-dev libmysqlclient-dev libfftw3-dev libcfitsio-dev graphviz-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl0-dev qtwebengine5-dev nlohmann-json3-dev libtbb-dev libavahi-compat-libdnssd-dev # from https://root.cern/install/dependencies/
apt -y install gfortran libpcre3-dev xlibmesa-glu-dev libglew-dev libftgl-dev libmysqlclient-dev libfftw3-dev libcfitsio-dev graphviz-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl0-dev qtwebengine5-dev nlohmann-json3-dev libtbb-dev libavahi-compat-libdnssd-dev # from https://root.cern/install/dependencies/
apt -y install libvdt-dev # for ROOT 6.32 on Ubuntu-24
apt -y install u-boot-tools # for Xilinx petalinux
#apt -y install linux-headers-generic # to build linux kernel drivers
</pre>
</pre>


Line 265: Line 286:
<pre>
<pre>
apt -y install linux-generic-hwe-22.04 # enable linux 6.2.0 series kernel
apt -y install linux-generic-hwe-22.04 # enable linux 6.2.0 series kernel
</pre>
Ubuntu LTS 24.04:
<pre>
apt -y install linux-generic-hwe-24.04 # enable linux 6.8.0 series kernel
</pre>
</pre>


Line 300: Line 326:
make install
make install
./ganglia-all.perl
./ganglia-all.perl
</pre>
fix gmond start before network is ready:
<pre>
mkdir /etc/systemd/system/ganglia-monitor.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ganglia-monitor.service.d/local.conf
systemctl daemon-reload
systemctl cat ganglia-monitor.service
</pre>
</pre>


Line 312: Line 347:
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
cd gonodeinfo
git remote set-url origin https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
git pull
git pull
make
make
Line 453: Line 489:
physica
physica
@rangauss.pcm
@rangauss.pcm
</pre>
== install wine ==
As far as I know, only needed for BNMR/BNQR
<pre>
apt install wine winetricks
</pre>
</pre>


Line 496: Line 540:


== Install x2go ==
== Install x2go ==
KO - is this still needed? does it cause any security problems?
x2go instructions, thanks to Art O.


<pre>
<pre>
add-apt-repository ppa:x2go/stable
apt-get update
apt-get update
apt-get install x2goserver x2goserver-xsession
apt-get install x2goserver x2goserver-xsession
Line 644: Line 683:
This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.
This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.


= enable NIS (ubuntu 22.04, debian 11) =
= enable NIS (ubuntu 22.04, 24.04, debian 11, 12) =


<pre>
<pre>
Line 678: Line 717:
netgroup: files nis
netgroup: files nis
# end get data from nis
# end get data from nis
#passwd: ...
#group: ...
#shadow: ...
#netgroup: ...
#automount: ...
</pre>
</pre>


Line 685: Line 731:
mkdir ~root/git
mkdir ~root/git
cd ~root/git
cd ~root/git
git clone http://daq00.triumf.ca/~olchansk/git/scripts.git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
cd ~/git/scripts/etc
git pull
git pull
Line 969: Line 1,015:


<pre>
<pre>
apt remove bash-completion # broken, adds unwanted "\" if "ls -l $ROOTSYS/<tab>"
apt remove zsys # broken, do not use
apt remove zsys # broken, do not use
apt remove sddm # login manager
apt remove sddm # login manager
Line 1,172: Line 1,219:
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl',
try this: pip install requests==2.6.0
try this: pip install requests==2.6.0
== generate self-signed certificate ==
<pre>
root@alphacpc05:~# openssl req  -nodes -new -x509  -keyout server.key -out server.cert -days 1001
...+....+..+..........+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..+...+.........+......+.+...+...+.....+...............+.........+...+.+......+...+...........+....+...+..+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+.+...+..+.......+..+...+.......+......+...+..+...+......+....+...............+..+...+....+...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
......+......+.+..+......+.+......+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.+.....+......+.+.........+......+.....+.+..+...+.......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.......+....+......+.....+...+...+.......+..+.+........+.+...+......+..+..........+..+.+...........+...+.......+......+.....+.......+...+.........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:CH
State or Province Name (full name) [Some-State]:Geneve
Locality Name (eg, city) []:CERN
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CERN
Organizational Unit Name (eg, section) []:ALPHA experiment         
Common Name (e.g. server FQDN or YOUR name) []:alphacpc05.cern.ch
Email Address []:
root@alphacpc05:~#
root@alphacpc05:~#
root@alphacpc05:~# ls -l
-rw-r--r-- 1 root root 1375 juil. 10 21:43 server.cert
-rw------- 1 root root 1708 juil. 10 21:42 server.key
root@alphacpc05:~# systemctl restart apache2
</pre>


= Enable elog PDF preview =
= Enable elog PDF preview =
Line 1,248: Line 1,324:
systemctl stop cups
systemctl stop cups
systemctl disable cups
systemctl disable cups
systemctl stop snap.cups.cupsd.service
systemctl stop snap.cups.cups-browsed.service
systemctl disable snap.cups.cupsd.service
systemctl disable snap.cups.cups-browsed.service
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a
lpstat -a
Line 1,283: Line 1,363:


= Disable Ubuntu Pro nag =
= Disable Ubuntu Pro nag =
best I can tell, impossible at this time.
== do not do this ==
!!! does nothing !!!
<pre>
pro config set apt_news=false
</pre>
== do not do this ==
!!! breaks automatic updates because 20apt-esm-hook.conf is missing !!!


If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:
If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:
Line 1,289: Line 1,383:
</pre>
</pre>


= Update packages =
== do not do this ==
 
!!! likely same as above, breaks automatic updates !!!


* apt-get update # update package list
* comment out /etc/apt/apt.conf.d/20apt-esm-hook.conf
 
== do not do this ==
 
!!! removes too many packages !!!
 
<pre>
apt remove ubuntu-pro-client
</pre>
 
= Update packages =
 
* apt-get update # update package list
* apt-get dist-upgrade # install updated packages and update "kept back" packages
* apt-get dist-upgrade # install updated packages and update "kept back" packages
* apt-get autoremove # remove packages that apt thinks should be removed
* apt-get autoremove # remove packages that apt thinks should be removed
Line 1,306: Line 1,414:
= Update to new version of Ubuntu =
= Update to new version of Ubuntu =


<pre>
* run "do-release-upgrade -c"
vi /etc/update-manager/release-upgrades # set "Prompt=normal"
* if it does not report new release Ubuntu 24, check /etc/update-manager/release-upgrades has "Prompt=lts"
do-release-upgrade
</pre>


Update Ubuntu LTS 20.04 to LTS 22.04:
== Update Ubuntu LTS 20.04 to LTS 22.04 ==


<pre>
<pre>
Line 1,317: Line 1,423:
</pre>
</pre>


== daqubuntu ==
=== daqubuntu ===


<pre>
<pre>
Line 1,355: Line 1,461:
</pre>
</pre>


== midm9a ==
=== midm9a ===


<pre>
<pre>
Line 1,371: Line 1,477:
</pre>
</pre>


== daq17 ==
=== daq17 ===


<pre>
<pre>
Line 1,385: Line 1,491:
</pre>
</pre>


== daq00 ==
=== daq00 ===


per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/
per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/
Line 1,395: Line 1,501:
if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.
if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.


== isdaq08 ==
=== isdaq08 ===


* prepare
* prepare
Line 1,438: Line 1,544:
* reboot
* reboot


= Upgrade to new version of Debian =
== upgrade U-22 to U-24 ==


https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html
=== daqubuntu, U-24 ===


== 32-bit VME processor Debian 11 to 12 ==
* prepare
 
* cd git/scripts; git pull; cd ~
* apt update
* apt upgrade
* edit /etc/apt/sources.list
<pre>
<pre>
deb http://deb.debian.org/debian/ bookworm main
cd ~/git/scripts
#deb http://deb.debian.org/debian/ bullseye main
git pull
#deb-src http://deb.debian.org/debian/ bullseye main
cd ~
apt -y install debsums
</pre>
</pre>
* apt update
* check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
* apt upgrade --without-new-pkgs
<pre>
* apt full-upgrade
root@daqubuntu:~# debsums -ce
* apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
/etc/ganglia/gmond.conf
* reboot
debsums: missing file /etc/init.d/nis (from nis package)
 
/etc/default/nis
= Ubuntu package manager =
/etc/ypserv.conf
/etc/ypserv.securenets
/var/yp/Makefile
/etc/update-manager/release-upgrades
/etc/apt/apt.conf.d/10periodic
/etc/yp.conf
root@daqubuntu:~#
* restore original /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
</pre>
* apt remove ganglia-monitor
* apt remove nis
* apt autoremove
* restore original release-upgrades: "Prompt: lts"
* "debsums -ce" is now empty
 
Check for upgrade:
 
<pre>
root@daqubuntu:~# do-release-upgrade -c
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release
set Prompt=normal in /etc/update-manager/release-upgrades.
root@daqubuntu:~#
</pre>
 
Run the upgrade:
 
* do-release-upgrade -f DistUpgradeViewNonInteractive
 
Post upgrade:
 
* configure DNS
* apt -y install linux-generic-hwe-22.04
* /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
* /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
* install missing packages
* restore ganglia
* restore nis
* check zpool status, may need zpool upgrade
* reboot
 
=== daq14, U-20-22-24 ===
 
* apt update, apt upgrade
* apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 ### install kernel 5.15
* shutdown -r now
* stuck waiting for daq14 to shutdown...
* reboot into kernel 5.15
* ???
<pre>
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
</pre>
* debsums -ce
<pre>
/etc/apache2/ports.conf
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/sudoers
</pre>
* apache2 restore original ports.conf, uncomment "Listen 80"
* cp -pv /etc/dnsmasq.conf.dpkg-dist /etc/dnsmasq.conf
* apt remove ganglia-monitor
* edit /etc/yp.conf, remove everything after "# ypserver ypserver.network.com"
* "debsums -ce" is now empty
* do-release-upgrade -f DistUpgradeViewNonInteractive
* runs for a long time
* stuck on "/etc/default/nis", type "Y", press enter, nothing for a bit, then resumes running
* finished
* configure DNS
* reboot
* have kernel 6.8
* apt update; apt upgrade
* apt upgrade guile-2.2-libs ### would not auto-update, "kept back", has to be done by hand
* apt autoremove
* debsums -ce
<pre>
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
</pre>
* diff /etc/default/nis.dpkg-dist  /etc/default/nis
* cp -pv /etc/default/nis.dpkg-dist  /etc/default/nis
* debsums -ce
<pre>
debsums: missing file /etc/init.d/nis (from nis package)
</pre>
* we ignore this and run the update
* do-release-upgrade -c
<pre>
Checking for a new Ubuntu release
New release '24.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
</pre>
* do-release-upgrade -f DistUpgradeViewNonInteractive
* bombs out without any error messages
* in /var/log/dist-upgrade/main.log reports "Failed to find a replacement for xapp" and other packages
* apt remove xapp usrmerge ureadahead thunderbird-gnome-support
* no go, complains about even more packages.
* apt list | grep installed | grep -v jammy ### show packages installed from non-ubuntu sources
* remove all packages marked "install,local" ### ubuntu updater does not know where they came from and so cannot update them.
* apt remove desktop-base ### not happy about this package in /var/log/dist-upgrade/apt.log
* apt autoremove
* do-release-upgrade -f DistUpgradeViewNonInteractive
* running for a long time...
 
=== alpha04 U-20-24 ===
 
* apt update, apt upgrade, apt autoremove
* reboot into latest kernel (already done)
* debsums -ce
<pre>
root@alpha04:~# debsums -ce
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/default/nis
/etc/yp.conf
root@alpha04:~#
</pre>
* move /etc/dnsmasq.conf to /etc/dnsmasq.d/alpha04.conf
* apt remove dnsmasq
* apt remove ganglia-monitor
* apt remove nis
* apt autoremove
* debsums -ce ### is now empty
* do-release-upgrade -f DistUpgradeViewNonInteractive
* it runs for a long time...
* complained about /etc/fwupd config files, not sure why...
* finished
* apt update, apt upgrade, apt autoremove
* restore dnsmasq: apt install dnsmasq, systemctl status dnsmasq
* restore ganglia, per instructions
* restore NIS: apt -y install rpcbind nis, ypwhich, ypwhich -m
* zpool upgrade rpool ### also upgrade any other zfs pools, see zpool status
* remove unwanted packages, per instructions
* run gonodeinfo
* reboot
* done
 
= Upgrade to new version of Debian =
 
https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html
 
== 32-bit VME processor Debian 11 to 12 ==
 
* cd git/scripts; git pull; cd ~
* apt update
* apt upgrade
* edit /etc/apt/sources.list
<pre>
deb http://deb.debian.org/debian/ bookworm main
#deb http://deb.debian.org/debian/ bullseye main
#deb-src http://deb.debian.org/debian/ bullseye main
</pre>
* apt update
* apt upgrade --without-new-pkgs
* apt full-upgrade
* apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
* reboot
 
= Ubuntu package manager =
 
* apt-get install xxx # install package xxx
* apt-get update
* apt-get upgrade
* apt-get dist-upgrade
* apt-get autoremove # remove automatically installed packages required by a removed package
* apt-get remove xxx # remove package xxx
* apt-cache search . # list all available packages
* apt-cache show "." | grep ^Package # list al available packages
* apt-cache madison root-system # show all available versions of package root-system
* apt list # list all installed packages
* dpkg --listfiles libpng16-16 # list all files from this package
* apt list --installed # list all installed packages
* dpkg -S /bin/bash # what package provides this file?
* dpkg -L bash # what files provided by this package?
* debsums -ce # show modified config files
* apt-config dump # show apt configuration
 
= Ubuntu zsys =
 
NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230
 
* manual removal of old snapshots
<pre>
zsysctl show
zsysctl state remove xy69ye -s
zsysctl state remove xy69ye
zsysctl state remove xy69ye -u wheel
</pre>
* apt remove zsys
 
NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots
 
* manages system snapshots
* documentation: https://github.com/ubuntu/zsys
* documentation: (go to next article via link "newer" at the bottom) https://didrocks.fr/2020/05/21/zfs-focus-on-ubuntu-20.04-lts-whats-new/
* ubuntu 20.04 bug, too many snapshots cause /boot to become full and updates fail. https://github.com/ubuntu/zsys/issues/155
* solution: use custom /etc/zsys.conf, limit number of snapshots to 10, see trinatdaq:/etc/zsys.conf
* zsys commands:
<pre>
update-grub # list of all snapshots, errors if some snapshots are broken
zsysctl state remove lnc0k7 --system # remove snapshot
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
zsysctl show # show snapshots
</pre>
 
= Ubuntu cloning =
 
to clone a ubuntu image:
 
<pre>
cd /nfsroot/lxcpet
emacs -nw etc/hostname ### change hostname
emacs -nw etc/mailname ### change hostname (debian 11)
emacs -nw etc/defaultdomain ### change the NIS domainname
emacs -nw etc/yp.conf ### change the NIS server
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
emacs -nw root/.ssh/authorized_keys ### update root ssh keys
</pre>
 
= Ubuntu boot loader =
 
== maintenance commands ==
 
* update-initramfs -v -u
* grub-install /dev/sda
 
= Convert from single to dual mirrored ZFS SSD =
 
Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will
add a second SSD, configure ZFS to use both SSDs in mirrored
configuration and setup grub to boot from either SSD. This
is intended to create a full redundant system where failure
of either SSD does not break the system.


* apt-get install xxx # install package xxx
* identify first SSD
* apt-get update
* apt-get upgrade
* apt-get dist-upgrade
* apt-get autoremove # remove automatically installed packages required by a removed package
* apt-get remove xxx # remove package xxx
* apt-cache search . # list all available packages
* apt-cache show "." | grep ^Package # list al available packages
* apt-cache madison root-system # show all available versions of package root-system
* apt list # list all installed packages
* dpkg --listfiles libpng16-16 # list all files from this package
* apt list --installed # list all installed packages
* dpkg -S /bin/bash # what package provides this file?
* dpkg -L bash # what files provided by this package?
* debsums -ce # show modified config files
* apt-config dump # show apt configuration
 
= Ubuntu zsys =
 
NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230
 
* manual removal of old snapshots
<pre>
<pre>
zsysctl show
root@midm9b:~# ./smart-status.perl
zsysctl state remove xy69ye -s
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER Errors    Link
zsysctl state remove xy69ye
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .      6.0
zsysctl state remove xy69ye -u wheel
root@midm9b:~#
</pre>
</pre>
* apt remove zsys
* connect second SSD of identical size
 
<pre>
NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots
root@midm9b:~# ./smart-status.perl
 
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER  Errors    Link
* manages system snapshots
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .     6.0
* documentation: https://github.com/ubuntu/zsys
    /dev/sdb  WD Blue SA510 2.5 250GB        22243Z803852              25        .       ?        ?        .       ?        .     6.0
* documentation: (go to next article via link "newer" at the bottom) https://didrocks.fr/2020/05/21/zfs-focus-on-ubuntu-20.04-lts-whats-new/
root@midm9b:~#
* ubuntu 20.04 bug, too many snapshots cause /boot to become full and updates fail. https://github.com/ubuntu/zsys/issues/155
</pre>
* solution: use custom /etc/zsys.conf, limit number of snapshots to 10, see trinatdaq:/etc/zsys.conf
* if second SSD is not autodetected, reboot
* zsys commands:
* Clone partition table automatically
If both SSDs are identical size, use this simpler method of duplicating the partition table:
<pre>
<pre>
update-grub # list of all snapshots, errors if some snapshots are broken
root@midm9b:~# sfdisk -d /dev/sda > part_table
zsysctl state remove lnc0k7 --system # remove snapshot
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
zsysctl show # show snapshots
</pre>
</pre>
The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.


= Ubuntu cloning =
* Clone partition table manually (e.g. for different size disks)
 
* list partition table of first SSD:
to clone a ubuntu image:
 
<pre>
<pre>
cd /nfsroot/lxcpet
root@midm9b:~# fdisk -l /dev/sda
emacs -nw etc/hostname ### change hostname
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
emacs -nw etc/mailname ### change hostname (debian 11)
Disk model: WD Blue SA510 2.
emacs -nw etc/defaultdomain ### change the NIS domainname
Units: sectors of 1 * 512 = 512 bytes
emacs -nw etc/yp.conf ### change the NIS server
Sector size (logical/physical): 512 bytes / 512 bytes
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
I/O size (minimum/optimal): 512 bytes / 512 bytes
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
Disklabel type: gpt
emacs -nw root/.ssh/authorized_keys ### update root ssh keys
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E
 
Device      Start      End  Sectors  Size Type
/dev/sda1    2048  1050623  1048576  512M EFI System
/dev/sda2  1050624  5244927  4194304    2G Linux swap
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#  
</pre>
</pre>
* create identical partitions on second SSD, use sector numbers from above.
<pre>
root@midm9b:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 1.0.8


= Ubuntu boot loader =
Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present


== maintenance commands ==
Creating new GPT entries in memory.


* update-initramfs -v -u
Command (? for help): n
* grub-install /dev/sda
Partition number (1-128, default 1):
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'


= Convert from single to dual mirrored ZFS SSD =
Command (? for help): n
 
Partition number (2-128, default 2):
Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}:
add a second SSD, configure ZFS to use both SSDs in mirrored
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
configuration and setup grub to boot from either SSD. This
Current type is 8300 (Linux filesystem)
is intended to create a full redundant system where failure
Hex code or GUID (L to show codes, Enter = 8300): 8200
of either SSD does not break the system.
Changed type of partition to 'Linux swap'


* identify first SSD
Command (? for help): n
<pre>
Partition number (3-128, default 3):  
root@midm9b:~# ./smart-status.perl
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}:  
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER Errors    Link
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?       .        ?        .      6.0
Current type is 8300 (Linux filesystem)
root@midm9b:~#
Hex code or GUID (L to show codes, Enter = 8300): be00
</pre>
Changed type of partition to 'Solaris boot'
* connect second SSD of identical size
<pre>
root@midm9b:~# ./smart-status.perl
        Disk                    model              serial    temperature  realloc  pending  uncorr  CRC err    RRER  Errors    Link
    /dev/sda  WD Blue SA510 2.5 250GB        22243Z803769              24        .        ?        ?        .        ?        .      6.0
    /dev/sdb  WD Blue SA510 2.5 250GB        22243Z803852              25        .        ?        ?        .        ?        .      6.0
root@midm9b:~#
</pre>
* if second SSD is not autodetected, reboot
* Clone partition table automatically
If both SSDs are identical size, use this simpler method of duplicating the partition table:
<pre>
root@midm9b:~# sfdisk -d /dev/sda > part_table
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb
</pre>
The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.


* Clone partition table manually (e.g. for different size disks)
Command (? for help): n
* list partition table of first SSD:
Partition number (4-128, default 4):
<pre>
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}:
root@midm9b:~# fdisk -l /dev/sda
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}:
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): bf00
Changed type of partition to 'Solaris root'
 
Command (? for help): w
 
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
 
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdb.
The operation has completed successfully.
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Disk model: WD Blue SA510 2.
Line 1,578: Line 1,916:
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603
Device      Start      End  Sectors  Size Type
/dev/sdb1    2048  1050623  1048576  512M EFI System
/dev/sdb2  1050624  5244927  4194304    2G Linux swap
/dev/sdb3  5244928  9439231  4194304    2G Solaris boot
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#  
root@midm9b:~#  
</pre>
</pre>
* create identical partitions on second SSD, use sector numbers from above.
* identify second SSD partitions
<pre>
<pre>
root@midm9b:~# gdisk /dev/sdb
root@midm9b:~# ls -l /dev/disk/by-id/ata*part3
GPT fdisk (gdisk) version 1.0.8
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 -> ../../sdb3
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
</pre>
* convert bpool from single disk to mirrored disk:
<pre>
root@midm9b:~# zpool status
  pool: bpool
state: ONLINE
config:
 
NAME                                    STATE    READ WRITE CKSUM
bpool                                  ONLINE      0    0    0
  99e03dc0-7d4d-f24b-8fa1-f042b9f135db  ONLINE      0    0    0


Partition table scan:
errors: No known data errors
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present


Creating new GPT entries in memory.
  pool: rpool
state: ONLINE
config:


Command (? for help): n
NAME                                    STATE    READ WRITE CKSUM
Partition number (1-128, default 1):
rpool                                  ONLINE      0    0    0
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}:
  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE      0    0    0
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'


Command (? for help): n
errors: No known data errors
Partition number (2-128, default 2):
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}:  
root@midm9b:~# zpool status bpool
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
  pool: bpool
Current type is 8300 (Linux filesystem)
state: ONLINE
Hex code or GUID (L to show codes, Enter = 8300): 8200
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
Changed type of partition to 'Linux swap'
config:


Command (? for help): n
NAME                                                STATE    READ WRITE CKSUM
Partition number (3-128, default 3):
bpool                                              ONLINE      0    0    0
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}:
  mirror-0                                          ONLINE      0    0    0
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0    0
Current type is 8300 (Linux filesystem)
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0    0    0
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'


Command (? for help): n
errors: No known data errors
Partition number (4-128, default 4):  
</pre>
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}:  
* convert rpool
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}:  
<pre>
Current type is 8300 (Linux filesystem)
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
Hex code or GUID (L to show codes, Enter = 8300): bf00
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
Changed type of partition to 'Solaris root'
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
 
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
Command (? for help): w
root@midm9b:~# zpool status rpool
  pool: rpool
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
0B resilvered, 0.03% done, no estimated completion time
config:


Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
NAME                                                STATE    READ WRITE CKSUM
PARTITIONS!!
rpool                                              ONLINE      0    0    0
  mirror-0                                          ONLINE      0    0    0
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0


Do you want to proceed? (Y/N): y
errors: No known data errors
OK; writing new GUID partition table (GPT) to /dev/sdb.
root@midm9b:~#
The operation has completed successfully.
</pre>
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
* wait for resilver to complete
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
<pre>
Disk model: WD Blue SA510 2.
root@midm9b:~# zpool status
Units: sectors of 1 * 512 = 512 bytes
  pool: bpool
Sector size (logical/physical): 512 bytes / 512 bytes
state: ONLINE
I/O size (minimum/optimal): 512 bytes / 512 bytes
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
Disklabel type: gpt
config:
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E


Device       Start       End  Sectors  Size Type
NAME                                                STATE    READ WRITE CKSUM
/dev/sda1     2048  1050623  1048576  512M EFI System
bpool                                              ONLINE       0    0    0
/dev/sda2 1050624  5244927  4194304     2G Linux swap
  mirror-0                                          ONLINE       0    0    0
/dev/sda3  5244928  9439231  4194304    2G Solaris boot
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0     0
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 ONLINE      0    0     0
 
errors: No known data errors


  pool: rpool
state: ONLINE
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
config:


Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
NAME                                                STATE    READ WRITE CKSUM
Disk model: WD Blue SA510 2.
rpool                                              ONLINE      0    0    0
Units: sectors of 1 * 512 = 512 bytes
  mirror-0                                          ONLINE      0    0    0
Sector size (logical/physical): 512 bytes / 512 bytes
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
I/O size (minimum/optimal): 512 bytes / 512 bytes
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603


Device      Start      End  Sectors  Size Type
errors: No known data errors
/dev/sdb1    2048  1050623  1048576  512M EFI System
/dev/sdb2  1050624  5244927  4194304    2G Linux swap
/dev/sdb3  5244928  9439231  4194304    2G Solaris boot
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~#
</pre>
</pre>
* identify second SSD partitions
* enable booting from second SSD: (instead of /dev/sda1, /dev/sdb1, use UUID=xxx)
<pre>
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part3
root@midm9b:~# mkfs.msdos /dev/sdb1
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part3 -> ../../sda3
root@midm9b:~# mkdir /boot/efi-sda
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 -> ../../sdb3
root@midm9b:~# mkdir /boot/efi-sdb
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
root@midm20c:~# blkid | grep vfat ### identify UUID
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
/dev/sdb1: UUID="DD89-5081" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="d0cb6be4-2f67-5b42-9b26-9e6905e9f774"
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
/dev/sdc1: UUID="D970-86BA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="e6d3b5b9-a512-44a2-9205-1a4db06ed2a2"
/dev/sda1: UUID="DDA1-044C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6dc9dff0-1c13-8045-a906-7803d3074c70"
root@midm20c:~# cat /etc/fstab | grep vfat ### add mount points with correct UUID
#UUID=D970-86BA  /boot/efi      vfat    umask=0022,fmask=0022,dmask=0022      0      1
UUID=DDA1-044C  /boot/efi-sda      vfat    umask=0022,fmask=0022,dmask=0022      0      1
UUID=DD89-5081  /boot/efi-sdb      vfat    umask=0022,fmask=0022,dmask=0022      0      1
root@midm9b:~# mount -a
root@midm9b:~# df -kl
Filesystem                                      1K-blocks    Used Available Use% Mounted on
...
/dev/sda1                                          523244  13720    509524  3% /boot/efi
/dev/sdb1                                          523244      4    523240  1% /boot/efi-sdb
...
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
sending incremental file list
EFI/
...
root@midm9b:~# ls -l /boot/efi-sda
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# ls -l /boot/efi-sdb
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~#
</pre>
</pre>
* convert bpool from single disk to mirrored disk:
* setup script to update grub on second SSD, it must be run manually after every kernel update
<pre>
<pre>
root@midm9b:~# zpool status
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
  pool: bpool
root@midm9b:~# ~/update_efi_grub.perl -u
state: ONLINE
EFI dir: /boot/efi-sda
config:
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
building file list ... done


NAME                                    STATE    READ WRITE CKSUM
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
bpool                                  ONLINE      0    0    0
total size is 7,944,644  speedup is 1,492.23
  99e03dc0-7d4d-f24b-8fa1-f042b9f135db ONLINE      0    0    0
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/ /boot/efi-sda/EFI
building file list ... done


errors: No known data errors
sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
EFI dir: /boot/efi-sdb
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
building file list ... done


  pool: rpool
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
  state: ONLINE
total size is 7,944,644 speedup is 1,492.23
config:
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sdb/EFI
building file list ... done


NAME                                    STATE    READ WRITE CKSUM
sent 216 bytes  received 11 bytes  454.00 bytes/sec
rpool                                  ONLINE      0    0    0
total size is 5,452,378  speedup is 24,019.29
  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE      0    0    0
root@midm9b:~#
</pre>


errors: No known data errors
= Disable NetworkManager =
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
root@midm9b:~# zpool status bpool
  pool: bpool
state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:


NAME                                                STATE    READ WRITE CKSUM
NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04
bpool                                              ONLINE      0    0    0
 
  mirror-0                                          ONLINE      0    0    0
NetworkManager is useful for configuring dynamic
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0    0
network interfaces, i.e. laptops that often move
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0    0    0
between networks, or connect to multiple choice
of wifi networks, etc.
 
For machines with statically configured network interfaces,
NetworkManager is not necessary.
 
As it has been observed to become confused and observed
to malfunction when network links go up and down (it keeps
unnecessarily reconfiguring the ip address, etc), it can
be usefuil to disable it.


errors: No known data errors
* list all network interfaces
<pre>
# /bin/ls -1 /sys/class/net/
enp0s31f6
lo
</pre>
</pre>
* convert rpool
* edit /etc/network/interfaces:
<pre>
<pre>
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
rename enp0s31f6=eth0
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
auto eth0
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
iface eth0 inet static
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
  address 142.90.120.94/19
root@midm9b:~# zpool status rpool
  gateway 142.90.100.18
  pool: rpool
</pre>
state: ONLINE
* statically configure systemd-resolved
status: One or more devices is currently being resilvered. The pool will
** create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
continue to function, possibly in a degraded state.
<pre>
action: Wait for the resilver to complete.
[Resolve]
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
DNS=142.90.100.19
5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
Domains=triumf.ca
0B resilvered, 0.03% done, no estimated completion time
config:
 
NAME                                                STATE    READ WRITE CKSUM
rpool                                              ONLINE      0    0    0
  mirror-0                                          ONLINE      0    0    0
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0
 
errors: No known data errors
root@midm9b:~#
</pre>
</pre>
* wait for resilver to complete
** systemctl restart systemd-resolved
** resolvectl
** systemd-analyze cat-config systemd/resolved.conf
* disable NetworkManager
<pre>
<pre>
root@midm9b:~# zpool status
systemctl disable NetworkManager
  pool: bpool
</pre>
state: ONLINE
* reboot
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:


NAME                                                STATE    READ WRITE CKSUM
= Configure ECC memory =
bpool                                              ONLINE      0    0    0
  mirror-0                                          ONLINE      0    0    0
    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE      0    0    0
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE      0    0    0


errors: No known data errors
== Configure EDAC ==


  pool: rpool
* apt install edac-utils rasdaemon
state: ONLINE
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
config:


NAME                                                STATE    READ WRITE CKSUM
=== Intel i3-2120 ===
rpool                                              ONLINE      0    0    0
<pre>
  mirror-0                                          ONLINE      0    0    0
root@musr00:~# edac-ctl --mainboard
    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE      0    0    0
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE      0    0    0
root@musr00:~# edac-ctl --status
edac-ctl: drivers not loaded.
</pre>


errors: No known data errors
=== Intel E-2236 ===
<pre>
root@daq00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SCM-F
root@daq00:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@daq00:~# edac-util
edac-util: No errors to report.
root@daq00:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
</pre>
</pre>
* enable booting from second SSD: (instead of /dev/sda1, /dev/sdb1, use UUID=xxx)
* check edac sysfs files (Intel)
<pre>
<pre>
root@midm9b:~# mkfs.msdos /dev/sdb1
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
root@midm9b:~# mkdir /boot/efi-sda
total 0
root@midm9b:~# mkdir /boot/efi-sdb
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
root@midm20c:~# blkid | grep vfat ### identify UUID
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
/dev/sdb1: UUID="DD89-5081" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="d0cb6be4-2f67-5b42-9b26-9e6905e9f774"
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
/dev/sdc1: UUID="D970-86BA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="e6d3b5b9-a512-44a2-9205-1a4db06ed2a2"
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
/dev/sda1: UUID="DDA1-044C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6dc9dff0-1c13-8045-a906-7803d3074c70"
drwxr-xr-x 2 root root   0 Jan 25 15:10 power
root@midm20c:~# cat /etc/fstab | grep vfat ### add mount points with correct UUID
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank0
#UUID=D970-86BA  /boot/efi      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank1
UUID=DDA1-044C  /boot/efi-sda      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank2
UUID=DD89-5081  /boot/efi-sdb      vfat   umask=0022,fmask=0022,dmask=0022      0       1
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank3
root@midm9b:~# mount -a
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank4
root@midm9b:~# df -kl
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank5
Filesystem                                      1K-blocks    Used Available Use% Mounted on
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank6
...
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank7
/dev/sda1                                          523244  13720    509524  3% /boot/efi
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
/dev/sdb1                                          523244      4   523240  1% /boot/efi-sdb
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
...
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
sending incremental file list
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
EFI/
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
...
root@daq00:~#  
root@midm9b:~# ls -l /boot/efi-sda
</pre>
total 8
 
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
=== Intel E3-1270 v6 ===
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# ls -l /boot/efi-sdb
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~#  
</pre>
* setup script to update grub on second SSD, it must be run manually after every kernel update
<pre>
<pre>
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
root@midm9b:~# ~/update_efi_grub.perl -u
edac-ctl: mainboard: Supermicro X11SSH-F
EFI dir: /boot/efi-sda
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
edac-ctl: drivers are loaded.
building file list ... done
root@grsnis01:~# edac-util
 
edac-util: No errors to report.
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
root@grsnis01:~# edac-util -s
total size is 7,944,644  speedup is 1,492.23
edac-util: EDAC drivers are loaded. 1 MC detected
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sda/EFI
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
building file list ... done
total 0
 
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
sent 216 bytes  received 11 bytes  454.00 bytes/sec
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
total size is 5,452,378  speedup is 24,019.29
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
EFI dir: /boot/efi-sdb
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
building file list ... done
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
 
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
total size is 7,944,644  speedup is 1,492.23
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sdb/EFI
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
building file list ... done
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
 
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
sent 216 bytes  received 11 bytes  454.00 bytes/sec
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
total size is 5,452,378  speedup is 24,019.29
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
root@midm9b:~#  
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
root@grsnis01:~#  
</pre>
</pre>


= Disable NetworkManager =
=== Intel E3-1245 v6 ===


NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04
<pre>
 
[root@alphagdaq ~]# edac-ctl --mainboard
NetworkManager is useful for configuring dynamic
edac-ctl: mainboard: Supermicro X11SSH-F
network interfaces, i.e. laptops that often move
[root@alphagdaq ~]# edac-ctl --mainboard
between networks, or connect to multiple choice
edac-ctl: mainboard: Supermicro X11SSH-F
of wifi networks, etc.
[root@alphagdaq ~]# edac-ctl --status
 
edac-ctl: drivers are loaded.
For machines with statically configured network interfaces,
[root@alphagdaq ~]# edac-util
NetworkManager is not necessary.
edac-util: No errors to report.
[root@alphagdaq ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
[root@alphagdaq ~]# ras-mc-ctl --layout
          +-----------------------------------------------+
          |                      mc0                      |
          |  csrow0  |  csrow1  |  csrow2  |  csrow3  |
----------+-----------------------------------------------+
channel1: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
----------+-----------------------------------------------+
[root@alphagdaq ~]# ras-mc-ctl --error-count
Label              CE UE
mc#0csrow#3channel#0 0 0
mc#0csrow#2channel#1 0 0
mc#0csrow#3channel#1 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#2channel#0 0 0
[root@alphagdaq ~]# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
[root@alphagdaq ~]# ras-mc-ctl --summary
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
[root@alphagdaq ~]#
</pre>
 
=== AMD 3700X ===


As it has been observed to become confused and observed
(memory is non-ECC)
to malfunction when network links go up and down (it keeps
unnecessarily reconfiguring the ip address, etc), it can
be usefuil to disable it.


* list all network interfaces
<pre>
<pre>
# /bin/ls -1 /sys/class/net/
root@daq13:~# edac-ctl --mainboard
enp0s31f6
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
lo
root@daq13:~#
root@daq13:~#
root@daq13:~# edac-ctl --status
edac-ctl: drivers not loaded.
root@daq13:~# edac-util
edac-util: Error: No memory controller data found.
root@daq13:~# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
</pre>
</pre>
* edit /etc/network/interfaces:
 
(memory is ECC)
 
<pre>
<pre>
rename enp0s31f6=eth0
root@trinatdaq:~# edac-ctl --mainboard
auto eth0
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
iface eth0 inet static
root@trinatdaq:~# edac-ctl --status
  address 142.90.120.94/19
edac-ctl: drivers are loaded.
   gateway 142.90.100.18
root@trinatdaq:~# edac-util
</pre>
edac-util: No errors to report.
* statically configure systemd-resolved
root@trinatdaq:~# edac-util -s
** create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
edac-util: EDAC drivers are loaded. 1 MC detected
<pre>
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
[Resolve]
total 0
DNS=142.90.100.19
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
Domains=triumf.ca
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
</pre>
lrwxrwxrwx 1 root root   0 Dec 13 18:31 subsystem -> ../../../../bus/edac
** systemctl restart systemd-resolved
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
** resolvectl
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
** systemd-analyze cat-config systemd/resolved.conf
total 0
* disable NetworkManager
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
<pre>
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
systemctl disable NetworkManager
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank4
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank5
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
root@trinatdaq:~#
</pre>
</pre>
* reboot


= Configure ECC memory =
=== AMD 5000G ===


== Configure EDAC ==
* no linux driver for AMD 5000-series "G" CPU
* no mention of ECC in the BIOS settings
* unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
* unclear status of ECC support in ASUS documentation (web page out of date)


* apt install edac-utils
=== AMD 5600X ===


=== Intel i3-2120 ===
<pre>
<pre>
root@musr00:~# edac-ctl --mainboard
root@daq17:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
root@musr00:~# edac-ctl --status
root@daq17:~# edac-ctl --status
edac-ctl: drivers not loaded.
</pre>
 
=== Intel E-2236 ===
<pre>
root@daq00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SCM-F
root@daq00:~# edac-ctl --status
edac-ctl: drivers are loaded.
edac-ctl: drivers are loaded.
root@daq00:~# edac-util  
root@daq17:~# edac-util
edac-util: No errors to report.
edac-util: No errors to report.
root@daq00:~# edac-util -s
root@daq17:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
edac-util: EDAC drivers are loaded. 1 MC detected
</pre>
root@daq17:~# ls -l /sys/devices/system/edac/mc
* check edac sysfs files (Intel)
total 0
<pre>
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
total 0
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Jan 25 15:10 power
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank0
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank1
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank2
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank3
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank4
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank5
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank6
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
drwxr-xr-x 3 root root   0 Jan 25 15:10 rank7
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
root@daq17:~#  
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
root@daq00:~#  
</pre>
</pre>


=== Intel E3-1270 v6 ===
=== AMD 3955WX ===
 
<pre>
<pre>
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
edac-ctl: drivers are loaded.
edac-ctl: drivers are loaded.
root@grsnis01:~# edac-util
root@alphasuperdaq:~/git/scripts/quotareport# edac-util  
edac-util: No errors to report.
edac-util: No errors to report.
root@grsnis01:~# edac-util -s
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
edac-util: EDAC drivers are loaded. 1 MC detected
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 19 root root    0 Dez 12 04:48 mc0
drwxr-xr-x  2 root root    0 Dez 12 04:48 power
lrwxrwxrwx  1 root root    0 Dez  9 05:31 subsystem -> ../../../../bus/edac
-rw-r--r--  1 root root 4096 Dez  9 05:31 uevent
root@alphasuperdaq:~/git/scripts/quotareport#
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
total 0
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
drwxr-xr-x 2 root root    0 Dez 12 04:48 power
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank0
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank1
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank12
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank13
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank3
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank4
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank7
root@grsnis01:~#  
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank8
</pre>
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
 
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
=== Intel E3-1245 v6 ===
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
<pre>
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
[root@alphagdaq ~]# edac-ctl --mainboard
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
edac-ctl: mainboard: Supermicro X11SSH-F
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
[root@alphagdaq ~]# edac-ctl --mainboard
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
edac-ctl: mainboard: Supermicro X11SSH-F
root@alphasuperdaq:~#
[root@alphagdaq ~]# edac-ctl --status
root@alphasuperdaq:~# ras-mc-ctl --layout
edac-ctl: drivers are loaded.
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
[root@alphagdaq ~]# edac-util
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
edac-util: No errors to report.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
[root@alphagdaq ~]# edac-util -s
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
edac-util: EDAC drivers are loaded. 1 MC detected
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
[root@alphagdaq ~]# ras-mc-ctl --layout
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
          +-----------------------------------------------+
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
          |                     mc0                     |
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
          csrow0  csrow1  csrow2  csrow3  |
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
----------+-----------------------------------------------+
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
channel1: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
channel0: 8192 MB  |  8192 MB  |  8192 MB  | 8192 MB  |
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
----------+-----------------------------------------------+
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
[root@alphagdaq ~]# ras-mc-ctl --error-count
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                             mc0                                                                                             |
    |                                            csrow0                                            |                                            csrow1                                            |
    | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4 | channel5 | channel6 | channel7 |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 
0: |     0 MB |    0 MB  |     0 MB |    0 MB  |     0 MB |    0 MB  |     0 MB |    0 MB  |     0 MB  |     0 MB |    0 MB  |     0 MB |    0 MB  |     0 MB |    0 MB  |     0 MB  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@alphasuperdaq:~# ras-mc-ctl --error-count
Label              CE UE
Label              CE UE
mc#0csrow#3channel#0 0 0
mc#0csrow#0channel#2 0 0
mc#0csrow#2channel#1 0 0
mc#0csrow#1channel#7 0 0
mc#0csrow#3channel#1 0 0
mc#0csrow#0channel#3 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#4 0 0
mc#0csrow#1channel#2 0 0
mc#0csrow#0channel#7 0 0
mc#0csrow#1channel#3 0 0
mc#0csrow#0channel#4 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#1channel#5 0 0
mc#0csrow#0channel#6 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#0channel#5 0 0
mc#0csrow#2channel#0 0 0
mc#0csrow#0channel#0 0 0
[root@alphagdaq ~]# ras-mc-ctl --mainboard
mc#0csrow#1channel#6 0 0
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
root@alphasuperdaq:~# ras-mc-ctl --mainboard
[root@alphagdaq ~]# ras-mc-ctl --summary
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
root@alphasuperdaq:~# ras-mc-ctl --summary
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
No Memory errors.
[root@alphagdaq ~]#  
 
No PCIe AER errors.
 
No Extlog errors.
 
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@alphasuperdaq:~#
</pre>
</pre>


=== AMD 3700X ===
=== AMD 7700X ===
 
(memory is non-ECC)


<pre>
<pre>
root@daq13:~# edac-ctl --mainboard
root@dsfe05:~# apt install edac-utils
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@dsfe05:~# edac-ctl --mainboard
root@daq13:~#
edac-ctl: mainboard: Supermicro H13SAE-MF
root@daq13:~#
root@dsfe05:~# edac-ctl --status
root@daq13:~# edac-ctl --status
edac-ctl: drivers are loaded.
edac-ctl: drivers not loaded.
root@dsfe05:~# edac-util
root@daq13:~# edac-util  
edac-util: No errors to report.
edac-util: Error: No memory controller data found.
root@dsfe05:~# edac-util -s
root@daq13:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
edac-util: EDAC drivers loaded. No memory controllers found
root@dsfe05:~# ls -l /sys/devices/system/edac/mc/mc0
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
-r--r--r-- 1 root root 4096 May 14 09:33 ce_count
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-r--r--r-- 1 root root 4096 May 14 09:33 ce_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
-r--r--r-- 1 root root 4096 May 14 09:33 max_location
-r--r--r-- 1 root root 4096 May 14 09:33 mc_name
drwxr-xr-x 2 root root    0 May 14 09:33 power
drwxr-xr-x 3 root root    0 May 14 09:33 rank4
drwxr-xr-x 3 root root    0 May 14 09:33 rank5
--w------- 1 root root 4096 May 14 09:33 reset_counters
-r--r--r-- 1 root root 4096 May 14 09:33 seconds_since_reset
-r--r--r-- 1 root root 4096 May 14 09:33 size_mb
-r--r--r-- 1 root root 4096 May 14 09:33 ue_count
-r--r--r-- 1 root root 4096 May 14 09:33 ue_noinfo_count
-rw-r--r-- 1 root root 4096 May 14 09:33 uevent
root@dsfe05:~#
</pre>
</pre>


(memory is ECC)
== Configure rasdaemon ==


<pre>
<pre>
root@trinatdaq:~# edac-ctl --mainboard
apt install rasdaemon
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
</pre>
root@trinatdaq:~# edac-ctl --status
<pre>
edac-ctl: drivers are loaded.
systemctl enable rasdaemon
root@trinatdaq:~# edac-util
systemctl restart rasdaemon
edac-util: No errors to report.
systemctl status rasdaemon
root@trinatdaq:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
lrwxrwxrwx 1 root root    0 Dec 13 18:31 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank4
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank5
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
root@trinatdaq:~#
</pre>
</pre>


=== AMD 5000G ===
<pre>
 
● rasdaemon.service - RAS daemon to log the RAS events
* no linux driver for AMD 5000-series "G" CPU
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
* no mention of ECC in the BIOS settings
    Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
* unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
  Main PID: 2477175 (rasdaemon)
* unclear status of ECC support in ASUS documentation (web page out of date)
      Tasks: 1 (limit: 76958)
    Memory: 17.1M
    CGroup: /system.slice/rasdaemon.service
            └─2477175 /usr/sbin/rasdaemon -f -r
 
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
</pre>


=== AMD 5600X ===
== Get reports ==


* Intel 2x32GB ECC DIMMs
<pre>
<pre>
root@daq17:~# edac-ctl --mainboard
root@daq00:~# ras-mc-ctl --layout
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
          +-------------------------+
root@daq17:~# edac-ctl --status
          |          mc0           |
edac-ctl: drivers are loaded.
          |  csrow0  |  csrow1  |
root@daq17:~# edac-util
----------+-------------------------+
edac-util: No errors to report.
channel1: |  16384 MB  |  16384 MB  |
root@daq17:~# edac-util -s
channel0: |  16384 MB  |  16384 MB  |
edac-util: EDAC drivers are loaded. 1 MC detected
----------+-------------------------+
root@daq17:~# ls -l /sys/devices/system/edac/mc
root@daq00:~# ras-mc-ctl --error-count
total 0
Label                  CE      UE
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
mc#0csrow#1channel#1   0      0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
mc#0csrow#1channel#0    0      0
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
mc#0csrow#0channel#0    0      0
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
mc#0csrow#0channel#1   0      0
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
root@daq00:~#  
total 0
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
root@daq17:~#  
</pre>
</pre>


=== AMD 3955WX ===
* Intel 4x16GB ECC DIMMs
 
<pre>
<pre>
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
root@daq00:~# ras-mc-ctl --error-count
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
Label                  CE      UE
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
mc#0csrow#0channel#1    0      0
edac-ctl: drivers are loaded.
mc#0csrow#2channel#0    0      0
root@alphasuperdaq:~/git/scripts/quotareport# edac-util
mc#0csrow#0channel#0    0      0
edac-util: No errors to report.
mc#0csrow#2channel#1   0      0
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
mc#0csrow#1channel#0    0      0
edac-util: EDAC drivers are loaded. 1 MC detected
mc#0csrow#1channel#1   0       0
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
mc#0csrow#3channel#0   0       0
total 0
mc#0csrow#3channel#1    0       0
drwxr-xr-x 19 root root   0 Dez 12 04:48 mc0
root@daq00:~#  
drwxr-xr-x  2 root root   0 Dez 12 04:48 power
root@daq00:~# ras-mc-ctl --layout
lrwxrwxrwx  1 root root   0 Dez  9 05:31 subsystem -> ../../../../bus/edac
          +-----------------------+
-rw-r--r--  1 root root 4096 Dez  9 05:31 uevent
          |          mc0          |
root@alphasuperdaq:~/git/scripts/quotareport#  
          |  csrow0  |  csrow1  |
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
----------+-----------------------+
total 0
channel1: |  8192 MB  |  8192 MB  |
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
channel0: |  8192 MB  |  8192 MB  |
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
----------+-----------------------+
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
root@daq00:~#
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
root@daq00:~#
drwxr-xr-x 2 root root    0 Dez 12 04:48 power
root@daq00:~#  
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank0
root@daq00:~# ras-mc-ctl --print-labels
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank1
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
root@daq00:~# ras-mc-ctl --mainboard
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank12
root@daq00:~# ras-mc-ctl --summary
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank13
No Memory errors.
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
 
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
No PCIe AER errors.
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
 
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank3
No Extlog errors.
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank4
 
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank7
root@daq00:~#
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank8
</pre>
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
 
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
* AMD 7700 2x32GB DDR5 ECC DIMMs
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
 
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
<pre>
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
root@dsfe05:~# systemctl status rasdaemon
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
● rasdaemon.service - RAS daemon to log the RAS events
root@alphasuperdaq:~#  
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
root@alphasuperdaq:~# ras-mc-ctl --layout
    Active: active (running) since Tue 2024-05-14 09:36:43 PDT; 33ms ago
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
    Process: 4088418 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
  Main PID: 4088417 (rasdaemon)
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
      Tasks: 1 (limit: 37300)
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    Memory: 788.0K
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
        CPU: 5ms
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    CGroup: /system.slice/rasdaemon.service
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
            └─4088417 /usr/sbin/rasdaemon -f -r
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                              mc0                                                                                              |
    |                                            csrow0                                            |                                            csrow1                                            |
    | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


0: |     0 MB  |     0 MB  |    0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |    0 MB  |     0 MB |     0 MB |     0 MB |     0 MB |     0 MB |     0 MB |     0 MB |     0 MB |
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:aer_event event enabled
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:aer_event
root@alphasuperdaq:~# ras-mc-ctl --error-count
May 14 09:36:43 dsfe05 rasdaemon[4088417]: mce:mce_record event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event mce:mce_record
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:extlog_mem_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:extlog_mem_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mc_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording aer_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording extlog_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mce_record events
root@dsfe05:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 907.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 908.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 911.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
    +-----------------------------------------------------------------------------------------------+
     |                                             mc0                                              |
     |       csrow0        |       csrow1        |       csrow2        |       csrow3        |
     | channel0 | channel1 | channel0 | channel1 | channel0 | channel1 | channel0 | channel1 |
----+-----------------------------------------------------------------------------------------------+
 
0: |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |    0 MB  |
----+-----------------------------------------------------------------------------------------------+
root@dsfe05:~# ras-mc-ctl --error-count
Label              CE UE
Label              CE UE
mc#0csrow#0channel#2 0 0
mc#0csrow#2channel#1 0 0
mc#0csrow#1channel#7 0 0
mc#0csrow#2channel#0 0 0
mc#0csrow#0channel#3 0 0
root@dsfe05:~# ras-mc-ctl --print-labels
mc#0csrow#1channel#4 0 0
ras-mc-ctl: Error: No dimm labels for Supermicro model H13SAE-MF
mc#0csrow#1channel#2 0 0
root@dsfe05:~# ras-mc-ctl --mainboard
mc#0csrow#0channel#7 0 0
ras-mc-ctl: mainboard: Supermicro model H13SAE-MF
mc#0csrow#1channel#3 0 0
root@dsfe05:~# ras-mc-ctl --summary
mc#0csrow#0channel#4 0 0
mc#0csrow#1channel#1 0 0
mc#0csrow#1channel#0 0 0
mc#0csrow#1channel#5 0 0
mc#0csrow#0channel#6 0 0
mc#0csrow#0channel#1 0 0
mc#0csrow#0channel#5 0 0
mc#0csrow#0channel#0 0 0
mc#0csrow#1channel#6 0 0
root@alphasuperdaq:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~# ras-mc-ctl --summary
No Memory errors.
No Memory errors.


Line 2,208: Line 2,668:
No Extlog errors.
No Extlog errors.


DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
No MCE errors.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@dsfe05:~#  
root@alphasuperdaq:~#
</pre>
</pre>


=== AMD 7700X ===
= sensors =
 
== ASUS P7P55D EVO ==
 
* BIOS version 2101


<pre>
<pre>
root@dsfe05:~# apt install edac-utils
root@iris01:~# sensors
root@dsfe05:~# edac-ctl --mainboard
coretemp-isa-0000
edac-ctl: mainboard: Supermicro H13SAE-MF
Adapter: ISA adapter
root@dsfe05:~# edac-ctl --status
Core 0:       +34.0°C  (high = +83.0°C, crit = +99.0°C)
edac-ctl: drivers are loaded.
Core 1:       +37.0°C  (high = +83.0°C, crit = +99.0°C)
root@dsfe05:~# edac-util
Core 2:       +38.0°C  (high = +83.0°C, crit = +99.0°C)
edac-util: No errors to report.
Core 3:       +35.0°C  (high = +83.0°C, crit = +99.0°C)
root@dsfe05:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@dsfe05:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 May 14 09:33 ce_count
-r--r--r-- 1 root root 4096 May 14 09:33 ce_noinfo_count
-r--r--r-- 1 root root 4096 May 14 09:33 max_location
-r--r--r-- 1 root root 4096 May 14 09:33 mc_name
drwxr-xr-x 2 root root    0 May 14 09:33 power
drwxr-xr-x 3 root root    0 May 14 09:33 rank4
drwxr-xr-x 3 root root    0 May 14 09:33 rank5
--w------- 1 root root 4096 May 14 09:33 reset_counters
-r--r--r-- 1 root root 4096 May 14 09:33 seconds_since_reset
-r--r--r-- 1 root root 4096 May 14 09:33 size_mb
-r--r--r-- 1 root root 4096 May 14 09:33 ue_count
-r--r--r-- 1 root root 4096 May 14 09:33 ue_noinfo_count
-rw-r--r-- 1 root root 4096 May 14 09:33 uevent
root@dsfe05:~#
</pre>


== Configure rasdaemon ==
nouveau-pci-0100
Adapter: PCI adapter
GPU core:    900.00 mV (min = +0.85 V, max = +1.05 V)
temp1:        +46.0°C  (high = +95.0°C, hyst = +3.0°C)
                      (crit = +105.0°C, hyst =  +5.0°C)
                      (emerg = +135.0°C, hyst =  +5.0°C)


<pre>
atk0110-acpi-0
apt install rasdaemon
Adapter: ACPI interface
</pre>
Vcore Voltage:      864.00 mV (min =  +0.80 V, max =  +1.60 V)
<pre>
+3.3V Voltage:        3.38 V  (min =  +2.97 V, max =  +3.63 V)
systemctl enable rasdaemon
+5V Voltage:          5.04 V  (min =  +4.50 V, max =  +5.50 V)
systemctl restart rasdaemon
+12V Voltage:        12.15 V  (min = +10.20 V, max = +13.80 V)
systemctl status rasdaemon
CPU Fan Speed:      968 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis1 Fan Speed: 1288 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis2 Fan Speed: 1316 RPM  (min =  600 RPM, max = 7200 RPM)
Power Fan Speed:      0 RPM  (min =    0 RPM, max = 7200 RPM)
CPU Temperature:    +34.0°C  (high = +45.0°C, crit = +45.5°C)
MB Temperature:      +30.0°C  (high = +45.0°C, crit = +46.0°C)
 
root@iris01:~#
</pre>
</pre>


== ASUS Z170-DELUXE ==
* BIOS version 3801
* load sensors drivers
<pre>
<pre>
● rasdaemon.service - RAS daemon to log the RAS events
echo modprobe coretemp >> /etc/rc.local
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
echo modprobe jc42 >> /etc/rc.local
    Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
echo modprobe lm92 >> /etc/rc.local
  Main PID: 2477175 (rasdaemon)
echo modprobe nct6775 >> /etc/rc.local
      Tasks: 1 (limit: 76958)
    Memory: 17.1M
    CGroup: /system.slice/rasdaemon.service
            └─2477175 /usr/sbin/rasdaemon -f -r
 
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
</pre>
</pre>
* in /boot/grub/grub.cfg, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
* update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg


== Get reports ==
* Intel 2x32GB ECC DIMMs
<pre>
<pre>
root@daq00:~# ras-mc-ctl --layout
root@iris00:~# sensors
          +-------------------------+
nct6793-isa-0290
          |          mc0          |
Adapter: ISA adapter
          |  csrow0  |  csrow1  |
in0:                      600.00 mV (min =  +0.00 V, max =  +1.74 V)
----------+-------------------------+
in1:                       1.02 V (min = +0.00 V, max = +0.00 V) ALARM
channel1: | 16384 MB | 16384 MB |
in2:                       3.39 V (min = +0.00 V, max = +0.00 V) ALARM
channel0: | 16384 MB | 16384 MB |
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
----------+-------------------------+
in4:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
root@daq00:~# ras-mc-ctl --error-count
in5:                      144.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
Label                  CE      UE
in6:                        0.00 V  (min =  +0.00 V, max =  +0.00 V)
mc#0csrow#1channel#1   0       0
in7:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
mc#0csrow#1channel#0   0      0
in8:                       3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
mc#0csrow#0channel#0   0       0
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
mc#0csrow#0channel#1    0       0
in10:                    600.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@daq00:~#
in11:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in13:                    592.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
* Intel 4x16GB ECC DIMMs
in14:                    968.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
<pre>
fan1:                    1370 RPM  (min =   0 RPM)
root@daq00:~# ras-mc-ctl --error-count
fan2:                    1437 RPM  (min =   0 RPM)
Label                  CE      UE
fan3:                        0 RPM  (min =   0 RPM)
mc#0csrow#0channel#1   0       0
fan4:                        0 RPM  (min =   0 RPM)
mc#0csrow#2channel#0   0       0
fan5:                        0 RPM  (min =   0 RPM)
mc#0csrow#0channel#0   0       0
fan6:                        0 RPM  (min =   0 RPM)
mc#0csrow#2channel#1   0       0
SYSTIN:                    +32.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
mc#0csrow#1channel#0    0       0
CPUTIN:                   +42.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
mc#0csrow#1channel#1   0       0
AUXTIN0:                 -128.0°C    sensor = thermistor
mc#0csrow#3channel#0    0       0
AUXTIN1:                  +50.0°C    sensor = thermistor
mc#0csrow#3channel#1   0       0
AUXTIN2:                  +22.0°C    sensor = thermistor
root@daq00:~#
AUXTIN3:                  +28.0°C    sensor = thermistor
root@daq00:~# ras-mc-ctl --layout
PECI Agent 0:              +50.0°C (high = +98.0°C, hyst = +95.0°C)
          +-----------------------+
                                    (crit = +100.0°C)
          |          mc0          |
PECI Agent 0 Calibration+42.5°C  
          | csrow0  |  csrow1  |
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
----------+-----------------------+
PCH_CHIP_TEMP:              +0.0°C  
channel1: | 8192 MB | 8192 MB |
PCH_CPU_TEMP:               +0.0°C  
channel0: | 8192 MB  |  8192 MB |
PCH_MCH_TEMP:              +0.0°C  
----------+-----------------------+
TSI2_TEMP:                +3892314.0°C 
root@daq00:~#
TSI3_TEMP:               +3892314.0°C 
root@daq00:~#
TSI4_TEMP:               +3892314.0°C 
root@daq00:~#
TSI5_TEMP:               +3892314.0°C 
root@daq00:~# ras-mc-ctl --print-labels
TSI6_TEMP:               +3892314.0°C 
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
TSI7_TEMP:               +3892314.0°C 
root@daq00:~# ras-mc-ctl --mainboard
intrusion0:               ALARM
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
intrusion1:               ALARM
root@daq00:~# ras-mc-ctl --summary
beep_enable:             disabled
No Memory errors.


No PCIe AER errors.
jc42-i2c-0-1a
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)


No Extlog errors.
jc42-i2c-0-18
Adapter: SMBus I801 adapter at f040
temp1:        +34.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
jc42-i2c-0-1b
Adapter: SMBus I801 adapter at f040
temp1:        +35.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
jc42-i2c-0-19
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +48.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +84.0°C, crit = +100.0°C)


DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
root@iris00:~#  
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@daq00:~#  
</pre>
</pre>


note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.
== ASUS H110M-A/M.2 ==


* AMD 7700 2x32GB DDR5 ECC DIMMs
* BIOS version 4202
* echo modprobe coretemp >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local


<pre>
<pre>
root@dsfe05:~# systemctl status rasdaemon
root@midpol:~# sensors
● rasdaemon.service - RAS daemon to log the RAS events
coretemp-isa-0000
    Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
Adapter: ISA adapter
    Active: active (running) since Tue 2024-05-14 09:36:43 PDT; 33ms ago
Package id 0: +33.0°C  (high = +80.0°C, crit = +100.0°C)
    Process: 4088418 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
Core 0:       +33.0°C  (high = +80.0°C, crit = +100.0°C)
  Main PID: 4088417 (rasdaemon)
Core 1:       +30.0°C  (high = +80.0°C, crit = +100.0°C)
      Tasks: 1 (limit: 37300)
    Memory: 788.0K
        CPU: 5ms
    CGroup: /system.slice/rasdaemon.service
            └─4088417 /usr/sbin/rasdaemon -f -r


May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:aer_event event enabled
acpitz-acpi-0
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:aer_event
Adapter: ACPI interface
May 14 09:36:43 dsfe05 rasdaemon[4088417]: mce:mce_record event enabled
temp1:       +27.8°C  (crit = +119.0°C)
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event mce:mce_record
temp2:        +29.8°C (crit = +119.0°C)
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:extlog_mem_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:extlog_mem_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mc_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording aer_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording extlog_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mce_record events
root@dsfe05:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 907.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 908.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 911.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
    +-----------------------------------------------------------------------------------------------+
    |                                              mc0                                              |
    |        csrow0        |        csrow1        |        csrow2        |        csrow3        |
    | channel0 | channel1  | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  |
----+-----------------------------------------------------------------------------------------------+


0: |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |    0 MB |
nct6793-isa-0290
----+-----------------------------------------------------------------------------------------------+
Adapter: ISA adapter
root@dsfe05:~# ras-mc-ctl --error-count
in0:                      368.00 mV (min =  +0.00 V, max =  +1.74 V)
Label              CE UE
in1:                       1.02 V  (min =  +0.00 V, max = +0.00 V)  ALARM
mc#0csrow#2channel#1 0 0
in2:                        3.39 V  (min = +0.00 V, max = +0.00 V) ALARM
mc#0csrow#2channel#0 0 0
in3:                        3.36 V  (min =  +0.00 V, max = +0.00 V)  ALARM
root@dsfe05:~# ras-mc-ctl --print-labels
in4:                        1.02 V (min =  +0.00 V, max = +0.00 V) ALARM
ras-mc-ctl: Error: No dimm labels for Supermicro model H13SAE-MF
in5:                      152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@dsfe05:~# ras-mc-ctl --mainboard
in6:                      928.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
ras-mc-ctl: mainboard: Supermicro model H13SAE-MF
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
root@dsfe05:~# ras-mc-ctl --summary
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
No Memory errors.
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in10:                    152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
No PCIe AER errors.
in11:                    128.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in12:                    136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
No Extlog errors.
in13:                     120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
 
in14:                    136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
No MCE errors.
fan1:                    1004 RPM  (min =    0 RPM)
root@dsfe05:~#  
fan2:                    1143 RPM  (min =    0 RPM)
</pre>
fan5:                        0 RPM  (min =    0 RPM)
 
fan6:                        0 RPM  (min =    0 RPM)
= sensors =
SYSTIN:                  +118.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +30.0°C    sensor = thermistor
AUXTIN1:                 +112.0°C    sensor = thermistor
AUXTIN2:                 +111.0°C    sensor = thermistor
AUXTIN3:                 +110.0°C    sensor = thermistor
PECI Agent 0:             +31.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration: +36.5°C 
PCH_CHIP_CPU_MAX_TEMP:     +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
TSI2_TEMP:                +3892314.0°C 
TSI3_TEMP:                +3892314.0°C 
TSI4_TEMP:                +3892314.0°C 
TSI5_TEMP:                +3892314.0°C 
TSI6_TEMP:                +3892314.0°C 
TSI7_TEMP:                +3892314.0°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled
 
root@midpol:~#  
</pre>


== ASUS P7P55D EVO ==
== ASUS P9X79 WS ==


* BIOS version 2101
* https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
* BIOS version 4802
* modprobe nct6775
* modprobe coretemp


<pre>
<pre>
root@iris01:~# sensors
root@daq14:~# sensors
coretemp-isa-0000
coretemp-isa-0000
Adapter: ISA adapter
Adapter: ISA adapter
Core 0:       +34.0°C  (high = +83.0°C, crit = +99.0°C)
Package id 0:  +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:       +37.0°C  (high = +83.0°C, crit = +99.0°C)
Core 0:       +29.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:       +38.0°C  (high = +83.0°C, crit = +99.0°C)
Core 1:       +24.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:       +35.0°C  (high = +83.0°C, crit = +99.0°C)
Core 2:       +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:       +32.0°C  (high = +82.0°C, crit = +100.0°C)


nouveau-pci-0100
nouveau-pci-0200
Adapter: PCI adapter
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.05 V)
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +46.0°C  (high = +95.0°C, hyst =  +3.0°C)
temp1:        +39.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)


atk0110-acpi-0
nct6776-isa-0290
Adapter: ACPI interface
Adapter: ISA adapter
Vcore Voltage:     864.00 mV (min =  +0.80 V, max =  +1.60 V)
Vcore:           1.04 V  (min =  +0.00 V, max =  +1.74 V)
+3.3V Voltage:       3.38 V  (min =  +2.97 V, max =  +3.63 V)
in1:            1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+5V Voltage:         5.04 V  (min =  +4.50 V, max =  +5.50 V)
AVCC:            3.33 V  (min =  +0.00 V, max =  +0.00 V) ALARM
+12V Voltage:       12.15 V  (min = +10.20 V, max = +13.80 V)
+3.3V:           3.33 V  (min =  +0.00 V, max =  +0.00 V) ALARM
CPU Fan Speed:       968 RPM (min =  600 RPM, max = 7200 RPM)
in4:            1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Chassis1 Fan Speed: 1288 RPM  (min =  600 RPM, max = 7200 RPM)
in5:             2.04 V  (min =  +0.00 V, max =  +0.00 V) ALARM
Chassis2 Fan Speed: 1316 RPM  (min =  600 RPM, max = 7200 RPM)
in6:          904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
Power Fan Speed:       0 RPM  (min =    0 RPM, max = 7200 RPM)
3VSB:           3.41 V  (min = +0.00 V, max = +0.00 V) ALARM
CPU Temperature:     +34.0°C  (high = +45.0°C, crit = +45.5°C)
Vbat:           3.30 V (min =  +0.00 V, max = +0.00 V) ALARM
MB Temperature:     +30.0°C  (high = +45.0°C, crit = +46.0°C)
fan1:         1265 RPM  (min =   0 RPM)
 
fan2:          1909 RPM (min =   0 RPM)
root@iris01:~#
fan3:             0 RPM  (min =   0 RPM)
</pre>
fan4:            0 RPM (min =   0 RPM)
 
fan5:             0 RPM  (min =    0 RPM)
== ASUS Z170-DELUXE ==
SYSTIN:        +34.0°C  (high =  +0.0°C, hyst = +0.0°C) ALARM  sensor = thermistor
CPUTIN:         +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
AUXTIN:        +31.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
PECI Agent 0:   +31.0°C  (high = +80.0°C, hyst = +75.0°C)
                        (crit = +96.0°C)
PCH_CHIP_TEMP:  +0.0°C 
PCH_CPU_TEMP:    +0.0°C 
PCH_MCH_TEMP:   +0.0°C 
intrusion0:    ALARM
intrusion1:    ALARM
beep_enable:  disabled


* BIOS version 3801
root@daq14:~#
<pre>
echo modprobe coretemp >> /etc/rc.local
echo modprobe jc42 >> /etc/rc.local
echo modprobe lm92 >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
</pre>
</pre>


<pre>
== ASUS TUF GAMING B550M-PLUS WIFI II ==
root@iris00:~# sensors
jc42-i2c-0-1a
Adapter: SMBus I801 adapter at f040
temp1:        +32.0°C  (low  = +0.0°C)                  ALARM (HIGH, CRIT)
                      (high = +0.0°C, hyst =  +0.0°C)
                      (crit = +0.0°C, hyst = +0.0°C)


jc42-i2c-0-18
* BIOS 2803, 2806
Adapter: SMBus I801 adapter at f040
* echo modprobe nct6775 >> /etc/rc.local
temp1:        +30.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)


jc42-i2c-0-1b
<pre>
Adapter: SMBus I801 adapter at f040
root@midm9a:~# sensors
temp1:        +31.5°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
nct6798-isa-0290
                      (high =  +0.0°C, hyst =  +0.0°C)
Adapter: ISA adapter
                      (crit =  +0.0°C, hyst =  +0.0°C)
in0:                      488.00 mV (min =  +0.00 V, max =  +1.74 V)
 
jc42-i2c-0-19
Adapter: SMBus I801 adapter at f040
temp1:        +31.5°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +28.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +24.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +25.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +25.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +22.0°C  (high = +84.0°C, crit = +100.0°C)
 
root@iris00:~#
</pre>
 
== ASUS H110M-A/M.2 ==
 
* BIOS version 4202
* echo modprobe coretemp >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
root@midpol:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +30.0°C  (high = +80.0°C, crit = +100.0°C)
 
acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)
 
nct6793-isa-0290
Adapter: ISA adapter
in0:                      368.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                     152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      928.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                     1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                       1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     128.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1004 RPM  (min =    0 RPM)
fan1:                       0 RPM  (min =    0 RPM)
fan2:                     1143 RPM  (min =    0 RPM)
fan2:                     760 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan6:                       0 RPM  (min =    0 RPM)
fan7:                     1264 RPM  (min =    0 RPM)
SYSTIN:                   +118.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
SYSTIN:                   +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +22.5°C (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +30.0°C    sensor = thermistor
AUXTIN0:                  +95.0°C    sensor = thermistor
AUXTIN1:                 +112.0°C    sensor = thermistor
AUXTIN1:                   +25.0°C    sensor = thermistor
AUXTIN2:                 +111.0°C    sensor = thermistor
AUXTIN2:                   +25.0°C    sensor = thermistor
AUXTIN3:                 +110.0°C    sensor = thermistor
AUXTIN3:                   +25.0°C    sensor = thermistor
PECI Agent 0:              +31.0°C  (high = +98.0°C, hyst = +95.0°C)
PECI Agent 0 Calibration:  +23.5°C   
                                    (crit = +100.0°C)
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C   
PECI Agent 0 Calibration:  +36.5°C   
PCH_CHIP_TEMP:              +0.0°C   
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C   
PCH_CPU_TEMP:               +0.0°C   
PCH_CHIP_TEMP:              +0.0°C   
TSI0_TEMP:                 +32.4°C  
TSI2_TEMP:               +3892314.0°C 
TSI3_TEMP:                +3892314.0°C 
TSI4_TEMP:                +3892314.0°C 
TSI5_TEMP:                +3892314.0°C   
TSI6_TEMP:               +3892314.0°C 
TSI7_TEMP:                +3892314.0°C  
intrusion0:              ALARM
intrusion0:              ALARM
intrusion1:              ALARM
intrusion1:              ALARM
beep_enable:              disabled
beep_enable:              disabled


root@midpol:~#  
amdgpu-pci-0800
Adapter: PCI adapter
vddgfx:        1.45 V 
vddnb:      993.00 mV
edge:        +28.0°C 
PPT:          20.00 W 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +33.4°C 
 
root@midm9a:~#  
</pre>
</pre>


== ASUS P9X79 WS ==
== ASUS ASUS ROG STRIX B550-XE GAMING WIFI ==


* https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
* BIOS 2423, 2604
* BIOS version 4802
* echo modprobe nct6775 >> /etc/rc.local
* modprobe nct6775
* modprobe coretemp


<pre>
<pre>
root@daq14:~# sensors
root@daq13:~# sensors
coretemp-isa-0000
nct6798-isa-0290
Adapter: ISA adapter
Adapter: ISA adapter
Package id 0:  +35.0°C (high = +82.0°C, crit = +100.0°C)
in0:                      344.00 mV (min =  +0.00 V, max =  +1.74 V)
Core 0:       +29.0°C (high = +82.0°C, crit = +100.0°C)
in1:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
Core 1:       +24.0°C (high = +82.0°C, crit = +100.0°C)
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Core 2:       +35.0°C  (high = +82.0°C, crit = +100.0°C)
in3:                       3.39 V  (min = +0.00 V, max =  +0.00 V)  ALARM
Core 3:       +32.0°C (high = +82.0°C, crit = +100.0°C)
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V (min = +0.00 V, max = +0.00 V) ALARM
in8:                        3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                       1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                    960.00 mV (min =  +0.00 V, max =  +0.00 V) ALARM
in11:                    960.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in12:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V) ALARM
in14:                    208.00 mV (min = +0.00 V, max = +0.00 V)  ALARM
fan1:                      845 RPM  (min =    0 RPM)
fan2:                      998 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                   +28.0°C  (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
CPUTIN:                   +27.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
AUXTIN0:                  +94.0°C    sensor = thermistor
AUXTIN1:                  +28.0°C    sensor = thermistor
AUXTIN2:                  +28.0°C    sensor = thermistor
AUXTIN3:                  +97.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +27.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +33.6°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled


nouveau-pci-0200
amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.45 V 
vddnb:      999.00 mV
edge:        +29.0°C 
PPT:          14.00 W 
 
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +33.9°C 
 
root@daq13:~#
</pre>
 
== ASUS ASUS ROG STRIX B550-E GAMING ==
 
* bios 2803
* echo modprobe jc42 >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
root@daq17:~# sensors
jc42-i2c-1-1b
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +28.0°C 
 
nouveau-pci-0800
Adapter: PCI adapter
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +39.0°C  (high = +95.0°C, hyst =  +3.0°C)
temp1:        +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)


nct6776-isa-0290
nct6798-isa-0290
Adapter: ISA adapter
Adapter: ISA adapter
Vcore:           1.04 V  (min =  +0.00 V, max =  +1.74 V)
in0:                     288.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in1:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:           3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                       3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:           3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                       3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:             2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:           904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                     224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:           3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                       3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:           3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                       3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:         1265 RPM  (min =    0 RPM)
in9:                        1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan2:         1909 RPM  (min =    0 RPM)
in10:                      1.06 (min =  +0.00 V, max =  +0.00 V)  ALARM
fan3:             0 RPM  (min =    0 RPM)
in11:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan4:             0 RPM  (min =    0 RPM)
in12:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan5:             0 RPM  (min =    0 RPM)
in13:                    280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
SYSTIN:         +34.0°C  (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
in14:                    208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
CPUTIN:         +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
fan1:                     843 RPM  (min =    0 RPM)
AUXTIN:         +31.5°C  (high = +80.0°C, hyst = +75.0°Csensor = thermistor
fan2:                     629 RPM  (min =    0 RPM)
PECI Agent 0:   +31.0°C (high = +80.0°C, hyst = +75.0°C)
fan3:                     746 RPM  (min =    0 RPM)
                        (crit = +96.0°C)
fan4:                       0 RPM  (min =    0 RPM)
PCH_CHIP_TEMP:   +0.0°C   
fan5:                       0 RPM  (min =    0 RPM)
PCH_CPU_TEMP:   +0.0°C   
SYSTIN:                   +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
PCH_MCH_TEMP:   +0.0°C  
CPUTIN:                   +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
intrusion0:   ALARM
AUXTIN0:                   +93.0°C   sensor = thermistor
intrusion1:   ALARM
AUXTIN1:                  +22.0°C   sensor = thermistor
beep_enable:   disabled
AUXTIN2:                   +22.0°C   sensor = thermistor
AUXTIN3:                  +96.0°C   sensor = thermistor
PECI Agent 0 Calibration:  +25.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C
PCH_CHIP_TEMP:             +0.0°C   
PCH_CPU_TEMP:               +0.0°C   
TSI0_TEMP:                 +27.6°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:             disabled


root@daq14:~#
jc42-i2c-1-1a
</pre>
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:       +23.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)


== ASUS TUF GAMING B550M-PLUS WIFI II ==
asusec-isa-0000
Adapter: ISA adapter
CPU_Opt:        0 RPM
Chipset:      +34.0°C 
CPU:          +25.0°C 
Motherboard:  +22.0°C 
T_Sensor:    -40.0°C 
VRM:          +31.0°C 


* BIOS 2803, 2806
k10temp-pci-00c3
* echo modprobe nct6775 >> /etc/rc.local
Adapter: PCI adapter
 
Tctl:        +28.0°C 
<pre>
Tccd1:        +27.5°C 
root@midm9a:~# sensors
 
nct6798-isa-0290
root@daq17:~#
</pre>
 
== ASUS PRIME B650-PLUS ==
 
* BIOS 1811
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
root@dsdaqgw:~# sensors
amdgpu-pci-0b00
Adapter: PCI adapter
vddgfx:      930.00 mV
vddnb:        1.19 V 
edge:        +38.0°C 
PPT:          25.10 W 
 
nct6799-isa-0290
Adapter: ISA adapter
Adapter: ISA adapter
in0:                      488.00 mV (min =  +0.00 V, max =  +1.74 V)
in0:                      920.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan1:                        0 RPM  (min =    0 RPM)
fan2:                     760 RPM  (min =    0 RPM)
fan2:                     1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan7:                     1264 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
fan5:                        0 RPM  (min =    0 RPM)
CPUTIN:                    +22.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
fan7:                       0 RPM  (min =    0 RPM)
AUXTIN0:                  +95.0°C    sensor = thermistor
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN1:                  +25.0°C    sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN2:                  +25.0°C    sensor = thermistor
AUXTIN0:                  +78.0°C    sensor = thermistor
AUXTIN3:                  +25.0°C    sensor = thermistor
AUXTIN1:                  +11.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +23.5°C   
AUXTIN2:                  +20.0°C    sensor = thermistor
AUXTIN3:                  +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C   
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C   
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C   
PCH_CHIP_TEMP:              +0.0°C   
PCH_CHIP_TEMP:              +0.0°C   
PCH_CPU_TEMP:              +0.0°C   
PCH_CPU_TEMP:              +0.0°C   
TSI0_TEMP:                +32.4°C  
TSI0_TEMP:                +42.6°C  
intrusion0:              ALARM
intrusion0:              ALARM
intrusion1:              ALARM
intrusion1:              OK
beep_enable:              disabled
beep_enable:              disabled


amdgpu-pci-0800
k10temp-pci-00c3
Adapter: PCI adapter
Adapter: PCI adapter
vddgfx:        1.45 V 
Tctl:        +42.6°C  
vddnb:      993.00 mV
Tccd1:       +36.4°C  
edge:        +28.0°C  
PPT:         20.00 W  


k10temp-pci-00c3
root@dsdaqgw:~#  
Adapter: PCI adapter
Tctl:        +33.4°C 
 
root@midm9a:~#  
</pre>
</pre>


== ASUS ASUS ROG STRIX B550-XE GAMING WIFI ==
= Enable CPU turbo mode =
 
* BIOS 2423, 2604
* echo modprobe nct6775 >> /etc/rc.local


* Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
* Find out CPU capability
<pre>
root@daq01:~# lscpu | grep Hz
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
CPU MHz:                        3965.803
CPU max MHz:                    4000.0000
CPU min MHz:                    800.0000
root@daq01:~#
</pre>
* Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html
* Find current frequency settings:
<pre>
<pre>
root@daq13:~# sensors
root@daq01:~# cpupower frequency-info
nct6798-isa-0290
analyzing CPU 0:
Adapter: ISA adapter
  driver: intel_pstate
in0:                      344.00 mV (min =  +0.00 V, max =  +1.74 V)
  CPUs which run at the same hardware frequency: 0
in1:                     992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
  CPUs which need to have their frequency coordinated by software: 0
in2:                       3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
  maximum transition latencyCannot determine or is not supported.
in3:                       3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
  hardware limits: 800 MHz - 4.00 GHz
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
  available cpufreq governors: performance powersave
in5:                     960.00 mV (min = +0.00 V, max =  +0.00 V)  ALARM
  current policy: frequency should be within 800 MHz and 4.00 GHz.
in6:                     216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
                  The governor "powersave" may decide which speed to use
in7:                       3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
                  within this range.
in8:                       3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
  current CPU frequency: Unable to call hardware
in9:                        1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
in10:                    960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
  boost state support:
in11:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
    Supported: yes
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V) ALARM
    Active: yes
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
root@daq01:~#  
in14:                    208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
</pre>
fan1:                      845 RPM  (min =    0 RPM)
* Note the following:
fan2:                      998 RPM  (min =    0 RPM)
** current governor is "powersave"
fan3:                        0 RPM  (min =    0 RPM)
** "performance" governor is available
fan4:                        0 RPM  (min =    0 RPM)
** "boost state support" is supported and active.
fan5:                        0 RPM  (min =    0 RPM)
* Confirm CPU frequency governor:
SYSTIN:                    +28.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +27.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +94.0°C    sensor = thermistor
AUXTIN1:                  +28.0°C    sensor = thermistor
AUXTIN2:                  +28.0°C    sensor = thermistor
AUXTIN3:                  +97.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +27.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +33.6°C 
intrusion0:              ALARM
intrusion1:              ALARM
beep_enable:              disabled
 
amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:       1.45 V 
vddnb:       999.00 mV
edge:        +29.0°C 
PPT:          14.00 W 
 
iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +33.9°C 
 
root@daq13:~#  
</pre>
 
== ASUS ASUS ROG STRIX B550-E GAMING ==
 
* bios 2803
* echo modprobe jc42 >> /etc/rc.local
* echo modprobe nct6775 >> /etc/rc.local
 
<pre>
<pre>
root@daq17:~# sensors
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
jc42-i2c-1-1b
powersave
Adapter: SMBus PIIX4 adapter port 0 at 0b00
powersave
temp1:        +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
powersave
                      (high =  +0.0°C, hyst =  +0.0°C)
powersave
                      (crit =  +0.0°C, hyst =  +0.0°C)
powersave
 
powersave
iwlwifi_1-virtual-0
powersave
Adapter: Virtual device
powersave
temp1:       +28.0°C 
root@daq01:~#
 
</pre>
nouveau-pci-0800
* Change governor to "performance":
Adapter: PCI adapter
<pre>
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
root@daq01:~# cpupower frequency-set --governor performance
temp1:       +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
Setting cpu: 0
                      (crit = +105.0°C, hyst =  +5.0°C)
Setting cpu: 1
                      (emerg = +135.0°C, hyst =  +5.0°C)
Setting cpu: 2
 
Setting cpu: 3
nct6798-isa-0290
Setting cpu: 4
Adapter: ISA adapter
Setting cpu: 5
in0:                     288.00 mV (min =  +0.00 V, max =  +1.74 V)
Setting cpu: 6
in1:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Setting cpu: 7
in2:                       3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
in3:                       3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in4:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in5:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in6:                     224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in7:                       3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in8:                       3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in9:                        1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in10:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
performance
in11:                      1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
root@daq01:~# cpupower frequency-info
in12:                      1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
analyzing CPU 0:
in13:                    280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
  driver: intel_pstate
in14:                    208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
  CPUs which run at the same hardware frequency: 0
fan1:                      843 RPM  (min =    0 RPM)
  CPUs which need to have their frequency coordinated by software: 0
fan2:                      629 RPM  (min =    0 RPM)
  maximum transition latencyCannot determine or is not supported.
fan3:                      746 RPM  (min =    0 RPM)
  hardware limits: 800 MHz - 4.00 GHz
fan4:                       0 RPM  (min =    0 RPM)
  available cpufreq governors: performance powersave
fan5:                        0 RPM  (min =    0 RPM)
  current policy: frequency should be within 800 MHz and 4.00 GHz.
SYSTIN:                   +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
                  The governor "performance" may decide which speed to use
CPUTIN:                   +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
                  within this range.
AUXTIN0:                  +93.0°C    sensor = thermistor
  current CPU frequency: Unable to call hardware
AUXTIN1:                  +22.0°C    sensor = thermistor
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
AUXTIN2:                   +22.0°C    sensor = thermistor
  boost state support:
AUXTIN3:                  +96.0°C    sensor = thermistor
     Supported: yes
PECI Agent 0 Calibration:  +25.5°C 
    Active: yes
PCH_CHIP_CPU_MAX_TEMP:     +0.0°C 
PCH_CHIP_TEMP:             +0.0°C  
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                 +27.6°C 
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled
 
jc42-i2c-1-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +23.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                      (high =  +0.0°C, hyst =  +0.0°C)
                      (crit =  +0.0°C, hyst =  +0.0°C)
 
asusec-isa-0000
Adapter: ISA adapter
CPU_Opt:       0 RPM
Chipset:      +34.0°C 
CPU:         +25.0°C 
Motherboard: +22.0°C 
T_Sensor:     -40.0°C 
VRM:         +31.0°C 
 
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:        +28.0°C 
Tccd1:        +27.5°C 
 
root@daq17:~#
</pre>
</pre>
* monitor CPU frequency:
<pre>
root@daq01:~# cpupower monitor
    | Nehalem                  || Mperf              || Idle_Stats                                   
CPU| C3  | C6  | PC3  | PC6  || C0  | Cx  | Freq  || POLL | C1  | C1E  | C3  | C6  | C7s  | C8   
  0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
  4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
  1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
  5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
  2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
  6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
  3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
  7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~#
</pre>
* check that the CPU is not overheating:
<pre>
root@daq01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
</pre>
* congratulations, we are running at 4 GHz now!


== ASUS PRIME B650-PLUS ==
= Setup ubuntu as gateway to private network =
 
* BIOS 1811
* echo modprobe nct6775 >> /etc/rc.local


<pre>
See also:
root@dsdaqgw:~# sensors
* https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
amdgpu-pci-0b00
* http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1
Adapter: PCI adapter
vddgfx:      930.00 mV
vddnb:         1.19 V 
edge:        +38.0°C 
PPT:          25.10 W 


nct6799-isa-0290
== Steps to do ==
Adapter: ISA adapter
 
in0:                      920.00 mV (min = +0.00 V, max = +1.74 V)
!!! UPDATED 16feb2024 Ubuntu-22.04.03 !!!
in1:                        1.02 V  (min = +0.00 V, max = +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                      1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                      1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                      1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                    1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  +78.0°C    sensor = thermistor
AUXTIN1:                  +11.0°C    sensor = thermistor
AUXTIN2:                  +20.0°C    sensor = thermistor
AUXTIN3:                  +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C 
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C 
PCH_CHIP_TEMP:              +0.0°C 
PCH_CPU_TEMP:              +0.0°C 
TSI0_TEMP:                +42.6°C 
intrusion0:              ALARM
intrusion1:              OK
beep_enable:              disabled


k10temp-pci-00c3
* assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
Adapter: PCI adapter
* (on the gateway machine, each private network interface has to have a different network number)
Tctl:        +42.6°C 
* (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
Tccd1:        +36.4°C 
* assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
* (for simplicity, assign 192.168.1.1 to the gateway machine itself)
* (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
* setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
* setup DHCP server (dnsmasq) to give out the IP addresses
* setup TFTP server (dnsmasq), pxelinux and NFS for diskless booting
* setup time server (chronyd) to provide common time to all devices
* setup NAT so machines on private network can access the internet (to get OS updates, etc)
* setup NIS and NFS so machines on the private network can use common home directories
* setup rsync backup of machines on the private network
 
== setup hosts ==


root@dsdaqgw:~#
* edit /etc/hosts
<pre>
192.168.1.101 dsfe01
... and so forth
</pre>
</pre>


= Enable CPU turbo mode =
== setup dns and dhcp ==
 
!!! updated 16feb2024 for Ubuntu 22.04.3 !!!
 
!!! note: stock systemd-resolved remains, is configured to forward queries to dnsmasq, configured to forward queries to TRIUMF DNS !!!
 
!!! note: per authors of systemd, bare hostnames are not permitted, a DNS domain name must always be used. DNS domain name "dsdaq" is used in this example !!!


* Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
* apt install dnsmasq
* Find out CPU capability
* ensure dnsmasq starts after all interfaces are up (Ubuntu-22)
<pre>
<pre>
root@daq01:~# lscpu | grep Hz
mkdir /etc/systemd/system/dnsmasq.service.d
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/dnsmasq.service.d/local.conf
CPU MHz:                        3965.803
</pre>
CPU max MHz:                    4000.0000
* edit /etc/dnsmasq.conf
CPU min MHz:                    800.0000
root@daq01:~#
</pre>
* Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html
* Find current frequency settings:
<pre>
<pre>
root@daq01:~# cpupower frequency-info
# /etc/dnsmasq.conf
analyzing CPU 0:
# DNS settings
  driver: intel_pstate
#port=0 # disable DNS function
  CPUs which run at the same hardware frequency: 0
port=53 # enable DNS function
  CPUs which need to have their frequency coordinated by software: 0
bind-interfaces # do not collide with systemd-resolved, we use 127.0.0.1:53, they use 127.0.0.53:53
  maximum transition latency: Cannot determine or is not supported.
domain-needed
  hardware limits: 800 MHz - 4.00 GHz
bogus-priv
  available cpufreq governors: performance powersave
no-resolv
  current policy: frequency should be within 800 MHz and 4.00 GHz.
#log-queries # log DNS quesries
                  The governor "powersave" may decide which speed to use
                  within this range.
# TRIUMF DNS settings
  current CPU frequency: Unable to call hardware
   
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
server=142.90.100.19
  boost state support:
expand-hosts
    Supported: yes
domain=dsdaq
    Active: yes
local=/dsdaq/
root@daq01:~#
localmx # do not forward MX queries to TRIUMF
 
# DHCP settings
interface=enp1s0f0 # VX network 192.168.0.x
#interface=missing  # FEP and TSP network 192.168.1.x
interface=enp1s0f1 # controls network 192.168.2.x
#dhcp-range=192.168.1.50,192.168.1.150,infinite
dhcp-range=192.168.0.0,static
dhcp-range=192.168.2.0,static
log-dhcp # log DHCP queries
#quiet-dhcp
dhcp-ignore=tag:!known
#dhcp-boot=pxelinux.0
dhcp-option=option:dns-server,192.168.0.248
dhcp-option=option:ntp-server,192.168.0.248
# TFTP settings
enable-tftp
tftp-root=/tftpboot
</pre>
</pre>
* Note the following:
* #mkdir /tftpboot ### per tftp-root (if no ZFS)
** current governor is "powersave"
* zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
** "performance" governor is available
* create resolved-dsdaq.conf with main IP address of dnsmasq
** "boost state support" is supported and active.
* Confirm CPU frequency governor:
<pre>
<pre>
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
[Resolve]
powersave
DNS=192.168.0.248
powersave
Domains=dsdaq triumf.ca
powersave
</pre>
powersave
* mkdir -p /etc/systemd/resolved.conf.d/
powersave
* /bin/rm -f /etc/systemd/resolved.conf.d/*.conf
powersave
* cp resolved-dsdaq.conf /etc/systemd/resolved.conf.d/
powersave
* systemctl stop systemd-resolved.service
powersave
* systemctl disable systemd-resolved.service
root@daq01:~#
* systemctl enable dnsmasq
</pre>
* systemctl restart dnsmasq
* Change governor to "performance":
* try to "ping" or "host" some names from /etc/hosts, it should work
* try to ping daq00, daq00.triumf.ca, all should work
* resolved-dsdaq.conf goes into /etc/systemd/resolved.conf.d/ of all machines on the private network
* if not using systemd-resolved, edit /etc/resolv.conf
 
== setup chronyd ==
 
* enable ntp server:
* disable systemd-timesyncd, configure and enable chronyd per instructions above
* create dsdaq.conf
<pre>
<pre>
root@daq01:~# cpupower frequency-set --governor performance
# chrony config for dsdaq server
Setting cpu: 0
 
Setting cpu: 1
#allow 192.168.0.0
Setting cpu: 2
#allow 192.168.1.0
Setting cpu: 3
#allow 192.168.2.0
Setting cpu: 4
allow all
Setting cpu: 5
 
Setting cpu: 6
# end
Setting cpu: 7
</pre>
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
* cp dsdaq.conf /etc/chrony/conf.d/
performance
* systemctl restart chronyd
performance
* chronyc tracking ### wait until time is synchronized (a few seconds)
performance
* create dsdaq.sources # use hostname or IP address of chronyd server
performance
performance
performance
performance
performance
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
</pre>
* monitor CPU frequency:
<pre>
<pre>
root@daq01:~# cpupower monitor
# Put this file in /etc/chrony/sources.d
    | Nehalem                  || Mperf              || Idle_Stats                                   
# systemctl restart chrony
CPU| C3  | C6  | PC3  | PC6  || C0  | Cx  | Freq  || POLL | C1  | C1E  | C3  | C6  | C7s  | C8   
# chronyc sources
  0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
# chronyc tracking
  4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
server dsdaqgw iburst prefer
  1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
# end
  5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
  2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
  6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
  3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
  7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~#  
</pre>
</pre>
* check that the CPU is not overheating:
* dsdaq.sources goes to /etc/chrony/sources.d of all machines on the private network
<pre>
root@daq01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
</pre>
* congratulations, we are running at 4 GHz now!


= Setup ubuntu as gateway to private network =
== setup diskless network booting ==


See also:
=== setup pxelinux for legacy pxe boot ===
* https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
* http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1


== Steps to do ==
* add bits in dnsmasq.conf
 
<pre>
!!! UPDATED 16feb2024 Ubuntu-22.04.03 !!!
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
 
dhcp-boot=pxelinux.0
* assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
dhcp-option=17,"192.168.0.251:/nfsroot/%s,vers=3"
* (on the gateway machine, each private network interface has to have a different network number)
</pre>
* (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
* setup pxelinux for Ubuntu-18
* assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
<pre>
* (for simplicity, assign 192.168.1.1 to the gateway machine itself)
cd ~
* (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
* setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
tar xjvf syslinux-4.03.tar.bz2
* setup DHCP server (dnsmasq) to give out the IP addresses
cd syslinux-4.03
* setup TFTP server (dnsmasq), pxelinux and NFS for diskless booting
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
* setup time server (chronyd) to provide common time to all devices
</pre>
* setup NAT so machines on private network can access the internet (to get OS updates, etc)
* cd /zssd/tftpboot
* setup NIS and NFS so machines on the private network can use common home directories
* setup rsync backup of machines on the private network
 
== setup hosts ==
 
* edit /etc/hosts
<pre>
<pre>
192.168.1.101 dsfe01
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
... and so forth
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
wget http://ladd00.triumf.ca/tftpboot/modules.alias
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
wget http://ladd00.triumf.ca/tftpboot/pci.ids
</pre>
</pre>
* mkdir pxelinux.cfg
* emacs -nw pxelinux.cfg/default
<pre>
default menu.c32
prompt 0


== setup dns and dhcp ==
menu title Welcome to the DSVSLICE PXE boot menu


!!! updated 16feb2024 for Ubuntu 22.04.3 !!!
timeout 50
 
label hdt
  kernel hdt.c32
 
label memtest86+-5.01
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz
 
label memtest86+-4.20
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip
 
label vmlinuz-5.3.0-26-generic
  menu default
  kernel vmlinuz-5.3.0-26-generic
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0


!!! note: stock systemd-resolved remains, is configured to forward queries to dnsmasq, configured to forward queries to TRIUMF DNS !!!
#end
</pre>


!!! note: per authors of systemd, bare hostnames are not permitted, a DNS domain name must always be used. DNS domain name "dsdaq" is used in this example !!!
=== setup pxelinux for efi pxe boot ===


* apt install dnsmasq
* https://c-nergy.be/blog/?p=13808
* ensure dnsmasq starts after all interfaces are up (Ubuntu-22)
* add dnsmasq.conf bits. note: root-path does not actually work, it is hardwired pxelinux.cfg/default file.
<pre>
<pre>
mkdir /etc/systemd/system/dnsmasq.service.d
# uefi pxe
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/dnsmasq.service.d/local.conf
 
dhcp-boot=tag:uefipxe,uefi/syslinux.efi
dhcp-option-force=tag:fe01,option:root-path,192.168.0.248:/nfsroot/fe01
 
# VX network 192.168.0.x
 
dhcp-host=40:a6:b7:c1:d9:c5,fe01,infinite,set:uefipxe,set:fe01
</pre>
</pre>
* edit /etc/dnsmasq.conf
* apt install syslinux pxelinux syslinux-common syslinux-efi syslinux-utils
<pre>
<pre>
# /etc/dnsmasq.conf
mkdir /tftpboot/uefi
# DNS settings
cp /usr/lib/SYSLINUX.EFI/efi64/syslinux.efi /tftpboot/uefi/
#port=0 # disable DNS function
cp /usr/lib/syslinux/modules/efi64/ldlinux.e64 /tftpboot/uefi/
port=53 # enable DNS function
cp /usr/lib/syslinux/modules/efi64/menu.c32 /tftpboot/uefi/
bind-interfaces # do not collide with systemd-resolved, we use 127.0.0.1:53, they use 127.0.0.53:53
cp /usr/lib/syslinux/modules/efi64/hdt.c32 /tftpboot/uefi/
domain-needed
cp /usr/lib/syslinux/modules/efi64/libutil.c32 /tftpboot/uefi/
bogus-priv
cp /usr/lib/syslinux/modules/efi64/libmenu.c32 /tftpboot/uefi/
no-resolv
cp /usr/lib/syslinux/modules/efi64/libcom32.c32 /tftpboot/uefi/
#log-queries # log DNS quesries
cp /usr/lib/syslinux/modules/efi64/libgpl.c32 /tftpboot/uefi/
</pre>
# TRIUMF DNS settings
* try to boot, it should bomb with "cannot load pxelinux.cfg/default"
* mkdir /tftpboot/uefi/pxelinux.cfg
server=142.90.100.19
* create /tftpboot/uefi/pxelinux.cfg/default, note nfsroot path is hardwired, note "http:" is used to load vmlinuz and initrd files (because tftp is super slow)
expand-hosts
<pre>
domain=dsdaq
default menu.c32
local=/dsdaq/
prompt 0
localmx # do not forward MX queries to TRIUMF
 
menu title Welcome to the DSDAQGW UEFI PXE boot menu
 
timeout 50


# DHCP settings
label vmlinuz-6.5.0-17-generic
interface=enp1s0f0 # VX network 192.168.0.x
  kernel http://192.168.0.248:8088/uefi/vmlinuz-6.5.0-17-generic
#interface=missing  # FEP and TSP network 192.168.1.x
  append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto rw ip=dhcp panic=60
interface=enp1s0f1 # controls network 192.168.2.x
 
#dhcp-range=192.168.1.50,192.168.1.150,infinite
# append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
dhcp-range=192.168.0.0,static
 
dhcp-range=192.168.2.0,static
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
log-dhcp # log DHCP queries
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto ip=dhcp rw panic=60
#quiet-dhcp
 
dhcp-ignore=tag:!known
#end
#dhcp-boot=pxelinux.0
dhcp-option=option:dns-server,192.168.0.248  
dhcp-option=option:ntp-server,192.168.0.248
# TFTP settings
enable-tftp
tftp-root=/tftpboot
</pre>
</pre>
* #mkdir /tftpboot ### per tftp-root (if no ZFS)
* try to boot, it will bomb with "cannot load http://...."
* zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
* install mini_httpd on port 8088, see https://acme.com/software/mini_httpd/
* create resolved-dsdaq.conf with main IP address of dnsmasq
<pre>
<pre>
[Resolve]
apt install mini-httpd
DNS=192.168.0.248
emacs -nw /etc/default/mini-httpd # set "START=1"
Domains=dsdaq triumf.ca
emacs -nw /etc/mini-httpd.conf # set "host=192.168.0.248", "port=8088", "data_dir=/tftpboot"
mkdir /etc/systemd/system/mini-httpd.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/mini-httpd.service.d/local.conf
systemctl enable mini-httpd
systemctl restart mini-httpd
systemctl status mini-httpd
wget http://192.168.0.248:8088/uefi/syslinux.efi
tail -100 /var/log/mini_httpd.log
</pre>
</pre>
* mkdir -p /etc/systemd/resolved.conf.d/
* fix initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
* /bin/rm -f /etc/systemd/resolved.conf.d/*.conf
** emacs -nw /usr/lib/initramfs-tools/etc/dhcp/dhclient-enter-hooks.d/config
* cp resolved-dsdaq.conf /etc/systemd/resolved.conf.d/
** add "echo ROOTPATH=..." if it is missing (Ubuntu LTS 22.04)
* systemctl stop systemd-resolved.service
* systemctl disable systemd-resolved.service
* systemctl enable dnsmasq
* systemctl restart dnsmasq
* try to "ping" or "host" some names from /etc/hosts, it should work
* try to ping daq00, daq00.triumf.ca, all should work
* resolved-dsdaq.conf goes into /etc/systemd/resolved.conf.d/ of all machines on the private network
* if not using systemd-resolved, edit /etc/resolv.conf
 
== setup chronyd ==
 
* enable ntp server:
* disable systemd-timesyncd, configure and enable chronyd per instructions above
* create dsdaq.conf
<pre>
<pre>
# chrony config for dsdaq server
                echo "ROOTSERVER='${new_routers%% *}'"
 
                echo "ROOTPATH='$new_root_path'"
#allow 192.168.0.0
                echo "HOSTNAME='$new_host_name'"
#allow 192.168.1.0
#allow 192.168.2.0
allow all
 
# end
</pre>
</pre>
* cp dsdaq.conf /etc/chrony/conf.d/
** regenerate initramfs (be careful you generate it for the right kernel!)
* systemctl restart chronyd
** see https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/2054482
* chronyc tracking ### wait until time is synchronized (a few seconds)
* create dsdaq.sources # use hostname or IP address of chronyd server
<pre>
<pre>
# Put this file in /etc/chrony/sources.d
mkinitramfs 6.5.0-18-generic
# systemctl restart chrony
</pre>
# chronyc sources
* copy linux kernel and initrd
# chronyc tracking
server dsdaqgw iburst prefer
# end
</pre>
* dsdaq.sources goes to /etc/chrony/sources.d of all machines on the private network
 
== setup diskless network booting ==
 
=== setup pxelinux for legacy pxe boot ===
 
* add bits in dnsmasq.conf
<pre>
<pre>
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
cp /boot/vmlinuz-6.5.0-18-generic /tftpboot/uefi/
dhcp-boot=pxelinux.0
cp /boot/initrd.img-6.5.0-18-generic /tftpboot/uefi/
dhcp-option=17,"192.168.0.251:/nfsroot/%s,vers=3"
chmod a+r /tftpboot/uefi/*
</pre>
</pre>
* setup pxelinux for Ubuntu-18
* try to boot, should bomb with messages about "trying to mount root filesystem"
* tail /var/log/syslog
<pre>
<pre>
cd ~
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
tar xjvf syslinux-4.03.tar.bz2
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
cd syslinux-4.03
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
</pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,
* cd /zssd/tftpboot
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
<pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
wget http://ladd00.triumf.ca/tftpboot/modules.alias
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,  
wget http://ladd00.triumf.ca/tftpboot/pci.ids
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
</pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,  
* mkdir pxelinux.cfg
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
* emacs -nw pxelinux.cfg/default
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
<pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
default menu.c32
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  2
prompt 0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
menu title Welcome to the DSVSLICE PXE boot menu
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
timeout 50
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 3 router  192.168.0.248
label hdt
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
  kernel hdt.c32
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
label memtest86+-5.01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 6 dns-server  192.168.0.248
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
label memtest86+-4.20
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
label vmlinuz-5.3.0-26-generic
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
  menu default
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,  
  kernel vmlinuz-5.3.0-26-generic
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
#end
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,
=== setup pxelinux for efi pxe boot ===
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,
* https://c-nergy.be/blog/?p=13808
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
* add dnsmasq.conf bits. note: root-path does not actually work, it is hardwired pxelinux.cfg/default file.
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
# uefi pxe
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 1 option: 53 message-type  5
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 54 server-identifier  192.168.0.248
dhcp-boot=tag:uefipxe,uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 51 lease-time  infinite
dhcp-option-force=tag:fe01,option:root-path,192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 1 netmask  255.255.255.0
# VX network 192.168.0.x
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 28 broadcast  192.168.0.255
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 3 router  192.168.0.248
dhcp-host=40:a6:b7:c1:d9:c5,fe01,infinite,set:uefipxe,set:fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 5 option: 15 domain-name  dsdaq
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
* apt install syslinux pxelinux syslinux-common syslinux-efi syslinux-utils
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server 192.168.0.248
mkdir /tftpboot/uefi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 6 dns-server  192.168.0.248
cp /usr/lib/SYSLINUX.EFI/efi64/syslinux.efi /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: error 8 User aborted the transfer received from 192.168.0.110
cp /usr/lib/syslinux/modules/efi64/ldlinux.e64 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
cp /usr/lib/syslinux/modules/efi64/menu.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
cp /usr/lib/syslinux/modules/efi64/hdt.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
cp /usr/lib/syslinux/modules/efi64/libutil.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
cp /usr/lib/syslinux/modules/efi64/libmenu.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
cp /usr/lib/syslinux/modules/efi64/libcom32.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
cp /usr/lib/syslinux/modules/efi64/libgpl.c32 /tftpboot/uefi/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
* try to boot, it should bomb with "cannot load pxelinux.cfg/default"
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
* mkdir /tftpboot/uefi/pxelinux.cfg
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
* create /tftpboot/uefi/pxelinux.cfg/default, note nfsroot path is hardwired, note "http:" is used to load vmlinuz and initrd files (because tftp is super slow)
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 1 option: 53 message-type  2
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
default menu.c32
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
prompt 0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 28 broadcast  192.168.0.255
menu title Welcome to the DSDAQGW UEFI PXE boot menu
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router 192.168.0.248
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
timeout 50
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 6 dns-server  192.168.0.248
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
label vmlinuz-6.5.0-17-generic
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
  kernel http://192.168.0.248:8088/uefi/vmlinuz-6.5.0-17-generic
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
  append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto rw ip=dhcp panic=60
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
# append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
# append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto ip=dhcp rw panic=60
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  5
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
#end
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
</pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option:  1 netmask  255.255.255.0
* try to boot, it will bomb with "cannot load http://...."
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast 192.168.0.255
* install mini_httpd on port 8088, see https://acme.com/software/mini_httpd/
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 3 router 192.168.0.248
<pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path 192.168.0.248:/nfsroot/fe01
apt install mini-httpd
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 6 dns-server 192.168.0.248
emacs -nw /etc/default/mini-httpd # set "START=1"
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/ldlinux.e64 to 192.168.0.110
emacs -nw /etc/mini-httpd.conf # set "host=192.168.0.248", "port=8088", "data_dir=/tftpboot"
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/01-40-a6-b7-c1-d9-c5 not found
mkdir /etc/systemd/system/mini-httpd.service.d
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006E not found
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/mini-httpd.service.d/local.conf
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006 not found
systemctl enable mini-httpd
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A800 not found
systemctl restart mini-httpd
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A80 not found
systemctl status mini-httpd
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8 not found
wget http://192.168.0.248:8088/uefi/syslinux.efi
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A not found
tail -100 /var/log/mini_httpd.log
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0 not found
</pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C not found
* fix initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
** emacs -nw /usr/lib/initramfs-tools/etc/dhcp/dhclient-enter-hooks.d/config
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/menu.c32 to 192.168.0.110
** add "echo ROOTPATH=..." if it is missing (Ubuntu LTS 22.04)
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/libutil.c32 to 192.168.0.110
<pre>
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
                echo "ROOTSERVER='${new_routers%% *}'"
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
                echo "ROOTPATH='$new_root_path'"
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
                echo "HOSTNAME='$new_host_name'"
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
</pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
** regenerate initramfs (be careful you generate it for the right kernel!)
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
** see https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/2054482
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
<pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
mkinitramfs 6.5.0-18-generic
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
</pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
* copy linux kernel and initrd
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
<pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
cp /boot/vmlinuz-6.5.0-18-generic /tftpboot/uefi/
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  2
cp /boot/initrd.img-6.5.0-18-generic /tftpboot/uefi/
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option: 54 server-identifier  192.168.0.248
chmod a+r /tftpboot/uefi/*
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
</pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option:  1 netmask  255.255.255.0
* try to boot, should bomb with messages about "trying to mount root filesystem"
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast 192.168.0.255
* tail /var/log/syslog
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router 192.168.0.248
<pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name dsdaq
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path 192.168.0.248:/nfsroot/fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  5
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type 2
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 1 netmask  255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option: 3 router  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option: 12 hostname  fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option: 42 ntp-server 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 4 option: 6 dns-server  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 12 hostname  fe01
Feb 16 20:44:54 dsdaqgw rpc.mountd[3350210]: authenticated mount request from 192.168.0.110:981 for /nfsroot/fe01 (/nfsroot/fe01)
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:45:07 dsdaqgw rpc.mountd[3350210]: authenticated unmount request from 192.168.0.110:859 for /nfsroot/fe01/tmp/autoDY4k5u (/nfsroot/fe01)
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 42 ntp-server  192.168.0.248
</pre>
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option:  6 dns-server  192.168.0.248
* tail /var/log/mini_httpd.log
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
192.168.0.110 - - [16/Feb/2024:20:43:15 -0800] "GET /uefi/vmlinuz-6.5.0-17-generic HTTP/1.0" 200 14227944 "" "Syslinux/6.04"
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
192.168.0.110 - - [16/Feb/2024:20:43:24 -0800] "GET /uefi/initrd.img-6.5.0-17-generic HTTP/1.0" 200 137824833 "" "Syslinux/6.04"
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5,
=== setup efi http boot ===
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size,
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path,
https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl,
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server,
=== setup linux kernel ===
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address,  
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1,
* copy the kernel files
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name,
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131,
cd /boot
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
rsync -av config* initrd* System.map* vmlinuz* /tftpboot/
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
* cd /tftpboot
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  5
* chmod a+r *
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
=== setup nfs ===
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name uefi/syslinux.efi
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
* apt-get install nfs-kernel-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast 192.168.0.255
* enable NFS over UDP, edit /etc/nfs.conf add "udp=y":
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 3 router 192.168.0.248
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name dsdaq
udp=y
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
systemctl restart nfs-server.service
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 4 option: 6 dns-server  192.168.0.248
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: error 8 User aborted the transfer received from 192.168.0.110
* emacs -nw /etc/exports
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
* enable services
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
systemctl enable nfs-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
systemctl enable nfs-mountd
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
systemctl enable nfs-idmapd
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
systemctl restart nfs-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
systemctl restart nfs-mountd
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 1 option: 53 message-type  2
systemctl restart nfs-idmapd
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 54 server-identifier  192.168.0.248
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 51 lease-time  infinite
* after editing /etc/exports, run
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
<pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 28 broadcast 192.168.0.255
exportfs -av
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option:  3 router 192.168.0.248
</pre>
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option: 6 dns-server  192.168.0.248
=== setup userland ===
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
!!! ubuntu-18 version !!!
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
* zfs create rpool/nfsroot
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
* zfs set dedup=verify rpool/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
* clone ubuntu
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
<pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
mkdir /nfsroot/dsfe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type 5
cd /
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier 192.168.0.248
rsync -avx . /nfsroot/dsfe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time infinite
</pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option:  1 netmask  255.255.255.0
* edit config files:
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast 192.168.0.255
* cd /nfsroot/dsfe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router 192.168.0.248
* emacs -nw etc/hostname ### change to dsfe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
* emacs -nw etc/mailname ### change to dsfe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 4 option:  6 dns-server  192.168.0.248
* emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/ldlinux.e64 to 192.168.0.110
* emacs -nw etc/defaultdomain ### change to MUSR-NIS
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/01-40-a6-b7-c1-d9-c5 not found
* cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006E not found
* emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006 not found
* emacs -nw root/.ssh/authorized_keys ### update root ssh keys
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A800 not found
* emacs -nw etc/fstab ### add this
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A80 not found
<pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8 not found
192.168.1.1:/nfsroot/dsfe01 / nfs defaults,nolock 0 0
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A not found
</pre>
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0 not found
* emacs -nw etc/chrony/chrony.conf
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C not found
** comment-out all "pool" and "server" entries
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
** add entry "server 192.168.1.1 iburst"
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/menu.c32 to 192.168.0.110
 
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/libutil.c32 to 192.168.0.110
After dsfe01 is booted:
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
* disable services:
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
<pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5
systemctl disable apache2
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
systemctl disable dnsmasq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
systemctl disable zfs-import-cache
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
</pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
To setup additional machines, clone dsfe01 instead of cloning the gateway machine
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
=== Allow manpages to be viewed ===
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  2
If <code>/</code> is mounted over NFS, <code>man</code> will report a permission error. Fix it with:
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
<pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
apparmor_parser -R /etc/apparmor.d/usr.bin.man
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
</pre>
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope,
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 12 hostname  fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw rpc.mountd[3350210]: authenticated mount request from 192.168.0.110:981 for /nfsroot/fe01 (/nfsroot/fe01)
Feb 16 20:45:07 dsdaqgw rpc.mountd[3350210]: authenticated unmount request from 192.168.0.110:859 for /nfsroot/fe01/tmp/autoDY4k5u (/nfsroot/fe01)
</pre>
* tail /var/log/mini_httpd.log
<pre>
192.168.0.110 - - [16/Feb/2024:20:43:15 -0800] "GET /uefi/vmlinuz-6.5.0-17-generic HTTP/1.0" 200 14227944 "" "Syslinux/6.04"
192.168.0.110 - - [16/Feb/2024:20:43:24 -0800] "GET /uefi/initrd.img-6.5.0-17-generic HTTP/1.0" 200 137824833 "" "Syslinux/6.04"
</pre>


=== setup efi http boot ===
== setup shared home directory ==


https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html
=== on the gateway machine ===
 
* define netgroups
=== setup linux kernel ===
* emacs -nw /etc/netgroup
 
* copy the kernel files
<pre>
<pre>
cd /boot
dsfe (dsfe01,,) (dsfe02,,)
rsync -av config* initrd* System.map* vmlinuz* /tftpboot/
</pre>
</pre>
* cd /tftpboot
* emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
* chmod a+r *
 
=== setup nfs ===
 
* apt-get install nfs-kernel-server
* enable NFS over UDP, edit /etc/nfs.conf add "udp=y":
<pre>
<pre>
udp=y
netgroup: files
</pre>
</pre>
* export the home directories:
* emacs -nw /etc/exports ### add this:
<pre>
<pre>
systemctl restart nfs-server.service
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
</pre>
</pre>
* emacs -nw /etc/exports
* exportfs -rc
 
=== on the frontend machine ===
 
* mkdir /home
* emacs -nw /etc/fstab ### add this:
<pre>
<pre>
/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
</pre>
</pre>
* enable services
* mount -a
<pre>
 
systemctl enable nfs-server
== setup NAT ==
systemctl enable nfs-mountd
 
systemctl enable nfs-idmapd
NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
systemctl restart nfs-server
 
systemctl restart nfs-mountd
In these examples:
systemctl restart nfs-idmapd
* replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
</pre>
* replace "enp11s0" with name of the private network interface (192.168.1.x network)
* after editing /etc/exports, run
 
* emacs -nw /etc/rc.local ### add this:
<pre>
<pre>
exportfs -av
# /etc/rc.local
</pre>
 
# enable NAT
 
/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -L -v
 
# uncomment following lines if machine has prohibitive FORWARD rules:
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v


=== setup userland ===
iptables -L -v
sysctl -w net.ipv4.ip_forward=1
#sysctl -a | grep forward


!!! ubuntu-18 version !!!
sh /etc/firewall-rfc1918.sh


* zfs create rpool/nfsroot
# end
* zfs set dedup=verify rpool/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
</pre>
* clone ubuntu
* emacs -nw /etc/firewall-rfc1918.sh
<pre>
<pre>
mkdir /nfsroot/dsfe01
# firewall-rfc1918.sh
cd /
rsync -avx . /nfsroot/dsfe01
</pre>
* edit config files:
* cd /nfsroot/dsfe01
* emacs -nw etc/hostname ### change to dsfe01
* emacs -nw etc/mailname ### change to dsfe01
* emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
* emacs -nw etc/defaultdomain ### change to MUSR-NIS
* cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
* emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
* emacs -nw root/.ssh/authorized_keys ### update root ssh keys
* emacs -nw etc/fstab ### add this
<pre>
192.168.1.1:/nfsroot/dsfe01 / nfs defaults,nolock 0 0
</pre>
* emacs -nw etc/chrony/chrony.conf
** comment-out all "pool" and "server" entries
** add entry "server 192.168.1.1 iburst"


After dsfe01 is booted:
# prevent RFC1918 private network IP addresses from
# going in and out from our uplink.


* disable services:
ETH=eno1
<pre>
systemctl disable apache2
systemctl disable dnsmasq
systemctl disable zfs-import-cache
</pre>


To setup additional machines, clone dsfe01 instead of cloning the gateway machine
iptables -F in-rfc1918
iptables -N in-rfc1918
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A in-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT


=== Allow manpages to be viewed ===
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH


If <code>/</code> is mounted over NFS, <code>man</code> will report a permission error. Fix it with:
iptables -F out-rfc1918
iptables -N out-rfc1918
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A out-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT
 
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH
 
iptables -D FORWARD -j out-rfc1918 -o $ETH
iptables -D FORWARD -j out-rfc1918 -o $ETH
iptables -I FORWARD -j out-rfc1918 -o $ETH
 
# allow TRIUMF-SECURE network
 
iptables -I in-rfc1918 -s 10.90.0.0/255.255.0.0 -j ACCEPT
iptables -I out-rfc1918 -d 10.90.0.0/255.255.0.0 -j ACCEPT
 
# show configuration
 
iptables -L -v


<pre>
#end
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.bin.man
</pre>
</pre>


== setup shared home directory ==
= KVM =


=== on the gateway machine ===
* define netgroups
* emacs -nw /etc/netgroup
<pre>
<pre>
dsfe (dsfe01,,) (dsfe02,,)
apt install cpu-checker
</pre>
 
* emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
root@daq13:~# kvm-ok
<pre>
INFO: /dev/kvm exists
netgroup: files
KVM acceleration can be used
</pre>
root@daq13:~#
* export the home directories:
* emacs -nw /etc/exports ### add this:
<pre>
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
</pre>
* exportfs -rc


=== on the frontend machine ===
(if not, shutdown, go into BIOS settings, enable CPU virtualization)


* mkdir /home
apt install virtinst ### will install many packages
* emacs -nw /etc/fstab ### add this:
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils
<pre>
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
</pre>
* mount -a


== setup NAT ==
root@daq13:/home1/wheel# virsh list --all
Id  Name          State
------------------------------
1    ubuntu-guest  running


NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
apt install virt-manager


In these examples:
virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'
* replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
* replace "enp11s0" with name of the private network interface (192.168.1.x network)


* emacs -nw /etc/rc.local ### add this:
virtual machine will start, boot, etc
<pre>
to get out of it, CTRL + Shift followed by ]
# /etc/rc.local


# enable NAT
ssh wheel@daq13
virt-manager


/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop
iptables -L -v


# uncomment following lines if machine has prohibitive FORWARD rules:
virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v


iptables -L -v
virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off
sysctl -w net.ipv4.ip_forward=1
</pre>
#sysctl -a | grep forward


sh /etc/firewall-rfc1918.sh
build image


# end
<pre>
dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
cd /kvm_ladd00/
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
rsync -av . /mnt/tmp/ --delete
umount /mnt/tmp
</pre>
</pre>
* emacs -nw /etc/firewall-rfc1918.sh
 
on the guest, configure network: /etc/rc.local
<pre>
<pre>
# firewall-rfc1918.sh
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.


# prevent RFC1918 private network IP addresses from
touch /var/lock/subsys/local
# going in and out from our uplink.


ETH=eno1
ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn


iptables -F in-rfc1918
# end
iptables -N in-rfc1918
</pre>
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A in-rfc1918 --dst 172.16.0.0/12  -j REJECT
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT


iptables -D INPUT -j in-rfc1918 -i $ETH
= ARM64 cross-compiler =
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH


iptables -F out-rfc1918
* arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
iptables -N out-rfc1918
* install packages:
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
<pre>
iptables -A out-rfc1918 --dst 172.16.0.0/12  -j REJECT
apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT
</pre>
* run:
<pre>
aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static
aarch64-linux-gnu-g++-12 -o fecdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 fecdm.o dsdm.o /home/dsdaqdev/packages_common/midas/linux-aarch64-remoteonly/lib/libmidas.a -pthread -lrt -lutil /nfsroot/gdm00/usr/lib/aarch64-linux-gnu/libi2c.a -static
</pre>


iptables -D OUTPUT -j out-rfc1918 -o $ETH
= ARM cross-compiler =
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH


iptables -D FORWARD -j out-rfc1918 -o $ETH
NOTE!!!
iptables -D FORWARD -j out-rfc1918 -o $ETH
iptables -I FORWARD -j out-rfc1918 -o $ETH


# allow TRIUMF-SECURE network
THIS IS NOT AN AARCH64 (arm64) CROSSCOMPILER!


iptables -I in-rfc1918 -s 10.90.0.0/255.255.0.0 -j ACCEPT
NOTE!!!
iptables -I out-rfc1918 -d 10.90.0.0/255.255.0.0 -j ACCEPT


# show configuration
* install packages:
 
<pre>
iptables -L -v
apt install libgcc-9-dev-arm64-cross
 
apt install gcc-arm-linux-gnueabi
#end
apt install gcc-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabi
</pre>
* find out the correct -march setting, on the target machine, run:
<pre>
root@gdm00:~# g++ -Q --help=target | grep march
  -march=                    armv8-a
</pre>
</pre>


= KVM =


<pre>
<pre>
apt install cpu-checker
arm-linux-gnueabi-gcc -o ttcp1 ttcp.c -march=armv7 -static
arm-linux-gnueabi-gcc -o memcpy.armv7 memcpy.cc -march=armv7 -static -O2
</pre>


root@daq13:~# kvm-ok
= 32-bit intel cross-compiler =
INFO: /dev/kvm exists
KVM acceleration can be used
root@daq13:~#


(if not, shutdown, go into BIOS settings, enable CPU virtualization)
Ubuntu 22.04:


apt install virtinst ### will install many packages
<pre>
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils
apt install libstdc++-11-dev:i386
apt install zlib1g-dev:i386
</pre>


root@daq13:/home1/wheel# virsh list --all
NOTES:
Id  Name          State
* "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
------------------------------
* to cross-build 32-bit MIDAS, use "make linux32".
1    ubuntu-guest  running
* executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
* executables cross-build on Ubuntu-22 run on 32-bit Debian-12.


apt install virt-manager
= SSH settings for EPICS =
 
virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'
 
virtual machine will start, boot, etc
to get out of it, CTRL + Shift followed by ]
 
ssh wheel@daq13
virt-manager
 
run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop
 
virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none
 
virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off
</pre>
 
build image


* TRIUMF EPICS runs obsolete version of SSH
* add this to the use .ssh/config
<pre>
<pre>
dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
Host sbp1*
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
HostKeyAlgorithms +ssh-rsa
cd /kvm_ladd00/
PubKeyAcceptedAlgorithms +ssh-rsa
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
KexAlgorithms +diffie-hellman-group1-sha1
rsync -av . /mnt/tmp/ --delete
ForwardX11 yes
umount /mnt/tmp
ForwardX11Trusted yes
</pre>
 
on the guest, configure network: /etc/rc.local
<pre>
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
 
touch /var/lock/subsys/local
 
ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn
 
# end
</pre>
 
= ARM64 cross-compiler =
 
* arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
* install packages:
<pre>
apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
</pre>
* run:
<pre>
aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static
aarch64-linux-gnu-g++-12 -o fecdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 fecdm.o dsdm.o /home/dsdaqdev/packages_common/midas/linux-aarch64-remoteonly/lib/libmidas.a -pthread -lrt -lutil /nfsroot/gdm00/usr/lib/aarch64-linux-gnu/libi2c.a -static
</pre>
 
= ARM cross-compiler =
 
NOTE!!!
 
THIS IS NOT AN AARCH64 (arm64) CROSSCOMPILER!
 
NOTE!!!
 
* install packages:
<pre>
apt install libgcc-9-dev-arm64-cross
apt install gcc-arm-linux-gnueabi
apt install gcc-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabi
</pre>
* find out the correct -march setting, on the target machine, run:
<pre>
root@gdm00:~# g++ -Q --help=target | grep march
  -march=                    armv8-a
</pre>
 
 
<pre>
arm-linux-gnueabi-gcc -o ttcp1 ttcp.c -march=armv7 -static
arm-linux-gnueabi-gcc -o memcpy.armv7 memcpy.cc -march=armv7 -static -O2
</pre>
 
= 32-bit intel cross-compiler =
 
Ubuntu 22.04:
 
<pre>
apt install libstdc++-11-dev:i386
apt install zlib1g-dev:i386
</pre>
</pre>


NOTES:
= changes for VME processors =
* "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
* to cross-build 32-bit MIDAS, use "make linux32".
* executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
* executables cross-build on Ubuntu-22 run on 32-bit Debian-12.
 
= SSH settings for EPICS =


* TRIUMF EPICS runs obsolete version of SSH
* add this to the use .ssh/config
<pre>
<pre>
Host sbp1*
apt -y remove sysstat man-db
HostKeyAlgorithms +ssh-rsa
apt -y purge dkms
PubKeyAcceptedAlgorithms +ssh-rsa
apt -y purge mdadm
KexAlgorithms +diffie-hellman-group1-sha1
apt -y autoremove
ForwardX11 yes
ForwardX11Trusted yes
</pre>
</pre>

Latest revision as of 16:57, 13 November 2024

Prerequisites

  • before setting up new machine run memory test
  • prepare flash drive with free version of memtest86: https://www.memtest86.com
  • test boot from flash drive, test takes ~ few hours
  • test will end with summary page, if passed continue with Ubuntu
  • number that might be worth noting is memory latency

Ubuntu version

lsb_release -a
uname -a

Ubuntu installer

  • updated for Ububtu LTS 20.04.01, 22.04.1, 24.04 (only minor differences)
  • download the latest Ubuntu LTS desktop installer iso image
  • dd the image to a USB key
  • power down, disconnect all disks (all HDDs, all SSDs, all M.2)
  • connect the SSD to be used as system disk
  • if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
  • power up
  • boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F8 for ASUS, F11 for Supermicro)
  • follow the instruction:
  • "try ubuntu or install ubuntu" - choose "install"
  • select language - accept default
  • "updates and other software" - accept default settings ("normal install")
  • "installation type" - select "advanced features" and "experimental: use ZFS"
  • accept partition choice
  • "where are you?" - select "Vancouver" (PST time zone)
  • "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
  • don't install third party sw
  • installation runs in a few minutes, when finished, reboot
  • login as user wheel
  • answer annouying questions:
  • "livepatch" - say "next"
  • "help improve" - select "do not send", say "next"
  • "privacy" - leave "location" as "off", say "next"
  • "ready to go", say "done"
  • right-click on the desktop, say "open in terminal", a shell will open
  • say "sudo /bin/bash", enter the root password, you now have the root shell
  • run nm-connection-editor to configure the network. use netmask 255.255.224.0, gateway 142.90.100.18, DNS 142.90.100.19, search path "triumf.ca"
  • after network is up (can ping ladd00), continue with post-installation steps below

Install instructions

prepare

apt update
apt upgrade

install ssh

apt install ssh

install git/scripts

apt -y install git
mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd scripts
git pull

configure hostname

vi /etc/hostname

disable swap

ubuntu installer creates a 2 GB swap partition, not useful on 32-64 GB machine, disable it:

vi /etc/fstab ### comment out the "swap" line

maybe reboot

this is a good point to reboot the machine to boot the latest kernel and to set the correct hostname

install etckeeper

keep contents of /etc in a git repository:

apt -y install etckeeper

set timezone

timedatectl list-timezones | grep -i vancouver
timedatectl set-timezone America/Vancouver

install time synchronization

apt -y install chrony
#echo server time1.triumf.ca iburst >> /etc/chrony/chrony.conf
#echo server time2.triumf.ca iburst >> /etc/chrony/chrony.conf
#echo server time3.triumf.ca iburst >> /etc/chrony/chrony.conf
cd ~/git/scripts
git pull
cd ~
cp ~/git/scripts/etc/triumf.sources /etc/chrony/sources.d/
systemctl disable systemd-timesyncd.service
systemctl stop systemd-timesyncd.service
systemctl disable ntp
systemctl stop ntp
systemctl enable chrony
systemctl restart chrony
chronyc sources
chronyc tracking

NOTE1: if time1, time2, time3 are already listed in /etc/crony/chrony.conf, please remove them and restart chrony.

NOTE2: if time1, time2, time3 are not listed in "chronyc tracking" or if they are not selected by "chronyc tracking", check that /etc/crony/chrony.conf contains "sourcedir /etc/chrony/sources.d", see NOTE4.

NOTE3: read https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server

NOTE4: update very old chrony config file, remove chrony, than install it from scratch as above

grep sourcedir /etc/chrony/chrony.conf ### if we have it, we are good
apt remove chrony
apt purge chrony

reenable systemd-timesyncd

ONLY IF CHRONY DOES NOT WORK

To configure systemd-timesyncd, set "NTP=" in /etc/systemd/timesyncd.conf

apt remove chrony
cat /etc/systemd/timesyncd.conf
systemctl enable systemd-timesyncd.service
systemctl restart systemd-timesyncd.service
systemctl status systemd-timesyncd.service
timedatectl status
timedatectl timesync-status

enable outgoing email (debian 11)

this is different from ubuntu 20. it uses /etc/mailname and it hardwires the hostname into main.cf.

enable outgoing email

we have an unusual email configuration. outgoing email should work to deliver error messages, notices, etc. incoming email is disabled, we do not receive email for local users.

this causes problems with TRIUMF smtp server. if our message cannot be delivered (wrong email address or receipient computer is turned off), TRIUMF smtp server will generate a delivery failure notification email and try to send it to the "from" address of the failed message. but the "from" address does not receive any email, so another delivery failure notification email is generated and an attempt to deliver it. which again fails, rinse and repeat.

as solution, kray created a special rule, email from scrap.triumf.ca does not generate delivery failure notices. failed messages sit in the queue for 5 days, then they are deleted. (K.O. - confirmed with kray 3jan2024).

to make this work we use the msmtp MTA package.

cd ~
apt -y remove postfix
apt -y purge postfix # remove old config files
apt -y install mailutils msmtp msmtp-mta # say "no" to apparmor support
apt -y install bsd-mailx
cd ~/git/scripts/etc
git pull
/bin/cp -fv aliases /etc/aliases
/bin/cp -fv msmtprc /etc/msmtprc
/bin/rm -vf ~root/.forward
/bin/rm -vf /etc/mailname
Mail root
Subject: test
test
^D
CC: <CR>

enable outgoing email (postfix)

THIS IS OBSOLETE!!!

  • TRIUMF: use smtp.triumf.ca
  • CERN: use cernmx.cern.ch
apt install postfix ### select "satellite system", enter full hostname "xxx.triumf.ca", enter "smtp.triumf.ca"
apt install mailutils
dpkg-reconfigure postfix ### (if postfix already installed)
echo olchansk@triumf.ca lindner@triumf.ca bsmith@triumf.ca >> ~root/.forward
mailx root
test
^D

enable ping for all users (debian 11)

Without this tweak, Debian will report "operation not permitted" if a user tries to ping somewhere.

echo 'net.ipv4.ping_group_range = 0 1000' > /etc/sysctl.d/99-ping.conf

disable apparmor

On NFS-Root network booted machines!

If "man man" returns "permission denied" and syslog reports apparmor "sendmsg DENIED" errors, disable apparmor. This is supposedly fixed in kernel 6.0 and later (to be confirmed), see https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/1784499

Disable apparmor, see https://ubuntu.com/server/docs/security-apparmor

This takes effect after a reboot.

systemctl stop apparmor.service
systemctl disable apparmor.service

install missing packages

(apt eats terminal input, even the "yes |" trick does not quite work, repeat the following commands until they report that everything is installed)

yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools traceroute time minicom screen git lsof debsums tmux iptables telnet
yes | apt -y install sysstat smartmontools lm-sensors
yes | apt -y install lsb-release
apt -y install vim # in addition to default vim-tiny, requested by IRIS
apt -y install tcl
apt -y install pax rpm alien ### package converter tools
yes | apt -y install flex bison
yes | apt -y install neofetch
yes | apt -y install snmp snmp-mibs-downloader
yes | apt -y install git subversion g++ gfortran cmake doxygen
yes | apt -y install curl libcurl4 libcurl4-openssl-dev
yes | apt -y install mariadb-client ### mysql client
yes | apt -y install libz-dev libzstd-dev sqlite3 libsqlite3-dev unixodbc-dev
yes | apt -y install libssl-dev
yes | apt -y install emacs xemacs21 joe
yes | apt -y install gnuplot dos2unix
yes | apt -y install mutt bsd-mailx # email clients
yes | apt -y install liblz4-tool pbzip2
yes | apt -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h
yes | apt -y install libreadline-dev
yes | apt -y install ubuntu-mate-themes
yes | apt -y install libmotif-dev libxmu-dev
yes | apt -y install libusb-dev libusb-1.0-0-dev
yes | apt -y install i2c-tools libi2c-dev libi2c0
yes | apt -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig
yes | apt -y install libjson-perl
yes | apt -y install libgsl-dev # additional GNU Scientific Library
yes | apt -y install qt5-default # Qt development
yes | apt -y install python3-full python3-dev python3-dbg python3-pip ### for pyROOT
yes | apt -y install imagemagick imagemagick-common ckeditor # for elog
yes | apt -y install libjpeg-dev libjpeg-progs libjpeg-tools
yes | apt -y install linux-tools-common linux-tools-generic # cpupower frequency-info
yes | apt -y install rdesktop remmina remmina-plugin"*" # requested by POL
yes | apt -y install nlohmann-json3-dev # required to build MIDAS with ROOT 6.30 on Ubuntu-22
apt -y install dpkg-dev cmake g++ gcc binutils libx11-dev libxpm-dev libxft-dev libxext-dev python3 libssl-dev libafterimage0 # from https://root.cern/install/dependencies/
apt -y install gfortran libpcre3-dev xlibmesa-glu-dev libglew-dev libftgl-dev libmysqlclient-dev libfftw3-dev libcfitsio-dev graphviz-dev libldap2-dev python3-dev python3-numpy libxml2-dev libkrb5-dev libgsl0-dev qtwebengine5-dev nlohmann-json3-dev libtbb-dev libavahi-compat-libdnssd-dev # from https://root.cern/install/dependencies/
apt -y install libvdt-dev # for ROOT 6.32 on Ubuntu-24
apt -y install u-boot-tools # for Xilinx petalinux
#apt -y install linux-headers-generic # to build linux kernel drivers

Ubuntu LTS 20.04:

yes | apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 # enable linux 5.11 series kernel

Ubuntu LTS 22.04:

apt -y install linux-generic-hwe-22.04 # enable linux 6.2.0 series kernel

Ubuntu LTS 24.04:

apt -y install linux-generic-hwe-24.04 # enable linux 6.8.0 series kernel

disable swap (debian 11)

  • on 64 GB RAM machines swap is not useful
  • on machines booted from network (NFS-ROOT), swap does not work
  • on machines running from flash (RPi, etc), flash is too slow for useful swap
  • swap configured by linux installers invariably has wrong size and is not useful
systemctl disable dphys-swapfile
systemctl stop dphys-swapfile
dphys-swapfile uninstall

configure DNS

cd ~/git/scripts
git pull
mkdir /etc/systemd/resolved.conf.d
cp etc/resolved-triumf.conf /etc/systemd/resolved.conf.d/
systemctl restart systemd-resolved
resolvectl
#systemd-analyze cat-config systemd/resolved.conf

install ganglia

apt -y install ganglia-monitor
cd ~root/git/scripts/ganglia
git pull
make install
./ganglia-all.perl

fix gmond start before network is ready:

mkdir /etc/systemd/system/ganglia-monitor.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/ganglia-monitor.service.d/local.conf
systemctl daemon-reload
systemctl cat ganglia-monitor.service

install gonodeinfo

yes | apt-get -y install golang
mkdir ~/git
cd ~/git
#git clone https://bitbucket.org/dd1/gonodeinfo.git
git clone https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
cd gonodeinfo
git remote set-url origin https://daq00.triumf.ca/~olchansk/git/gonodeinfo.git
git pull
make
make install # install gonodeinfo agent
cd ~ # this is important
  • edit /etc/gonodeinfo.conf
  • change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
  • change "Servers" to read: Servers: daq00.triumf.ca:8601
  • run "gonodeinfo -v"
  • if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
  • on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
  • try gonodeinfo again, there should be no error
  • on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now

install fonts for EPICS

  • apt install xfonts-100dpi xfonts-75dpi
  • restart Xorg (i.e. "killall Xorg", this will log you out from the console)
  • xlsfonts | grep -i helvetica ### should show fonts with different sizes, not just size 0 (scalable)

install libz.so.1 for CentOS compatibility

KO - confirm which versions on quartus need this.

yes | apt-get -y install zlib1g
yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386

install libpng12.so.0 for Quartus compatibility

(does not work anymore!!!)

wget http://ftp.ca.debian.org/debian/pool/main/libp/libpng/libpng12-0_1.2.50-2+deb8u2_amd64.deb
dpkg --install libpng12-0_1.2.50-2+deb8u2_amd64.deb

install libpng12.so.0 for Quartus 13.0sp1

wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0.50.0
/bin/cp -pv libpng12.so.0 libpng12.so.0.50.0 /lib/x86_64-linux-gnu/

install packages for Xilinx

ubuntu LTS 22.04 vivado 2020.1

apt install autoconf libtool
apt install libtinfo5
apt install texinfo
apt install zlib1g:i386

install packages for building ROOT

apt -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake

install 32-bit libraries for PHYSICA

these instructions are for running 32-bit physica executable built for SL6 on ubuntu LTS 20.04

install physica sources (cannot build, do not have g77)

cd ~/packages
git clone https://bitbucket.org/ttriumfdaq/physica.git

install 32-bit libraries using ubuntu package manager:

apt install lib32z1 # libz.so

copy 32-bit SL6 shared libraries to /lib32

root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libX11.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libgd.so.2 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libpng12.so.0 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libreadline.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libncurses.so.5 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libg2c.so.0 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libxcb.so.1 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXpm.so.4 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libjpeg.so.62 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfontconfig.so.1 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfreetype.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libtinfo.so.5 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXau.so.6 /lib32/
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libexpat.so.1 /lib32/

ldd should report:

trinatdaq:trinat> ldd /usr/local/physica/physica.exe
	linux-gate.so.1 (0xf7fa2000)
	libX11.so.6 => /lib32/libX11.so.6 (0xf7e43000)
	libgd.so.2 => /lib32/libgd.so.2 (0xf7dfe000)
	libpng12.so.0 => /lib32/libpng12.so.0 (0xf7dd6000)
	libz.so.1 => /lib32/libz.so.1 (0xf7db8000)
	libreadline.so.6 => /lib32/libreadline.so.6 (0xf7d7e000)
	libncurses.so.5 => /lib32/libncurses.so.5 (0xf7d5b000)
	libg2c.so.0 => /lib32/libg2c.so.0 (0xf7d3d000)
	libm.so.6 => /lib32/libm.so.6 (0xf7c39000)
	libgcc_s.so.1 => /lib32/libgcc_s.so.1 (0xf7c1a000)
	libc.so.6 => /lib32/libc.so.6 (0xf7a2f000)
	libxcb.so.1 => /lib32/libxcb.so.1 (0xf7a05000)
	libdl.so.2 => /lib32/libdl.so.2 (0xf79ff000)
	libXpm.so.4 => /lib32/libXpm.so.4 (0xf79ee000)
	libjpeg.so.62 => /lib32/libjpeg.so.62 (0xf7997000)
	libfontconfig.so.1 => /lib32/libfontconfig.so.1 (0xf7962000)
	libfreetype.so.6 => /lib32/libfreetype.so.6 (0xf78c9000)
	libtinfo.so.5 => /lib32/libtinfo.so.5 (0xf78b0000)
	/lib/ld-linux.so.2 (0xf7fa4000)
	libXau.so.6 => /lib32/libXau.so.6 (0xf78ad000)
	libexpat.so.1 => /lib32/libexpat.so.1 (0xf7885000)
trinatdaq:trinat> 

set login environment:

setenv TRIUMF_FONTS $HOME/packages/physica/fonts
setenv PHYSICA_DIR $HOME/packages/physica
alias physica $PHYSICA_DIR/physica-SL6-32

test:

cd ~/packages/physica
physica
@rangauss.pcm

install wine

As far as I know, only needed for BNMR/BNQR

apt install wine winetricks

install lightdm

unlike the default gdm login manager, lightdm shows the machine hostname and does not require an extra mouse click to swicth from screen saver to login mode.

apt -y install lightdm
# select lightdm

install desktop environments

note: default display manager and default desktop are deficient, please do not skip this step.

note: if apt asks to choose the display manager, select "lightdm"

note: KO - I recommend the "MATE" desktop.

note: you will have to cut-and-paste this several times because "apt" eats commands, even with "-y" and even piped from "yes".

# install MATE desktop
DEBIAN_FRONTEND=noninteractive apt -y install ubuntu-mate-core ubuntu-mate-desktop ubuntu-mate-themes
# install Cinnamon desktop
DEBIAN_FRONTEND=noninteractive apt -y install cinnamon
# install KDE desktop
DEBIAN_FRONTEND=noninteractive apt -y install kubuntu-desktop
# install Lxqt desktop
DEBIAN_FRONTEND=noninteractive apt -y install lxqt
# install Xfce4 desktop
DEBIAN_FRONTEND=noninteractive apt -y install xfce4

install ROOT

Please install ROOT per instructions at https://root.cern.ch.

NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system

NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.

Install x2go

apt-get update
apt-get install x2goserver x2goserver-xsession

enable root login from ladd00/daq00

ssh localhost
CTRL-C
/bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/

disable ssh access from outside of TRIUMF

to stop ssh login spam, disable ssh access from outside of TRIUMF. this can be done by requesting a firewall block through the helpdesk or by local firewall rule:

echo iptables -I INPUT ! -s 142.90.0.0/255.255.0.0 -p tcp --dport 22 -j REJECT >> /etc/rc.local
/etc/rc.local

install smart-status

ln -s ~/git/scripts/smart-status/smart-status.perl ~root/

enable boot menu and boot messages

This will enable the grub menu (with a 10 sec timeout) and replace black screen with exciting linux boot messages.

  • emacs -nw /etc/default/grub
GRUB_DEFAULT=0
#GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
#GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480"
GRUB_CMDLINE_LINUX_DEFAULT=""
GRUB_CMDLINE_LINUX=""
#GRUB_GFXMODE=640x480
  • update grub config:
grub-mkconfig -o /boot/grub/grub.cfg

reboot

this completes installation of the base system.

following sections modify basic ubuntu to fix known problems and to enable special stuff.

Enable automatic updates

apt install unattended-upgrades
cd ~/git/scripts
git pull
/bin/cp -v etc/99apt-conf-ko /etc/apt/apt.conf.d/
apt-config dump | grep Unattended

Following is obsolete:

  • emacs -nw /etc/apt/apt.conf.d/50unattended-upgrades
    • uncomment in Allowed-Origins "-security" and "-updates"
    • add in Allowed-Origins: "Google LLC:stable";
    • uncomment/add: "Unattended-Upgrade::Mail "root";
  • emacs -nw /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "1";
APT::Periodic::AutocleanInterval "7";
APT::Periodic::Unattended-Upgrade "1";
  • test: unattended-upgrade --dry-run -v

NOTE: update-on-shutdown is disabled.

NOTE: there is no update-on-boot, but:

NOTE: if machine was off for a long time, the systemd update timer would have expired and it will fire soon after reboot, causing an automatic update run. this is unwanted, and there is no fix or workaround for it. K.O. June-2023.

Fix bpool is full (obsolete)

THIS IS CAUSED BY OBSOLETE PACKAGE zsys. PLEASE: apt remove zsys

!!! only if ROOT on ZFS !!!

There is an error in the zsys package that causes bpool to run out of space, see #Ubuntu zsys for more details.

To fix:

cd ~/git/scripts
git pull
cp etc/zsys.conf /etc/
zsysctl service reload
zsysctl service gc
zpool list bpool
zfs list bpool
df /boot

IPMI instructions

IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.

apt-get install ipmitool
systemctl enable ipmievd
systemctl restart ipmievd

Run:

  • ipmitool sel list ### event list
  • ipmitool sel elist ### event list
  • ipmitool sel clear ### clear event list (if it becomes full)
  • ipmitool sensor ### report hardware sensors

move /home/wheel

note: this MUST be done if ZFS root and NIS/autofs with /home.

Default location of wheel's home directory will collide with autofs /home, it has to be moved, for example to /wheel.

# logout from the wheel user
# go to another computer
ssh root@daqubuntuxxx
zfs list | grep wheel ### identify zfs name wheel_xxxxxx
#zfs set mountpoint=/wheel rpool/USERDATA/wheel_hm8fzh
zfs set mountpoint=/wheel `zfs list | grep wheel | cut -f1 -d" "`
zfs list | grep wheel
emacs -nw /etc/passwd ### change wheel's home directory from /home/wheel to /wheel
su - wheel ### check that user wheel still works

This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.

enable NIS (ubuntu 22.04, 24.04, debian 11, 12)

apt -y install rpcbind nis
echo DAQ-NIS >> /etc/defaultdomain
echo ypserver daq00.triumf.ca >> /etc/yp.conf
systemctl enable ypbind.service
systemctl restart ypbind.service
systemctl status ypbind.service
ypwhich -m

enable ypserv:

sed -i s/NISSERVER=false/NISSERVER=slave/ /etc/default/nis
/usr/lib/yp/ypinit -s daq00
echo ypserver localhost >> /etc/yp.conf
sed -i "s/ypserver .*/ypserver localhost/" /etc/yp.conf
systemctl enable ypserv
systemctl restart ypserv
systemctl restart ypbind

edit /etc/nsswitch.conf to read:

# begin get data from nis
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis

#passwd: ...
#group: ...
#shadow: ...

#netgroup: ...
#automount: ...

enable hourly update of nis maps:

mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly

If this is a new machine, then on the master NIS node (daq00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)

enable NIS (ubuntu 20.04)

  • apt-get -y install portmap nis ### will ask for NIS domain (DAQ-NIS)
  • dpkg-reconfigure nis ### reconfigure if already installed
  • ypwhich -m
  • edit /etc/default/nis
    • set "NISSERVER=slave"
    • Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
  • #edit /etc/yp.conf, comment-out everything, add "domain DAQ-NIS server localhost"
  • edit /etc/yp.conf, comment-out everything, add "ypserver localhost"
  • /usr/lib/yp/ypinit -s daq00
  • systemctl enable nis
  • systemctl restart nis
  • ypwhich
  • ypwhich -m
  • ypcat -k passwd
  • vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
# begin get data from nis
passwd: files nis
group: files nis
shadow: files nis
automount:  files nis
netgroup: files nis
# end get data from nis
  • enable hourly update of NIS maps
mkdir ~root/git
cd ~root/git
git clone https://daq00.triumf.ca/~olchansk/git/scripts.git
cd ~/git/scripts/etc
git pull
ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
  • ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"

enable autofs

apt -y install autofs
systemctl enable autofs
systemctl restart autofs
ls -l /home/olchansk ### test autofs, check file owner is correct

enable NFS server

apt install nfs-kernel-server
#edit /etc/exports
systemctl enable nfs-server
systemctl restart nfs-server

NIS master

notes for setting up the NIS master

wheel user

"wheel" is the default administrative user. We do not want it's password exported to NIS (encrypted password hash is world visible) and we do not want it's home directory exported to NFS (~wheel/.ssh is world visible and potentially writable: anybody can change ~wheel/.ssh/authorized_keys).

  • move wheel's home directory from /home/wheel to /wheel (see special section about this)
  • change wheel's UID and GID from 1000 to a value below MINUID in /var/yp/Makefile

coherent uids

we do not want system accounts defined in /etc/passwd of the NIS master to be included in the NIS map "passwd". this causes trouble on NIS clients where newly installed packages fail to create local system users because same user already exists in NIS.

This is controlled by MINUID in /var/yp/Makefile.

Historical TRIUMF uids start from around 200, but several clusters do not have any historic TRIUMF uids below 500 and MINUID is set to:

  • DAQ-NIS: MINUID=200
  • ISAC-NIS: MINUID=500
  • TITAN-NIS: MINUID=500
  • MUSR-NIS: MINUID=500
  • TIG-NIS: MINUID=500 (100 on SL6 mother8pi)

Ubuntu 20 has two programs to create users:

  • adduser - creates new users with UID 1000 and up as specified in /etc/adduser.conf. No problems here.
  • adduser --system - creates new system users with UID 100 and up as specified in /etc/adduser.conf. No problems here.
  • useradd - creates new users with UID 1000 and up as specified in /etc/login.defs. No problems here.
  • useradd --system - creates new system users with UID 999 and down (read "man useradd", section at the end about SYS_UID_MAX). This collides with NIS MINUID, these system users will be included in the NIS map and cause trouble.

This problem cannot be fixed, SYS_UID_MIN, SYS_UID_MAX and UID_MIN in /etc/login.defs do not seem to have any effect on UIDs chosen by "useradd --system". (tested on Ubuntu LTS 20.04).

So far only these system accounts seem to be affected by this:

  • systemd-coredump
  • ganglia

To fix:

  • run "sort -r -n -t: -k3 /etc/passwd" to identify the last unused system user uid (range 100..200)
  • run "sort -r -n -t: -k3 /etc/group" to identify the last unused system user gid (range 100.200)
  • systemd-coredump: manually change UID and GID (package systemd-coredump is usually not installed)
  • ganglia: same thing, then change ownership on all ganglia files.

Also read systemd author's opinion on system vs user UIDs: https://github.com/systemd/systemd/issues/4850#issuecomment-265698275

Fix systemd-logind NIS breakage

!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!

there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this fix removes the delay.

systemd developers think that we should not use NIS and made sure there are problems if we do. To give them credit, they do offer a workaround. Read this: https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67

mkdir /etc/systemd/system/systemd-logind.service.d
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf
systemctl daemon-reload
systemctl cat systemd-logind.service

Fix systemd-udevd NIS breakage

see same problem as above with udev getting stuck. ubuntu lts 20.04.

mkdir /etc/systemd/system/systemd-udevd.service.d
echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-udevd.service.d/local.conf
systemctl daemon-reload
systemctl cat systemd-udevd.service

Configure USB device permissions

Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.

  • create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules
ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" 
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c"
ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}"
ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
  • reload udev rules: udevadm control --reload-rules
  • apply new permissions: udevadm trigger --action=add
  • watch udev activity: udevadm monitor -p

Configure lightdm display manager

  • enable it
echo lightdm | dpkg-reconfigure -fteletype lightdm
systemctl disable gdm
systemctl disable sddm
systemctl enable lightdm
  • make the MATE desktop as default
cd ~root/git/scripts/
git pull
/bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
  • enable login by NIS users
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
  • restart lightdm
systemctl stop gdm
systemctl restart lightdm

Install libpng12.so.0

Quartus 16 needs libpng12:

wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb
dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb

Install google-chrome

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
dpkg -i google-chrome-stable_current_amd64.deb

confirm autoupdate is enabled, observe dl.google.com is present in the list of repositories:

apt update
...
Get:5 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,094 B]
...

FOLLOWING IS OBSOLETE:

Instructions from here: https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add -
sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-tmp.list'
apt update
apt install google-chrome-stable
/bin/rm -f /etc/apt/sources.list.d/google-tmp.list

Install amanda client

ONLY ONE MACHINES THAT HOST HOME DIRECTORIES

  • apt install amanda-client
  • edit /etc/amandahosts
amanda.triumf.ca amanda amdump
  • check permissions on /etc/amandahosts:
root@daq00:/var/log/amanda# ls -l /etc/amandahosts
-rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
  • fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
  • edit /etc/amanda-security.conf, add this line:
runtar:gnutar_path=/usr/bin/tar

On the amanda machine:

  • in amanda disklist, use dump type "bsdtcp-comp-user-tar"
  • su - amanda and run amcheck -c daily daq00
-bash-4.1$ amcheck -c daily daq00

Amanda Backup Client Hosts Check
--------------------------------
Client check: 1 host checked in 0.092 seconds.  0 problems found.

(brought to you by Amanda 3.3.7p1.git.685ff76d)

Enable rc.local

For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:

cd ~/git/scripts
git pull
cp -n -v etc/rc.local /etc/
chmod a+rx /etc/rc.local
cp etc/rc-local.service /etc/systemd/system/
systemctl daemon-reload
systemctl enable rc-local
systemctl start rc-local
systemctl status rc-local

Remove unwanted packages

apt remove bash-completion # broken, adds unwanted "\" if "ls -l $ROOTSYS/<tab>"
apt remove zsys # broken, do not use
apt remove sddm # login manager
apt remove avahi-daemon avahi-autoipd # not sure what it does, observed using 100% CPU
apt remove modemmanager # probes all serial ports to see if it's a modem

Disable unwanted services

systemctl disable mpd
systemctl disable snapd
systemctl disable ModemManager
systemctl --global mask tracker-extract-3.service
systemctl --global mask tracker-miner-fs-3.service
systemctl daemon-reload

Disable sleep and suspend

note: we see some computers randomly shutdown or go to sleep, log files indicates the "sleep" or "suspend" button was pushed by user, but no such buttons actually exist. this is the fix for this:

systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service

Enable crontab @reboot for MIDAS

startup scripts have a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).

mkdir /etc/systemd/system/cron.service.d
echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/cron.service.d/local.conf
systemctl daemon-reload
systemctl cat cron.service

Explore the systemd dependency tree using "systemctl list-dependencies" maybe with "--all".

Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.

Crontab entry to start midas: (install in the midas user crontab, not root crontab)

su - midasuser
crontab -l
#@reboot /bin/bash -l -c "/home/trinat/bin/start-daq-applications"
#@reboot /bin/tcsh -c "/home/trinat/bin/start-daq-applications"

Install apache httpd proxy for midas and elog

This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.

First, configure apache2:

  • execute these commands:
apt -y install apache2
cd /etc/apache2
  • create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
SSLSessionCache         shmcb:/run/httpd/sslcache(512000)
SSLSessionCacheTimeout  300
SSLRandomSeed startup file:/dev/urandom  256
SSLRandomSeed connect builtin
SSLCryptoDevice builtin
  • create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
<IfModule mod_ssl.c>
    <VirtualHost *:443>
        ServerName daq14.triumf.ca
        DocumentRoot /var/www/html
        ErrorLog /var/log/apache2/daq14.log
        SSLEngine on
        # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf
        SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1
        SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4
        ## use port specified in elogd.cfg
        #ProxyPass /elog/ http://localhost:8082/ retry=1 
        ## use mhttpd port
        #ProxyPass /      http://localhost:8080/ retry=1 
        Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains"
        <Location />
            SSLRequireSSL
            AuthType Basic
            AuthName "DAQ password protected site"
            Require valid-user
            # create password file: touch /etc/apache2/htpasswd
            # to add new user or change password: htpasswd /etc/apache2/htpasswd username
            AuthUserFile /etc/apache2/htpasswd
        </Location>
    </VirtualHost>
</IfModule>
  • stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
  • stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
  • enable ssl module
  • enable new configurations
a2enmod ssl
a2enmod headers
a2enmod proxy
a2enmod proxy_http
a2enconf ssl-daq14
a2ensite daq14-ssl
  • disable default ssl sites
a2dissite 000-default-le-ssl
a2dissite 000-default
ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
  • check that there are no syntax problems
apache2ctl configtest
  • enable and start apache2:
systemctl enable apache2
systemctl restart apache2
systemctl status apache2
  • apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
  • if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
  • try to access https://daq14.triumf.ca
    • you should see a complaint about self-signed certificate
    • you should see a request for password (do not login yet)
    • if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.

Second, configure certbot:

(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)

(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)

(Note: unsurprisingly, this requires outside access to connect with letsencrypt, so won't work if PC is only accessible from on-site network)

  • check that port 80 is not used by anything:
  • netstat -an | grep LISTEN | grep ^tcp | grep 80
  • lsof -P | grep -i tcp | grep LISTEN | grep 80
  • if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
  • install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
apt install certbot python3-certbot-apache
certbot certonly --standalone --installer apache
  • then answer questions:
  • "activate HTTPS for daq14.triumf.ca" - say ok
  • "enter email address" - enter your own email address
  • "please read terms..." - read the terms and say "agree"
  • it will take a few moments...
  • "congratulations..." - say ok.
certbot install --apache --cert-name daq14.triumf.ca
  • then answer questions:
  • "choose redirect..." - say "1" (no redirect)
  • look inside /etc/apache2/sites-enabled/daq14-ssl.conf to see that SSLCertificateFile & co point to certbot certificates in

/etc/letsencrypt/live/daq14.triumf.ca/

  • to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal

NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot

Third, activate password protection:

  • as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/apache2/htpasswd
htpasswd /etc/apache2/htpasswd midas
  • restart apache2
systemctl restart apache2
systemctl status apache2

From here:

  • enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
  • enable proxy for ELOG - ditto
a2enmod proxy
a2enmod proxy_http
apache2ctl configtest
systemctl restart apache2
SSL                  = 0

NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0

generate self-signed certificate

root@alphacpc05:~# openssl req  -nodes -new -x509  -keyout server.key -out server.cert -days 1001
...+....+..+..........+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+..+...+.........+......+.+...+...+.....+...............+.........+...+.+......+...+...........+....+...+..+......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*....+......+.+...+..+.......+..+...+.......+......+...+..+...+......+....+...............+..+...+....+...........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
......+......+.+..+......+.+......+.....+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.+.....+......+.+.........+......+.....+.+..+...+.......+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++*.......+....+......+.....+...+...+.......+..+.+........+.+...+......+..+..........+..+.+...........+...+.......+......+.....+.......+...+.........+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [AU]:CH
State or Province Name (full name) [Some-State]:Geneve
Locality Name (eg, city) []:CERN
Organization Name (eg, company) [Internet Widgits Pty Ltd]:CERN
Organizational Unit Name (eg, section) []:ALPHA experiment           
Common Name (e.g. server FQDN or YOUR name) []:alphacpc05.cern.ch
Email Address []:
root@alphacpc05:~# 
root@alphacpc05:~# 
root@alphacpc05:~# ls -l
-rw-r--r-- 1 root root 1375 juil. 10 21:43 server.cert
-rw------- 1 root root 1708 juil. 10 21:42 server.key
root@alphacpc05:~# systemctl restart apache2

Enable elog PDF preview

see https://stackoverflow.com/questions/52998331/imagemagick-security-policy-pdf-blocking-conversion

  • xemacs -nw /etc/ImageMagick-6/policy.xml
  • remove this section at the end:
<!-- disable ghostscript format types -->
<policy domain="coder" rights="none" pattern="PS" />
<policy domain="coder" rights="none" pattern="PS2" />
<policy domain="coder" rights="none" pattern="PS3" />
<policy domain="coder" rights="none" pattern="EPS" />
<policy domain="coder" rights="none" pattern="PDF" />
<policy domain="coder" rights="none" pattern="XPS" />

Install Jupyter notebook

From https://jupyter.org/install
apt install python3-pip
pip install jupyterlab
pip install notebook
~/.local/bin/jupyter notebook
watch the http://localhost:8888 URL that it printed
say "no" to offer to start firefox (it will not work!)
URL is: http://localhost:8888/tree?token=xxx
from the machine where you are running the web browser (i.e. google-chrome), run (replace trinat@trinatdaq with the username and machine name where you started jupyter)
open a new shell and run: ssh -v trinat@trinatdaq -L 8888:localhost:8888
in the web browser, open http://localhost:8888
this gives us the login page
in the password or token entry field, put the token from the "tree?token=xxx" above (printed by jupyter on startup)
push button "login"
jupyter page should open with the list of files in the trinat home directory
congratulate Brian with full success

Install ZFS quota report

If there are any ZFS volumes, install script to report disk and quota usage

cd ~/git/scripts/quotareport
git pull
mkdir /var/www/html/zfsquotareport
cp -pv ~/git/scripts/quotareport/sorttable.js /var/www/html/zfsquotareport/
ln -s $PWD/zfsquotareport.perl /etc/cron.daily/
touch /etc/crontab

If httpd is configured to redirect "/" to MIDAS mhttpd:

  • add following to /etc/apache2/sites-enabled/xxx-ssl.conf in front of "ProxyPass / ..."
  • run "systemctl reload apache2"
## do not proxy zfs quota report directory 
ProxyPass /zfsquotareport/ ! 

Install PHP

  • apt install php libapache2-mod-php
  • systemctl restart apache2
  • create /var/www/html/info.php
<?php 
 
phpinfo(); 

Configure TRIUMF printers

systemctl stop cups
systemctl disable cups
systemctl stop snap.cups.cupsd.service
systemctl stop snap.cups.cups-browsed.service
systemctl disable snap.cups.cupsd.service
systemctl disable snap.cups.cups-browsed.service
echo "ServerName printers.triumf.ca" > /etc/cups/client.conf
lpstat -a

Enable core dumps

By default, Ubuntu LTS 20.04 installs the apport package which disabled core dumps from user applications. (google it up!). It is not meant to do this and documentation claims that it is not installed and not enabled by default. Oh, well...

apt remove apport
apt autoremove ### will remove apport-symptoms and a few other packages

After this, core dumps are written to file "core" in the current directory. See /proc/sys/kernel/core_pattern and /proc/sys/kernel/core_uses_pid.

Enable core dump file names to include process id, add following to /etc/rc.local

echo 1 > /proc/sys/kernel/core_uses_pid

Enable debugger

By default, Ubuntu LTS 20.04 does not permit debugger to attach and debug already running programs. To enable it, add following to /etc/rc.local

echo 0 > /proc/sys/kernel/yama/ptrace_scope

Disable Ubuntu Pro nag

best I can tell, impossible at this time.

do not do this

!!! does nothing !!!

pro config set apt_news=false

do not do this

!!! breaks automatic updates because 20apt-esm-hook.conf is missing !!!

If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:

/bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf

do not do this

!!! likely same as above, breaks automatic updates !!!

  • comment out /etc/apt/apt.conf.d/20apt-esm-hook.conf

do not do this

!!! removes too many packages !!!

apt remove ubuntu-pro-client

Update packages

  • apt-get update # update package list
  • apt-get dist-upgrade # install updated packages and update "kept back" packages
  • apt-get autoremove # remove packages that apt thinks should be removed

Finish installation

Congratulations. There is nothing more to do!

  • reboot
shutdown -r now

Update to new version of Ubuntu

  • run "do-release-upgrade -c"
  • if it does not report new release Ubuntu 24, check /etc/update-manager/release-upgrades has "Prompt=lts"

Update Ubuntu LTS 20.04 to LTS 22.04

apt remove zsys

daqubuntu

# reboot to clear out all updates
# vi /etc/update-manager/release-upgrades # set "Prompt=normal"
# do-release-upgrade -c
Checking for a new Ubuntu release
New release '22.04 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
# do-release-upgrade
...
say yes...
...
login.defs, say "Y" (erase local changes, use packaged version)
/etc/systemd/resolved.conf, say "Y" (same as above)
firefox snap, say yes
unable to reach snap store, say "skip"
/etc/gmond.conf, say "Y"
/var/yp/Makefile, say "install the package maintainer's version"
/etc/ypserv.conf, same thing
/etc/ypserv.securenets, same thing
/etc/default/nis, same thing
/etc/speech-dispatcher/modules/mary-generic.conf, same thing
/etc/apt/apt.conf.d/50unattended-upgrades, same thing
...
278 packages are going to be removed, say yes
...
restart required, say yes
...
no ping... yes ping...
...
ssh daqubuntu, ok
apt update, fail, DNS does not work, "host security.ubuntu.com" does not resolve.
fix resolver per https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#Disable_NetworkManager
apt update, apt upgrade now works, 0 packages to update
NIS does not work.

midm9a

login.defs
firefox snap
gmond.conf
ypserv
/etc/default/nis
unattended-upgrades
amanda-security.conf
remove obsolete (no)
reboot
configure dns
reenable nis

daq17

firefox snap
imagemagick policy.xml
gmond.conf
chrony.conf
/var/yp/Makefile
ypserv.conf
ypserv.securenets
/etc/default/nis
50unattended-upgrades

daq00

per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/

do-release-upgrade -f DistUpgradeViewNonInteractive

if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.

isdaq08

  • prepare
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
root@isdaq08:~# debsums -ce
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/apt/apt.conf.d/10periodic
root@isdaq08:~# 
  • restore original /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1"; 
APT::Periodic::Download-Upgradeable-Packages "0"; 
APT::Periodic::AutocleanInterval "0"; 
  • apt remove ganglia-monitor
  • apt remove nis
  • "debsums -ce" is now empty

Run the upgrade:

  • do-release-upgrade -f DistUpgradeViewNonInteractive

Post upgrade:

  • configure DNS
  • apt -y install linux-generic-hwe-22.04
  • /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
  • /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
  • install missing packages
  • restore ganglia
  • restore nis
  • check zpool status, may need zpool upgrade
  • reboot

upgrade U-22 to U-24

daqubuntu, U-24

  • prepare
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
root@daqubuntu:~# debsums -ce
/etc/ganglia/gmond.conf
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
/etc/ypserv.conf
/etc/ypserv.securenets
/var/yp/Makefile
/etc/update-manager/release-upgrades
/etc/apt/apt.conf.d/10periodic
/etc/yp.conf
root@daqubuntu:~# 
* restore original /etc/apt/apt.conf.d/10periodic
<pre>
APT::Periodic::Update-Package-Lists "1"; 
APT::Periodic::Download-Upgradeable-Packages "0"; 
APT::Periodic::AutocleanInterval "0"; 
  • apt remove ganglia-monitor
  • apt remove nis
  • apt autoremove
  • restore original release-upgrades: "Prompt: lts"
  • "debsums -ce" is now empty

Check for upgrade:

root@daqubuntu:~# do-release-upgrade -c
Checking for a new Ubuntu release
There is no development version of an LTS available.
To upgrade to the latest non-LTS development release 
set Prompt=normal in /etc/update-manager/release-upgrades.
root@daqubuntu:~# 

Run the upgrade:

  • do-release-upgrade -f DistUpgradeViewNonInteractive

Post upgrade:

  • configure DNS
  • apt -y install linux-generic-hwe-22.04
  • /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
  • /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
  • install missing packages
  • restore ganglia
  • restore nis
  • check zpool status, may need zpool upgrade
  • reboot

daq14, U-20-22-24

  • apt update, apt upgrade
  • apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 ### install kernel 5.15
  • shutdown -r now
  • stuck waiting for daq14 to shutdown...
  • reboot into kernel 5.15
  • ???
cd ~/git/scripts
git pull
cd ~
apt -y install debsums
  • debsums -ce
/etc/apache2/ports.conf
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/yp.conf
/etc/sudoers
  • apache2 restore original ports.conf, uncomment "Listen 80"
  • cp -pv /etc/dnsmasq.conf.dpkg-dist /etc/dnsmasq.conf
  • apt remove ganglia-monitor
  • edit /etc/yp.conf, remove everything after "# ypserver ypserver.network.com"
  • "debsums -ce" is now empty
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • runs for a long time
  • stuck on "/etc/default/nis", type "Y", press enter, nothing for a bit, then resumes running
  • finished
  • configure DNS
  • reboot
  • have kernel 6.8
  • apt update; apt upgrade
  • apt upgrade guile-2.2-libs ### would not auto-update, "kept back", has to be done by hand
  • apt autoremove
  • debsums -ce
debsums: missing file /etc/init.d/nis (from nis package)
/etc/default/nis
  • diff /etc/default/nis.dpkg-dist /etc/default/nis
  • cp -pv /etc/default/nis.dpkg-dist /etc/default/nis
  • debsums -ce
debsums: missing file /etc/init.d/nis (from nis package)
  • we ignore this and run the update
  • do-release-upgrade -c
Checking for a new Ubuntu release
New release '24.04.1 LTS' available.
Run 'do-release-upgrade' to upgrade to it.
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • bombs out without any error messages
  • in /var/log/dist-upgrade/main.log reports "Failed to find a replacement for xapp" and other packages
  • apt remove xapp usrmerge ureadahead thunderbird-gnome-support
  • no go, complains about even more packages.
  • apt list | grep installed | grep -v jammy ### show packages installed from non-ubuntu sources
  • remove all packages marked "install,local" ### ubuntu updater does not know where they came from and so cannot update them.
  • apt remove desktop-base ### not happy about this package in /var/log/dist-upgrade/apt.log
  • apt autoremove
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • running for a long time...

alpha04 U-20-24

  • apt update, apt upgrade, apt autoremove
  • reboot into latest kernel (already done)
  • debsums -ce
root@alpha04:~# debsums -ce
/etc/dnsmasq.conf
/etc/ganglia/gmond.conf
/etc/default/nis
/etc/yp.conf
root@alpha04:~# 
  • move /etc/dnsmasq.conf to /etc/dnsmasq.d/alpha04.conf
  • apt remove dnsmasq
  • apt remove ganglia-monitor
  • apt remove nis
  • apt autoremove
  • debsums -ce ### is now empty
  • do-release-upgrade -f DistUpgradeViewNonInteractive
  • it runs for a long time...
  • complained about /etc/fwupd config files, not sure why...
  • finished
  • apt update, apt upgrade, apt autoremove
  • restore dnsmasq: apt install dnsmasq, systemctl status dnsmasq
  • restore ganglia, per instructions
  • restore NIS: apt -y install rpcbind nis, ypwhich, ypwhich -m
  • zpool upgrade rpool ### also upgrade any other zfs pools, see zpool status
  • remove unwanted packages, per instructions
  • run gonodeinfo
  • reboot
  • done

Upgrade to new version of Debian

https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html

32-bit VME processor Debian 11 to 12

  • cd git/scripts; git pull; cd ~
  • apt update
  • apt upgrade
  • edit /etc/apt/sources.list
deb http://deb.debian.org/debian/ bookworm main
#deb http://deb.debian.org/debian/ bullseye main
#deb-src http://deb.debian.org/debian/ bullseye main
  • apt update
  • apt upgrade --without-new-pkgs
  • apt full-upgrade
  • apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
  • reboot

Ubuntu package manager

  • apt-get install xxx # install package xxx
  • apt-get update
  • apt-get upgrade
  • apt-get dist-upgrade
  • apt-get autoremove # remove automatically installed packages required by a removed package
  • apt-get remove xxx # remove package xxx
  • apt-cache search . # list all available packages
  • apt-cache show "." | grep ^Package # list al available packages
  • apt-cache madison root-system # show all available versions of package root-system
  • apt list # list all installed packages
  • dpkg --listfiles libpng16-16 # list all files from this package
  • apt list --installed # list all installed packages
  • dpkg -S /bin/bash # what package provides this file?
  • dpkg -L bash # what files provided by this package?
  • debsums -ce # show modified config files
  • apt-config dump # show apt configuration

Ubuntu zsys

NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230

  • manual removal of old snapshots
zsysctl show
zsysctl state remove xy69ye -s
zsysctl state remove xy69ye
zsysctl state remove xy69ye -u wheel
  • apt remove zsys

NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots

update-grub # list of all snapshots, errors if some snapshots are broken
zsysctl state remove lnc0k7 --system # remove snapshot
xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf
zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots
zsysctl show # show snapshots

Ubuntu cloning

to clone a ubuntu image:

cd /nfsroot/lxcpet
emacs -nw etc/hostname ### change hostname
emacs -nw etc/mailname ### change hostname (debian 11)
emacs -nw etc/defaultdomain ### change the NIS domainname
emacs -nw etc/yp.conf ### change the NIS server
cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
emacs -nw root/.ssh/authorized_keys ### update root ssh keys

Ubuntu boot loader

maintenance commands

  • update-initramfs -v -u
  • grub-install /dev/sda

Convert from single to dual mirrored ZFS SSD

Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will add a second SSD, configure ZFS to use both SSDs in mirrored configuration and setup grub to boot from either SSD. This is intended to create a full redundant system where failure of either SSD does not break the system.

  • identify first SSD
root@midm9b:~# ./smart-status.perl 
        Disk                    model               serial     temperature  realloc  pending   uncorr  CRC err     RRER Errors     Link
    /dev/sda  WD Blue SA510 2.5 250GB         22243Z803769              24        .        ?        ?        .        ?        .      6.0
root@midm9b:~# 
  • connect second SSD of identical size
root@midm9b:~# ./smart-status.perl 
        Disk                    model               serial     temperature  realloc  pending   uncorr  CRC err     RRER   Errors     Link
    /dev/sda  WD Blue SA510 2.5 250GB         22243Z803769              24        .        ?        ?        .        ?        .      6.0
    /dev/sdb  WD Blue SA510 2.5 250GB         22243Z803852              25        .        ?        ?        .        ?        .      6.0
root@midm9b:~# 
  • if second SSD is not autodetected, reboot
  • Clone partition table automatically

If both SSDs are identical size, use this simpler method of duplicating the partition table:

root@midm9b:~# sfdisk -d /dev/sda > part_table
root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb

The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.

  • Clone partition table manually (e.g. for different size disks)
  • list partition table of first SSD:
root@midm9b:~# fdisk -l /dev/sda
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624   5244927   4194304     2G Linux swap
/dev/sda3  5244928   9439231   4194304     2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~# 
  • create identical partitions on second SSD, use sector numbers from above.
root@midm9b:~# gdisk /dev/sdb
GPT fdisk (gdisk) version 1.0.8

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): n
Partition number (1-128, default 1): 
First sector (34-488397134, default = 2048) or {+-}size{KMGTP}: 
Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): ef00
Changed type of partition to 'EFI system partition'

Command (? for help): n
Partition number (2-128, default 2): 
First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}: 
Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): 8200
Changed type of partition to 'Linux swap'

Command (? for help): n
Partition number (3-128, default 3): 
First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}: 
Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): be00
Changed type of partition to 'Solaris boot'

Command (? for help): n
Partition number (4-128, default 4): 
First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}: 
Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}: 
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): bf00
Changed type of partition to 'Solaris root'

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdb.
The operation has completed successfully.
root@midm9b:~# fdisk -l /dev/sda /dev/sdb
Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E

Device       Start       End   Sectors   Size Type
/dev/sda1     2048   1050623   1048576   512M EFI System
/dev/sda2  1050624   5244927   4194304     2G Linux swap
/dev/sda3  5244928   9439231   4194304     2G Solaris boot
/dev/sda4  9439232 488397134 478957903 228.4G Solaris root


Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors
Disk model: WD Blue SA510 2.
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603

Device       Start       End   Sectors   Size Type
/dev/sdb1     2048   1050623   1048576   512M EFI System
/dev/sdb2  1050624   5244927   4194304     2G Linux swap
/dev/sdb3  5244928   9439231   4194304     2G Solaris boot
/dev/sdb4  9439232 488397134 478957903 228.4G Solaris root
root@midm9b:~# 
  • identify second SSD partitions
root@midm9b:~# ls -l /dev/disk/by-id/ata*part3
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 -> ../../sdb3
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
  • convert bpool from single disk to mirrored disk:
root@midm9b:~# zpool status
  pool: bpool
 state: ONLINE
config:

	NAME                                    STATE     READ WRITE CKSUM
	bpool                                   ONLINE       0     0     0
	  99e03dc0-7d4d-f24b-8fa1-f042b9f135db  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
config:

	NAME                                    STATE     READ WRITE CKSUM
	rpool                                   ONLINE       0     0     0
	  f6fd54f8-3af7-b943-ae3d-a4e480537fb9  ONLINE       0     0     0

errors: No known data errors
root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3
root@midm9b:~# zpool status bpool
  pool: bpool
 state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	bpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE       0     0     0

errors: No known data errors
  • convert rpool
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4
lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4
lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4
root@midm9b:~# zpool status rpool
  pool: rpool
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jan 20 19:40:45 2023
	5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total
	0B resilvered, 0.03% done, no estimated completion time
config:

	NAME                                                STATE     READ WRITE CKSUM
	rpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE       0     0     0

errors: No known data errors
root@midm9b:~# 
  • wait for resilver to complete
root@midm9b:~# zpool status
  pool: bpool
 state: ONLINE
  scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	bpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    99e03dc0-7d4d-f24b-8fa1-f042b9f135db            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023
config:

	NAME                                                STATE     READ WRITE CKSUM
	rpool                                               ONLINE       0     0     0
	  mirror-0                                          ONLINE       0     0     0
	    f6fd54f8-3af7-b943-ae3d-a4e480537fb9            ONLINE       0     0     0
	    ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4  ONLINE       0     0     0

errors: No known data errors
  • enable booting from second SSD: (instead of /dev/sda1, /dev/sdb1, use UUID=xxx)
root@midm9b:~# mkfs.msdos /dev/sdb1
root@midm9b:~# mkdir /boot/efi-sda
root@midm9b:~# mkdir /boot/efi-sdb
root@midm20c:~# blkid | grep vfat ### identify UUID
/dev/sdb1: UUID="DD89-5081" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="d0cb6be4-2f67-5b42-9b26-9e6905e9f774"
/dev/sdc1: UUID="D970-86BA" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="e6d3b5b9-a512-44a2-9205-1a4db06ed2a2"
/dev/sda1: UUID="DDA1-044C" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="6dc9dff0-1c13-8045-a906-7803d3074c70"
root@midm20c:~# cat /etc/fstab | grep vfat ### add mount points with correct UUID
#UUID=D970-86BA  /boot/efi       vfat    umask=0022,fmask=0022,dmask=0022      0       1
UUID=DDA1-044C  /boot/efi-sda       vfat    umask=0022,fmask=0022,dmask=0022      0       1
UUID=DD89-5081  /boot/efi-sdb       vfat    umask=0022,fmask=0022,dmask=0022      0       1
root@midm9b:~# mount -a
root@midm9b:~# df -kl
Filesystem                                       1K-blocks    Used Available Use% Mounted on
...
/dev/sda1                                           523244   13720    509524   3% /boot/efi
/dev/sdb1                                           523244       4    523240   1% /boot/efi-sdb
...
root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/
sending incremental file list
EFI/
...
root@midm9b:~# ls -l /boot/efi-sda
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# ls -l /boot/efi-sdb
total 8
drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI
drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub
root@midm9b:~# 
  • setup script to update grub on second SSD, it must be run manually after every kernel update
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/
root@midm9b:~# ~/update_efi_grub.perl -u
EFI dir: /boot/efi-sda
/boot/efi-sda: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub
building file list ... done

sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sda: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sda/EFI
building file list ... done

sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
EFI dir: /boot/efi-sdb
/boot/efi-sdb: update grub: rsync  -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub
building file list ... done

sent 5,313 bytes  received 11 bytes  10,648.00 bytes/sec
total size is 7,944,644  speedup is 1,492.23
/boot/efi-sdb: update efi:  rsync  -av --delete-after --modify-window=2 /boot/efi/EFI/  /boot/efi-sdb/EFI
building file list ... done

sent 216 bytes  received 11 bytes  454.00 bytes/sec
total size is 5,452,378  speedup is 24,019.29
root@midm9b:~# 

Disable NetworkManager

NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04

NetworkManager is useful for configuring dynamic network interfaces, i.e. laptops that often move between networks, or connect to multiple choice of wifi networks, etc.

For machines with statically configured network interfaces, NetworkManager is not necessary.

As it has been observed to become confused and observed to malfunction when network links go up and down (it keeps unnecessarily reconfiguring the ip address, etc), it can be usefuil to disable it.

  • list all network interfaces
# /bin/ls -1 /sys/class/net/
enp0s31f6
lo
  • edit /etc/network/interfaces:
rename enp0s31f6=eth0
auto eth0
iface eth0 inet static
   address 142.90.120.94/19
   gateway 142.90.100.18
  • statically configure systemd-resolved
    • create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
[Resolve]
DNS=142.90.100.19
Domains=triumf.ca
    • systemctl restart systemd-resolved
    • resolvectl
    • systemd-analyze cat-config systemd/resolved.conf
  • disable NetworkManager
systemctl disable NetworkManager
  • reboot

Configure ECC memory

Configure EDAC

  • apt install edac-utils rasdaemon

Intel i3-2120

root@musr00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X9SCL/X9SCM
root@musr00:~# edac-ctl --status
edac-ctl: drivers not loaded.

Intel E-2236

root@daq00:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SCM-F
root@daq00:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@daq00:~# edac-util 
edac-util: No errors to report.
root@daq00:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
  • check edac sysfs files (Intel)
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 max_location
-r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name
drwxr-xr-x 2 root root    0 Jan 25 15:10 power
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank0
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank1
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank2
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank3
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank4
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank5
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank6
drwxr-xr-x 3 root root    0 Jan 25 15:10 rank7
--w------- 1 root root 4096 Jan 25 15:10 reset_counters
-r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset
-r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count
-r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent
root@daq00:~# 

Intel E3-1270 v6

root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status
edac-ctl: drivers are loaded.
root@grsnis01:~# edac-util
edac-util: No errors to report.
root@grsnis01:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 max_location
-r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name
drwxr-xr-x 2 root root    0 Feb 19 12:35 power
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank0
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank1
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank2
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank3
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank4
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank5
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank6
drwxr-xr-x 3 root root    0 Feb 19 12:35 rank7
--w------- 1 root root 4096 Feb 19 12:35 reset_counters
-r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count
-r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent
root@grsnis01:~# 

Intel E3-1245 v6

[root@alphagdaq ~]# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
[root@alphagdaq ~]# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro X11SSH-F
[root@alphagdaq ~]# edac-ctl --status
edac-ctl: drivers are loaded.
[root@alphagdaq ~]# edac-util
edac-util: No errors to report.
[root@alphagdaq ~]# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
[root@alphagdaq ~]# ras-mc-ctl --layout
          +-----------------------------------------------+
          |                      mc0                      |
          |  csrow0   |  csrow1   |  csrow2   |  csrow3   |
----------+-----------------------------------------------+
channel1: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |  8192 MB  |  8192 MB  |
----------+-----------------------------------------------+
[root@alphagdaq ~]# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#3channel#0	0	0
mc#0csrow#2channel#1	0	0
mc#0csrow#3channel#1	0	0
mc#0csrow#0channel#0	0	0
mc#0csrow#1channel#1	0	0
mc#0csrow#0channel#1	0	0
mc#0csrow#1channel#0	0	0
mc#0csrow#2channel#0	0	0
[root@alphagdaq ~]# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SSH-F
[root@alphagdaq ~]# ras-mc-ctl --summary
DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130.
[root@alphagdaq ~]# 

AMD 3700X

(memory is non-ECC)

root@daq13:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@daq13:~# 
root@daq13:~# 
root@daq13:~# edac-ctl --status
edac-ctl: drivers not loaded.
root@daq13:~# edac-util 
edac-util: Error: No memory controller data found.
root@daq13:~# edac-util -s
edac-util: EDAC drivers loaded. No memory controllers found
root@daq13:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 2 root root    0 Jan 25 15:26 power
lrwxrwxrwx 1 root root    0 Jan 21 16:16 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent

(memory is ECC)

root@trinatdaq:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING
root@trinatdaq:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@trinatdaq:~# edac-util 
edac-util: No errors to report.
root@trinatdaq:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Dec 15 13:04 mc0
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
lrwxrwxrwx 1 root root    0 Dec 13 18:31 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent
root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 max_location
-r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name
drwxr-xr-x 2 root root    0 Dec 15 13:04 power
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank4
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank5
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank6
drwxr-xr-x 3 root root    0 Dec 15 13:04 rank7
--w------- 1 root root 4096 Dec 15 13:04 reset_counters
-rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset
-r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count
-r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent
root@trinatdaq:~# 

AMD 5000G

  • no linux driver for AMD 5000-series "G" CPU
  • no mention of ECC in the BIOS settings
  • unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
  • unclear status of ECC support in ASUS documentation (web page out of date)

AMD 5600X

root@daq17:~# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI
root@daq17:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@daq17:~# edac-util
edac-util: No errors to report.
root@daq17:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@daq17:~# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 7 root root    0 Aug 19 19:27 mc0
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
lrwxrwxrwx 1 root root    0 May 10 10:11 subsystem -> ../../../../bus/edac
-rw-r--r-- 1 root root 4096 May 10 10:11 uevent
root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 max_location
-r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name
drwxr-xr-x 2 root root    0 Aug 19 19:27 power
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank4
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank5
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank6
drwxr-xr-x 3 root root    0 Aug 19 19:27 rank7
--w------- 1 root root 4096 Aug 19 19:27 reset_counters
-rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset
-r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count
-r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent
root@daq17:~# 

AMD 3955WX

root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard
edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status
edac-ctl: drivers are loaded.
root@alphasuperdaq:~/git/scripts/quotareport# edac-util 
edac-util: No errors to report.
root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc
total 0
drwxr-xr-x 19 root root    0 Dez 12 04:48 mc0
drwxr-xr-x  2 root root    0 Dez 12 04:48 power
lrwxrwxrwx  1 root root    0 Dez  9 05:31 subsystem -> ../../../../bus/edac
-rw-r--r--  1 root root 4096 Dez  9 05:31 uevent
root@alphasuperdaq:~/git/scripts/quotareport# 
root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 max_location
-r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name
drwxr-xr-x 2 root root    0 Dez 12 04:48 power
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank0
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank1
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank10
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank11
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank12
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank13
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank14
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank15
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank2
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank3
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank4
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank5
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank6
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank7
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank8
drwxr-xr-x 3 root root    0 Dez 12 04:48 rank9
--w------- 1 root root 4096 Feb 28 22:19 reset_counters
-rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate
-r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset
-r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count
-r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count
-rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent
root@alphasuperdaq:~# 
root@alphasuperdaq:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791.
    +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    |                                                                                              mc0                                                                                              |
    |                                            csrow0                                             |                                            csrow1                                             |
    | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  | channel0  | channel1  | channel2  | channel3  | channel4  | channel5  | channel6  | channel7  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

0: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
root@alphasuperdaq:~# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#0channel#2	0	0
mc#0csrow#1channel#7	0	0
mc#0csrow#0channel#3	0	0
mc#0csrow#1channel#4	0	0
mc#0csrow#1channel#2	0	0
mc#0csrow#0channel#7	0	0
mc#0csrow#1channel#3	0	0
mc#0csrow#0channel#4	0	0
mc#0csrow#1channel#1	0	0
mc#0csrow#1channel#0	0	0
mc#0csrow#1channel#5	0	0
mc#0csrow#0channel#6	0	0
mc#0csrow#0channel#1	0	0
mc#0csrow#0channel#5	0	0
mc#0csrow#0channel#0	0	0
mc#0csrow#1channel#6	0	0
root@alphasuperdaq:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI
root@alphasuperdaq:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@alphasuperdaq:~#

AMD 7700X

root@dsfe05:~# apt install edac-utils
root@dsfe05:~# edac-ctl --mainboard
edac-ctl: mainboard: Supermicro H13SAE-MF
root@dsfe05:~# edac-ctl --status
edac-ctl: drivers are loaded.
root@dsfe05:~# edac-util
edac-util: No errors to report.
root@dsfe05:~# edac-util -s
edac-util: EDAC drivers are loaded. 1 MC detected
root@dsfe05:~# ls -l /sys/devices/system/edac/mc/mc0
total 0
-r--r--r-- 1 root root 4096 May 14 09:33 ce_count
-r--r--r-- 1 root root 4096 May 14 09:33 ce_noinfo_count
-r--r--r-- 1 root root 4096 May 14 09:33 max_location
-r--r--r-- 1 root root 4096 May 14 09:33 mc_name
drwxr-xr-x 2 root root    0 May 14 09:33 power
drwxr-xr-x 3 root root    0 May 14 09:33 rank4
drwxr-xr-x 3 root root    0 May 14 09:33 rank5
--w------- 1 root root 4096 May 14 09:33 reset_counters
-r--r--r-- 1 root root 4096 May 14 09:33 seconds_since_reset
-r--r--r-- 1 root root 4096 May 14 09:33 size_mb
-r--r--r-- 1 root root 4096 May 14 09:33 ue_count
-r--r--r-- 1 root root 4096 May 14 09:33 ue_noinfo_count
-rw-r--r-- 1 root root 4096 May 14 09:33 uevent
root@dsfe05:~# 

Configure rasdaemon

apt install rasdaemon
systemctl enable rasdaemon
systemctl restart rasdaemon
systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago
   Main PID: 2477175 (rasdaemon)
      Tasks: 1 (limit: 76958)
     Memory: 17.1M
     CGroup: /system.slice/rasdaemon.service
             └─2477175 /usr/sbin/rasdaemon -f -r

Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events
Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events

Get reports

  • Intel 2x32GB ECC DIMMs
root@daq00:~# ras-mc-ctl --layout
          +-------------------------+
          |           mc0           |
          |   csrow0   |   csrow1   |
----------+-------------------------+
channel1: |  16384 MB  |  16384 MB  |
channel0: |  16384 MB  |  16384 MB  |
----------+-------------------------+
root@daq00:~# ras-mc-ctl --error-count
Label                   CE      UE
mc#0csrow#1channel#1    0       0
mc#0csrow#1channel#0    0       0
mc#0csrow#0channel#0    0       0
mc#0csrow#0channel#1    0       0
root@daq00:~# 
  • Intel 4x16GB ECC DIMMs
root@daq00:~# ras-mc-ctl --error-count
Label                   CE      UE
mc#0csrow#0channel#1    0       0
mc#0csrow#2channel#0    0       0
mc#0csrow#0channel#0    0       0
mc#0csrow#2channel#1    0       0
mc#0csrow#1channel#0    0       0
mc#0csrow#1channel#1    0       0
mc#0csrow#3channel#0    0       0
mc#0csrow#3channel#1    0       0
root@daq00:~# 
root@daq00:~# ras-mc-ctl --layout
          +-----------------------+
          |          mc0          |
          |  csrow0   |  csrow1   |
----------+-----------------------+
channel1: |  8192 MB  |  8192 MB  |
channel0: |  8192 MB  |  8192 MB  |
----------+-----------------------+
root@daq00:~# 
root@daq00:~# 
root@daq00:~# 
root@daq00:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model X11SCM-F
root@daq00:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181.
Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182.
root@daq00:~# 

note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.

  • AMD 7700 2x32GB DDR5 ECC DIMMs
root@dsfe05:~# systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events
     Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2024-05-14 09:36:43 PDT; 33ms ago
    Process: 4088418 ExecStartPost=/usr/sbin/rasdaemon --enable (code=exited, status=0/SUCCESS)
   Main PID: 4088417 (rasdaemon)
      Tasks: 1 (limit: 37300)
     Memory: 788.0K
        CPU: 5ms
     CGroup: /system.slice/rasdaemon.service
             └─4088417 /usr/sbin/rasdaemon -f -r

May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:aer_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:aer_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: mce:mce_record event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event mce:mce_record
May 14 09:36:43 dsfe05 rasdaemon[4088417]: ras:extlog_mem_event event enabled
May 14 09:36:43 dsfe05 rasdaemon[4088417]: Enabled event ras:extlog_mem_event
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mc_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording aer_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording extlog_event events
May 14 09:36:43 dsfe05 rasdaemon[4088417]: rasdaemon: Recording mce_record events
root@dsfe05:~# ras-mc-ctl --layout
Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 907.
Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 908.
Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 911.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 830.
    +-----------------------------------------------------------------------------------------------+
    |                                              mc0                                              |
    |        csrow0         |        csrow1         |        csrow2         |        csrow3         |
    | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  | channel0  | channel1  |
----+-----------------------------------------------------------------------------------------------+

0: |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |     0 MB  |
----+-----------------------------------------------------------------------------------------------+
root@dsfe05:~# ras-mc-ctl --error-count
Label               	CE	UE
mc#0csrow#2channel#1	0	0
mc#0csrow#2channel#0	0	0
root@dsfe05:~# ras-mc-ctl --print-labels
ras-mc-ctl: Error: No dimm labels for Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --mainboard
ras-mc-ctl: mainboard: Supermicro model H13SAE-MF
root@dsfe05:~# ras-mc-ctl --summary
No Memory errors.

No PCIe AER errors.

No Extlog errors.

No MCE errors.
root@dsfe05:~# 

sensors

ASUS P7P55D EVO

  • BIOS version 2101
root@iris01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0:       +34.0°C  (high = +83.0°C, crit = +99.0°C)
Core 1:       +37.0°C  (high = +83.0°C, crit = +99.0°C)
Core 2:       +38.0°C  (high = +83.0°C, crit = +99.0°C)
Core 3:       +35.0°C  (high = +83.0°C, crit = +99.0°C)

nouveau-pci-0100
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.05 V)
temp1:        +46.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

atk0110-acpi-0
Adapter: ACPI interface
Vcore Voltage:      864.00 mV (min =  +0.80 V, max =  +1.60 V)
+3.3V Voltage:        3.38 V  (min =  +2.97 V, max =  +3.63 V)
+5V Voltage:          5.04 V  (min =  +4.50 V, max =  +5.50 V)
+12V Voltage:        12.15 V  (min = +10.20 V, max = +13.80 V)
CPU Fan Speed:       968 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis1 Fan Speed: 1288 RPM  (min =  600 RPM, max = 7200 RPM)
Chassis2 Fan Speed: 1316 RPM  (min =  600 RPM, max = 7200 RPM)
Power Fan Speed:       0 RPM  (min =    0 RPM, max = 7200 RPM)
CPU Temperature:     +34.0°C  (high = +45.0°C, crit = +45.5°C)
MB Temperature:      +30.0°C  (high = +45.0°C, crit = +46.0°C)

root@iris01:~# 

ASUS Z170-DELUXE

  • BIOS version 3801
  • load sensors drivers
echo modprobe coretemp >> /etc/rc.local
echo modprobe jc42 >> /etc/rc.local
echo modprobe lm92 >> /etc/rc.local
echo modprobe nct6775 >> /etc/rc.local
  • in /boot/grub/grub.cfg, add: GRUB_CMDLINE_LINUX_DEFAULT="acpi_enforce_resources=no"
  • update grub and reboot: grub-mkconfig -o /boot/grub/grub.cfg
root@iris00:~# sensors
nct6793-isa-0290
Adapter: ISA adapter
in0:                      600.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      144.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                        0.00 V  (min =  +0.00 V, max =  +0.00 V)
in7:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     600.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     592.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     968.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1370 RPM  (min =    0 RPM)
fan2:                     1437 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +32.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +42.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                  -128.0°C    sensor = thermistor
AUXTIN1:                   +50.0°C    sensor = thermistor
AUXTIN2:                   +22.0°C    sensor = thermistor
AUXTIN3:                   +28.0°C    sensor = thermistor
PECI Agent 0:              +50.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +42.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
PCH_MCH_TEMP:               +0.0°C  
TSI2_TEMP:                +3892314.0°C  
TSI3_TEMP:                +3892314.0°C  
TSI4_TEMP:                +3892314.0°C  
TSI5_TEMP:                +3892314.0°C  
TSI6_TEMP:                +3892314.0°C  
TSI7_TEMP:                +3892314.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

jc42-i2c-0-1a
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-18
Adapter: SMBus I801 adapter at f040
temp1:        +34.8°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-1b
Adapter: SMBus I801 adapter at f040
temp1:        +35.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

jc42-i2c-0-19
Adapter: SMBus I801 adapter at f040
temp1:        +36.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +52.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +48.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +84.0°C, crit = +100.0°C)

root@iris00:~# 

ASUS H110M-A/M.2

  • BIOS version 4202
  • echo modprobe coretemp >> /etc/rc.local
  • echo modprobe nct6775 >> /etc/rc.local
root@midpol:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +30.0°C  (high = +80.0°C, crit = +100.0°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +27.8°C  (crit = +119.0°C)
temp2:        +29.8°C  (crit = +119.0°C)

nct6793-isa-0290
Adapter: ISA adapter
in0:                      368.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      928.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.14 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                      1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     152.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     128.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     136.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     1004 RPM  (min =    0 RPM)
fan2:                     1143 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan6:                        0 RPM  (min =    0 RPM)
SYSTIN:                   +118.0°C  (high = +98.0°C, hyst = +95.0°C)  sensor = thermistor
CPUTIN:                    +29.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +30.0°C    sensor = thermistor
AUXTIN1:                  +112.0°C    sensor = thermistor
AUXTIN2:                  +111.0°C    sensor = thermistor
AUXTIN3:                  +110.0°C    sensor = thermistor
PECI Agent 0:              +31.0°C  (high = +98.0°C, hyst = +95.0°C)
                                    (crit = +100.0°C)
PECI Agent 0 Calibration:  +36.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
TSI2_TEMP:                +3892314.0°C  
TSI3_TEMP:                +3892314.0°C  
TSI4_TEMP:                +3892314.0°C  
TSI5_TEMP:                +3892314.0°C  
TSI6_TEMP:                +3892314.0°C  
TSI7_TEMP:                +3892314.0°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

root@midpol:~# 

ASUS P9X79 WS

root@daq14:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +29.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +24.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +35.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +82.0°C, crit = +100.0°C)

nouveau-pci-0200
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +39.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

nct6776-isa-0290
Adapter: ISA adapter
Vcore:           1.04 V  (min =  +0.00 V, max =  +1.74 V)
in1:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
AVCC:            3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
+3.3V:           3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:             1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:             2.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:           904.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
3VSB:            3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
Vbat:            3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:          1265 RPM  (min =    0 RPM)
fan2:          1909 RPM  (min =    0 RPM)
fan3:             0 RPM  (min =    0 RPM)
fan4:             0 RPM  (min =    0 RPM)
fan5:             0 RPM  (min =    0 RPM)
SYSTIN:         +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:         +58.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermal diode
AUXTIN:         +31.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
PECI Agent 0:   +31.0°C  (high = +80.0°C, hyst = +75.0°C)
                         (crit = +96.0°C)
PCH_CHIP_TEMP:   +0.0°C  
PCH_CPU_TEMP:    +0.0°C  
PCH_MCH_TEMP:    +0.0°C  
intrusion0:    ALARM
intrusion1:    ALARM
beep_enable:   disabled

root@daq14:~# 

ASUS TUF GAMING B550M-PLUS WIFI II

  • BIOS 2803, 2806
  • echo modprobe nct6775 >> /etc/rc.local
root@midm9a:~# sensors
nct6798-isa-0290
Adapter: ISA adapter
in0:                      488.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.82 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                       1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                      760 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan7:                     1264 RPM  (min =    0 RPM)
SYSTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +22.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +95.0°C    sensor = thermistor
AUXTIN1:                   +25.0°C    sensor = thermistor
AUXTIN2:                   +25.0°C    sensor = thermistor
AUXTIN3:                   +25.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +23.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +32.4°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

amdgpu-pci-0800
Adapter: PCI adapter
vddgfx:        1.45 V  
vddnb:       993.00 mV 
edge:         +28.0°C  
PPT:          20.00 W  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +33.4°C  

root@midm9a:~# 

ASUS ASUS ROG STRIX B550-XE GAMING WIFI

  • BIOS 2423, 2604
  • echo modprobe nct6775 >> /etc/rc.local
root@daq13:~# sensors
nct6798-isa-0290
Adapter: ISA adapter
in0:                      344.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                      992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                      960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      216.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.41 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.30 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.81 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                     960.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                      845 RPM  (min =    0 RPM)
fan2:                      998 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +28.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +27.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +94.0°C    sensor = thermistor
AUXTIN1:                   +28.0°C    sensor = thermistor
AUXTIN2:                   +28.0°C    sensor = thermistor
AUXTIN3:                   +97.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +27.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +33.6°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

amdgpu-pci-0600
Adapter: PCI adapter
vddgfx:        1.45 V  
vddnb:       999.00 mV 
edge:         +29.0°C  
PPT:          14.00 W  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +30.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +33.9°C  

root@daq13:~# 

ASUS ASUS ROG STRIX B550-E GAMING

  • bios 2803
  • echo modprobe jc42 >> /etc/rc.local
  • echo modprobe nct6775 >> /etc/rc.local
root@daq17:~# sensors
jc42-i2c-1-1b
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +25.0°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +28.0°C  

nouveau-pci-0800
Adapter: PCI adapter
GPU core:    900.00 mV (min =  +0.85 V, max =  +1.00 V)
temp1:        +34.0°C  (high = +95.0°C, hyst =  +3.0°C)
                       (crit = +105.0°C, hyst =  +5.0°C)
                       (emerg = +135.0°C, hyst =  +5.0°C)

nct6798-isa-0290
Adapter: ISA adapter
in0:                      288.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      224.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.36 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        1.79 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     280.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     208.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                      843 RPM  (min =    0 RPM)
fan2:                      629 RPM  (min =    0 RPM)
fan3:                      746 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +22.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +25.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +93.0°C    sensor = thermistor
AUXTIN1:                   +22.0°C    sensor = thermistor
AUXTIN2:                   +22.0°C    sensor = thermistor
AUXTIN3:                   +96.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +25.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +27.6°C  
intrusion0:               ALARM
intrusion1:               ALARM
beep_enable:              disabled

jc42-i2c-1-1a
Adapter: SMBus PIIX4 adapter port 0 at 0b00
temp1:        +23.2°C  (low  =  +0.0°C)                  ALARM (HIGH, CRIT)
                       (high =  +0.0°C, hyst =  +0.0°C)
                       (crit =  +0.0°C, hyst =  +0.0°C)

asusec-isa-0000
Adapter: ISA adapter
CPU_Opt:        0 RPM
Chipset:      +34.0°C  
CPU:          +25.0°C  
Motherboard:  +22.0°C  
T_Sensor:     -40.0°C  
VRM:          +31.0°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +28.0°C  
Tccd1:        +27.5°C  

root@daq17:~# 

ASUS PRIME B650-PLUS

  • BIOS 1811
  • echo modprobe nct6775 >> /etc/rc.local
root@dsdaqgw:~# sensors
amdgpu-pci-0b00
Adapter: PCI adapter
vddgfx:      930.00 mV 
vddnb:         1.19 V  
edge:         +38.0°C  
PPT:          25.10 W  

nct6799-isa-0290
Adapter: ISA adapter
in0:                      920.00 mV (min =  +0.00 V, max =  +1.74 V)
in1:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                        1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                        1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                      320.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                        3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                        3.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                        3.38 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                       1.28 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                       1.10 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                       1.04 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                     416.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                     328.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                        0 RPM  (min =    0 RPM)
fan2:                     1253 RPM  (min =    0 RPM)
fan3:                        0 RPM  (min =    0 RPM)
fan4:                        0 RPM  (min =    0 RPM)
fan5:                        0 RPM  (min =    0 RPM)
fan7:                        0 RPM  (min =    0 RPM)
SYSTIN:                    +33.0°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
CPUTIN:                    +35.5°C  (high = +80.0°C, hyst = +75.0°C)  sensor = thermistor
AUXTIN0:                   +78.0°C    sensor = thermistor
AUXTIN1:                   +11.0°C    sensor = thermistor
AUXTIN2:                   +20.0°C    sensor = thermistor
AUXTIN3:                   +82.0°C    sensor = thermistor
PECI Agent 0 Calibration:  +35.5°C  
PCH_CHIP_CPU_MAX_TEMP:      +0.0°C  
PCH_CHIP_TEMP:              +0.0°C  
PCH_CPU_TEMP:               +0.0°C  
TSI0_TEMP:                 +42.6°C  
intrusion0:               ALARM
intrusion1:               OK
beep_enable:              disabled

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +42.6°C  
Tccd1:        +36.4°C  

root@dsdaqgw:~# 

Enable CPU turbo mode

  • Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
  • Find out CPU capability
root@daq01:~# lscpu | grep Hz
Model name:                      Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
CPU MHz:                         3965.803
CPU max MHz:                     4000.0000
CPU min MHz:                     800.0000
root@daq01:~# 
  • Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.

https://ark.intel.com/content/www/us/en/ark/products/88196/intel-core-i7-6700-processor-8m-cache-up-to-4-00-ghz.html

  • Find current frequency settings:
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "powersave" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 2.72 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
root@daq01:~# 
  • Note the following:
    • current governor is "powersave"
    • "performance" governor is available
    • "boost state support" is supported and active.
  • Confirm CPU frequency governor:
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
powersave
powersave
powersave
powersave
powersave
powersave
powersave
powersave
root@daq01:~# 
  • Change governor to "performance":
root@daq01:~# cpupower frequency-set --governor performance
Setting cpu: 0
Setting cpu: 1
Setting cpu: 2
Setting cpu: 3
Setting cpu: 4
Setting cpu: 5
Setting cpu: 6
Setting cpu: 7
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
performance
performance
performance
performance
performance
performance
performance
performance
root@daq01:~# cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 800 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 800 MHz and 4.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.93 GHz (asserted by call to kernel)
  boost state support:
    Supported: yes
    Active: yes
  • monitor CPU frequency:
root@daq01:~# cpupower monitor
    | Nehalem                   || Mperf              || Idle_Stats                                     
 CPU| C3   | C6   | PC3  | PC6   || C0   | Cx   | Freq  || POLL | C1   | C1E  | C3   | C6   | C7s  | C8    
   0|  0.00|  0.00|  0.00|  0.00|| 88.80| 11.20|  3973||  0.00|  0.00|  0.01|  0.02|  0.31|  0.00|  4.25
   4|  0.00|  0.00|  0.00|  0.00||  4.70| 95.30|  3945||  0.00|  0.00|  0.00|  0.00|  0.00|  0.00| 95.03
   1|  0.73|  3.70|  0.00|  0.00||  4.52| 95.48|  3864||  0.00|  0.01|  1.19|  0.44|  2.82|  0.00| 90.23
   5|  0.73|  3.70|  0.00|  0.00||  0.37| 99.63|  3807||  0.00|  0.00|  0.03|  0.09|  1.70|  0.00| 97.64
   2|  2.28| 12.86|  0.00|  0.00||  1.41| 98.59|  3829||  0.00|  0.86|  3.17|  0.46|  7.70|  0.00| 85.87
   6|  2.28| 12.86|  0.00|  0.00||  2.88| 97.12|  3856||  0.00|  0.11|  4.56|  2.15| 10.31|  0.00| 78.99
   3|  1.33|  4.81|  0.00|  0.00||  0.99| 99.01|  3804||  0.00|  0.49|  0.79|  0.01|  1.03|  0.00| 96.12
   7|  1.34|  4.81|  0.00|  0.00||  1.26| 98.74|  3818||  0.00|  0.01|  2.32|  0.47|  5.02|  0.00| 90.06
root@daq01:~# 
  • check that the CPU is not overheating:
root@daq01:~# sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 0:        +51.0°C  (high = +84.0°C, crit = +100.0°C)
Core 1:        +38.0°C  (high = +84.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +84.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +84.0°C, crit = +100.0°C)
  • congratulations, we are running at 4 GHz now!

Setup ubuntu as gateway to private network

See also:

Steps to do

!!! UPDATED 16feb2024 Ubuntu-22.04.03 !!!

  • assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
  • (on the gateway machine, each private network interface has to have a different network number)
  • (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
  • assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
  • (for simplicity, assign 192.168.1.1 to the gateway machine itself)
  • (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
  • setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
  • setup DHCP server (dnsmasq) to give out the IP addresses
  • setup TFTP server (dnsmasq), pxelinux and NFS for diskless booting
  • setup time server (chronyd) to provide common time to all devices
  • setup NAT so machines on private network can access the internet (to get OS updates, etc)
  • setup NIS and NFS so machines on the private network can use common home directories
  • setup rsync backup of machines on the private network

setup hosts

  • edit /etc/hosts
192.168.1.101 dsfe01
... and so forth

setup dns and dhcp

!!! updated 16feb2024 for Ubuntu 22.04.3 !!!

!!! note: stock systemd-resolved remains, is configured to forward queries to dnsmasq, configured to forward queries to TRIUMF DNS !!!

!!! note: per authors of systemd, bare hostnames are not permitted, a DNS domain name must always be used. DNS domain name "dsdaq" is used in this example !!!

  • apt install dnsmasq
  • ensure dnsmasq starts after all interfaces are up (Ubuntu-22)
mkdir /etc/systemd/system/dnsmasq.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/dnsmasq.service.d/local.conf
  • edit /etc/dnsmasq.conf
# /etc/dnsmasq.conf
# DNS settings 
#port=0 # disable DNS function 
port=53 # enable DNS function 
bind-interfaces # do not collide with systemd-resolved, we use 127.0.0.1:53, they use 127.0.0.53:53 
domain-needed 
bogus-priv 
no-resolv 
#log-queries # log DNS quesries 
 
# TRIUMF DNS settings 
 
server=142.90.100.19 
expand-hosts 
domain=dsdaq 
local=/dsdaq/ 
localmx # do not forward MX queries to TRIUMF 

# DHCP settings 
interface=enp1s0f0 # VX network 192.168.0.x 
#interface=missing  # FEP and TSP network 192.168.1.x 
interface=enp1s0f1 # controls network 192.168.2.x 
#dhcp-range=192.168.1.50,192.168.1.150,infinite 
dhcp-range=192.168.0.0,static 
dhcp-range=192.168.2.0,static 
log-dhcp # log DHCP queries 
#quiet-dhcp 
dhcp-ignore=tag:!known 
#dhcp-boot=pxelinux.0 
 
dhcp-option=option:dns-server,192.168.0.248 
dhcp-option=option:ntp-server,192.168.0.248 
 
# TFTP settings 
 
enable-tftp 
tftp-root=/tftpboot 
  • #mkdir /tftpboot ### per tftp-root (if no ZFS)
  • zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
  • create resolved-dsdaq.conf with main IP address of dnsmasq
[Resolve]
DNS=192.168.0.248
Domains=dsdaq triumf.ca
  • mkdir -p /etc/systemd/resolved.conf.d/
  • /bin/rm -f /etc/systemd/resolved.conf.d/*.conf
  • cp resolved-dsdaq.conf /etc/systemd/resolved.conf.d/
  • systemctl stop systemd-resolved.service
  • systemctl disable systemd-resolved.service
  • systemctl enable dnsmasq
  • systemctl restart dnsmasq
  • try to "ping" or "host" some names from /etc/hosts, it should work
  • try to ping daq00, daq00.triumf.ca, all should work
  • resolved-dsdaq.conf goes into /etc/systemd/resolved.conf.d/ of all machines on the private network
  • if not using systemd-resolved, edit /etc/resolv.conf

setup chronyd

  • enable ntp server:
  • disable systemd-timesyncd, configure and enable chronyd per instructions above
  • create dsdaq.conf
# chrony config for dsdaq server

#allow 192.168.0.0
#allow 192.168.1.0
#allow 192.168.2.0
allow all

# end
  • cp dsdaq.conf /etc/chrony/conf.d/
  • systemctl restart chronyd
  • chronyc tracking ### wait until time is synchronized (a few seconds)
  • create dsdaq.sources # use hostname or IP address of chronyd server
# Put this file in /etc/chrony/sources.d
# systemctl restart chrony
# chronyc sources
# chronyc tracking
server dsdaqgw iburst prefer
# end
  • dsdaq.sources goes to /etc/chrony/sources.d of all machines on the private network

setup diskless network booting

setup pxelinux for legacy pxe boot

  • add bits in dnsmasq.conf
dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite
dhcp-boot=pxelinux.0
dhcp-option=17,"192.168.0.251:/nfsroot/%s,vers=3"
  • setup pxelinux for Ubuntu-18
cd ~
wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2
tar xjvf syslinux-4.03.tar.bz2
cd syslinux-4.03
cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
  • cd /zssd/tftpboot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip
wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz
wget http://ladd00.triumf.ca/tftpboot/modules.alias
wget http://ladd00.triumf.ca/tftpboot/modules.pcimap
wget http://ladd00.triumf.ca/tftpboot/pci.ids
  • mkdir pxelinux.cfg
  • emacs -nw pxelinux.cfg/default
default menu.c32
prompt 0

menu title Welcome to the DSVSLICE PXE boot menu

timeout 50

label hdt
  kernel hdt.c32

label memtest86+-5.01 
  kernel memdisk iso initrd=memtest86+-5.01.iso.gz 

label memtest86+-4.20
  kernel memdisk iso initrd=memtest86+-4.20.iso.zip

label vmlinuz-5.3.0-26-generic
  menu default
  kernel vmlinuz-5.3.0-26-generic
  append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0

#end

setup pxelinux for efi pxe boot

# uefi pxe

dhcp-boot=tag:uefipxe,uefi/syslinux.efi
dhcp-option-force=tag:fe01,option:root-path,192.168.0.248:/nfsroot/fe01

# VX network 192.168.0.x

dhcp-host=40:a6:b7:c1:d9:c5,fe01,infinite,set:uefipxe,set:fe01
  • apt install syslinux pxelinux syslinux-common syslinux-efi syslinux-utils
mkdir /tftpboot/uefi
cp /usr/lib/SYSLINUX.EFI/efi64/syslinux.efi /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/ldlinux.e64 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/menu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/hdt.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libutil.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libmenu.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libcom32.c32 /tftpboot/uefi/
cp /usr/lib/syslinux/modules/efi64/libgpl.c32 /tftpboot/uefi/
  • try to boot, it should bomb with "cannot load pxelinux.cfg/default"
  • mkdir /tftpboot/uefi/pxelinux.cfg
  • create /tftpboot/uefi/pxelinux.cfg/default, note nfsroot path is hardwired, note "http:" is used to load vmlinuz and initrd files (because tftp is super slow)
default menu.c32
prompt 0

menu title Welcome to the DSDAQGW UEFI PXE boot menu

timeout 50

label vmlinuz-6.5.0-17-generic
  kernel http://192.168.0.248:8088/uefi/vmlinuz-6.5.0-17-generic
  append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto rw ip=dhcp panic=60

# append initrd=http://192.168.0.248:8088/uefi/initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60

#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.0.248:/nfsroot/fe01 rw ip=dhcp panic=60
#  append initrd=initrd.img-6.5.0-17-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=auto ip=dhcp rw panic=60

#end
apt install mini-httpd
emacs -nw /etc/default/mini-httpd # set "START=1"
emacs -nw /etc/mini-httpd.conf # set "host=192.168.0.248", "port=8088", "data_dir=/tftpboot"
mkdir /etc/systemd/system/mini-httpd.service.d
echo -e "[Unit]\nAfter=network-online.target\n" > /etc/systemd/system/mini-httpd.service.d/local.conf
systemctl enable mini-httpd
systemctl restart mini-httpd
systemctl status mini-httpd
wget http://192.168.0.248:8088/uefi/syslinux.efi
tail -100 /var/log/mini_httpd.log
  • fix initramfs bug for "nfsroot=auto", otherwise, "nfsroot=" has to be different for each machine and you have to have separate pxelinux config files for each machine
    • emacs -nw /usr/lib/initramfs-tools/etc/dhcp/dhclient-enter-hooks.d/config
    • add "echo ROOTPATH=..." if it is missing (Ubuntu LTS 22.04)
                echo "ROOTSERVER='${new_routers%% *}'" 
                echo "ROOTPATH='$new_root_path'" 
                echo "HOSTNAME='$new_host_name'" 
mkinitramfs 6.5.0-18-generic
  • copy linux kernel and initrd
cp /boot/vmlinuz-6.5.0-18-generic /tftpboot/uefi/
cp /boot/initrd.img-6.5.0-18-generic /tftpboot/uefi/
chmod a+r /tftpboot/uefi/*
  • try to boot, should bomb with messages about "trying to mount root filesystem"
  • tail /var/log/syslog
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131, 
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  2
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:02 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 vendor class: PXEClient:Arch:00007:UNDI:003016
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 1:netmask, 2:time-offset, 3:router, 4, 5, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 6:dns-server, 12:hostname, 13:boot-file-size, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 15:domain-name, 17:root-path, 18:extension-path, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 22:max-datagram-reassembly, 23:default-ttl, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 28:broadcast, 40:nis-domain, 41:nis-server, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 42:ntp-server, 43:vendor-encap, 50:requested-address, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 51:lease-time, 54:server-identifier, 58:T1, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 59:T2, 60:vendor-class, 66:tftp-server, 67:bootfile-name, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 97:client-machine-id, 128, 129, 130, 131, 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 requested options: 132, 133, 134, 135
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  1 option: 53 message-type  5
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 18 option: 67 bootfile-name  uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 12 hostname  fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065885 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: error 8 User aborted the transfer received from 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/syslinux.efi to 192.168.0.110
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  2
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:05 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 requested options: 1:netmask, 3:router, 6:dns-server
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 bootfile name: uefi/syslinux.efi
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 next server: 192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 broadcast response
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  1 option: 53 message-type  5
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:43:09 dsdaqgw dnsmasq-dhcp[3629416]: 2348065887 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/ldlinux.e64 to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/01-40-a6-b7-c1-d9-c5 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006E not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8006 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A800 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A80 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A8 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0A not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C0 not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: file /tftpboot/uefi/pxelinux.cfg/C not found
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:43:09 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/menu.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/libutil.c32 to 192.168.0.110
Feb 16 20:43:10 dsdaqgw dnsmasq-tftp[3629416]: sent /tftpboot/uefi/pxelinux.cfg/default to 192.168.0.110
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPDISCOVER(enp1s0f0) 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPOFFER(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  2
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 available DHCP subnet: 192.168.0.0/255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 client provides name: dsdaqgw.triumf.ca
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPREQUEST(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 tags: uefipxe, fe01, known, enp1s0f0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 DHCPACK(enp1s0f0) 192.168.0.110 40:a6:b7:c1:d9:c5 fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 1:netmask, 28:broadcast, 2:time-offset, 3:router, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 15:domain-name, 6:dns-server, 119:domain-search, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 12:hostname, 44:netbios-ns, 47:netbios-scope, 
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 requested options: 26:mtu, 121:classless-static-route, 42:ntp-server
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 bootfile name: uefi/syslinux.efi
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 next server: 192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  1 option: 53 message-type  5
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 54 server-identifier  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 51 lease-time  infinite
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  1 netmask  255.255.255.0
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 28 broadcast  192.168.0.255
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  3 router  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  5 option: 15 domain-name  dsdaq
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 12 hostname  fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size: 27 option: 17 root-path  192.168.0.248:/nfsroot/fe01
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option: 42 ntp-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw dnsmasq-dhcp[3629416]: 3693523458 sent size:  4 option:  6 dns-server  192.168.0.248
Feb 16 20:44:54 dsdaqgw rpc.mountd[3350210]: authenticated mount request from 192.168.0.110:981 for /nfsroot/fe01 (/nfsroot/fe01)
Feb 16 20:45:07 dsdaqgw rpc.mountd[3350210]: authenticated unmount request from 192.168.0.110:859 for /nfsroot/fe01/tmp/autoDY4k5u (/nfsroot/fe01)
  • tail /var/log/mini_httpd.log
192.168.0.110 - - [16/Feb/2024:20:43:15 -0800] "GET /uefi/vmlinuz-6.5.0-17-generic HTTP/1.0" 200 14227944 "" "Syslinux/6.04"
192.168.0.110 - - [16/Feb/2024:20:43:24 -0800] "GET /uefi/initrd.img-6.5.0-17-generic HTTP/1.0" 200 137824833 "" "Syslinux/6.04"

setup efi http boot

https://documentation.suse.com/sles/15-SP2/html/SLES-all/cha-deployment-prep-uefi-httpboot.html

setup linux kernel

  • copy the kernel files
cd /boot
rsync -av config* initrd* System.map* vmlinuz* /tftpboot/
  • cd /tftpboot
  • chmod a+r *

setup nfs

  • apt-get install nfs-kernel-server
  • enable NFS over UDP, edit /etc/nfs.conf add "udp=y":
udp=y
systemctl restart nfs-server.service
  • emacs -nw /etc/exports
/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
  • enable services
systemctl enable nfs-server
systemctl enable nfs-mountd
systemctl enable nfs-idmapd
systemctl restart nfs-server
systemctl restart nfs-mountd
systemctl restart nfs-idmapd
  • after editing /etc/exports, run
exportfs -av

setup userland

!!! ubuntu-18 version !!!

  • zfs create rpool/nfsroot
  • zfs set dedup=verify rpool/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
  • clone ubuntu
mkdir /nfsroot/dsfe01
cd /
rsync -avx . /nfsroot/dsfe01
  • edit config files:
  • cd /nfsroot/dsfe01
  • emacs -nw etc/hostname ### change to dsfe01
  • emacs -nw etc/mailname ### change to dsfe01
  • emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
  • emacs -nw etc/defaultdomain ### change to MUSR-NIS
  • cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
  • emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
  • emacs -nw root/.ssh/authorized_keys ### update root ssh keys
  • emacs -nw etc/fstab ### add this
192.168.1.1:/nfsroot/dsfe01 / nfs defaults,nolock 0 0
  • emacs -nw etc/chrony/chrony.conf
    • comment-out all "pool" and "server" entries
    • add entry "server 192.168.1.1 iburst"

After dsfe01 is booted:

  • disable services:
systemctl disable apache2
systemctl disable dnsmasq
systemctl disable zfs-import-cache

To setup additional machines, clone dsfe01 instead of cloning the gateway machine

Allow manpages to be viewed

If / is mounted over NFS, man will report a permission error. Fix it with:

ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/
apparmor_parser -R /etc/apparmor.d/usr.bin.man

setup shared home directory

on the gateway machine

  • define netgroups
  • emacs -nw /etc/netgroup
dsfe (dsfe01,,) (dsfe02,,)
  • emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
netgroup: files
  • export the home directories:
  • emacs -nw /etc/exports ### add this:
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
  • exportfs -rc

on the frontend machine

  • mkdir /home
  • emacs -nw /etc/fstab ### add this:
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
  • mount -a

setup NAT

NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation

In these examples:

  • replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
  • replace "enp11s0" with name of the private network interface (192.168.1.x network)
  • emacs -nw /etc/rc.local ### add this:
# /etc/rc.local

# enable NAT

/sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE
iptables -L -v

# uncomment following lines if machine has prohibitive FORWARD rules:
#/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT
#/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT
#iptables -L -v

iptables -L -v
sysctl -w net.ipv4.ip_forward=1
#sysctl -a | grep forward

sh /etc/firewall-rfc1918.sh

# end
  • emacs -nw /etc/firewall-rfc1918.sh
# firewall-rfc1918.sh

# prevent RFC1918 private network IP addresses from
# going in and out from our uplink.

ETH=eno1

iptables -F in-rfc1918
iptables -N in-rfc1918
iptables -A in-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A in-rfc1918 --dst 172.16.0.0/12   -j REJECT
iptables -A in-rfc1918 --dst 192.168.0.0/16  -j REJECT

iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -D INPUT -j in-rfc1918 -i $ETH
iptables -I INPUT -j in-rfc1918 -i $ETH

iptables -F out-rfc1918
iptables -N out-rfc1918
iptables -A out-rfc1918 --dst 10.0.0.0/8      -j REJECT
iptables -A out-rfc1918 --dst 172.16.0.0/12   -j REJECT
iptables -A out-rfc1918 --dst 192.168.0.0/16  -j REJECT

iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -D OUTPUT -j out-rfc1918 -o $ETH
iptables -I OUTPUT -j out-rfc1918 -o $ETH

iptables -D FORWARD -j out-rfc1918 -o $ETH 
iptables -D FORWARD -j out-rfc1918 -o $ETH 
iptables -I FORWARD -j out-rfc1918 -o $ETH 

# allow TRIUMF-SECURE network

iptables -I in-rfc1918 -s 10.90.0.0/255.255.0.0 -j ACCEPT 
iptables -I out-rfc1918 -d 10.90.0.0/255.255.0.0 -j ACCEPT 

# show configuration

iptables -L -v

#end

KVM

apt install cpu-checker

root@daq13:~# kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
root@daq13:~# 

(if not, shutdown, go into BIOS settings, enable CPU virtualization)

apt install virtinst ### will install many packages
apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils

root@daq13:/home1/wheel# virsh list --all
 Id   Name           State
------------------------------
 1    ubuntu-guest   running

apt install virt-manager

virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial'

virtual machine will start, boot, etc
to get out of it, CTRL + Shift followed by ]

ssh wheel@daq13
virt-manager

run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop

virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none

virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off

build image

dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20
mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options"
cd /kvm_ladd00/
mount -o loop /tmp/xxx/ladd00.img /mnt/tmp
rsync -av . /mnt/tmp/ --delete
umount /mnt/tmp

on the guest, configure network: /etc/rc.local

#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local

ifconfig eth2 192.168.122.2
route add -net 0.0.0.0 gw 192.168.122.1
ifconfig -a
netstat -rn

# end

ARM64 cross-compiler

  • arm64, aarch64 are Xilinx FPGA Cortex-A53, RPi4, RPi5 machines
  • install packages:
apt install g++-12-aarch64-linux-gnu gcc-12-aarch64-linux-gnu-base libstdc++-12-dev-arm64-cross
  • run:
aarch64-linux-gnu-gcc-12 -o ttcp.aarch64 ttcp.c -static
aarch64-linux-gnu-g++-12 -o fecdm.exe -O2 -g -Wall -Wuninitialized -std=c++20 fecdm.o dsdm.o /home/dsdaqdev/packages_common/midas/linux-aarch64-remoteonly/lib/libmidas.a -pthread -lrt -lutil /nfsroot/gdm00/usr/lib/aarch64-linux-gnu/libi2c.a -static

ARM cross-compiler

NOTE!!!

THIS IS NOT AN AARCH64 (arm64) CROSSCOMPILER!

NOTE!!!

  • install packages:
apt install libgcc-9-dev-arm64-cross
apt install gcc-arm-linux-gnueabi
apt install gcc-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabihf
apt install g++-arm-linux-gnueabi
  • find out the correct -march setting, on the target machine, run:
root@gdm00:~# g++ -Q --help=target | grep march
  -march=                     		armv8-a


arm-linux-gnueabi-gcc -o ttcp1 ttcp.c -march=armv7 -static
arm-linux-gnueabi-gcc -o memcpy.armv7 memcpy.cc -march=armv7 -static -O2

32-bit intel cross-compiler

Ubuntu 22.04:

apt install libstdc++-11-dev:i386
apt install zlib1g-dev:i386

NOTES:

  • "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
  • to cross-build 32-bit MIDAS, use "make linux32".
  • executables cross-build on Ubuntu-22 do NOT run on 32-bit Debain-11 (GLIBC and GLIBCXX version mismatch)
  • executables cross-build on Ubuntu-22 run on 32-bit Debian-12.

SSH settings for EPICS

  • TRIUMF EPICS runs obsolete version of SSH
  • add this to the use .ssh/config
Host sbp1*
HostKeyAlgorithms +ssh-rsa
PubKeyAcceptedAlgorithms +ssh-rsa
KexAlgorithms +diffie-hellman-group1-sha1
ForwardX11 yes
ForwardX11Trusted yes

changes for VME processors

apt -y remove sysstat man-db
apt -y purge dkms
apt -y purge mdadm
apt -y autoremove