Ubuntu
About Ubuntu
AAA
Ubuntu version
lsb_release -a uname -a
Ubuntu installer
- updated for Ububtu LTS 20.04.01, 22.04.1
- download the latest Ubuntu LTS desktop installer iso image
- dd the image to a USB key
- power down, disconnect all disks (all HDDs, all SSDs, all M.2)
- connect the SSD to be used as system disk
- if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
- power up
- boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F8 for ASUS, F11 for Supermicro)
- follow the instruction:
- "try ubuntu or install ubuntu" - choose "install"
- select language - accept default
- "updates and other software" - accept default settings ("normal install")
- "installation type" - select "advanced features" and "experimental: use ZFS"
- accept partition choice
- "where are you?" - select "Vancouver" (PST time zone)
- "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
- installation runs in a few minutes, when finished, reboot
- login as user wheel
- answer annouying questions:
- "livepatch" - say "next"
- "help improve" - select "do not send", say "next"
- "privacy" - leave "location" as "off", say "next"
- "ready to go", say "done"
- right-click on the desktop, say "open in terminal", a shell will open
- say "sudo /bin/bash", enter the root password, you now have the root shell
- run nm-connection-editor to configure the network. use netmask 255.255.224.0, gateway 142.90.100.18, DNS 142.90.100.19, search path "triumf.ca"
- after network is up (can ping ladd00), continue with post-installation steps below
Install instructions
prepare
apt update apt upgrade
install ssh
apt install ssh
configure hostname
vi /etc/hostname
disable swap
ubuntu installer creates a 2 GB swap partition, not useful on 32-64 GB machine, disable it:
vi /etc/fstab ### comment out the "swap" line
maybe reboot
this is a good point to reboot the machine to boot the latest kernel and to set the correct hostname
install etckeeper
keep contents of /etc in a git repository:
apt -y install etckeeper
set timezone
timedatectl list-timezones | grep -i vancouver timedatectl set-timezone America/Vancouver
install time synchronization
apt -y install chrony #echo server time1.triumf.ca iburst >> /etc/chrony/chrony.conf #echo server time2.triumf.ca iburst >> /etc/chrony/chrony.conf #echo server time3.triumf.ca iburst >> /etc/chrony/chrony.conf cp ~/git/scripts/etc/triumf.sources /etc/chrony/sources.d/ systemctl disable systemd-timesyncd.service systemctl stop systemd-timesyncd.service systemctl disable ntp systemctl stop ntp systemctl enable chrony systemctl restart chrony chronyc sources chronyc tracking
NOTE1: if time1, time2, time3 are already listed in /etc/crony/chrony.conf, please remove them and restart chrony.
NOTE2: if time1, time2, time3 are not listed in "chronyc tracking" or if they are not selected by "chronyc tracking", check that /etc/crony/chrony.conf contains "sourcedir /etc/chrony/sources.d". old versions of this file may not have it.
NOTE3: read https://chrony-project.org/faq.html#_should_i_prefer_chrony_over_timesyncd_if_i_do_not_need_to_run_a_server
reenable systemd-timesyncd
ONLY IF CHRONY DOES NOT WORK
apt remove chrony systemctl enable systemd-timesyncd.service systemctl restart systemd-timesyncd.service systemctl status systemd-timesyncd.service timedatectl status timedatectl timesync-status
enable outgoing email (debian 11)
this is different from ubuntu 20. it uses /etc/mailname and it hardwires the hostname into main.cf.
enable outgoing email
- TRIUMF: use smtp.triumf.ca
- CERN: use cernmx.cern.ch
apt install postfix ### select "satellite system", enter full hostname "xxx.triumf.ca", enter "smtp.triumf.ca" apt install mailutils dpkg-reconfigure postfix ### (if postfix already installed)
echo olchansk@triumf.ca lindner@triumf.ca bsmith@triumf.ca >> ~root/.forward mailx root test ^D
enable ping for all users (debian 11)
Without this tweak, Debian will report "operation not permitted" if a user tries to ping somewhere.
echo 'net.ipv4.ping_group_range = 0 1000' > /etc/sysctl.d/99-ping.conf
install missing packages
(apt eats terminal input, even the "yes |" trick does not quite work, repeat the following commands until they report that everything is installed)
yes | apt -y install ssh tcsh ethtool ncat rsync strace net-tools sysstat smartmontools lm-sensors traceroute time minicom screen git lsof debsums tmux yes | apt -y install lsb-release yes | apt -y install flex bison yes | apt -y install neofetch yes | apt -y install snmp snmp-mibs-downloader yes | apt -y install git subversion g++ gfortran cmake doxygen yes | apt -y install curl libcurl4 libcurl4-openssl-dev yes | apt -y install mariadb-client ### mysql client yes | apt -y install libz-dev sqlite3 libsqlite3-dev unixodbc-dev yes | apt -y install libssl-dev yes | apt -y install emacs xemacs21 joe yes | apt -y install gnuplot dos2unix yes | apt -y install mutt bsd-mailx # email clients yes | apt -y install liblz4-tool pbzip2 yes | apt -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h yes | apt -y install libreadline-dev yes | apt -y install ubuntu-mate-themes yes | apt -y install libmotif-dev libxmu-dev yes | apt -y install libusb-dev libusb-1.0-0-dev yes | apt -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig yes | apt -y install libjson-perl yes | apt -y install libgsl-dev # additional GNU Scientific Library yes | apt -y install qt5-default # Qt development yes | apt -y install python3-full python3-dev python3-dbg python3-pip ### for pyROOT yes | apt -y install imagemagick imagemagick-common ckeditor # for elog yes | apt -y install libjpeg-dev libjpeg-progs libjpeg-tools yes | apt -y install linux-tools-common linux-tools-generic # cpupower frequency-info yes | apt -y install rdesktop remmina remmina-plugin"*" # requested by POL
Ubuntu LTS 20.04:
yes | apt -y install linux-image-generic-hwe-20.04 linux-tools-virtual-hwe-20.04 # enable linux 5.11 series kernel
Ubuntu LTS 22.04:
apt -y install linux-generic-hwe-22.04 # enable linux 6.2.0 series kernel
install git/scripts
mkdir ~root/git cd ~root/git git clone https://daq00.triumf.ca/~olchansk/git/scripts.git cd scripts git pull
disable swap (debian 11)
- on 64 GB RAM machines swap is not useful
- on machines booted from network (NFS-ROOT), swap does not work
- on machines running from flash (RPi, etc), flash is too slow for useful swap
- swap configured by linux installers invariably has wrong size and is not useful
systemctl disable dphys-swapfile systemctl stop dphys-swapfile dphys-swapfile uninstall
configure DNS
cd ~/git/scripts git pull mkdir /etc/systemd/resolved.conf.d cp etc/resolved-triumf.conf /etc/systemd/resolved.conf.d/ systemctl restart systemd-resolved resolvectl #systemd-analyze cat-config systemd/resolved.conf
install ganglia
yes | apt-get -y install ganglia-monitor systemctl enable ganglia-monitor
cd ~root/git/scripts git pull cp etc/gmond-ubuntu.conf /etc/ganglia/gmond.conf ln -s ganglia/gmond.conf /etc # ln -s arm-linux-gnueabihf/ganglia /usr/lib/ganglia ### fix path for ARM CPU # ln -s i386-linux-gnu/ganglia /usr/lib/ganglia ### fix path for 32-bit Intel CPU systemctl restart ganglia-monitor systemctl status ganglia-monitor ps -efw | grep gmond
cd ~root/git/scripts/ganglia make install ./ganglia-all.perl
install gonodeinfo
- go to https://bitbucket.org/dd1/gonodeinfo follow instructions:
yes | apt-get -y install golang mkdir ~/git cd ~/git git clone https://bitbucket.org/dd1/gonodeinfo.git cd gonodeinfo git pull make make install # install gonodeinfo agent cd ~ # this is important
- edit /etc/gonodeinfo.conf
- change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
- change "Servers" to read: Servers: daq00.triumf.ca:8601
- run "gonodeinfo -v"
- if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
- on the gonodeinfo server: run /opt/gonodeinfo/gonodereceive.exe -a daq13
- try gonodeinfo again, there should be no error
- on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now
install fonts for EPICS
- apt install xfonts-100dpi xfonts-75dpi
- restart Xorg (i.e. "killall Xorg", this will log you out from the console)
- xlsfonts | grep -i helvetica ### should show fonts with different sizes, not just size 0 (scalable)
install libz.so.1 for CentOS compatibility
KO - confirm which versions on quartus need this.
yes | apt-get -y install zlib1g yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386
install libpng12.so.0 for Quartus compatibility
(does not work anymore!!!)
wget http://ftp.ca.debian.org/debian/pool/main/libp/libpng/libpng12-0_1.2.50-2+deb8u2_amd64.deb dpkg --install libpng12-0_1.2.50-2+deb8u2_amd64.deb
install libpng12.so.0 for Quartus 13.0sp1
wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0 wget https://daq00.triumf.ca/~olchansk/linux/libpng12.so.0.50.0 /bin/cp -pv libpng12.so.0 libpng12.so.0.50.0 /lib/x86_64-linux-gnu/
install packages for Xilinx
ubuntu LTS 22.04 vivado 2020.1
apt install autoconf libtool apt install libtinfo5 apt install texinfo apt install zlib1g:i386
install packages for building ROOT
apt -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake
install 32-bit libraries for PHYSICA
these instructions are for running 32-bit physica executable built for SL6 on ubuntu LTS 20.04
install physica sources (cannot build, do not have g77)
cd ~/packages git clone https://bitbucket.org/ttriumfdaq/physica.git
install 32-bit libraries using ubuntu package manager:
apt install lib32z1 # libz.so
copy 32-bit SL6 shared libraries to /lib32
root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libX11.so.6 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libgd.so.2 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libpng12.so.0 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libreadline.so.6 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libncurses.so.5 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libg2c.so.0 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libxcb.so.1 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXpm.so.4 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libjpeg.so.62 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfontconfig.so.1 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libfreetype.so.6 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libtinfo.so.5 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libXau.so.6 /lib32/ root@trinatdaq:~# cp /daq/daqstore/olchansk/daq/physica-SL6/libexpat.so.1 /lib32/
ldd should report:
trinatdaq:trinat> ldd /usr/local/physica/physica.exe linux-gate.so.1 (0xf7fa2000) libX11.so.6 => /lib32/libX11.so.6 (0xf7e43000) libgd.so.2 => /lib32/libgd.so.2 (0xf7dfe000) libpng12.so.0 => /lib32/libpng12.so.0 (0xf7dd6000) libz.so.1 => /lib32/libz.so.1 (0xf7db8000) libreadline.so.6 => /lib32/libreadline.so.6 (0xf7d7e000) libncurses.so.5 => /lib32/libncurses.so.5 (0xf7d5b000) libg2c.so.0 => /lib32/libg2c.so.0 (0xf7d3d000) libm.so.6 => /lib32/libm.so.6 (0xf7c39000) libgcc_s.so.1 => /lib32/libgcc_s.so.1 (0xf7c1a000) libc.so.6 => /lib32/libc.so.6 (0xf7a2f000) libxcb.so.1 => /lib32/libxcb.so.1 (0xf7a05000) libdl.so.2 => /lib32/libdl.so.2 (0xf79ff000) libXpm.so.4 => /lib32/libXpm.so.4 (0xf79ee000) libjpeg.so.62 => /lib32/libjpeg.so.62 (0xf7997000) libfontconfig.so.1 => /lib32/libfontconfig.so.1 (0xf7962000) libfreetype.so.6 => /lib32/libfreetype.so.6 (0xf78c9000) libtinfo.so.5 => /lib32/libtinfo.so.5 (0xf78b0000) /lib/ld-linux.so.2 (0xf7fa4000) libXau.so.6 => /lib32/libXau.so.6 (0xf78ad000) libexpat.so.1 => /lib32/libexpat.so.1 (0xf7885000) trinatdaq:trinat>
set login environment:
setenv TRIUMF_FONTS $HOME/packages/physica/fonts setenv PHYSICA_DIR $HOME/packages/physica alias physica $PHYSICA_DIR/physica-SL6-32
test:
cd ~/packages/physica physica @rangauss.pcm
install lightdm
unlike the default gdm login manager, lightdm shows the machine hostname and does not require an extra mouse click to swicth from screen saver to login mode.
apt -y install lightdm # select lightdm
install desktop environments
note: default display manager and default desktop are deficient, please do not skip this step.
note: if apt asks to choose the display manager, select "lightdm"
note: KO - I recommend the "MATE" desktop.
note: you will have to cut-and-paste this several times because "apt" eats commands, even with "-y" and even piped from "yes".
# install MATE desktop DEBIAN_FRONTEND=noninteractive apt -y install ubuntu-mate-core ubuntu-mate-desktop ubuntu-mate-themes # install Cinnamon desktop DEBIAN_FRONTEND=noninteractive apt -y install cinnamon # install KDE desktop DEBIAN_FRONTEND=noninteractive apt -y install kubuntu-desktop # install Lxqt desktop DEBIAN_FRONTEND=noninteractive apt -y install lxqt # install Xfce4 desktop DEBIAN_FRONTEND=noninteractive apt -y install xfce4
install ROOT
Please install ROOT per instructions at http://root.cern.ch.
NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system
NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.
Install x2go
KO - is this still needed? does it cause any security problems?
x2go instructions, thanks to Art O.
add-apt-repository ppa:x2go/stable apt-get update apt-get install x2goserver x2goserver-xsession
enable root login from ladd00/daq00
ssh localhost CTRL-C /bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/
install smart-status
ln -s ~/git/scripts/smart-status/smart-status.perl ~root/
This will enable the grub menu (with a 10 sec timeout) and replace black screen with exciting linux boot messages.
- emacs -nw /etc/default/grub
GRUB_DEFAULT=0 #GRUB_TIMEOUT_STYLE=hidden GRUB_TIMEOUT=10 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` #GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480" GRUB_CMDLINE_LINUX_DEFAULT="" GRUB_CMDLINE_LINUX="" #GRUB_GFXMODE=640x480
- update grub config:
grub-mkconfig -o /boot/grub/grub.cfg
reboot
this completes installation of the base system.
following sections modify basic ubuntu to fix known problems and to enable special stuff.
Enable automatic updates
apt install unattended-upgrades cd ~/git/scripts git pull /bin/cp -v etc/99apt-conf-ko /etc/apt/apt.conf.d/ apt-config dump | grep Unattended
Following is obsolete:
- emacs -nw /etc/apt/apt.conf.d/50unattended-upgrades
- uncomment in Allowed-Origins "-security" and "-updates"
- add in Allowed-Origins: "Google LLC:stable";
- uncomment/add: "Unattended-Upgrade::Mail "root";
- emacs -nw /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Download-Upgradeable-Packages "1"; APT::Periodic::AutocleanInterval "7"; APT::Periodic::Unattended-Upgrade "1";
- test: unattended-upgrade --dry-run -v
NOTE: update-on-shutdown is disabled.
NOTE: there is no update-on-boot, but:
NOTE: if machine was off for a long time, the systemd update timer would have expired and it will fire soon after reboot, causing an automatic update run. this is unwanted, and there is no fix or workaround for it. K.O. June-2023.
Fix bpool is full
THIS IS CAUSED BY OBSOLETE PACKAGE zsys. PLEASE: apt remove zsys
!!! only if ROOT on ZFS !!!
There is an error in the zsys package that causes bpool to run out of space, see #Ubuntu zsys for more details.
To fix:
cd ~/git/scripts git pull cp etc/zsys.conf /etc/ zsysctl service reload zsysctl service gc zpool list bpool zfs list bpool df /boot
IPMI instructions
IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.
apt-get install ipmitool systemctl enable ipmievd systemctl restart ipmievd
Run:
- ipmitool sel list ### event list
- ipmitool sel elist ### event list
- ipmitool sel clear ### clear event list (if it becomes full)
- ipmitool sensor ### report hardware sensors
move /home/wheel
note: this MUST be done if ZFS root and NIS/autofs with /home.
Default location of wheel's home directory will collide with autofs /home, it has to be moved, for example to /wheel.
# logout from the wheel user # go to another computer ssh root@daqubuntuxxx zfs list | grep wheel ### identify zfs name wheel_xxxxxx zfs set mountpoint=/wheel rpool/USERDATA/wheel_hm8fzh emacs -nw /etc/passwd ### change wheel's home directory from /home/wheel to /wheel su - wheel ### check that user wheel still works
This will break wheel's ability to run snap programs, such as firefox, install chrome as listed below.
enable NIS (ubuntu 22.04, debian 11)
apt -y install rpcbind nis echo DAQ-NIS >> /etc/defaultdomain echo ypserver daq00.triumf.ca >> /etc/yp.conf systemctl enable ypbind.service systemctl restart ypbind.service systemctl status ypbind.service ypwhich -m
enable ypserv:
sed -i s/NISSERVER=false/NISSERVER=slave/ /etc/default/nis /usr/lib/yp/ypinit -s daq00 echo ypserver localhost >> /etc/yp.conf sed -i "s/ypserver .*/ypserver localhost/" /etc/yp.conf systemctl enable ypserv systemctl restart ypserv systemctl restart ypbind
edit /etc/nsswitch.conf to read:
# begin get data from nis passwd: files nis group: files nis shadow: files nis automount: files nis netgroup: files nis # end get data from nis
enable hourly update of nis maps:
mkdir ~root/git cd ~root/git git clone http://daq00.triumf.ca/~olchansk/git/scripts.git cd ~/git/scripts/etc git pull ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
If this is a new machine, then on the master NIS node (daq00), add this new node to /etc/netgroup, and update NIS maps (cd /var/yp; make)
enable NIS (ubuntu 20.04)
- apt-get -y install portmap nis ### will ask for NIS domain (DAQ-NIS)
- dpkg-reconfigure nis ### reconfigure if already installed
- ypwhich -m
- edit /etc/default/nis
- set "NISSERVER=slave"
- Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
- #edit /etc/yp.conf, comment-out everything, add "domain DAQ-NIS server localhost"
- edit /etc/yp.conf, comment-out everything, add "ypserver localhost"
- /usr/lib/yp/ypinit -s daq00
- systemctl enable nis
- systemctl restart nis
- ypwhich
- ypwhich -m
- ypcat -k passwd
- vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
# begin get data from nis passwd: files nis group: files nis shadow: files nis automount: files nis netgroup: files nis # end get data from nis
- enable hourly update of NIS maps
mkdir ~root/git cd ~root/git git clone http://ladd00.triumf.ca/~olchansk/git/scripts.git cd ~/git/scripts/etc git pull ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
- ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"
enable autofs
apt -y install autofs systemctl enable autofs systemctl restart autofs ls -l /home/olchansk ### test autofs, check file owner is correct
enable NFS server
apt install nfs-kernel-server #edit /etc/exports systemctl enable nfs-server systemctl restart nfs-server
NIS master
notes for setting up the NIS master
wheel user
"wheel" is the default administrative user. We do not want it's password exported to NIS (encrypted password hash is world visible) and we do not want it's home directory exported to NFS (~wheel/.ssh is world visible and potentially writable: anybody can change ~wheel/.ssh/authorized_keys).
- move wheel's home directory from /home/wheel to /wheel (see special section about this)
- change wheel's UID and GID from 1000 to a value below MINUID in /var/yp/Makefile
coherent uids
we do not want system accounts defined in /etc/passwd of the NIS master to be included in the NIS map "passwd". this causes trouble on NIS clients where newly installed packages fail to create local system users because same user already exists in NIS.
This is controlled by MINUID in /var/yp/Makefile.
Historical TRIUMF uids start from around 200, but several clusters do not have any historic TRIUMF uids below 500 and MINUID is set to:
- DAQ-NIS: MINUID=200
- ISAC-NIS: MINUID=500
- TITAN-NIS: MINUID=500
- MUSR-NIS: MINUID=500
- TIG-NIS: MINUID=500 (100 on SL6 mother8pi)
Ubuntu 20 has two programs to create users:
- adduser - creates new users with UID 1000 and up as specified in /etc/adduser.conf. No problems here.
- adduser --system - creates new system users with UID 100 and up as specified in /etc/adduser.conf. No problems here.
- useradd - creates new users with UID 1000 and up as specified in /etc/login.defs. No problems here.
- useradd --system - creates new system users with UID 999 and down (read "man useradd", section at the end about SYS_UID_MAX). This collides with NIS MINUID, these system users will be included in the NIS map and cause trouble.
This problem cannot be fixed, SYS_UID_MIN, SYS_UID_MAX and UID_MIN in /etc/login.defs do not seem to have any effect on UIDs chosen by "useradd --system". (tested on Ubuntu LTS 20.04).
So far only these system accounts seem to be affected by this:
- systemd-coredump
- ganglia
To fix:
- run "sort -r -n -t: -k3 /etc/passwd" to identify the last unused system user uid (range 100..200)
- run "sort -r -n -t: -k3 /etc/group" to identify the last unused system user gid (range 100.200)
- systemd-coredump: manually change UID and GID (package systemd-coredump is usually not installed)
- ganglia: same thing, then change ownership on all ganglia files.
Also read systemd author's opinion on system vs user UIDs: https://github.com/systemd/systemd/issues/4850#issuecomment-265698275
Fix systemd-logind NIS breakage
!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!
there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this fix removes the delay.
systemd developers think that we should not use NIS and made sure there are problems if we do. To give them credit, they do offer a workaround. Read this: https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67
mkdir /etc/systemd/system/systemd-logind.service.d echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf systemctl daemon-reload systemctl cat systemd-logind.service
Fix systemd-udevd NIS breakage
see same problem as above with udev getting stuck. ubuntu lts 20.04.
mkdir /etc/systemd/system/systemd-udevd.service.d echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-udevd.service.d/local.conf systemctl daemon-reload systemctl cat systemd-udevd.service
Configure USB device permissions
Configure USB device permissions for user access to USB-serial devices, Altera USB Blaster, etc.
- create file /etc/udev/rules.d/99-usb-chmod.rules with this contents:
emacs -nw /etc/udev/rules.d/99-usb-chmod.rules ACTION=="add", SUBSYSTEM=="usbmisc", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /dev/%c" ACTION=="add", SUBSYSTEM=="usb_device", RUN+="/bin/chmod a+wr /proc/%c" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVTYPE}=="usb_device", RUN+="/bin/chmod a+wr $env{DEVICE}" ACTION=="add", ENV{PHYSDEVBUS}=="usb-serial", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", ENV{DEVPATH}=="/class/tty/ttyS*", RUN+="/bin/chmod a+wr $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyUSB*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyACM*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", SUBSYSTEM=="tty", DEVPATH=="*ttyS*", RUN+="/bin/chmod a+rw $env{DEVNAME}" ACTION=="add", DEVPATH=="*video*", RUN+="/bin/chmod a+rw $env{DEVNAME}"
- reload udev rules: udevadm control --reload-rules
- apply new permissions: udevadm trigger --action=add
- watch udev activity: udevadm monitor -p
Configure lightdm display manager
- enable it
echo lightdm | dpkg-reconfigure -fteletype lightdm systemctl disable gdm systemctl disable sddm systemctl enable lightdm
- make the MATE desktop as default
cd ~root/git/scripts/ git pull /bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
- enable login by NIS users
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
- restart lightdm
systemctl stop gdm systemctl restart lightdm
Install libpng12.so.0
Quartus 16 needs libpng12:
wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb
Install google-chrome
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb dpkg -i google-chrome-stable_current_amd64.deb
confirm autoupdate is enabled, observe dl.google.com is present in the list of repositories:
apt update ... Get:5 https://dl.google.com/linux/chrome/deb stable/main amd64 Packages [1,094 B] ...
FOLLOWING IS OBSOLETE:
Instructions from here: https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-tmp.list' apt update apt install google-chrome-stable /bin/rm -f /etc/apt/sources.list.d/google-tmp.list
Install amanda client
ONLY ONE MACHINES THAT HOST HOME DIRECTORIES
- apt install amanda-client
- edit /etc/amandahosts
amanda.triumf.ca amanda amdump
- check permissions on /etc/amandahosts:
root@daq00:/var/log/amanda# ls -l /etc/amandahosts -rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
- fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
- edit /etc/amanda-security.conf, add this line:
runtar:gnutar_path=/usr/bin/tar
On the amanda machine:
- in amanda disklist, use dump type "bsdtcp-comp-user-tar"
- su - amanda and run amcheck -c daily daq00
-bash-4.1$ amcheck -c daily daq00 Amanda Backup Client Hosts Check -------------------------------- Client check: 1 host checked in 0.092 seconds. 0 problems found. (brought to you by Amanda 3.3.7p1.git.685ff76d)
Enable rc.local
For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:
cd ~/git/scripts git pull cp -n -v etc/rc.local /etc/ chmod a+rx /etc/rc.local cp etc/rc-local.service /etc/systemd/system/ systemctl daemon-reload systemctl enable rc-local systemctl start rc-local systemctl status rc-local
Remove unwanted packages
apt remove zsys apt remove sddm
Disable unwanted services
systemctl disable mpd systemctl disable snapd systemctl disable ModemManager systemctl --global mask tracker-extract-3.service systemctl --global mask tracker-miner-fs-3.service systemctl daemon-reload
Disable sleep and suspend
note: we see some computers randomly shutdown or go to sleep, log files indicates the "sleep" or "suspend" button was pushed by user, but no such buttons actually exist. this is the fix for this:
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service
Enable crontab @reboot for MIDAS
startup scripts have a bug - cron @reboot entries for normal users can run before autofs is ready, so if the home directory is on autofs/NFS, it cannot be accessed and the cron job fails. If MIDAS is supposed to be started by cron @reboot, it will not start (there *will* be an error message in /var/log/cron).
mkdir /etc/systemd/system/cron.service.d echo -e "[Unit]\nAfter=ypbind.service autofs.service\n" > /etc/systemd/system/cron.service.d/local.conf systemctl daemon-reload systemctl cat cron.service
Explore the systemd dependency tree using "systemctl list-dependencies" maybe with "--all".
Visualize the exact boot sequence from previous boot: "systemd-analyze plot > xxx.svg", look at the svg file using a web browser.
Crontab entry to start midas: (install in the midas user crontab, not root crontab)
su - midasuser crontab -l #@reboot /bin/bash -l -c "/home/trinat/bin/start-daq-applications" #@reboot /bin/tcsh -c "/home/trinat/bin/start-daq-applications"
Install apache httpd proxy for midas and elog
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.
First, configure apache2:
- execute these commands:
apt -y install apache2 cd /etc/apache2
- create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
SSLSessionCache shmcb:/run/httpd/sslcache(512000) SSLSessionCacheTimeout 300 SSLRandomSeed startup file:/dev/urandom 256 SSLRandomSeed connect builtin SSLCryptoDevice builtin
- create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
<IfModule mod_ssl.c> <VirtualHost *:443> ServerName daq14.triumf.ca DocumentRoot /var/www/html ErrorLog /var/log/apache2/daq14.log SSLEngine on # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4 ## use port specified in elogd.cfg #ProxyPass /elog/ http://localhost:8082/ retry=1 ## use mhttpd port #ProxyPass / http://localhost:8080/ retry=1 Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" <Location /> SSLRequireSSL AuthType Basic AuthName "DAQ password protected site" Require valid-user # create password file: touch /etc/apache2/htpasswd # to add new user or change password: htpasswd /etc/apache2/htpasswd username AuthUserFile /etc/apache2/htpasswd </Location> </VirtualHost> </IfModule>
- stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
- stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
- enable ssl module
- enable new configurations
a2enmod ssl a2enmod headers a2enmod proxy a2enmod proxy_http a2enconf ssl-daq14 a2ensite daq14-ssl
- disable default ssl sites
a2dissite 000-default-le-ssl a2dissite 000-default ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
- check that there are no syntax problems
apache2ctl configtest
- enable and start apache2:
systemctl enable apache2 systemctl restart apache2 systemctl status apache2
- apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
- if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
- try to access https://daq14.triumf.ca
- you should see a complaint about self-signed certificate
- you should see a request for password (do not login yet)
- if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.
Second, configure certbot:
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
- check that port 80 is not used by anything:
- netstat -an | grep LISTEN | grep ^tcp | grep 80
- lsof -P | grep -i tcp | grep LISTEN | grep 80
- if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
- install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
apt install certbot python3-certbot-apache certbot certonly --standalone --installer apache
- then answer questions:
- "activate HTTPS for daq14.triumf.ca" - say ok
- "enter email address" - enter your own email address
- "please read terms..." - read the terms and say "agree"
- it will take a few moments...
- "congratulations..." - say ok.
certbot install --apache --cert-name daq14.triumf.ca
- then answer questions:
- "choose redirect..." - say "1" (no redirect)
- look inside /etc/apache2/sites-enabled/daq14-ssl.conf to see that SSLCertificateFile & co point to certbot certificates in
/etc/letsencrypt/live/daq14.triumf.ca/
- to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal
NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot
Third, activate password protection:
- as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/apache2/htpasswd htpasswd /etc/apache2/htpasswd midas
- restart apache2
systemctl restart apache2 systemctl status apache2
From here:
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
a2enmod proxy a2enmod proxy_http apache2ctl configtest systemctl restart apache2
From here:
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
a2enmod proxy a2enmod proxy_http apache2ctl configtest systemctl restart apache2
- try accessing MIDAS https://daq14.triumf.ca/ (make sure mhttpd is running)
- if it's not working, check odb setting FIXME!
- try accessing ELog https://daq14.triumf.ca/elog/ (make sure elogd is running)
- if it's not working, check elogd.cfg file and make sure
SSL = 0
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0
Enable elog PDF preview
see https://stackoverflow.com/questions/52998331/imagemagick-security-policy-pdf-blocking-conversion
- xemacs -nw /etc/ImageMagick-6/policy.xml
- remove this section at the end:
<!-- disable ghostscript format types --> <policy domain="coder" rights="none" pattern="PS" /> <policy domain="coder" rights="none" pattern="PS2" /> <policy domain="coder" rights="none" pattern="PS3" /> <policy domain="coder" rights="none" pattern="EPS" /> <policy domain="coder" rights="none" pattern="PDF" /> <policy domain="coder" rights="none" pattern="XPS" />
Install Jupyter notebook
From https://jupyter.org/install apt install python3-pip pip install jupyterlab pip install notebook ~/.local/bin/jupyter notebook watch the http://localhost:8888 URL that it printed say "no" to offer to start firefox (it will not work!) URL is: http://localhost:8888/tree?token=xxx from the machine where you are running the web browser (i.e. google-chrome), run (replace trinat@trinatdaq with the username and machine name where you started jupyter) open a new shell and run: ssh -v trinat@trinatdaq -L 8888:localhost:8888 in the web browser, open http://localhost:8888 this gives us the login page in the password or token entry field, put the token from the "tree?token=xxx" above (printed by jupyter on startup) push button "login" jupyter page should open with the list of files in the trinat home directory congratulate Brian with full success
Install ZFS quota report
If there are any ZFS volumes, install script to report disk and quota usage
cd ~/git/scripts/quotareport git pull mkdir /var/www/html/zfsquotareport cp -pv ~/git/scripts/quotareport/sorttable.js /var/www/html/zfsquotareport/ ln -s $PWD/zfsquotareport.perl /etc/cron.daily/ touch /etc/crontab
If httpd is configured to redirect "/" to MIDAS mhttpd:
- add following to /etc/apache2/sites-enabled/xxx-ssl.conf in front of "ProxyPass / ..."
- run "systemctl reload apache2"
## do not proxy zfs quota report directory ProxyPass /zfsquotareport/ !
Install PHP
- apt install php libapache2-mod-php
- systemctl restart apache2
- create /var/www/html/info.php
<?php phpinfo();
Configure TRIUMF printers
systemctl stop cups systemctl disable cups echo "ServerName printers.triumf.ca" > /etc/cups/client.conf lpstat -a
Enable core dumps
By default, Ubuntu LTS 20.04 installs the apport package which disabled core dumps from user applications. (google it up!). It is not meant to do this and documentation claims that it is not installed and not enabled by default. Oh, well...
apt remove apport apt autoremove ### will remove apport-symptoms and a few other packages
After this, core dumps are written to file "core" in the current directory. See /proc/sys/kernel/core_pattern and /proc/sys/kernel/core_uses_pid.
Enable core dump file names to include process id, add following to /etc/rc.local
echo 1 > /proc/sys/kernel/core_uses_pid
Enable debugger
By default, Ubuntu LTS 20.04 does not permit debugger to attach and debug already running programs. To enable it, add following to /etc/rc.local
echo 0 > /proc/sys/kernel/yama/ptrace_scope
Disable Ubuntu Pro nag
If "apt upgrade" requests Ubuntu Pro or esm-apps, disable the nag:
/bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf
Update packages
- apt-get update # update package list
- apt-get dist-upgrade # install updated packages and update "kept back" packages
- apt-get autoremove # remove packages that apt thinks should be removed
Finish installation
Congratulations. There is nothing more to do!
- reboot
shutdown -r now
Install ZFS
!!! after installing all the packages, after updating the system, after updating the linux kernel, after rebooting into latest kernel !!!
apt-get install zfsutils-linux
Follow generic ZFS instructions: ZFS
Update to new version of Ubuntu
vi /etc/update-manager/release-upgrades # set "Prompt=normal" do-release-upgrade
Update Ubuntu LTS 20.04 to LTS 22.04:
apt remove zsys
daqubuntu
# reboot to clear out all updates # vi /etc/update-manager/release-upgrades # set "Prompt=normal" # do-release-upgrade -c Checking for a new Ubuntu release New release '22.04 LTS' available. Run 'do-release-upgrade' to upgrade to it. # do-release-upgrade ... say yes... ... login.defs, say "Y" (erase local changes, use packaged version) /etc/systemd/resolved.conf, say "Y" (same as above) firefox snap, say yes unable to reach snap store, say "skip" /etc/gmond.conf, say "Y" /var/yp/Makefile, say "install the package maintainer's version" /etc/ypserv.conf, same thing /etc/ypserv.securenets, same thing /etc/default/nis, same thing /etc/speech-dispatcher/modules/mary-generic.conf, same thing /etc/apt/apt.conf.d/50unattended-upgrades, same thing ... 278 packages are going to be removed, say yes ... restart required, say yes ... no ping... yes ping... ... ssh daqubuntu, ok apt update, fail, DNS does not work, "host security.ubuntu.com" does not resolve. fix resolver per https://daq00.triumf.ca/DaqWiki/index.php/Ubuntu#Disable_NetworkManager apt update, apt upgrade now works, 0 packages to update NIS does not work.
midm9a
login.defs firefox snap gmond.conf ypserv /etc/default/nis unattended-upgrades amanda-security.conf remove obsolete (no) reboot configure dns reenable nis
daq17
firefox snap imagemagick policy.xml gmond.conf chrony.conf /var/yp/Makefile ypserv.conf ypserv.securenets /etc/default/nis 50unattended-upgrades
daq00
per https://serverpilot.io/docs/how-to-upgrade-ubuntu-20.04-to-22.04/
do-release-upgrade -f DistUpgradeViewNonInteractive
if it exists "too soon" without doing anything, run it without "-f xxx", most likely it does not like something about this machine. in case of daq00 it did not like how the EFI partitions were mounted. after fixing it, non-interactive upgrade was successful.
isdaq08
- prepare
cd ~/git/scripts git pull cd ~ apt -y install debsums
- check for modified config files that make upgrade unhappy, deal with all files reported by debsums.
root@isdaq08:~# debsums -ce /etc/ganglia/gmond.conf /etc/yp.conf /etc/apt/apt.conf.d/10periodic root@isdaq08:~#
- restore original /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Download-Upgradeable-Packages "0"; APT::Periodic::AutocleanInterval "0";
- apt remove ganglia-monitor
- apt remove nis
- "debsums -ce" is now empty
Run the upgrade:
- do-release-upgrade -f DistUpgradeViewNonInteractive
Post upgrade:
- configure DNS
- apt -y install linux-generic-hwe-22.04
- /bin/cp -v ~/git/scripts/etc/99apt-conf-ko /etc/apt/apt.conf.d/ # restore nightly updates
- /bin/rm /etc/apt/apt.conf.d/20apt-esm-hook.conf # remove the ubuntu-pro nag
- install missing packages
- restore ganglia
- restore nis
- check zpool status, may need zpool upgrade
- reboot
Upgrade to new version of Debian
https://www.debian.org/releases/bookworm/amd64/release-notes/ch-upgrading.en.html
32-bit VME processor Debian 11 to 12
- cd git/scripts; git pull; cd ~
- apt update
- apt upgrade
- edit /etc/apt/sources.list
deb http://deb.debian.org/debian/ bookworm main #deb http://deb.debian.org/debian/ bullseye main #deb-src http://deb.debian.org/debian/ bullseye main
- apt update
- apt upgrade --without-new-pkgs
- apt full-upgrade
- apt list '~c'; apt purge '~c' # purge left-over config files [residual-config]
- reboot
Ubuntu package manager
- apt-get install xxx # install package xxx
- apt-get update
- apt-get upgrade
- apt-get dist-upgrade
- apt-get autoremove # remove automatically installed packages required by a removed package
- apt-get remove xxx # remove package xxx
- apt-cache search . # list all available packages
- apt-cache show "." | grep ^Package # list al available packages
- apt-cache madison root-system # show all available versions of package root-system
- apt list # list all installed packages
- dpkg --listfiles libpng16-16 # list all files from this package
- apt list --installed # list all installed packages
- dpkg -S /bin/bash # what package provides this file?
- dpkg -L bash # what files provided by this package?
- debsums -ce # show modified config files
- apt-config dump # show apt configuration
Ubuntu zsys
NOTE: DO NOT USE ZSYS, see https://github.com/ubuntu/zsys/issues/218 and https://github.com/ubuntu/zsys/issues/230
- manual removal of old snapshots
zsysctl show zsysctl state remove xy69ye -s zsysctl state remove xy69ye zsysctl state remove xy69ye -u wheel
- apt remove zsys
NOTE: old zsys snapshots must be cleaned manually, "zsysctl state remove xxx --system" is broken and does not remove user data snapshots
- manages system snapshots
- documentation: https://github.com/ubuntu/zsys
- documentation: (go to next article via link "newer" at the bottom) https://didrocks.fr/2020/05/21/zfs-focus-on-ubuntu-20.04-lts-whats-new/
- ubuntu 20.04 bug, too many snapshots cause /boot to become full and updates fail. https://github.com/ubuntu/zsys/issues/155
- solution: use custom /etc/zsys.conf, limit number of snapshots to 10, see trinatdaq:/etc/zsys.conf
- zsys commands:
update-grub # list of all snapshots, errors if some snapshots are broken zsysctl state remove lnc0k7 --system # remove snapshot xemacs -nw /etc/zsys.conf; zsysctl service reload; zsysctl service gc # cause gc to run with new settings in zsys.conf zfs list -r -t snapshot -o name,used,referenced,creation bpool/BOOT # list snapshots zsysctl show # show snapshots
Ubuntu cloning
to clone a ubuntu image:
cd /nfsroot/lxcpet emacs -nw etc/hostname ### change hostname emacs -nw etc/mailname ### change hostname (debian 11) emacs -nw etc/defaultdomain ### change the NIS domainname emacs -nw etc/yp.conf ### change the NIS server cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information emacs -nw root/.ssh/authorized_keys ### update root ssh keys
Ubuntu boot loader
boot from ZFS
- use UEFI boot with syslinux, see here: https://daq.triumf.ca/DaqWiki/index.php/SLinstall#Configure_UEFI_boot
- apt install zfs-initramfs
- update-initramfs -v -u
- ZFS structure:
root@daq00:~# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 147G 1.62T 96K / rpool/ROOT 17.8G 1.62T 96K none rpool/ROOT/ubuntu_00aaaa 17.8G 1.62T 6.22G /
- copy OS image to rpool/ROOT/ubuntu_00aaaa
- zfs set mountpoint=/ rpool
- zfs set mountpoint=none rpool/ROOT
- zfs set mountpoint=/ rpool/ROOT/ubuntu_00aaaa
- zfs get all | grep mountpoint
rpool mountpoint / local rpool/ROOT mountpoint none local rpool/ROOT/ubuntu_00aaaa mountpoint / local
- in linux kernel command line (syslinux.cfg), set "root=" to "root=ZFS=rpool/ROOT/ubuntu_00aaaa"
boot from ZFS mirror
setup the EFI partitions
- assuming /dev/sdb is already setup for EFI boot, setup /dev/sda the same way:
- partition the second boot disk same as first boot disk:
root@grsnis01:~# gdisk -l /dev/sdb Found valid GPT with protective MBR; using GPT. Number Start (sector) End (sector) Size Code Name 1 2048 1050623 512.0 MiB EF00 EFI system partition 2 1050624 3907029134 1.8 TiB 8300 Linux filesystem root@grsnis01:~#
- mkfs.msdos /dev/sdX1
- create mount points
mkdir /boot/efi-sda mkdir /boot/efi-sdb
- add to /etc/fstab
/dev/sda1 /boot/efi-sda vfat umask=0022,fmask=0022,dmask=0022,nofail 0 1 /dev/sdb1 /boot/efi-sdb vfat umask=0022,fmask=0022,dmask=0022,nofail 0 1
- mount -a
- df | grep boot
root@grsnis01:~# df | grep boot /dev/sdb1 523248 98100 425148 19% /boot/efi-sdb /dev/sda1 523248 4 523244 1% /boot/efi-sda
- copy boot files to new boot disk
- cd /boot/efi-sdX; rsync -av . /boot/efi-sdY
- set BIOS to boot from "UEFI Hard drive", disable legacy boot (except for booting from USB key in legacy mode)
- if using UEFI boot syslinux per these instructions, linux kernel update has to be done manually:
- run ~/git/scripts/etc/update_efi_mirror.perl, follow instructions that it prints.
setup zfs partitions
use partitions compatible with Ubuntu "install on ZFS"
- gdisk "o" to create new GPT partition table
- gdisk "n" +512M ef00 to create EFI partition
- gdisk "n" +2G 8200 to create linux swap partition (not used)
- gdisk "n" +2G BE00 to create ZFS bpool partition
- gdisk "n" xxx BF00 create ZFS rpool partition
# gdisk -l /dev/sda Number Start (sector) End (sector) Size Code Name 1 2048 1050623 512.0 MiB EF00 EFI System Partition 2 1050624 5244927 2.0 GiB 8200 3 5244928 9439231 2.0 GiB BE00 4 9439232 234441614 107.3 GiB BF00 root@midm9a:~#
setup zfs mirror
root@grsnis01:~# ls -l /dev/disk/by-id/ata*part2 lrwxrwxrwx 1 root root 10 Feb 19 16:47 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801081-part2 -> ../../sda2 lrwxrwxrwx 1 root root 10 Feb 19 16:47 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 -> ../../sdb2 root@grsnis01:~# zpool status pool: rpool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 ONLINE 0 0 0 errors: No known data errors root@grsnis01:~# zpool attach rpool ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 /dev/disk/by-id/ata-WDC_WDS200T2B0A-00SM50_205007801081-part2 root@grsnis01:~# zpool status pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Feb 19 16:54:39 2021 12.6G scanned at 3.16G/s, 1.02G issued at 262M/s, 12.6G total 1.02G resilvered, 8.09% done, 0 days 00:00:45 to go config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 ONLINE 0 0 0 ata-WDC_WDS200T2B0A-00SM50_205007801081-part2 ONLINE 0 0 0 (resilvering) errors: No known data errors
- wait
root@grsnis01:~# zpool status pool: rpool state: ONLINE scan: resilvered 12.7G in 0 days 00:00:40 with 0 errors on Fri Feb 19 16:55:19 2021 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ata-WDC_WDS200T2B0A-00SM50_205007801101-part2 ONLINE 0 0 0 ata-WDC_WDS200T2B0A-00SM50_205007801081-part2 ONLINE 0 0 0 errors: No known data errors
maintenance commands
- update-initramfs -v -u
- grub-install /dev/sda
Convert from single to dual mirrored ZFS SSD
Assuming Ubuntu LTS 22.04 with "instal on ZFS" option, we will add a second SSD, configure ZFS to use both SSDs in mirrored configuration and setup grub to boot from either SSD. This is intended to create a full redundant system where failure of either SSD does not break the system.
- identify first SSD
root@midm9b:~# ./smart-status.perl Disk model serial temperature realloc pending uncorr CRC err RRER Errors Link /dev/sda WD Blue SA510 2.5 250GB 22243Z803769 24 . ? ? . ? . 6.0 root@midm9b:~#
- connect second SSD of identical size
root@midm9b:~# ./smart-status.perl Disk model serial temperature realloc pending uncorr CRC err RRER Errors Link /dev/sda WD Blue SA510 2.5 250GB 22243Z803769 24 . ? ? . ? . 6.0 /dev/sdb WD Blue SA510 2.5 250GB 22243Z803852 25 . ? ? . ? . 6.0 root@midm9b:~#
- if second SSD is not autodetected, reboot
- Clone partition table automatically
If both SSDs are identical size, use this simpler method of duplicating the partition table:
root@midm9b:~# sfdisk -d /dev/sda > part_table root@midm9b:~# grep -v ^label-id part_table | sed -e 's/, *uuid=[0-9A-F-]*//' | sfdisk /dev/sdb
The grep and sed in the second command are there to prevent disk ID and partition IDs from being cloned. Alternatively the part_table file can be edited manually to remove the label-id line and the uuid entries from the individual partitions.
- Clone partition table manually (e.g. for different size disks)
- list partition table of first SSD:
root@midm9b:~# fdisk -l /dev/sda Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: WD Blue SA510 2. Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E Device Start End Sectors Size Type /dev/sda1 2048 1050623 1048576 512M EFI System /dev/sda2 1050624 5244927 4194304 2G Linux swap /dev/sda3 5244928 9439231 4194304 2G Solaris boot /dev/sda4 9439232 488397134 478957903 228.4G Solaris root root@midm9b:~#
- create identical partitions on second SSD, use sector numbers from above.
root@midm9b:~# gdisk /dev/sdb GPT fdisk (gdisk) version 1.0.8 Partition table scan: MBR: not present BSD: not present APM: not present GPT: not present Creating new GPT entries in memory. Command (? for help): n Partition number (1-128, default 1): First sector (34-488397134, default = 2048) or {+-}size{KMGTP}: Last sector (2048-488397134, default = 488397134) or {+-}size{KMGTP}: 1050623 Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): ef00 Changed type of partition to 'EFI system partition' Command (? for help): n Partition number (2-128, default 2): First sector (34-488397134, default = 1050624) or {+-}size{KMGTP}: Last sector (1050624-488397134, default = 488397134) or {+-}size{KMGTP}: 5244927 Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): 8200 Changed type of partition to 'Linux swap' Command (? for help): n Partition number (3-128, default 3): First sector (34-488397134, default = 5244928) or {+-}size{KMGTP}: Last sector (5244928-488397134, default = 488397134) or {+-}size{KMGTP}: 9439231 Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): be00 Changed type of partition to 'Solaris boot' Command (? for help): n Partition number (4-128, default 4): First sector (34-488397134, default = 9439232) or {+-}size{KMGTP}: Last sector (9439232-488397134, default = 488397134) or {+-}size{KMGTP}: Current type is 8300 (Linux filesystem) Hex code or GUID (L to show codes, Enter = 8300): bf00 Changed type of partition to 'Solaris root' Command (? for help): w Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING PARTITIONS!! Do you want to proceed? (Y/N): y OK; writing new GUID partition table (GPT) to /dev/sdb. The operation has completed successfully. root@midm9b:~# fdisk -l /dev/sda /dev/sdb Disk /dev/sda: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: WD Blue SA510 2. Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: 951A4174-B4C6-400D-99F5-BE9B5627FA8E Device Start End Sectors Size Type /dev/sda1 2048 1050623 1048576 512M EFI System /dev/sda2 1050624 5244927 4194304 2G Linux swap /dev/sda3 5244928 9439231 4194304 2G Solaris boot /dev/sda4 9439232 488397134 478957903 228.4G Solaris root Disk /dev/sdb: 232.89 GiB, 250059350016 bytes, 488397168 sectors Disk model: WD Blue SA510 2. Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disklabel type: gpt Disk identifier: EB251739-30C6-422F-A505-5887B5A0B603 Device Start End Sectors Size Type /dev/sdb1 2048 1050623 1048576 512M EFI System /dev/sdb2 1050624 5244927 4194304 2G Linux swap /dev/sdb3 5244928 9439231 4194304 2G Solaris boot /dev/sdb4 9439232 488397134 478957903 228.4G Solaris root root@midm9b:~#
- identify second SSD partitions
root@midm9b:~# ls -l /dev/disk/by-id/ata*part3 lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part3 -> ../../sda3 lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 -> ../../sdb3 root@midm9b:~# ls -l /dev/disk/by-id/ata*part4 lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4 lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4
- convert bpool from single disk to mirrored disk:
root@midm9b:~# zpool status pool: bpool state: ONLINE config: NAME STATE READ WRITE CKSUM bpool ONLINE 0 0 0 99e03dc0-7d4d-f24b-8fa1-f042b9f135db ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 f6fd54f8-3af7-b943-ae3d-a4e480537fb9 ONLINE 0 0 0 errors: No known data errors root@midm9b:~# zpool attach bpool 99e03dc0-7d4d-f24b-8fa1-f042b9f135db /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 root@midm9b:~# zpool status bpool pool: bpool state: ONLINE scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023 config: NAME STATE READ WRITE CKSUM bpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 99e03dc0-7d4d-f24b-8fa1-f042b9f135db ONLINE 0 0 0 ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 ONLINE 0 0 0 errors: No known data errors
- convert rpool
root@midm9b:~# ls -l /dev/disk/by-id/ata*part4 lrwxrwxrwx 1 root root 10 Jan 20 18:37 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803769-part4 -> ../../sda4 lrwxrwxrwx 1 root root 10 Jan 20 19:34 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 -> ../../sdb4 root@midm9b:~# zpool attach rpool f6fd54f8-3af7-b943-ae3d-a4e480537fb9 /dev/disk/by-id/ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 root@midm9b:~# zpool status rpool pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Fri Jan 20 19:40:45 2023 5.83G scanned at 664M/s, 2.92M issued at 332K/s, 9.11G total 0B resilvered, 0.03% done, no estimated completion time config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 f6fd54f8-3af7-b943-ae3d-a4e480537fb9 ONLINE 0 0 0 ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 ONLINE 0 0 0 errors: No known data errors root@midm9b:~#
- wait for resilver to complete
root@midm9b:~# zpool status pool: bpool state: ONLINE scan: resilvered 247M in 00:00:00 with 0 errors on Fri Jan 20 19:39:40 2023 config: NAME STATE READ WRITE CKSUM bpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 99e03dc0-7d4d-f24b-8fa1-f042b9f135db ONLINE 0 0 0 ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part3 ONLINE 0 0 0 errors: No known data errors pool: rpool state: ONLINE scan: resilvered 9.65G in 00:00:36 with 0 errors on Fri Jan 20 19:41:21 2023 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 f6fd54f8-3af7-b943-ae3d-a4e480537fb9 ONLINE 0 0 0 ata-WD_Blue_SA510_2.5_250GB_22243Z803852-part4 ONLINE 0 0 0 errors: No known data errors
- enable booting from second SSD: (instead of /dev/sda1, /dev/sdb1, use UUID=xxx)
root@midm9b:~# mkfs.msdos /dev/sdb1 root@midm9b:~# mkdir /boot/efi-sda root@midm9b:~# mkdir /boot/efi-sdb root@midm9b:~# echo "/dev/sda1 /boot/efi-sda vfat umask=0022,fmask=0022,dmask=0022,nofail 0 1" >> /etc/fstab root@midm9b:~# echo "/dev/sdb1 /boot/efi-sdb vfat umask=0022,fmask=0022,dmask=0022,nofail 0 1" >> /etc/fstab root@midm9b:~# mount -a root@midm9b:~# df -kl Filesystem 1K-blocks Used Available Use% Mounted on ... /dev/sda1 523244 13720 509524 3% /boot/efi /dev/sdb1 523244 4 523240 1% /boot/efi-sdb ... root@midm9b:~# rsync -av /boot/efi/ /boot/efi-sdb/ sending incremental file list EFI/ ... root@midm9b:~# ls -l /boot/efi-sda total 8 drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub root@midm9b:~# ls -l /boot/efi-sdb total 8 drwxr-xr-x 4 root root 4096 Jan 19 23:26 EFI drwxr-xr-x 5 root root 4096 Jan 19 23:26 grub root@midm9b:~#
- setup script to update grub on second SSD, it must be run manually after every kernel update
root@midm9b:~# ln -s ~/git/scripts/etc/update_efi_grub.perl ~/ root@midm9b:~# ~/update_efi_grub.perl -u EFI dir: /boot/efi-sda /boot/efi-sda: update grub: rsync -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sda/grub building file list ... done sent 5,313 bytes received 11 bytes 10,648.00 bytes/sec total size is 7,944,644 speedup is 1,492.23 /boot/efi-sda: update efi: rsync -av --delete-after --modify-window=2 /boot/efi/EFI/ /boot/efi-sda/EFI building file list ... done sent 216 bytes received 11 bytes 454.00 bytes/sec total size is 5,452,378 speedup is 24,019.29 EFI dir: /boot/efi-sdb /boot/efi-sdb: update grub: rsync -av --delete-after --modify-window=2 /boot/efi/grub/ /boot/efi-sdb/grub building file list ... done sent 5,313 bytes received 11 bytes 10,648.00 bytes/sec total size is 7,944,644 speedup is 1,492.23 /boot/efi-sdb: update efi: rsync -av --delete-after --modify-window=2 /boot/efi/EFI/ /boot/efi-sdb/EFI building file list ... done sent 216 bytes received 11 bytes 454.00 bytes/sec total size is 5,452,378 speedup is 24,019.29 root@midm9b:~#
Disable NetworkManager
NOTE: THIS IS BROKEN IN UBUNTU LTS 22.04
NetworkManager is useful for configuring dynamic network interfaces, i.e. laptops that often move between networks, or connect to multiple choice of wifi networks, etc.
For machines with statically configured network interfaces, NetworkManager is not necessary.
As it has been observed to become confused and observed to malfunction when network links go up and down (it keeps unnecessarily reconfiguring the ip address, etc), it can be usefuil to disable it.
- list all network interfaces
# /bin/ls -1 /sys/class/net/ enp0s31f6 lo
- edit /etc/network/interfaces:
rename enp0s31f6=eth0 auto eth0 iface eth0 inet static address 142.90.120.94/19 gateway 142.90.100.18
- statically configure systemd-resolved
- create /etc/systemd/resolved.conf.d/resolved.conf with this contents:
[Resolve] DNS=142.90.100.19 Domains=triumf.ca
- systemctl restart systemd-resolved
- resolvectl
- systemd-analyze cat-config systemd/resolved.conf
- disable NetworkManager
systemctl disable NetworkManager
- reboot
Configure ECC memory
Configure EDAC
- apt install edac-utils
Intel i3-2120
root@musr00:~# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X9SCL/X9SCM root@musr00:~# edac-ctl --status edac-ctl: drivers not loaded.
Intel E-2236
root@daq00:~# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SCM-F root@daq00:~# edac-ctl --status edac-ctl: drivers are loaded. root@daq00:~# edac-util edac-util: No errors to report. root@daq00:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected
- check edac sysfs files (Intel)
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count -r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count -r--r--r-- 1 root root 4096 Jan 25 15:10 max_location -r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name drwxr-xr-x 2 root root 0 Jan 25 15:10 power drwxr-xr-x 3 root root 0 Jan 25 15:10 rank0 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank1 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank2 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank3 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank4 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank5 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank6 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank7 --w------- 1 root root 4096 Jan 25 15:10 reset_counters -r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset -r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb -r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count -r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count -rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent root@daq00:~#
Intel E3-1270 v6
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SSH-F root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status edac-ctl: drivers are loaded. root@grsnis01:~# edac-util edac-util: No errors to report. root@grsnis01:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count -r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count -r--r--r-- 1 root root 4096 Feb 19 12:35 max_location -r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name drwxr-xr-x 2 root root 0 Feb 19 12:35 power drwxr-xr-x 3 root root 0 Feb 19 12:35 rank0 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank1 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank2 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank3 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank4 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank5 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank6 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank7 --w------- 1 root root 4096 Feb 19 12:35 reset_counters -r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset -r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb -r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count -r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count -rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent root@grsnis01:~#
Intel E3-1245 v6
[root@alphagdaq ~]# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SSH-F [root@alphagdaq ~]# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SSH-F [root@alphagdaq ~]# edac-ctl --status edac-ctl: drivers are loaded. [root@alphagdaq ~]# edac-util edac-util: No errors to report. [root@alphagdaq ~]# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected [root@alphagdaq ~]# ras-mc-ctl --layout +-----------------------------------------------+ | mc0 | | csrow0 | csrow1 | csrow2 | csrow3 | ----------+-----------------------------------------------+ channel1: | 8192 MB | 8192 MB | 8192 MB | 8192 MB | channel0: | 8192 MB | 8192 MB | 8192 MB | 8192 MB | ----------+-----------------------------------------------+ [root@alphagdaq ~]# ras-mc-ctl --error-count Label CE UE mc#0csrow#3channel#0 0 0 mc#0csrow#2channel#1 0 0 mc#0csrow#3channel#1 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#1channel#1 0 0 mc#0csrow#0channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#2channel#0 0 0 [root@alphagdaq ~]# ras-mc-ctl --mainboard ras-mc-ctl: mainboard: Supermicro model X11SSH-F [root@alphagdaq ~]# ras-mc-ctl --summary DBD::SQLite::db prepare failed: no such table: mc_event at /usr/sbin/ras-mc-ctl line 1129. Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1130. [root@alphagdaq ~]#
AMD 3700X
(memory is non-ECC)
root@daq13:~# edac-ctl --mainboard edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING root@daq13:~# root@daq13:~# root@daq13:~# edac-ctl --status edac-ctl: drivers not loaded. root@daq13:~# edac-util edac-util: Error: No memory controller data found. root@daq13:~# edac-util -s edac-util: EDAC drivers loaded. No memory controllers found root@daq13:~# ls -l /sys/devices/system/edac/mc total 0 drwxr-xr-x 2 root root 0 Jan 25 15:26 power lrwxrwxrwx 1 root root 0 Jan 21 16:16 subsystem -> ../../../../bus/edac -rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
(memory is ECC)
root@trinatdaq:~# edac-ctl --mainboard edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING root@trinatdaq:~# edac-ctl --status edac-ctl: drivers are loaded. root@trinatdaq:~# edac-util edac-util: No errors to report. root@trinatdaq:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected root@trinatdaq:~# ls -l /sys/devices/system/edac/mc total 0 drwxr-xr-x 7 root root 0 Dec 15 13:04 mc0 drwxr-xr-x 2 root root 0 Dec 15 13:04 power lrwxrwxrwx 1 root root 0 Dec 13 18:31 subsystem -> ../../../../bus/edac -rw-r--r-- 1 root root 4096 Dec 13 18:31 uevent root@trinatdaq:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Dec 15 13:04 ce_count -r--r--r-- 1 root root 4096 Dec 15 13:04 ce_noinfo_count -r--r--r-- 1 root root 4096 Dec 15 13:04 max_location -r--r--r-- 1 root root 4096 Dec 15 13:04 mc_name drwxr-xr-x 2 root root 0 Dec 15 13:04 power drwxr-xr-x 3 root root 0 Dec 15 13:04 rank4 drwxr-xr-x 3 root root 0 Dec 15 13:04 rank5 drwxr-xr-x 3 root root 0 Dec 15 13:04 rank6 drwxr-xr-x 3 root root 0 Dec 15 13:04 rank7 --w------- 1 root root 4096 Dec 15 13:04 reset_counters -rw-r--r-- 1 root root 4096 Dec 15 13:04 sdram_scrub_rate -r--r--r-- 1 root root 4096 Dec 15 13:04 seconds_since_reset -r--r--r-- 1 root root 4096 Dec 15 13:04 size_mb -r--r--r-- 1 root root 4096 Dec 15 13:04 ue_count -r--r--r-- 1 root root 4096 Dec 15 13:04 ue_noinfo_count -rw-r--r-- 1 root root 4096 Dec 15 13:04 uevent root@trinatdaq:~#
AMD 5000G
- no linux driver for AMD 5000-series "G" CPU
- no mention of ECC in the BIOS settings
- unclear status of ECC support in AMD documentation (sais only "pro" "G" CPUs have ECC)
- unclear status of ECC support in ASUS documentation (web page out of date)
AMD 5600X
root@daq17:~# edac-ctl --mainboard edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-XE GAMING WIFI root@daq17:~# edac-ctl --status edac-ctl: drivers are loaded. root@daq17:~# edac-util edac-util: No errors to report. root@daq17:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected root@daq17:~# ls -l /sys/devices/system/edac/mc total 0 drwxr-xr-x 7 root root 0 Aug 19 19:27 mc0 drwxr-xr-x 2 root root 0 Aug 19 19:27 power lrwxrwxrwx 1 root root 0 May 10 10:11 subsystem -> ../../../../bus/edac -rw-r--r-- 1 root root 4096 May 10 10:11 uevent root@daq17:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Aug 19 19:27 ce_count -r--r--r-- 1 root root 4096 Aug 19 19:27 ce_noinfo_count -r--r--r-- 1 root root 4096 Aug 19 19:27 max_location -r--r--r-- 1 root root 4096 Aug 19 19:27 mc_name drwxr-xr-x 2 root root 0 Aug 19 19:27 power drwxr-xr-x 3 root root 0 Aug 19 19:27 rank4 drwxr-xr-x 3 root root 0 Aug 19 19:27 rank5 drwxr-xr-x 3 root root 0 Aug 19 19:27 rank6 drwxr-xr-x 3 root root 0 Aug 19 19:27 rank7 --w------- 1 root root 4096 Aug 19 19:27 reset_counters -rw-r--r-- 1 root root 4096 Aug 19 19:27 sdram_scrub_rate -r--r--r-- 1 root root 4096 Aug 19 19:27 seconds_since_reset -r--r--r-- 1 root root 4096 Aug 19 19:27 size_mb -r--r--r-- 1 root root 4096 Aug 19 19:27 ue_count -r--r--r-- 1 root root 4096 Aug 19 19:27 ue_noinfo_count -rw-r--r-- 1 root root 4096 Aug 19 19:27 uevent root@daq17:~#
AMD 3955WX
root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --mainboard edac-ctl: mainboard: ASUSTeK COMPUTER INC. Pro WS WRX80E-SAGE SE WIFI root@alphasuperdaq:~/git/scripts/quotareport# edac-ctl --status edac-ctl: drivers are loaded. root@alphasuperdaq:~/git/scripts/quotareport# edac-util edac-util: No errors to report. root@alphasuperdaq:~/git/scripts/quotareport# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected root@alphasuperdaq:~/git/scripts/quotareport# ls -l /sys/devices/system/edac/mc total 0 drwxr-xr-x 19 root root 0 Dez 12 04:48 mc0 drwxr-xr-x 2 root root 0 Dez 12 04:48 power lrwxrwxrwx 1 root root 0 Dez 9 05:31 subsystem -> ../../../../bus/edac -rw-r--r-- 1 root root 4096 Dez 9 05:31 uevent root@alphasuperdaq:~/git/scripts/quotareport# root@alphasuperdaq:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Feb 28 22:19 ce_count -r--r--r-- 1 root root 4096 Feb 28 22:19 ce_noinfo_count -r--r--r-- 1 root root 4096 Feb 28 22:19 max_location -r--r--r-- 1 root root 4096 Feb 28 22:19 mc_name drwxr-xr-x 2 root root 0 Dez 12 04:48 power drwxr-xr-x 3 root root 0 Dez 12 04:48 rank0 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank1 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank10 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank11 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank12 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank13 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank14 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank15 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank2 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank3 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank4 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank5 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank6 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank7 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank8 drwxr-xr-x 3 root root 0 Dez 12 04:48 rank9 --w------- 1 root root 4096 Feb 28 22:19 reset_counters -rw-r--r-- 1 root root 4096 Feb 28 22:19 sdram_scrub_rate -r--r--r-- 1 root root 4096 Feb 28 22:19 seconds_since_reset -r--r--r-- 1 root root 4096 Feb 28 22:19 size_mb -r--r--r-- 1 root root 4096 Feb 28 22:19 ue_count -r--r--r-- 1 root root 4096 Feb 28 22:19 ue_noinfo_count -rw-r--r-- 1 root root 4096 Feb 28 22:19 uevent root@alphasuperdaq:~# root@alphasuperdaq:~# ras-mc-ctl --layout Use of uninitialized value $max_pos[3] in modulus (%) at /usr/sbin/ras-mc-ctl line 868. Use of uninitialized value $d in numeric ge (>=) at /usr/sbin/ras-mc-ctl line 869. Use of uninitialized value $d in sprintf at /usr/sbin/ras-mc-ctl line 872. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. Use of uninitialized value $pos[3] in join or string at /usr/sbin/ras-mc-ctl line 791. +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | mc0 | | csrow0 | csrow1 | | channel0 | channel1 | channel2 | channel3 | channel4 | channel5 | channel6 | channel7 | channel0 | channel1 | channel2 | channel3 | channel4 | channel5 | channel6 | channel7 | ----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 0: | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | 0 MB | ----+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ root@alphasuperdaq:~# ras-mc-ctl --error-count Label CE UE mc#0csrow#0channel#2 0 0 mc#0csrow#1channel#7 0 0 mc#0csrow#0channel#3 0 0 mc#0csrow#1channel#4 0 0 mc#0csrow#1channel#2 0 0 mc#0csrow#0channel#7 0 0 mc#0csrow#1channel#3 0 0 mc#0csrow#0channel#4 0 0 mc#0csrow#1channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#1channel#5 0 0 mc#0csrow#0channel#6 0 0 mc#0csrow#0channel#1 0 0 mc#0csrow#0channel#5 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#1channel#6 0 0 root@alphasuperdaq:~# ras-mc-ctl --mainboard ras-mc-ctl: mainboard: ASUSTeK COMPUTER INC. model Pro WS WRX80E-SAGE SE WIFI root@alphasuperdaq:~# ras-mc-ctl --summary No Memory errors. No PCIe AER errors. No Extlog errors. DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181. Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182. root@alphasuperdaq:~#
Configure rasdaemon
apt install rasdaemon
systemctl enable rasdaemon systemctl restart rasdaemon systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago Main PID: 2477175 (rasdaemon) Tasks: 1 (limit: 76958) Memory: 17.1M CGroup: /system.slice/rasdaemon.service └─2477175 /usr/sbin/rasdaemon -f -r Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11 Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
Get reports
- Intel 2x32GB ECC DIMMs
root@daq00:~# ras-mc-ctl --layout +-------------------------+ | mc0 | | csrow0 | csrow1 | ----------+-------------------------+ channel1: | 16384 MB | 16384 MB | channel0: | 16384 MB | 16384 MB | ----------+-------------------------+ root@daq00:~# ras-mc-ctl --error-count Label CE UE mc#0csrow#1channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#0channel#1 0 0 root@daq00:~#
- Intel 4x16GB ECC DIMMs
root@daq00:~# ras-mc-ctl --error-count Label CE UE mc#0csrow#0channel#1 0 0 mc#0csrow#2channel#0 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#2channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#1channel#1 0 0 mc#0csrow#3channel#0 0 0 mc#0csrow#3channel#1 0 0 root@daq00:~# root@daq00:~# ras-mc-ctl --layout +-----------------------+ | mc0 | | csrow0 | csrow1 | ----------+-----------------------+ channel1: | 8192 MB | 8192 MB | channel0: | 8192 MB | 8192 MB | ----------+-----------------------+ root@daq00:~# root@daq00:~# root@daq00:~# root@daq00:~# ras-mc-ctl --print-labels ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F root@daq00:~# ras-mc-ctl --mainboard ras-mc-ctl: mainboard: Supermicro model X11SCM-F root@daq00:~# ras-mc-ctl --summary No Memory errors. No PCIe AER errors. No Extlog errors. DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181. Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182. root@daq00:~#
note: ubuntu LTS 22.04 DBD::SQLite::db error is not there.
sensors
ASUS P9X79 WS
- https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
- BIOS version 4802
- modprobe nct6775
- modprobe coretemp
root@daq14:~# sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +35.0°C (high = +82.0°C, crit = +100.0°C) Core 0: +29.0°C (high = +82.0°C, crit = +100.0°C) Core 1: +24.0°C (high = +82.0°C, crit = +100.0°C) Core 2: +35.0°C (high = +82.0°C, crit = +100.0°C) Core 3: +32.0°C (high = +82.0°C, crit = +100.0°C) nouveau-pci-0200 Adapter: PCI adapter GPU core: 900.00 mV (min = +0.85 V, max = +1.00 V) temp1: +39.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: 1.04 V (min = +0.00 V, max = +1.74 V) in1: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM +3.3V: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM in4: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM in5: 2.04 V (min = +0.00 V, max = +0.00 V) ALARM in6: 904.00 mV (min = +0.00 V, max = +0.00 V) ALARM 3VSB: 3.41 V (min = +0.00 V, max = +0.00 V) ALARM Vbat: 3.30 V (min = +0.00 V, max = +0.00 V) ALARM fan1: 1265 RPM (min = 0 RPM) fan2: 1909 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +34.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +58.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermal diode AUXTIN: +31.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PECI Agent 0: +31.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +96.0°C) PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled root@daq14:~#
Enable CPU turbo mode
- Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
- Find out CPU capability
root@daq01:~# lscpu | grep Hz Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz CPU MHz: 3965.803 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 root@daq01:~#
- Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
- Find current frequency settings:
root@daq01:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 4.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 4.00 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 2.72 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes root@daq01:~#
- Note the following:
- current governor is "powersave"
- "performance" governor is available
- "boost state support" is supported and active.
- Confirm CPU frequency governor:
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor powersave powersave powersave powersave powersave powersave powersave powersave root@daq01:~#
- Change governor to "performance":
root@daq01:~# cpupower frequency-set --governor performance Setting cpu: 0 Setting cpu: 1 Setting cpu: 2 Setting cpu: 3 Setting cpu: 4 Setting cpu: 5 Setting cpu: 6 Setting cpu: 7 root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance performance performance performance performance performance performance performance root@daq01:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 4.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 4.00 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.93 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes
- monitor CPU frequency:
root@daq01:~# cpupower monitor | Nehalem || Mperf || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1 | C1E | C3 | C6 | C7s | C8 0| 0.00| 0.00| 0.00| 0.00|| 88.80| 11.20| 3973|| 0.00| 0.00| 0.01| 0.02| 0.31| 0.00| 4.25 4| 0.00| 0.00| 0.00| 0.00|| 4.70| 95.30| 3945|| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 95.03 1| 0.73| 3.70| 0.00| 0.00|| 4.52| 95.48| 3864|| 0.00| 0.01| 1.19| 0.44| 2.82| 0.00| 90.23 5| 0.73| 3.70| 0.00| 0.00|| 0.37| 99.63| 3807|| 0.00| 0.00| 0.03| 0.09| 1.70| 0.00| 97.64 2| 2.28| 12.86| 0.00| 0.00|| 1.41| 98.59| 3829|| 0.00| 0.86| 3.17| 0.46| 7.70| 0.00| 85.87 6| 2.28| 12.86| 0.00| 0.00|| 2.88| 97.12| 3856|| 0.00| 0.11| 4.56| 2.15| 10.31| 0.00| 78.99 3| 1.33| 4.81| 0.00| 0.00|| 0.99| 99.01| 3804|| 0.00| 0.49| 0.79| 0.01| 1.03| 0.00| 96.12 7| 1.34| 4.81| 0.00| 0.00|| 1.26| 98.74| 3818|| 0.00| 0.01| 2.32| 0.47| 5.02| 0.00| 90.06 root@daq01:~#
- check that the CPU is not overheating:
root@daq01:~# sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +51.0°C (high = +84.0°C, crit = +100.0°C) Core 0: +51.0°C (high = +84.0°C, crit = +100.0°C) Core 1: +38.0°C (high = +84.0°C, crit = +100.0°C) Core 2: +34.0°C (high = +84.0°C, crit = +100.0°C) Core 3: +32.0°C (high = +84.0°C, crit = +100.0°C)
- congratulations, we are running at 4 GHz now!
Setup ubuntu as gateway to private network
See also:
- https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
- http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1
Steps to do
- assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
- (on the gateway machine, each private network interface has to have a different network number)
- (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
- assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
- (for simplicity, assign 192.168.1.1 to the gateway machine itself)
- (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
- setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
- setup DHCP server (ISC dhcpd or dnsmasq) to give out the IP addresses
- setup tftp, pxelinux and NFS for diskless booting
- setup time server (chronyd) to provide common time to all devices
- setup NAT so machines on private network can access the internet (to get OS updates, etc)
- setup NIS and NFS so machines on the private network can use common home directories
- setup rsync backup of machines on the private network
setup hosts
- edit /etc/hosts
192.168.1.101 dsfe01 ... and so forth
setup dns and dhcp
- apt install dnsmasq
- edit /etc/dnsmasq.conf
# /etc/dnsmasq.conf # DNS settings #port=0 # disable DNS function port=53 # enable DNS function domain-needed bogus-priv no-resolv server=142.90.100.19 # DHCP settings interface=enp1s0f0 # DHCP interface #dhcp-range=192.168.1.50,192.168.1.150,infinite dhcp-range=192.168.1.0,static #log-dhcp quiet-dhcp #dhcp-ignore=tag:!known dhcp-boot=pxelinux.0 #dhcp-host=ac:1f:6b:9e:7f:4a,192.168.1.100,10m dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite # TFTP settings enable-tftp tftp-root=/tftpboot
- #mkdir /zssd/tftpboot ### per tftp-root (if no ZFS)
- zfs create -o mountpoint=/tftpboot rpool/tftpboot ### (if root is ZFS)
- systemctl stop systemd-resolved.service
- systemctl disable systemd-resolved.service
- rm /etc/resolv.conf
- create new /etc/resolv.conf with this contents:
nameserver 127.0.0.1 search snolab.ca
- systemctl enable dnsmasq
- systemctl restart dnsmasq
setup chronyd
- enable ntp server:
- configure and enable chronyd per instructions above
- echo "allow 192.168.1.0/24" > /etc/chrony/conf.d/allow-localhost.conf
- systemctl restart chronyd
- chronyc tracking ### wait until time is synchronized (a few seconds)
setup diskless network booting
setup pxelinux
cd ~ wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2 tar xjvf syslinux-4.03.tar.bz2 cd syslinux-4.03 cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
- cd /zssd/tftpboot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz wget http://ladd00.triumf.ca/tftpboot/modules.alias wget http://ladd00.triumf.ca/tftpboot/modules.pcimap wget http://ladd00.triumf.ca/tftpboot/pci.ids
- mkdir pxelinux.cfg
- emacs -nw pxelinux.cfg/default
default menu.c32 prompt 0 menu title Welcome to the DSVSLICE PXE boot menu timeout 50 label hdt kernel hdt.c32 label memtest86+-5.01 kernel memdisk iso initrd=memtest86+-5.01.iso.gz label memtest86+-4.20 kernel memdisk iso initrd=memtest86+-4.20.iso.zip label vmlinuz-5.3.0-26-generic menu default kernel vmlinuz-5.3.0-26-generic append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0 #end
setup linux kernel
- copy the kernel files
cd /boot rsync -av config* initrd* System.map* vmlinuz* /zssd/tftpboot/
- cd /zssd/tftpboot
- chmod a+r *
setup nfs
- apt-get install nfs-kernel-server
- emacs -nw /etc/exports
/zssd/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
- enable services
systemctl enable nfs-server systemctl enable nfs-mountd systemctl enable nfs-idmapd systemctl restart nfs-server systemctl restart nfs-mountd systemctl restart nfs-idmapd
- after editing /etc/exports, run
exportfs -av
setup userland
- zfs create zssd/nfsroot
- zfs set dedup=verify zssd/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
- clone ubuntu
mkdir /zssd/nfsroot/dsfe01 cd / rsync -avx . /zssd/nfsroot/dsfe01
- edit config files:
- cd /zssd/nfsroot/dsfe01
- emacs -nw etc/hostname ### change to dsfe01
- emacs -nw etc/mailname ### change to dsfe01
- emacs -nw etc/yp.conf ### change daq00.triumf.ca to musr00.triumf.ca
- emacs -nw etc/defaultdomain ### change to MUSR-NIS
- cp -pvf ../lxcpet-SL610/etc/ssh/*key* etc/ssh/ ### preserve the ssh keys
- emacs -nw opt/gonodeinfo/gonodeinfo.conf ### update information
- emacs -nw root/.ssh/authorized_keys ### update root ssh keys
- emacs -nw etc/fstab ### add this
192.168.1.1:/zssd/nfsroot/dsfe01 / nfs defaults,nolock 0 0
- emacs -nw etc/chrony/chrony.conf
- comment-out all "pool" and "server" entries
- add entry "server 192.168.1.1 iburst"
After dsfe01 is booted:
- disable services:
systemctl disable apache2 systemctl disable dnsmasq systemctl disable zfs-import-cache
To setup additional machines, clone dsfe01 instead of cloning the gateway machine
Allow manpages to be viewed
If /
is mounted over NFS, man
will report a permission error. Fix it with:
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/ apparmor_parser -R /etc/apparmor.d/usr.bin.man
on the gateway machine
- define netgroups
- emacs -nw /etc/netgroup
dsfe (dsfe01,,) (dsfe02,,)
- emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
netgroup: files
- export the home directories:
- emacs -nw /etc/exports ### add this:
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
- exportfs -rc
on the frontend machine
- mkdir /home
- emacs -nw /etc/fstab ### add this:
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
- mount -a
setup NAT
NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
In these examples:
- replace "eno1" with name of the outgoing interface (the one connected to the TRIUMF network).
- replace "enp11s0" with name of the private network interface (192.168.1.x network)
- emacs -nw /etc/rc.local ### add this:
# /etc/rc.local /sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE iptables -L -v # uncomment following lines if machine has prohibitive FORWARD rules: #/sbin/iptables -I FORWARD -i eno1 -o enp11s0 -m state --state RELATED,ESTABLISHED -j ACCEPT #/sbin/iptables -I FORWARD -i enp11s0 -o eno1 -j ACCEPT #iptables -L -v iptables -L -v sysctl -w net.ipv4.ip_forward=1 #sysctl -a | grep forward sh /etc/firewall-rfc1918.sh # end
- emacs -nw /etc/firewall-rfc1918.sh
# firewall-rfc1918.sh # prevent RFC1918 private network IP addresses from # going in and out from our uplink. ETH=eno1 iptables -F in-rfc1918 iptables -N in-rfc1918 iptables -A in-rfc1918 --dst 10.0.0.0/8 -j REJECT iptables -A in-rfc1918 --dst 172.16.0.0/12 -j REJECT iptables -A in-rfc1918 --dst 192.168.0.0/16 -j REJECT iptables -D INPUT -j in-rfc1918 -i $ETH iptables -D INPUT -j in-rfc1918 -i $ETH iptables -I INPUT -j in-rfc1918 -i $ETH iptables -F out-rfc1918 iptables -N out-rfc1918 iptables -A out-rfc1918 --dst 10.0.0.0/8 -j REJECT iptables -A out-rfc1918 --dst 172.16.0.0/12 -j REJECT iptables -A out-rfc1918 --dst 192.168.0.0/16 -j REJECT iptables -D OUTPUT -j out-rfc1918 -o $ETH iptables -D OUTPUT -j out-rfc1918 -o $ETH iptables -I OUTPUT -j out-rfc1918 -o $ETH iptables -L -v #end
KVM
apt install cpu-checker root@daq13:~# kvm-ok INFO: /dev/kvm exists KVM acceleration can be used root@daq13:~# (if not, shutdown, go into BIOS settings, enable CPU virtualization) apt install virtinst ### will install many packages apt install libvirt-clients libvirt-daemon-system-systemd libvirt-daemon qemu qemu-kvm libvirt-daemon-system virtinst bridge-utils root@daq13:/home1/wheel# virsh list --all Id Name State ------------------------------ 1 ubuntu-guest running apt install virt-manager virt-install --name ubuntu-guest --os-variant ubuntu20.04 --vcpus 2 --ram 2048 --location /daq/daqstore/olchansk/linux/Ubuntu/ubuntu-20.04.3-desktop-amd64.iso --network bridge=virbr0,model=virtio --graphics none --extra-args='console=ttyS0,115200n8 serial' virtual machine will start, boot, etc to get out of it, CTRL + Shift followed by ] ssh wheel@daq13 virt-manager run virt-install again, omit "--graphics none", open graphics console from virt-manager, it booted into ubuntu installer desktop virt-install --name test10 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --filesystem /kvm_ladd00,/ --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial" --graphics none virt-install --name test14 --os-variant centos6.10 --vcpus 2 --ram 2048 --import --disk /tmp/xxx/ladd00.img,bus=sata --network bridge=virbr0,model=virtio --boot kernel=/kvm_ladd00/boot/vmlinuz-2.6.32-754.35.1.el6.x86_64,initrd=/kvm_ladd00/boot/initramfs-2.6.32-754.35.1.el6.x86_64.img,kernel_args="root=/dev/sda console=ttyS0,115200n8 serial rdshell" --graphics none --check path_in_use=off
build image
dd if=/dev/zero of=/tmp/xxx/ladd00.img bs=1024M count=20 mkfs.ext3 /tmp/xxx/ladd00.img ### ext4 fails to mount by SL6 kernel, "unknown ext4 options" cd /kvm_ladd00/ mount -o loop /tmp/xxx/ladd00.img /mnt/tmp rsync -av . /mnt/tmp/ --delete umount /mnt/tmp
on the guest, configure network: /etc/rc.local
#!/bin/sh # # This script will be executed *after* all the other init scripts. # You can put your own initialization stuff in here if you don't # want to do the full Sys V style init stuff. touch /var/lock/subsys/local ifconfig eth2 192.168.122.2 route add -net 0.0.0.0 gw 192.168.122.1 ifconfig -a netstat -rn # end
ARM cross-compiler
apt install libgcc-9-dev-arm64-cross apt install gcc-arm-linux-gnueabi apt install gcc-arm-linux-gnueabihf apt install g++-arm-linux-gnueabihf apt install g++-arm-linux-gnueabi
arm-linux-gnueabi-gcc -o ttcp1 ttcp.c -march=armv7 -static arm-linux-gnueabi-gcc -o memcpy.armv7 memcpy.cc -march=armv7 -static -O2
32-bit intel cross-compiler
Ubuntu 22.04:
apt install libstdc++-11-dev:i386 apt install zlib1g-dev:i386
NOTE: "g++ -m32" does not find libstdc++, please use "g++ -m32 -L/usr/lib/gcc/i686-linux-gnu/11/"
to cross-build 32-bit MIDAS, use "make linux32"