Ubuntu: Difference between revisions
Line 756: | Line 756: | ||
grub-mkconfig -o /boot/grub/grub.cfg | grub-mkconfig -o /boot/grub/grub.cfg | ||
</pre> | </pre> | ||
To boot from ZFS: | |||
* apt install zfs-initramfs | |||
More grub maintenance commands: | More grub maintenance commands: |
Revision as of 16:12, 19 February 2021
About Ubuntu
AAA
Ubuntu version
lsb_release -a uname -a
Ubuntu installer
- updated for Ububtu LTS 20.04.01
- download the latest Ubuntu LTS desktop installer iso image
- dd the image to a USB key
- power down, disconnect all disks (all HDDs, all SSDs, all M.2)
- connect the SSD to be used as system disk
- if system will use mirrored SSDs (using ZFS mirror), leave second SSD disconnected, we will activate it later
- power up
- boot from USB key in legacy mode or UEFI mode (select this in the BIOS boot menu - F8 for ASUS, F11 for Supermicro)
- follow the instruction:
- "try ubuntu or install ubuntu" - choose "install"
- select language - accept default
- "updates and other software" - accept default settings ("normal install")
- "installation type" - select "advanced features" and "experimental: use ZFS"
- accept partition choice
- "where are you?" - select "Vancouver" (PST time zone)
- "who are you?" - leave all fields blank, except "username" set to "wheel", "password" set to the root password. hostname will be set later after configuring the network
- installation runs in a few minutes, when finished, reboot
- login as user wheel
- answer annouying questions:
- "livepatch" - say "next"
- "help improve" - select "do not send", say "next"
- "privacy" - leave "location" as "off", say "next"
- "ready to go", say "done"
- right-click on the desktop, say "open in terminal", a shell will open
- say "sudo /bin/bash", enter the root password, you now have the root shell
- run nm-connection-editor to configure the network
- after network is up (can ping ladd00), continue with post-installation steps below
Install instructions
prepare
apt-get update apt-get upgrade
install ssh
apt install ssh
configure hostname
vi /etc/hostname
disable swap
ubuntu installer creates a 2 GB swap partition, not useful on 32-64 GB machine, disable it:
vi /etc/fstab ### comment out the "swap" line
maybe reboot
this is a good point to reboot the machine to boot the latest kernel and to set the correct hostname
install time synchronization
apt-get -y install chrony echo server time1 iburst >> /etc/chrony/chrony.conf echo server time2 iburst >> /etc/chrony/chrony.conf echo server time3 iburst >> /etc/chrony/chrony.conf systemctl disable systemd-timesyncd.service systemctl stop systemd-timesyncd.service systemctl disable ntp systemctl stop ntp systemctl enable chrony systemctl restart chrony chronyc sources chronyc tracking
install email server
dpkg-reconfigure postfix ### or apt-get install postfix ### select "satellite system", enter full hostname "xxx.triumf.ca", enter "smtp.triumf.ca" echo olchansk@triumf.ca >> ~root/.forward apt-get install -y mailutils mailx root test ^D
install missing packages
(apt eats terminal input, even the "yes |" trick does not quite work, repeat the following commands until they report that everything is installed)
yes | apt-get -y install lsb ssh tcsh ethtool yes | apt-get -y install git subversion g++ cmake yes | apt-get -y install libz-dev sqlite sqlite3 libsqlite3-dev libmysqlclient-dev unixodbc-dev yes | apt-get -y install sqliteman yes | apt-get -y install libssl-dev yes | apt-get -y install sysstat smartmontools lm-sensors # also installs postfix yes | apt-get -y install emacs xemacs21 yes | apt-get -y install mutt bsd-mailx # email clients yes | apt-get -y install liblz4-tool pbzip2 yes | apt-get -y install libc6-dev-i386 # otherwise no /usr/include/sys/types.h yes | apt-get -y install libreadline-dev yes | apt-get -y install chromium-browser chromium-codecs-ffmpeg-extra yes | apt-get -y install ubuntu-mate-themes yes | apt-get -y install minicom yes | apt-get -y install screen yes | apt-get -y install rsync strace net-tools yes | apt-get -y install emacs xemacs21 yes | apt-get -y install xfig gsfonts-x11 gsfonts-other # install fonts for xfig yes | apt-get -y install time # /usr/bin/time yes | apt-get -y install libgsl-dev # additional GNU Scientific Library yes | apt-get -y install linux-tools-common linux-tools-generic linux-tools-5.4.0-48-generic # cpupower frequency-info
install git/scripts
mkdir ~root/git cd ~root/git git clone https://ladd00.triumf.ca/~olchansk/git/scripts.git cd scripts git pull
install ganglia
yes | apt-get -y install ganglia-monitor systemctl enable ganglia-monitor
cd ~root/git/scripts git pull cp etc/gmond-ubuntu.conf /etc/ganglia/gmond.conf systemctl restart ganglia-monitor systemctl status ganglia-monitor ps -efw | grep gmond
cd ~root/git/scripts/ganglia make install ./ganglia-all.perl
install gonodeinfo
- go to https://bitbucket.org/dd1/gonodeinfo follow instructions:
yes | apt-get -y install golang mkdir ~/git cd ~/git git clone https://bitbucket.org/dd1/gonodeinfo.git cd gonodeinfo git pull make make install # install gonodeinfo agent cd ~ # this is important
- edit /etc/gonodeinfo.conf
- change "Description", "Location", "User" and "Administrator" as appropriate (or delete them)
- change "Servers" to read: Servers: ladd00.triumf.ca:8601
- run gonodeinfo
- if error is "connection refused". go to the nodeinfo server to add this client to the access control list:
- on the gonodeinfo server: run gonodereceive -a daq13
- try gonodeinfo again, there should be no error
- on the gonodeinfo server: run gonodereport, look at the web pages, the new machine should be listed now
install libz.so.1 for CentOS compatibility
yes | apt-get -y install zlib1g yes | apt-get -y install zlib1g:i386 libc6:i386 libgcc1:i386 gcc-6-base:i386
install libpng12.so.0 for Quartus compatibility
(does not work anymore!!!)
wget http://ftp.ca.debian.org/debian/pool/main/libp/libpng/libpng12-0_1.2.50-2+deb8u2_amd64.deb dpkg --install libpng12-0_1.2.50-2+deb8u2_amd64.deb
install packages for building ROOT
apt-get -y install libx11-dev libxpm-dev libxft-dev libxext-dev libpng-dev libjpeg-dev xlibmesa-glu-dev libxml2-dev libgsl-dev cmake
install desktop environments
- install MATE desktop
yes | apt-get -y install ubuntu-mate-core ubuntu-mate-desktop yes | apt-get -y install ubuntu-mate-themes
- install Cinnamon desktop
### not needed 18.04 LTS ### add-apt-repository ppa:embrosyn/cinnamon yes | apt update yes | apt-get -y install cinnamon
- install KDE desktop
yes | apt-get -y install kubuntu-desktop
- install Lxqt desktop
yes | apt-get -y install lxqt
- install Xfce4 desktop
yes | apt-get -y install xfce4
install ROOT
Please install ROOT per instructions at http://root.cern.ch.
NOTE1: The ROOT package available from Ubuntu repositories is severely out of date and cannot be used with MIDAS and ROOTANA. ### DO NOT DO THIS! apt-get install root-system
NOTE2: as of 2017-Jan-09, ROOT binary kits for Ubuntu do not work (use GCC 5 instead of GCC6), build from source instead.
Install x2go
x2go instructions, thanks to Art O.
add-apt-repository ppa:x2go/stable apt-get update apt-get install x2goserver x2goserver-xsession
Post installation
- setup hostname
xemacs -nw /etc/hostname ### add .triumf.ca to the hostname if it is missing
- install Konstantin's scripts
mkdir ~root/git cd ~root/git git clone https://ladd00.triumf.ca/~olchansk/git/scripts.git cd scripts git pull
- enable root login from ladd00
ssh localhost CTRL-C /bin/cp ~root/git/scripts/etc/authorized_keys ~root/.ssh/
- install smart-status
ln -s ~/git/scripts/smart-status/smart-status.perl .
- install ganglia additional data
cd ~/git/scripts/ganglia make
- enable automatic updates 1
apt-get install unattended-upgrades xemacs -nw /etc/apt/apt.conf.d/50unattended-upgrades ### uncomment Allowed-Origins "-security" and "-updates", uncomment/add "::Mail "root"",
- enable automatic updates 2
add this to: xemacs -nw /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1"; APT::Periodic::Download-Upgradeable-Packages "1"; APT::Periodic::AutocleanInterval "7"; APT::Periodic::Unattended-Upgrade "1";
IPMI instructions
IPMI is the board management hardware on Supermicro and other server motherboards. This includes hardware sensors - fan rotation speed, temperatures and power supply voltages.
apt-get install ipmitool systemctl enable ipmievd systemctl restart ipmievd
Run:
- ipmitool sel list ### event list
- ipmitool sel elist ### event list
- ipmitool sel clear ### clear event list (if it becomes full)
- ipmitool sensor ### report hardware sensors
NIS instructions
- apt-get -y install portmap nis ### will ask for NIS domain (LADD-NIS)
- dpkg-reconfigure nis ### reconfigure if already installed
- ypwhich -m
- edit /etc/default/nis
- set "NISSERVER=slave"
- set "YPSERVARGS=-p800"
- Ubuntu LTS 20.04, check that "YPBINDARGS=" is blank, remove "-no-dbus" if it is there
- edit /etc/yp.conf, comment-out everything, add "domain LADD-NIS server localhost"
- /usr/lib/yp/ypinit -s ladd00
- systemctl enable nis
- systemctl restart nis
- ypwhich -m
- ypcat -k passwd
- apt-get -y install autofs
- systemctl enable autofs
- vi /etc/nsswitch.conf ### add the automount line, modify the passwd, group and shadow lines to read this:
# begin get data from nis passwd: files nis group: files nis shadow: files nis automount: files nis netgroup: files nis # end get data from nis
- systemctl restart autofs
- enable hourly update of NIS maps
cd ~/git/scripts/etc git pull ln -s $PWD/ypxfr-cron-hourly /etc/cron.hourly
- ### NOT NEEDED sudo vi /etc/idmapd.conf ### add line: "Domain = triumf.ca"
- reboot
Fix systemd NIS breakage
!!! THIS IS NOT NEEDED FOR UBUNTU LTS 20.04 !!!
there is a delay in ssh logins for normal users. "ssh -v" shows the delay is after "pledge...". this fix removes the delay.
systemd developers think that we should not use NIS and made sure there are problems if we do. To give them credit, they do offer a workaround. Read this: https://github.com/poettering/systemd/commit/695fe4078f0df6564a1be1c4a6a9e8a640d23b67
mkdir /etc/systemd/system/systemd-logind.service.d echo -e "[Service]\nIPAddressDeny=\n" > /etc/systemd/system/systemd-logind.service.d/local.conf systemctl daemon-reload systemctl cat systemd-logind.service
Install sddm display manager (DO NOT DO THIS)
- apt-get install sddm
- apt-get install sddm-theme-"*"
- create sddm.conf:
root@daqubuntu:~# more /etc/sddm.conf [Theme] Current=maldives root@daqubuntu:~#
- dpkg-reconfigure lightdm (select sddm)
- reboot
Configure lightdm display manager
- enable it
systemctl disable gdm systemctl disable sddm systemctl enable lightdm
- make the MATE desktop as default
cd ~root/git/scripts/ git pull /bin/cp -v etc/lightdm_default_mate.conf /etc/lightdm/lightdm.conf.d/
- enable login by NIS users
/bin/cp -v etc/lightdm_enable_nis_login.conf /etc/lightdm/lightdm.conf.d/
- restart lightdm
systemctl restart lightdm
Install libpng12.so.0
Quartus 16 needs libpng12:
wget http://mirrors.kernel.org/ubuntu/pool/main/libp/libpng/libpng12-0_1.2.54-1ubuntu1_amd64.deb dpkg --install libpng12-0_1.2.54-1ubuntu1_amd64.deb
Install google-chrome
Instructions from here: https://www.ubuntuupdates.org/ppa/google_chrome?dist=stable
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add - sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list' apt-get update apt-get install google-chrome-stable
Install amanda client
- apt install amanda-client
- edit /etc/amandahosts
amanda.triumf.ca amanda amdump
- check permissions on /etc/amandahosts:
root@daq00:/var/log/amanda# ls -l /etc/amandahosts -rw------- 1 backup backup 49 Jan 27 10:48 /etc/amandahosts
- fix if needed: chown backup.backup /etc/amandahosts; chmod a= /etc/amandahosts; chmod u=wr /etc/amandahosts
- edit /etc/amanda-security.conf, add this line:
runtar:gnutar_path=/usr/bin/tar
On the amanda machine:
- in amanda disklist, use dump type "bsdtcp-comp-user-tar"
- su - amanda and run amcheck -c daq00
-bash-4.1$ amcheck -c daily daq00 Amanda Backup Client Hosts Check -------------------------------- Client check: 1 host checked in 0.092 seconds. 0 problems found. (brought to you by Amanda 3.3.7p1.git.685ff76d)
Enable rc.local
For reasons unknown, Ubuntu LTS 20.04 does not enable /etc/rc.local. Do this:
- create /etc/rc.local with this content
#!/bin/bash exit 0
- chmod a+rx /etc/rc.local
- create systemd service file /etc/systemd/system/rc-local.service
[Unit] Description=/etc/rc.local Support ConditionPathExists=/etc/rc.local After=network.target nis.service autofs.service [Service] Type=forking ExecStart=/etc/rc.local start TimeoutSec=0 StandardOutput=tty RemainAfterExit=yes [Install] WantedBy=multi-user.target
- systemctl daemon-reload
- systemctl enable rc-local
- systemctl start rc-local
- systemctl status rc-local
Disable unwanted services
systemctl disable mpd systemctl disable snapd systemctl disable ModemManager
Disable sleep and suspend
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target systemd-suspend.service systemd-hybrid-sleep.service
Install apache httpd proxy for midas and elog
This will configure the HTTPS/SSL certificate using "certbot" and "letsencrypt" and configure an HTTPS web server using apache2.
First, configure apache2:
- execute these commands:
apt install apache2 cd /etc/apache2
- create new file conf-available/ssl-daq14.conf # use actual hostname instead of daq14
SSLSessionCache shmcb:/run/httpd/sslcache(512000) SSLSessionCacheTimeout 300 SSLRandomSeed startup file:/dev/urandom 256 SSLRandomSeed connect builtin SSLCryptoDevice builtin
- create new file sites-available/daq14-ssl.conf # use actual hostname instead of daq14
<IfModule mod_ssl.c> <VirtualHost *:443> ServerName daq14.triumf.ca DocumentRoot /var/www/html ErrorLog /var/log/apache2/daq14.log SSLEngine on # note SSLProtocol, SSLCipherSuite and some other settings are overwritten by /etc/letsencrypt/options-ssl-apache.conf SSLProtocol all -SSLv2 -SSLv3 -TLSv1 -TLSv1.1 SSLCipherSuite HIGH:MEDIUM:!aNULL:!MD5:!SEED:!IDEA:!RC4 ## use port specified in elogd.cfg #ProxyPass /elog/ http://localhost:8082/ retry=1 ## use mhttpd port #ProxyPass / http://localhost:8080/ retry=1 Header always set Strict-Transport-Security "max-age=31536000; includeSubDomains" <Location /> SSLRequireSSL AuthType Basic AuthName "DAQ password protected site" Require valid-user # create password file: touch /etc/apache2/htpasswd # to add new user or change password: htpasswd /etc/apache2/htpasswd username AuthUserFile /etc/apache2/htpasswd </Location> </VirtualHost> </IfModule>
- stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
- stop apache2 from listening on port 80: edit /etc/apache2/ports.conf, comment-out the line "Listen 80"
- enable ssl module
- enable new configurations
a2enmod ssl a2enmod headers a2enmod proxy a2enmod proxy_http a2enconf ssl-daq14 a2ensite daq14-ssl
- disable default ssl sites
a2dissite 000-default-le-ssl a2dissite 000-default ls -l /etc/apache2/sites-enabled/ ### should show only daq14-ssl.conf
- check that there are no syntax problems
apache2ctl configtest
- enable and start apache2:
systemctl enable apache2 systemctl restart apache2 systemctl status apache2
- apache2 may fail to start, look in /var/log/apache2/error.log and /var/log/apache2/daq14.log
- if it says "Failed to configure ... certificate", proceed to the step for setting certbot.
- try to access https://daq14.triumf.ca
- you should see a complaint about self-signed certificate
- you should see a request for password (do not login yet)
- if you get "connection refused", HTTPS port 443 may need to be enabled in the local firewall, look at documentation for ufw.
Second, configure certbot:
(Note: as of 2018-01-18 certbot requires use of http port 80 to get the initial https certificate, renewal can continue to use the https port 443)
(Note: as of 2019-01-?? certbot requires use of port 80 for renewals)
- check that port 80 is not used by anything:
- netstat -an | grep LISTEN | grep ^tcp | grep 80
- lsof -P | grep -i tcp | grep LISTEN | grep 80
- if lsof reports that apache2 is listening on port 80, follow the apache2 instructions above (remove "listen 80" from apache2.conf
- install certbot (if necessary open tcp port 80 in the firewall, see documentation for ufw):
apt install certbot python3-certbot-apache certbot certonly --standalone --installer apache
- then answer questions:
- "activate HTTPS for daq14.triumf.ca" - say ok
- "enter email address" - enter your own email address
- "please read terms..." - read the terms and say "agree"
- it will take a few moments...
- "congratulations..." - say ok.
certbot install --apache --cert-name daq14.triumf.ca
- then answer questions:
- "choose redirect..." - say "1" (no redirect)
- look inside /etc/apache2/sites-enabled/ssl-daq14.conf to see that SSLCertificateFile & co point to certbot certificates in
/etc/letsencrypt/live/daq14.triumf.ca/
- to check current renewal and to update the certbot config file in /etc/letsencrypt/renewal, run this:
certbot renew --standalone --installer apache --force-renewal
NOTE: this certificate will expire in 3 months, automatic renewal should work with current version of certbot
Third, activate password protection:
- as shown in the config file above, create password file and initial user: (replace "midas" with specific username)
touch /etc/apache2/htpasswd htpasswd /etc/apache2/htpasswd midas
- restart apache2
systemctl restart apache2 systemctl status apache2
From here:
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
a2enmod proxy a2enmod proxy_http apache2ctl configtest systemctl restart apache2
From here:
- enable proxy for MIDAS mhttpd - uncomment redirect in the config file above
- enable proxy for ELOG - ditto
a2enmod proxy a2enmod proxy_http apache2ctl configtest systemctl restart apache2
- try accessing MIDAS https://daq14.triumf.ca/ (make sure mhttpd is running)
- if it's not working, check odb setting FIXME!
- try accessing ELog https://daq14.triumf.ca/elog/ (make sure elogd is running)
- if it's not working, check elogd.cfg file and make sure
SSL = 0
NOTE: if certbot fails with errors about 'module' object has no attribute 'pyopenssl', try this: pip install requests==2.6.0
Install PHP
- apt install php libapache2-mod-php
- systemctl restart apache2
- create /var/www/html/info.php
<?php phpinfo();
Update packages
- apt-get update # update package list
- apt-get dist-upgrade # install updated packages and update "kept back" packages
- apt-get autoremove # remove packages that apt thinks should be removed
Finish installation
- reboot
shutdown -r now
Install ZFS
!!! after installing all the packages, after updating the system, after updating the linux kernel, after rebooting into latest kernel !!!
apt-get install zfsutils-linux
Follow generic ZFS instructions: ZFS
Update to new version of Ubuntu
vi /etc/update-manager/release-upgrades # set "Prompt=normal" do-release-upgrade
Ubuntu package manager
- apt-get install xxx # install package xxx
- apt-get update
- apt-get upgrade
- apt-get dist-upgrade
- apt-get autoremove # remove automatically installed packages required by a removed package
- apt-get remove xxx # remove package xxx
- apt-cache search . # list all available packages
- apt-cache show "." | grep ^Package # list al available packages
- apt-cache madison root-system # show all available versions of package root-system
- apt list # list all installed packages
- dpkg --listfiles libpng16-16 # list all files from this package
- apt list --installed # list all installed packages
Ubuntu grub boot loader
This will enable the grub menu (with a 10 sec timeout) and replace black screen with exciting linux boot messages.
- edit /etc/default/grub
GRUB_DEFAULT=0 #GRUB_TIMEOUT_STYLE=hidden GRUB_TIMEOUT=10 GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` #GRUB_CMDLINE_LINUX_DEFAULT="vga=769 video=640x480" GRUB_CMDLINE_LINUX="" #GRUB_GFXMODE=640x480
- update grub config:
grub-mkconfig -o /boot/grub/grub.cfg
To boot from ZFS:
- apt install zfs-initramfs
More grub maintenance commands:
- update-initramfs -v -u
- grub-install /dev/sda
Disable NetworkManager
NetworkManager is useful for configuring dynamic network interfaces, i.e. laptops that often move between networks, or connect to multiple choice of wifi networks, etc.
For machines with statically configured network interfaces, NetworkManager is not necessary.
As it has been observed to become confused and observed to malfunction when network links go up and down (it keeps unnecessarily reconfiguring the ip address, etc), it can be usefuil to disable it.
- list all network interfaces
# /bin/ls -1 /sys/class/net/ enp0s31f6 lo
- edit /etc/network/interfaces:
rename enp0s31f6=eth0 auto eth0 iface eth0 inet static address 142.90.120.94/22 gateway 142.90.100.18
- statically configure systemd-resolved
xemacs -nw /etc/systemd/resolved.conf ### to read this: XXX [Resolve] DNS=142.90.100.19 Domains=triumf.ca XXX systemctl restart systemd-resolved resolvectl
- disable NetworkManager
systemctl disable NetworkManager
- reboot
Configure ECC memory
Configure EDAC
- apt install edac-utils
Intel E-2236
root@daq00:~# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SCM-F root@daq00:~# edac-ctl --status edac-ctl: drivers are loaded. root@daq00:~# edac-util edac-util: No errors to report. root@daq00:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected
- check edac sysfs files (Intel)
root@daq00:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Jan 25 15:10 ce_count -r--r--r-- 1 root root 4096 Jan 25 15:10 ce_noinfo_count -r--r--r-- 1 root root 4096 Jan 25 15:10 max_location -r--r--r-- 1 root root 4096 Jan 25 15:10 mc_name drwxr-xr-x 2 root root 0 Jan 25 15:10 power drwxr-xr-x 3 root root 0 Jan 25 15:10 rank0 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank1 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank2 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank3 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank4 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank5 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank6 drwxr-xr-x 3 root root 0 Jan 25 15:10 rank7 --w------- 1 root root 4096 Jan 25 15:10 reset_counters -r--r--r-- 1 root root 4096 Jan 25 15:10 seconds_since_reset -r--r--r-- 1 root root 4096 Jan 25 15:10 size_mb -r--r--r-- 1 root root 4096 Jan 25 15:10 ue_count -r--r--r-- 1 root root 4096 Jan 25 15:10 ue_noinfo_count -rw-r--r-- 1 root root 4096 Jan 25 15:10 uevent root@daq00:~#
Intel E3-1270 v6
root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --mainboard edac-ctl: mainboard: Supermicro X11SSH-F root@wheel-SYS-5019S-M:~/git/scripts# edac-ctl --status edac-ctl: drivers are loaded. root@grsnis01:~# edac-util edac-util: No errors to report. root@grsnis01:~# edac-util -s edac-util: EDAC drivers are loaded. 1 MC detected root@grsnis01:~# ls -l /sys/devices/system/edac/mc/mc0 total 0 -r--r--r-- 1 root root 4096 Feb 19 12:35 ce_count -r--r--r-- 1 root root 4096 Feb 19 12:35 ce_noinfo_count -r--r--r-- 1 root root 4096 Feb 19 12:35 max_location -r--r--r-- 1 root root 4096 Feb 19 12:35 mc_name drwxr-xr-x 2 root root 0 Feb 19 12:35 power drwxr-xr-x 3 root root 0 Feb 19 12:35 rank0 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank1 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank2 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank3 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank4 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank5 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank6 drwxr-xr-x 3 root root 0 Feb 19 12:35 rank7 --w------- 1 root root 4096 Feb 19 12:35 reset_counters -r--r--r-- 1 root root 4096 Feb 19 12:35 seconds_since_reset -r--r--r-- 1 root root 4096 Feb 19 12:35 size_mb -r--r--r-- 1 root root 4096 Feb 19 12:35 ue_count -r--r--r-- 1 root root 4096 Feb 19 12:35 ue_noinfo_count -rw-r--r-- 1 root root 4096 Feb 19 12:35 uevent root@grsnis01:~#
AMD 3700X
(memory is non-ECC)
root@daq13:~# edac-ctl --mainboard edac-ctl: mainboard: ASUSTeK COMPUTER INC. ROG STRIX B550-E GAMING root@daq13:~# root@daq13:~# root@daq13:~# edac-ctl --status edac-ctl: drivers not loaded. root@daq13:~# edac-util edac-util: Error: No memory controller data found. root@daq13:~# edac-util -s edac-util: EDAC drivers loaded. No memory controllers found root@daq13:~# ls -l /sys/devices/system/edac/mc total 0 drwxr-xr-x 2 root root 0 Jan 25 15:26 power lrwxrwxrwx 1 root root 0 Jan 21 16:16 subsystem -> ../../../../bus/edac -rw-r--r-- 1 root root 4096 Jan 21 16:16 uevent
Configure rasdaemon
- apt install rasdaemon
- systemctl enable rasdaemon
- systemctl start rasdaemon
- systemctl status rasdaemon
● rasdaemon.service - RAS daemon to log the RAS events Loaded: loaded (/lib/systemd/system/rasdaemon.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2021-01-25 15:16:37 PST; 3min 5s ago Main PID: 2477175 (rasdaemon) Tasks: 1 (limit: 76958) Memory: 17.1M CGroup: /system.slice/rasdaemon.service └─2477175 /usr/sbin/rasdaemon -f -r Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: ras:extlog_mem_event event enabled Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Enabled event ras:extlog_mem_event Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: ras:extlog_mem_event event enabled Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Listening to events for cpus 0 to 11 Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: Enabled event ras:extlog_mem_event Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mc_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording aer_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording extlog_event events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording mce_record events Jan 25 15:16:37 daq00.triumf.ca rasdaemon[2477175]: rasdaemon: Recording arm_event events
Get reports
- Intel 2x32GB ECC DIMMs
root@daq00:~# ras-mc-ctl --layout +-------------------------+ | mc0 | | csrow0 | csrow1 | ----------+-------------------------+ channel1: | 16384 MB | 16384 MB | channel0: | 16384 MB | 16384 MB | ----------+-------------------------+ root@daq00:~# ras-mc-ctl --error-count Label CE UE mc#0csrow#1channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#0channel#1 0 0 root@daq00:~#
- Intel 4x16GB ECC DIMMs
root@daq00:~# ras-mc-ctl --error-count Label CE UE mc#0csrow#0channel#1 0 0 mc#0csrow#2channel#0 0 0 mc#0csrow#0channel#0 0 0 mc#0csrow#2channel#1 0 0 mc#0csrow#1channel#0 0 0 mc#0csrow#1channel#1 0 0 mc#0csrow#3channel#0 0 0 mc#0csrow#3channel#1 0 0 root@daq00:~# root@daq00:~# ras-mc-ctl --layout +-----------------------+ | mc0 | | csrow0 | csrow1 | ----------+-----------------------+ channel1: | 8192 MB | 8192 MB | channel0: | 8192 MB | 8192 MB | ----------+-----------------------+ root@daq00:~# root@daq00:~# root@daq00:~# root@daq00:~# ras-mc-ctl --print-labels ras-mc-ctl: Error: No dimm labels for Supermicro model X11SCM-F root@daq00:~# ras-mc-ctl --mainboard ras-mc-ctl: mainboard: Supermicro model X11SCM-F root@daq00:~# ras-mc-ctl --summary No Memory errors. No PCIe AER errors. No Extlog errors. DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1181. Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1182. root@daq00:~#
sensors
ASUS P9X79 WS
- https://www.asus.com/supportonly/P9X79%20WS/HelpDesk_Manual/
- BIOS version 4802
- modprobe nct6775
- modprobe coretemp
root@daq14:~# sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +35.0°C (high = +82.0°C, crit = +100.0°C) Core 0: +29.0°C (high = +82.0°C, crit = +100.0°C) Core 1: +24.0°C (high = +82.0°C, crit = +100.0°C) Core 2: +35.0°C (high = +82.0°C, crit = +100.0°C) Core 3: +32.0°C (high = +82.0°C, crit = +100.0°C) nouveau-pci-0200 Adapter: PCI adapter GPU core: 900.00 mV (min = +0.85 V, max = +1.00 V) temp1: +39.0°C (high = +95.0°C, hyst = +3.0°C) (crit = +105.0°C, hyst = +5.0°C) (emerg = +135.0°C, hyst = +5.0°C) nct6776-isa-0290 Adapter: ISA adapter Vcore: 1.04 V (min = +0.00 V, max = +1.74 V) in1: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM AVCC: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM +3.3V: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM in4: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM in5: 2.04 V (min = +0.00 V, max = +0.00 V) ALARM in6: 904.00 mV (min = +0.00 V, max = +0.00 V) ALARM 3VSB: 3.41 V (min = +0.00 V, max = +0.00 V) ALARM Vbat: 3.30 V (min = +0.00 V, max = +0.00 V) ALARM fan1: 1265 RPM (min = 0 RPM) fan2: 1909 RPM (min = 0 RPM) fan3: 0 RPM (min = 0 RPM) fan4: 0 RPM (min = 0 RPM) fan5: 0 RPM (min = 0 RPM) SYSTIN: +34.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor CPUTIN: +58.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermal diode AUXTIN: +31.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor PECI Agent 0: +31.0°C (high = +80.0°C, hyst = +75.0°C) (crit = +96.0°C) PCH_CHIP_TEMP: +0.0°C PCH_CPU_TEMP: +0.0°C PCH_MCH_TEMP: +0.0°C intrusion0: ALARM intrusion1: ALARM beep_enable: disabled root@daq14:~#
Enable CPU turbo mode
- Intel CPU has a nominal CPU frequency (i.e. 3.4GHz) and a turbo-boost CPU frequency (i.e. 4.0GHz). Here we will enable this turbo-boost mode.
- Find out CPU capability
root@daq01:~# lscpu | grep Hz Model name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz CPU MHz: 3965.803 CPU max MHz: 4000.0000 CPU min MHz: 800.0000 root@daq01:~#
- Look up this CPU in the Intel ARK database - google for the CPU model name, i.e.
- Find current frequency settings:
root@daq01:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 4.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 4.00 GHz. The governor "powersave" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 2.72 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes root@daq01:~#
- Note the following:
- current governor is "powersave"
- "performance" governor is available
- "boost state support" is supported and active.
- Confirm CPU frequency governor:
root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor powersave powersave powersave powersave powersave powersave powersave powersave root@daq01:~#
- Change governor to "performance":
root@daq01:~# cpupower frequency-set --governor performance Setting cpu: 0 Setting cpu: 1 Setting cpu: 2 Setting cpu: 3 Setting cpu: 4 Setting cpu: 5 Setting cpu: 6 Setting cpu: 7 root@daq01:~# cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor performance performance performance performance performance performance performance performance root@daq01:~# cpupower frequency-info analyzing CPU 0: driver: intel_pstate CPUs which run at the same hardware frequency: 0 CPUs which need to have their frequency coordinated by software: 0 maximum transition latency: Cannot determine or is not supported. hardware limits: 800 MHz - 4.00 GHz available cpufreq governors: performance powersave current policy: frequency should be within 800 MHz and 4.00 GHz. The governor "performance" may decide which speed to use within this range. current CPU frequency: Unable to call hardware current CPU frequency: 3.93 GHz (asserted by call to kernel) boost state support: Supported: yes Active: yes
- monitor CPU frequency:
root@daq01:~# cpupower monitor | Nehalem || Mperf || Idle_Stats CPU| C3 | C6 | PC3 | PC6 || C0 | Cx | Freq || POLL | C1 | C1E | C3 | C6 | C7s | C8 0| 0.00| 0.00| 0.00| 0.00|| 88.80| 11.20| 3973|| 0.00| 0.00| 0.01| 0.02| 0.31| 0.00| 4.25 4| 0.00| 0.00| 0.00| 0.00|| 4.70| 95.30| 3945|| 0.00| 0.00| 0.00| 0.00| 0.00| 0.00| 95.03 1| 0.73| 3.70| 0.00| 0.00|| 4.52| 95.48| 3864|| 0.00| 0.01| 1.19| 0.44| 2.82| 0.00| 90.23 5| 0.73| 3.70| 0.00| 0.00|| 0.37| 99.63| 3807|| 0.00| 0.00| 0.03| 0.09| 1.70| 0.00| 97.64 2| 2.28| 12.86| 0.00| 0.00|| 1.41| 98.59| 3829|| 0.00| 0.86| 3.17| 0.46| 7.70| 0.00| 85.87 6| 2.28| 12.86| 0.00| 0.00|| 2.88| 97.12| 3856|| 0.00| 0.11| 4.56| 2.15| 10.31| 0.00| 78.99 3| 1.33| 4.81| 0.00| 0.00|| 0.99| 99.01| 3804|| 0.00| 0.49| 0.79| 0.01| 1.03| 0.00| 96.12 7| 1.34| 4.81| 0.00| 0.00|| 1.26| 98.74| 3818|| 0.00| 0.01| 2.32| 0.47| 5.02| 0.00| 90.06 root@daq01:~#
- check that the CPU is not overheating:
root@daq01:~# sensors coretemp-isa-0000 Adapter: ISA adapter Package id 0: +51.0°C (high = +84.0°C, crit = +100.0°C) Core 0: +51.0°C (high = +84.0°C, crit = +100.0°C) Core 1: +38.0°C (high = +84.0°C, crit = +100.0°C) Core 2: +34.0°C (high = +84.0°C, crit = +100.0°C) Core 3: +32.0°C (high = +84.0°C, crit = +100.0°C)
- congratulations, we are running at 4 GHz now!
Setup ubuntu as gateway to private network
See also:
- https://daq.triumf.ca/DaqWiki/index.php/VME-CPU#Setup_the_boot_host_computer_.28el7.29
- http://www.triumf.info/wiki/DAQwiki/index.php/Dhcpd_on_eth1
Steps to do
- assign network numbers to the private network, i.e. 192.168.1.x, 192.168.2.x, etc
- (on the gateway machine, each private network interface has to have a different network number)
- (each network interface can have multiple networks attached, via VLANs or via eth0:0, eth0:1 constructs)
- assign IP addresses on the private network, save them in /etc/hosts i.e. "hvps 192.168.1.10"
- (for simplicity, assign 192.168.1.1 to the gateway machine itself)
- (IP addresses 192.168.1.0 and 192.168.1.255 are "special", do not use them)
- setup DNS server (dnsmasq) to serve contents of /etc/hosts via DNS (otherwise, many programs will see inconsistent name to IP address mapping)
- setup DHCP server (ISC dhcpd or dnsmasq) to give out the IP addresses
- setup tftp, pxelinux and NFS for diskless booting
- setup time server (chronyd) to provide common time to all devices
- setup NAT so machines on private network can access the internet (to get OS updates, etc)
- setup NIS and NFS so machines on the private network can use common home directories
- setup rsync backup of machines on the private network
setup hosts
- edit /etc/hosts
192.168.1.101 dsfe01 ... and so forth
setup dns and dhcp
- apt-get install dnsmasq
- edit /etc/dnsmasq.conf
# /etc/dnsmasq.conf # DNS settings #port=0 # disable DNS function port=53 domain-needed bogus-priv no-resolv server=142.90.100.19 # DHCP settings interface=enp1s0f0 # DHCP interface #dhcp-range=192.168.1.50,192.168.1.150,infinite dhcp-range=192.168.1.0,static dhcp-boot=pxelinux.0 #dhcp-host=ac:1f:6b:9e:7f:4a,192.168.1.100,10m dhcp-host=ac:1f:6b:9e:7f:4a,dsfe01,infinite # TFTP settings enable-tftp tftp-root=/zssd/tftpboot
- mkdir /zssd/tftpboot ### per tftp-root
- systemctl enable dnsmasq
- systemctl restart dnsmasq
setup chronyd
- enable ntp server:
- configure and enable chronyd per instructions above
- emacs -nw /etc/chrony/chrony.conf
- add "allow 192.168.1.0/24" at the end
- systemctl restart chronyd
- chronyc tracking ### wait until time is synchronized (a few seconds)
setup diskless network booting
setup pxelinux
cd ~ wget https://www.kernel.org/pub/linux/utils/boot/syslinux/4.xx/syslinux-4.03.tar.bz2 tar xjvf syslinux-4.03.tar.bz2 cd syslinux-4.03 cp -pv ./core/pxelinux.0 ./com32/hdt/hdt.c32 ./memdisk/memdisk ./com32/menu/menu.c32 /zssd/tftpboot/
- cd /zssd/tftpboot
wget http://ladd00.triumf.ca/tftpboot/memtest86+-4.20.iso.zip wget http://ladd00.triumf.ca/tftpboot/memtest86+-5.01.iso.gz wget http://ladd00.triumf.ca/tftpboot/modules.alias wget http://ladd00.triumf.ca/tftpboot/modules.pcimap wget http://ladd00.triumf.ca/tftpboot/pci.ids
- mkdir pxelinux.cfg
- emacs -nw pxelinux.cfg/default
default menu.c32 prompt 0 menu title Welcome to the DSVSLICE PXE boot menu timeout 50 label hdt kernel hdt.c32 label memtest86+-5.01 kernel memdisk iso initrd=memtest86+-5.01.iso.gz label memtest86+-4.20 kernel memdisk iso initrd=memtest86+-4.20.iso.zip label vmlinuz-5.3.0-26-generic menu default kernel vmlinuz-5.3.0-26-generic append initrd=initrd.img-5.3.0-26-generic boot=nfs root=/dev/nfs netboot=nfs nfsroot=192.168.1.1:/zssd/nfsroot/dsfe01 toram ip=dhcp panic=60 BOOTIF=enp1s0f0 #end
setup linux kernel
- copy the kernel files
cd /boot rsync -av config* initrd* System.map* vmlinuz* /zssd/tftpboot/
- cd /zssd/tftpboot
- chmod a+r *
setup nfs
- apt-get install nfs-kernel-server
- emacs -nw /etc/exports
/zssd/nfsroot/dsfe01 dsfe01(rw,no_root_squash,async,no_subtree_check)
- enable services
systemctl enable nfs-server systemctl enable nfs-mountd systemctl enable nfs-idmapd systemctl restart nfs-server systemctl restart nfs-mountd systemctl restart nfs-idmapd
- after editing /etc/exports, run
exportfs -av
setup userland
- zfs create zssd/nfsroot
- zfs set dedup=verify zssd/nfsroot ### enable deduplication to save disk space because most linux images have mostly identical files
- clone ubuntu
mkdir /zssd/nfsroot/dsfe01 cd / rsync -avx . /zssd/nfsroot/dsfe01
- edit config files:
- cd /zssd/nfsroot/dsfe01
- emacs -nw etc/hostname ### change to dsfe01
- emacs -nw etc/fstab ### add this
192.168.1.1:/zssd/nfsroot/dsfe01 / nfs defaults,nolock 0 0
- emacs -nw etc/chrony/chrony.conf
- comment-out all "pool" and "server" entries
- add entry "server 192.168.1.1 iburst"
After dsfe01 is booted:
- disable services:
systemctl disable apache2 systemctl disable dnsmasq systemctl disable zfs-import-cache
To setup additional machines, clone dsfe01 instead of cloning the gateway machine
Allow manpages to be viewed
If /
is mounted over NFS, man
will report a permission error. Fix it with:
ln -s /etc/apparmor.d/usr.bin.man /etc/apparmor.d/disable/ apparmor_parser -R /etc/apparmor.d/usr.bin.man
on the gateway machine
- define netgroups
- emacs -nw /etc/netgroup
dsfe (dsfe01,,) (dsfe02,,)
- emacs -nw /etc/nsswitch.conf ### edit the netgroup line to read:
netgroup: files
- export the home directories:
- emacs -nw /etc/exports ### add this:
/zssd/home1 @dsfe(rw,no_root_squash,async,no_subtree_check)
- exportfs -rc
on the frontend machine
- mkdir /home
- emacs -nw /etc/fstab ### add this:
192.168.1.1:/zssd/home1 /home nfs defaults 0 0
- mount -a
setup NAT
NAT allows machines on the private network to connect to the internet: https://en.wikipedia.org/wiki/Network_address_translation
- emacs -nw /etc/rc.local ### add this:
# /etc/rc.local /sbin/iptables -t nat -A POSTROUTING -o eno1 -j MASQUERADE iptables -L -v #/sbin/iptables -A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT #/sbin/iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT #iptables -L -v iptables -L -v sysctl -w net.ipv4.ip_forward=1 #sysctl -a | grep forward /etc/firewall-rfc1918.sh # end
# firewall-rfc1918.sh # prevent RFC1918 private network IP addresses from # going in and out from our uplink. ETH=eno1 iptables -F in-rfc1918 iptables -N in-rfc1918 iptables -A in-rfc1918 --dst 10.0.0.0/8 -j REJECT iptables -A in-rfc1918 --dst 172.16.0.0/12 -j REJECT iptables -A in-rfc1918 --dst 192.168.0.0/16 -j REJECT iptables -D INPUT -j in-rfc1918 -i $ETH iptables -D INPUT -j in-rfc1918 -i $ETH iptables -I INPUT -j in-rfc1918 -i $ETH iptables -F out-rfc1918 iptables -N out-rfc1918 iptables -A out-rfc1918 --dst 10.0.0.0/8 -j REJECT iptables -A out-rfc1918 --dst 172.16.0.0/12 -j REJECT iptables -A out-rfc1918 --dst 192.168.0.0/16 -j REJECT iptables -D OUTPUT -j out-rfc1918 -o $ETH iptables -D OUTPUT -j out-rfc1918 -o $ETH iptables -I OUTPUT -j out-rfc1918 -o $ETH iptables -L -v #end