ZFS: Difference between revisions

From DaqWiki
Jump to navigation Jump to search
Line 227: Line 227:
</pre>
</pre>


=== Quotas ===
=== Quotas and disk use ===


* zfs userspace zssd/home1 -s used
* zfs userspace zssd/home1 -s used

Revision as of 15:42, 3 March 2020

Documentation

Misc commands

  • zpool status
  • zpool get all
  • zpool iostat 1
  • zpool iostat -v 1
  • zpool history
  • zpool scrub data14
  • zpool events
  • arcstat.py 1
  • cat /proc/spl/kstat/zfs/arcstats
  • echo 30000000000 > /sys/module/zfs/parameters/zfs_arc_meta_limit
  • echo 32000000000 > /sys/module/zfs/parameters/zfs_arc_max
  • zfs get all
  • zfs set dedup=verify zssd/nfsroot
  • zpool create data14 raidz2 /dev/sd[b-h]1
  • zfs create z8tb/data
  • zfs destroy z8tb/data
  • zpool add z10tb cache /dev/disk/by-id/ata-ADATA_SP550_2F4320041688
  • parted /dev/sdx mklabel GPT
  • blkid
  • zpool iostat -v -q 1
  • watch -d -n 1 "cat /proc/spl/kstat/zfs/arcstats | grep l2"
  • zfs set primarycache=metadata tank/datab
  • zfs set secondarycache=metadata tank/datab
  • zfs userspace -p -H zssd/home1
  • zfs groupspace ...
  • zdb -vvv -O pool/gobackup/titan00__home1 data/home1/titan/packages/elog/logbooks/titan/2017
  • zdb -C pool | grep ashift ### find the real value of ashift
  • zfs snapshot -r pool_A@migrate
  • zfs send -R pool_A@migrate | zfs receive -F pool_B
  • echo 1 > /sys/module/zfs/parameters/zfs_send_corrupt_data # zfs send should not stop on i/o errors
  • zpool create test raidz2 `ls -1 /dev/disk/by-id/ata-WDC_WD40EZRX-00SPEB0_WD* | grep -v part`
  • zpool add -f test special mirror /dev/disk/by-id/ata-WDC_WDS120G2G0A-00JH30_1843A2802212 /dev/disk/by-id/ata-KINGSTON_SV300S37A120G_50026B77630CCB2C
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: pool and new vdev with different redundancy, raidz and mirror vdevs, 2 vs. 1 (2-way)

Create raid0 (mirror) volume

echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
dracut -vf
zpool create zssd mirror /dev/sdaX /dev/sdbX
zpool set cachefile=none zssd
zpool set failmode=continue zssd
zpool status
zpool events
zpool get all
df /zssd
ls -l /zssd

Use whole disk for zfs mirror (RAID0)

echo USE_DISK_BY_ID=\'yes\' >> /etc/default/zfs
[root@daq13 ~]# parted /dev/sdb
(parted) mklabel GPT
(parted) q                                                                
[root@daq13 ~]# parted /dev/sdc
(parted) mklabel GPT                                                      
(parted) q                                                                
[root@daq13 ~]# blkid                                                     
/dev/sda1: UUID="ab920e4b-40ae-4551-aab8-f3e893d38830" TYPE="xfs" 
/dev/sdb: PTTYPE="gpt" 
/dev/sdc: PTTYPE="gpt" 
[root@daq13 ~]# zpool create z10tb mirror /dev/sdb /dev/sdc
[root@daq13 ~]# zpool status
  pool: z10tb
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        z10tb       ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0     0

errors: No known data errors
[root@daq13 ~]# 
[root@daq13 ~]# zfs create z10tb/emma
[root@daq13 ~]# df -kl
Filesystem      1K-blocks     Used  Available Use% Mounted on
pool           9426697856        0 9426697856   0% /pool
pool/daqstore  9426697856        0 9426697856   0% /pool/daqstore
[root@daq13 ~]# 

Enable ZFS at boot

systemctl enable zfs-import-cache
systemctl enable zfs-import-scan
systemctl enable zfs-mount
systemctl enable zfs-import.target
systemctl enable zfs.target

Replace failed disk

  • pull failed disk out
  • zpool status # identify failed disk zfs label (it should be labeled FAULTED or OFFLINE
  • safe to reboot here
  • install new disk
  • partition new disk, i.e. "gdisk /dev/sdh", use "o" to create new partition table, use "n" to create new partition, accept all default answers, use "w" to save and exit
  • safe to reboot here
  • run tests on new disk (smart, diskscrub), if unhappy go back to "install new disk"
  • safe to reboot here
  • identify serial number of new disk, i.e. "smartctl -a /dev/sdh | grep -i serial" yields "Serial Number: WD-WCAVY0893313"
  • identify linux id of new disk by "ls -l /dev/disk/by-id | grep -i WD-WCAVY0893313" yields "ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1"
  • zpool replace data11 zfs-label-of-failed-disk ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1
  • zpool status should look like this:
[root@daq11 ~]# zpool status
  pool: data11
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Apr 29 11:51:03 2016
    24.7G scanned out of 795G at 32.3M/s, 6h46m to go
    3.00G resilvered, 3.11% done
config:

        NAME                                                   STATE     READ WRITE CKSUM
        data11                                                 DEGRADED     0     0     0
          raidz2-0                                             DEGRADED     0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1     ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1     ONLINE       0     0     0
            replacing-2                                        DEGRADED     0     0     0
              17494865033746374811                             FAULTED      0     0     0  was /dev/sdi1
              ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1  ONLINE       0     0     0  (resilvering)
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1     ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1     ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1     ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1     ONLINE       0     0     0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1    ONLINE       0     0     0

errors: No known data errors
  • wait for raid rebuild ("resilvering") to complete
  • zpool status should look like this:
[root@daq11 ~]# zpool status
  pool: data11
 state: ONLINE
  scan: resilvered 96.2G in 1h44m with 0 errors on Fri Apr 29 13:35:40 2016
config:

        NAME                                                 STATE     READ WRITE CKSUM
        data11                                               ONLINE       0     0     0
          raidz2-0                                           ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA3872943-part1   ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973466-part1   ONLINE       0     0     0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0893313-part1  ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WCAZA1973369-part1   ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0858733-part1   ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0819555-part1   ONLINE       0     0     0
            ata-WDC_WD20EARS-00MVWB0_WD-WMAZA0857075-part1   ONLINE       0     0     0
            ata-WDC_WD2002FYPS-01U1B0_WD-WCAVY0347413-part1  ONLINE       0     0     0

errors: No known data errors

Expand zfs pool

replacing 250GB mirrored SSDs with 1TB mirrored SSDs:
zpool scrub ### ensure both mirror halves are consistent and have good data
# confirm have backups of pool contents (amanda and daqbackup)
# pull one 250GB SSD
# insert replacement 1TB SSD
# follow instructions for replacing failed disk:
parted /dev/sda ...
ls -l /dev/disk/by-id/...
zpool replace zssd sda1 ata-WDC_WDS100T2B0A_192872803056
# wait for resilvering to complete
zpool scrub zssd # confirm resilver was ok
# do the same with the second 1TB disk
parted /dev/sdb
ls -l /dev/disk/by-id/...
zpool replace zssd sdb1 ata-WDC_WDS100T2B0A_192872802193
zpool online -e zssd ata-WDC_WDS100T2B0A_192872803056
zpool list -v ### observe EXPANDSZ is now non-zero
# wait for resilver to finish
zpool online -e zssd ata-WDC_WDS100T2B0A_192872803056
zpool list -v ### observe EXPANDSZ is now zero, but SIZE and FREE have changed
[root@alpha00 ~]# zpool list -v zssd
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zssd   222G   202G  20.1G      706G    56%    90%  1.00x  DEGRADED  -
  mirror   222G   202G  20.1G      708G    56%    90%
    ata-WDC_WDS100T2B0A_192872803056      -      -      -         -      -      -
    replacing      -      -      -      708G      -      -
      sdb1      -      -      -      708G      -      -
      ata-WDC_WDS100T2B0A_192872802193      -      -      -         -      -      -
[root@alpha00 ~]# zpool list -v zssd
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
zssd   930G   202G   728G         -    13%    21%  1.00x  ONLINE  -
  mirror   930G   202G   728G         -    13%    21%
    ata-WDC_WDS100T2B0A_192872803056      -      -      -         -      -      -
    ata-WDC_WDS100T2B0A_192872802193      -      -      -         -      -      -

Rename zfs pool

zpool export oldname
zpool import oldname z6tb

Quotas and disk use

  • zfs userspace zssd/home1 -s used

Misc

ZFS tunable parameters for hopefully speeding up resilvering:

https://www.reddit.com/r/zfs/comments/4192js/resilvering_raidz_why_so_incredibly_slow/
echo 0 > /sys/module/zfs/parameters/zfs_resilver_delay
echo 512 > /sys/module/zfs/parameters/zfs_top_maxinflight
echo 5000 > /sys/module/zfs/parameters/zfs_resilver_min_time_ms

Enable periodic scrub:

cd ~/git/scripts
git pull
cd zfs
make install

Working with ZFS snapshots:

If ZFS becomes 100% full, "rm" will stop working, but space can still be freed by using "echo > bigfile", afterwards "rm" works again.