Understanding Disk Usage On Syneto Storage / OpenIndiana / OpenSolaris : Syneto

1. The theory (zpool/zfs/du/ls – different tools for different results)

(main source of information -> here)

zpool: shows the total bytes of storage available in the pool (physical disk capacity).

How many bytes are in use on the storage device? How many unallocated bytes are there?
Use case: you want to upgrade your storage to get more room.

zfs: shows the total bytes of storage available to the filesystem, disk space minus ZFS redundancy metadata overhead (usable space available).

If I have to ship this filesystem to another box (uncompressed and without deduplication) how many bytes is that?
Use case: you need to know whether accounting or engineering is using more space.

du: shows the total bytes of storage space used by a directory, after compression and deduplication is taken into account.

How many bytes are used to store the contents of the files in this directory?
Use case: you look at a spare or compressed file and want to know how many bytes are allocated for it.

ls -l: shows the total bytes of storage currently used to store a file, after compression, dedupe, thin-provisioning, sparseness, etc.

How many bytes are addressable in this file?
Use case: you plan to email someone a file and want to know if it will fit in the 10MB quota

Use the zpool / zfs commands to identify available pool space and available file system space. “df” doesn’t understand descendent filesystems, whether snapshots exist, nor deduplication-aware.

2. Let’s see some practical situations

a) Mirror, 2 disks x 500m

# mkfile 500m /dev/dsk/diskA
# mkfile 500m /dev/dsk/diskB
# zpool create datapool_mirror mirror diskA diskB
# zpool list datapool_mirror
NAME              SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
datapool_mirror   492M    91K   492M     0%  1.00x  ONLINE  -
# zfs list datapool_mirror
NAME              USED  AVAIL  REFER  MOUNTPOINT
datapool_mirror    91K   460M    31K  /datapool_mirror

Mirroring 2+ devices means the data is replicated in an identical fashion across all components of a mirror. A mirror with N disks of size X can hold X bytes and can withstand (N-1) devices failing before data integrity is compromised. (zpool(1M))

Since the disks are mirrored we have 492M disk space.
Also, 460M is the amount you can store on the disks.

But why “zpool list” shows 492M free and zfs list shows 460M available?

Short answer: internal accounting, differences in raidz configuration.

Long answer: The physical space can be different from the total amount of space that any contained datasets can actually use. The amount of space used in a raidz configuration depends on the characteristics of the data being written. In addition, ZFS reserves some space for internal accounting that the zfs command takes into account, but the zpool command does not. For non-full pools of a reasonable size, these effects should be invisible. For small pools, or pools that are close to being completely full, these discrepancies may become more noticeable.

b) Raid-Z, 4 disks x 500m

# mkfile 500m /dev/dsk/disk1
# mkfile 500m /dev/dsk/disk2
# mkfile 500m /dev/dsk/disk3
# mkfile 500m /dev/dsk/disk4
# zpool create datapool raidz  disk1 disk2 disk3 disk4

Check our Diskpools article for further explanations on what Raid-Z means.

This setup provides single parity configuration (one disk can fail), setup also called a 3+1 (3 data disks + 1 parity).

Now let’s take a look at the space usage.

# zpool list datapool
NAME       SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
datapool  1.92G   163K  1.92G     0%  1.00x  ONLINE  -

The 1.92G SIZE is the disk space you have.

# zfs list datapool
NAME       USED  AVAIL  REFER  MOUNTPOINT
datapool   122K  1.41G  43.4K  /datapool

The 1.41G AVAIL is how much you can store.

Basically zpool list shows you how much disk space you have and zfs list (or df) shows you how much you can store.

A raidz group with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised. (zpool(1M))

In this case: N=4, X=500M, P=1 => (4-1)*500M=1.5G, factoring in the filesystem overhead we arrive at 1.41G

c) Raid-Z2, 8 disks x 502.11G

# zpool list tank
NAME   SIZE   ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank       3.62T  1.85T      1.78T     50%  1.00x       ONLINE    -
# zpool list tank
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
tank  3.62T  1.85T  1.78T    50%  1.00x  ONLINE  -

N=8 X=502.11G P=2

(N-P)*X = 3T (+620G ~17%)

3. The REFER field

(main source of information -> here)

REFER identifies the amount of data accessible by the dataset (that could be shared by other datasets in the pool). A snapshot (and a clone initially), will refer to the same data as its source.

USED identifies the amount of space consumed by the dataset and its descendants. It takes into account the reservation of any descendant datasets.

# zfs list test1
NAME                 USED  AVAIL  REFER  MOUNTPOINT
test1                    82.3G   329G   61.4G      /test1
# zfs list -t snapshot -r test1
NAME                                                         USED  AVAIL  REFER  MOUNTPOINT
test1@snapshot1                                     20.9G      -        74.5G      -
test1@snapshot2                                     25.7M      -        61.3G      -

Here we can see that the dataset /test1 has 61.4G referred. It also has 2 snapshots: 20.9G and 25.7M. Total: 82.3G

Deleting snapshot1 will delete all.

A good method on finding the snapshots’ used space is using this script:

$ ./snapspace.sh mainpool/storage
SNAPSHOT OLDREFS UNIQUE UNIQUE%
zfs-auto-snap_monthly-2011-11-14-18h59 34.67G 11.0G 31%
zfs-auto-snap_monthly-2011-12-14-18h59 33.41G 8.95G 26%
zfs-auto-snap_monthly-2012-02-09-16h05 123.75G 70.3G 56%

OLDREFS is how much of the referenced space of the snapshot is not referenced by the current filesystem. UNIQUE is the amount that is referenced by that snapshot and nothing else (the used property), and finally I thought it would be nice to have the unique amount as a percentage of OLDREFS.
The script:

#!/bin/bash
if (($# < 1))
then
 echo "usage: $0 <filesystem>"
 exit 1
fi
if [[ $1 == *@* || $1 == /* ]]
then
 echo "Snapshots and paths are not supported"
 echo "usage: $0 <filesystem>"
 exit 1
fi
echo "SNAPSHOT OLDREFS \tUNIQUE\tUNIQUE%"
fullref=`zfs get -Hp referenced "$1" | awk '{print $3}'`
for snap in `zfs list -Hd 1 -t snapshot -o name "$1" | cut -f2- -d@`
do
 snapref=`zfs get -Hp referenced "$1"@"$snap" | awk '{print $3}'`
 snapwritten=`zfs get -Hp written@"$snap" "$1" | awk '{print $3}'`
 olddata=$((snapref + snapwritten - fullref))
 snapused=`zfs get -H used "$1"@"$snap" | awk '{print $3}'`
 snapusedraw=`zfs get -Hp used "$1"@"$snap" | awk '{print $3}'`
 suffix=""
 divisor="1"
 if ((olddata > 1024))
 then
 suffix="K"
 divisor="1024"
 testnum=`echo "$olddata/$divisor" | bc`
 fi
 if ((testnum > 1024))
 then
 suffix="M"
 divisor="(1024*1024)"
 testnum=`echo "$olddata/$divisor" | bc`
 fi
 if ((testnum > 1024))
 then
 suffix="G"
 divisor="(1024*1024*1024)"
 testnum=`echo "$olddata/$divisor" | bc`
 fi
 if ((testnum > 1024))
 then
 suffix="T"
 divisor="(1024*1024*1024*1024)"
 testnum=`echo "$olddata/$divisor" | bc`
 fi
 if ((testnum > 1024))
 then
 suffix="P"
 divisor="(1024*1024*1024*1024*1024)"
 fi
 displaydata=`echo "scale = 2; $olddata/$divisor" | bc -l`
 if ((olddata > 0))
 then
 displaypercent=`echo "100*$snapusedraw/$olddata" | bc`
 else
 displaypercent=0
 fi
 chars=`echo "$snap" | wc -m | awk '{print $1}'`
 spacing=""
 while ((++chars < 44))
 do
 spacing="$spacing "
 done
 echo "$snap $spacing $displaydata$suffix \t$snapused\t$displaypercent%"
done

4. Freeing space

Due to ZFS snapshots sometimes removing a file from a full file system will not free the desired space. In order to do this you need to remove all the snapshots that reference the file.

From http://docs.oracle.com/cd/E23823_01/html/819-5461/gbciq.html:
When a snapshot is created, its disk space is initially shared between the snapshot and the file system, and possibly with previous snapshots. As the file system changes, disk space that was previously shared becomes unique to the snapshot, and thus is counted in the snapshot’s used property. Additionally, deleting snapshots can increase the amount of disk space unique to (and thus used by) other snapshots.

Note: As a result of this, deleting a file can actually consume more disk space because the file is kept on the snapshot and the new version of the directory needs to be created.

In the making

zfs list -o space

setting reservation and refreservation
http://docs.oracle.com/cd/E19253-01/819-5461/gbdbb/index.html
snapshots: how much they occupy http://lildude.co.uk/zfs-cheatsheet
quota: how a disk with quota has AVAIL < total AVAIL because he can’t grow to fill the pool while others can
http://lildude.co.uk/zfs-cheatsheet
http://docs.oracle.com/cd/E19082-01/817-2271/gazud/index.html