[colug-432] Linux Storage Performance Mystery

Joseph Beard joseph at josephbeard.net
Sun Sep 25 09:18:21 EDT 2016


Greetings!
Apologies in advance for the long mail. I've recently encountered some strange behavior that I hope our experts in residence might be able to help with.

First some background: I have a home server with multiple internal HDDs. I use rsnapshot to take hourly, daily, weekly, and monthly snapshots of my data.  Rsnapshot uses hard links to minimize disk space between snapshots of mostly identical data. I do a daily backup from the most recent daily snapshot to a set of 2.5" HDDs in a hot swap bay that I periodically rotate offsite. I do this with a custom script that writes a tar archive of the snapshot files directly to the raw disk (I think this is a technique called disk-as-tape?).  This setup had been working well for four years, but I recently decided to upgrade my storage and am experiencing two instances of significantly degraded performance that I believe are related.

Old Arrangement 
---------------
- Seagate Barracuda 7200.11 1 TB [ST31000333AS]
  - /boot: XFS
  - /boot/efi: VFAT
  - /: XFS (LVM)
  - /var: XFS (LVM)
  - /home: XFS (LVM)
  - swap (LVM)

- Western Digital Red 2 TB [WD20EFRX]
  - /srv: XFS (LVM)

- Western Digital Red 3 TB [WD30EFRX]
  - /mnt/backup: XFS (LVM)

- Western Digital Blue 2.5" 1 TB [WD10JPVX]
  - offsite tar archives

All of these drives were operating on the motherboard's Intel SATA controller.  I purchased two more of the 3 TB WD Red drives (same model number), a Highpoint RR642L controller, and an eSATA drive enclosure. I moved the existing 3 TB WD Red drive, along with the two new ones, into the eSATA enclosure and put it onto the Highpoint controller. I rearranged the filesystems (converting some of them to ext4)  and configured a RAID 5 array of the three 3 TB drives in the enclosure using Linux mdadm (rather than the card's soft RAID), so now I have the following:

New Arrangement
---------------
- Seagate Barracuda 7200.11 1 TB [ST31000333AS] -- UNCHANGED
  - /boot: XFS
  - /boot/efi: VFAT
  - /: XFS (LVM)
  - /var: XFS (LVM)
  - /home: XFS (LVM)
  - swap (LVM)

- Western Digital Red 2 TB [WD20EFRX]
  - /mnt/backup: ext4 (LVM)
  - Still on Intel SATA controller

- Western Digital Red 3 TB [WD30EFRX] x3, RAID 5
  - /srv: ext4 (LVM)
  - Highpoint eSATA controller

- Western Digital Blue 2.5" 1 TB [WD10JPVX] -- UNCHANGED
  - offsite tar archives

crontab
-------
    15    23       1       *       *        /usr/bin/rsnapshot -c /etc/rsnapshot.conf monthly
    30    23       *       *       Sat      /usr/bin/rsnapshot -c /etc/rsnapshot.conf weekly
    45    23       *       *       *        /usr/bin/rsnapshot -c /etc/rsnapshot.conf daily
    0     4-23     *       *       *        /usr/bin/rsnapshot -c /etc/rsnapshot.conf hourly
    15    0        *       *       *        /srv/backup/fsbackup.sh > /dev/null

The first problem is that rsnapshot is now taking so long to rotate the snapshots that the weekly snapshot is still running when it's time for the daily rotation to start, so rsnapshot skips the daily snapshot every week. I do not know how long the weekly rotation took before, but it was always under 15 minutes because this did not happen a single time in the past four years. I timed an hourly snapshot creation under the new arrangement and it completed in about 40 minutes, whereas (anecdotally) it required less than 10 before.

The second problem is with the performance of the daily script (`fsbackup.sh` in the crontab above). Each daily snapshot is about 600 GB. My script to write the tar archive would previously copy this 600 GB to the 2.5" HDD consistently in 3.5 hours. It takes 6.25 hours to copy under the new arrangement copying the same amount of data to the same destination disk on the same Intel controller, but copying from a different disk (of the same class) using the ext4 filesystem instead of XFS. This now causes the filesystem backup to run into the first rsnapshot and I get warnings about file contents changing underneath tar (i.e.: the sort of thing that makes me concerned about the integrity of the backup). Given that the disk specs are so similar, could the switch from XFS to ext4 really make such a dramatic difference?

I suspect both issues are symptomatic of the same underlying cause. Is XFS that much more performant than ext4? Are there gotchas that I likely fell into when creating the RAID? Is it possible that adding the Highpoint controller has caused my Intel controller to get jealous and halve its performance?

I've tried a few different attempts to measure the drive performance. Both the 2 TB drive and the RAID array seem to be performing reasonably when reading and writing. I’m happy to post the verbose output of those tests. I’m also happy to post a gist of my rsnapshot config and `fsbackup.sh` script.

Thanks for any advice!


Joe

[WD20EFRX]: http://www.wdc.com/global/products/specs/?driveID=1086&language=1
[WD30EFRX]: http://www.wdc.com/global/products/specs/?driveID=1087&language=1


--
Joseph Beard
joseph at josephbeard.net


More information about the colug-432 mailing list