Btrfs:RAID 5 Rsync Freeze

My server's /home/ directory is a btrfs RAID 5, spanning three drives (I did a blog post about it here). Everything worked fine, until I used rsync to sync my files from my laptop to my server. At that point, the sync would go well for a little while and then slow to a crawl. I couldn't cancel the sync with a ctrl+c. If I could get on my server over ssh, I'd find that one of my cpus was pegged at 100%. Sometimes though it got so bogged down I couldn't even get to the server at all. If I were already on the server and I did a kill -9 on rsync, it'd go defunct.

I checked my logs after trying to umount /home/ and found... Nov 03 12:01:18 zion kernel: device label home devid 1 transid 1173 /dev/sdb Nov 03 12:01:19 zion kernel: btrfs: disk space caching is enabled Nov 03 12:11:53 zion kernel: INFO: task umount:1668 blocked for more than 120 seconds. Nov 03 12:11:53 zion kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Nov 03 12:11:53 zion kernel: umount         D ffff880037afbc60     0  1668   1653 0x00000000 Nov 03 12:11:53 zion kernel: ffff880037afbbd0 0000000000000086 0000000000014500 ffff880037afbfd8 Nov 03 12:11:53 zion kernel: ffff880037afbfd8 0000000000014500 ffff8800aa0caa30 0000000000000010 Nov 03 12:11:53 zion kernel: 000000000d6fffff ffff880037afbb98 ffffffff8113a911 ffff8800afedb728 Nov 03 12:11:53 zion kernel: Call Trace: Nov 03 12:11:53 zion kernel: [ ] ? free_pcppages_bulk+0x3b1/0x3f0 Nov 03 12:11:53 zion kernel: [ ] ? filemap_fdatawait+0x30/0x30 Nov 03 12:11:53 zion kernel: [ ] schedule+0x29/0x70 Nov 03 12:11:53 zion kernel: [ ] io_schedule+0x8f/0xe0 Nov 03 12:11:53 zion kernel: [ ] sleep_on_page+0xe/0x20 Nov 03 12:11:53 zion kernel: [ ] __wait_on_bit_lock+0x5b/0xc0 Nov 03 12:11:53 zion kernel: [ ] __lock_page+0x6a/0x70 Nov 03 12:11:53 zion kernel: [ ] ? wake_atomic_t_function+0x40/0x40 Nov 03 12:11:53 zion kernel: [ ] truncate_inode_pages_range+0x613/0x660 Nov 03 12:11:53 zion kernel: [ ] truncate_inode_pages+0x15/0x20 Nov 03 12:11:53 zion kernel: [ ] btrfs_evict_inode+0x42/0x380 [btrfs] Nov 03 12:11:53 zion kernel: [ ] evict+0xb0/0x1b0 Nov 03 12:11:53 zion kernel: [ ] dispose_list+0x39/0x50 Nov 03 12:11:53 zion kernel: [ ] evict_inodes+0x11c/0x130 Nov 03 12:11:53 zion kernel: [ ] generic_shutdown_super+0x48/0xe0 Nov 03 12:11:53 zion kernel: [ ] kill_anon_super+0x12/0x20 Nov 03 12:11:53 zion kernel: [ ] btrfs_kill_super+0x16/0x90 [btrfs] Nov 03 12:11:53 zion kernel: [ ] deactivate_locked_super+0x3d/0x60 Nov 03 12:11:53 zion kernel: [ ] deactivate_super+0x46/0x60 Nov 03 12:11:53 zion kernel: [ ] mntput_no_expire+0xef/0x150 Nov 03 12:11:53 zion kernel: [ ] SyS_umount+0x91/0x3b0 Nov 03 12:11:53 zion kernel: [ ] system_call_fastpath+0x1a/0x1f

The only way to solve the problem was to perform a restart. After that, the problem would come back as soon as I started rsync again.

The Solution
I hunted around for a while until I finally just searched for the name of the pegged process, btrfs-endio-wri, and cpu time. It turns out, the btrfs folks have a page detailing a list of current "gotchas" btrfs has. This issue was one of them. They describe it as Files with a lot of random writes can become heavily fragmented (10000+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM. ... Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files.

One of the best parts of rsync is that is syncs deltas instead of resyncing the entire file. What does that result in? Lots of little random writes. Sounds like a match to me.

To fix this, I defragged all of /home/ (with compression=lzo of course :) ), and remounted using the autodefrag option.

Now I can run rsync with no problems.

One last thing to note. Their gotchas page says that once they've worked out a few potential kinks with the autodefrag mount option, they'll make it the default, which should prevent this from being an issue in future versions.