aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/volumes.c
AgeCommit message (Collapse)Author
2012-02-24Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Quoth Chris: "This is later than I wanted because I got backed up running through btrfs bugs from the Oracle QA teams. But they are all bug fixes that we've queued and tested since rc1. Nothing in particular stands out, this just reflects bug fixing and QA done in parallel by all the btrfs developers. The most user visible of these is: Btrfs: clear the extent uptodate bits during parent transid failures Because that helps deal with out of date drives (say an iscsi disk that has gone away and come back). The old code wasn't always properly retrying the other mirror for this type of failure." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (24 commits) Btrfs: fix compiler warnings on 32 bit systems Btrfs: increase the global block reserve estimates Btrfs: clear the extent uptodate bits during parent transid failures Btrfs: add extra sanity checks on the path names in btrfs_mksubvol Btrfs: make sure we update latest_bdev Btrfs: improve error handling for btrfs_insert_dir_item callers Btrfs: be less strict on finding next node in clear_extent_bit Btrfs: fix a bug on overcommit stuff Btrfs: kick out redundant stuff in convert_extent_bit Btrfs: skip states when they does not contain bits to clear Btrfs: check return value of lookup_extent_mapping() correctly Btrfs: fix deadlock on page lock when doing auto-defragment Btrfs: fix return value check of extent_io_ops btrfs: honor umask when creating subvol root btrfs: silence warning in raid array setup btrfs: fix structs where bitfields and spinlock/atomic share 8B word btrfs: delalloc for page dirtied out-of-band in fixup worker Btrfs: fix memory leak in load_free_space_cache() btrfs: don't check DUP chunks twice Btrfs: fix trim 0 bytes after a device delete ...
2012-02-23Btrfs: make sure we update latest_bdevChris Mason
When we are setting up the mount, we close all the devices that were not actually part of the metadata we found. But, we don't make sure that one of those devices wasn't fs_devices->latest_bdev, which means we can do a use after free on the one we closed. This updates latest_bdev as it goes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-02-16Btrfs: check return value of lookup_extent_mapping() correctlyTsutomu Itoh
This patch corrects error checking of lookup_extent_mapping(). Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
2012-02-15btrfs: silence warning in raid array setupDavid Sterba
Raid array setup code creates an extent buffer in an usual way. When the PAGE_CACHE_SIZE is > super block size, the extent pages are not marked up-to-date, which triggers a WARN_ON in the following write_extent_buffer call. Add an explicit up-to-date call to silence the warning. Signed-off-by: David Sterba <dsterba@suse.cz>
2012-01-17Merge branch 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
* 'btrfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: btrfs: take allocation of ->tree_root into open_ctree() btrfs: let ->s_fs_info point to fs_info, not root... btrfs: consolidate failure exits in btrfs_mount() a bit btrfs: make free_fs_info() call ->kill_sb() unconditional btrfs: merge free_fs_info() calls on fill_super failures btrfs: kill pointless reassignment of ->s_fs_info in btrfs_fill_super() btrfs: make open_ctree() return int btrfs: sanitizing ->fs_info, part 5 btrfs: sanitizing ->fs_info, part 4 btrfs: sanitizing ->fs_info, part 3 btrfs: sanitizing ->fs_info, part 2 btrfs: sanitizing ->fs_info, part 1 btrfs: fix a deadlock in btrfs_scan_one_device() btrfs: fix mount/umount race btrfs: get ->kill_sb() of its own btrfs: preparation to fixing mount/umount race
2012-01-16Btrfs: use larger system chunksChris Mason
system chunks by default are very small. This makes them slightly larger and also fixes the conditional checks to make sure we don't allocate a billion of them at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Merge branch 'integrity-check-patch-v2' of ↵Chris Mason
git://btrfs.giantdisaster.de/git/btrfs into integration Conflicts: fs/btrfs/ctree.h fs/btrfs/super.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Merge branch 'for-chris' of git://repo.or.cz/linux-btrfs-devel into integrationChris Mason
Conflicts: fs/btrfs/volumes.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2012-01-16Merge branch 'restriper' of git://github.com/idryomov/btrfs-unstable into ↵Chris Mason
integration
2012-01-16Btrfs: add balance progress reportingIlya Dryomov
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: allow for canceling restriperIlya Dryomov
Implement an ioctl for canceling restriper. Currently we wait until relocation of the current block group is finished, in future this can be done by triggering a commit. Balance item is deleted and no memory about the interrupted balance is kept. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: allow for pausing restriperIlya Dryomov
Implement an ioctl for pausing restriper. This pauses the relocation, but balance is still considered to be "in progress": balance item is not deleted, other volume operations cannot be started, etc. If paused in the middle of profile changing operation we will continue making allocations with the target profile. Add a hook to close_ctree() to pause restriper and free its data structures on unmount. (It's safe to unmount when restriper is in "paused" state, we will resume with the same parameters on the next mount) Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: add skip_balance mount optionIlya Dryomov
Since restriper kthread starts involuntarily on mount and can suck cpu and memory bandwidth add a mount option to forcefully skip it. The restriper in that case hangs around in paused state and can be resumed from userspace when it's convenient. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: recover balance on mountIlya Dryomov
On mount, if balance item is found, resume balance in a separate kernel thread. Try to be smart to continue roughly where previous balance (or convert) was interrupted. For chunk types that were being converted to some profile we turn on soft convert, in case of a simple balance we turn on usage filter and relocate only less-than-90%-full chunks of that type. These are just heuristics but they help quite a bit, and can be improved in future. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: save balance parameters to diskIlya Dryomov
Introduce a new btree objectid for storing balance item. The reason is to be able to resume restriper after a crash with the same parameters. Balance item has a very high objectid and goes into tree of tree roots. The key for the new item is as follows: [ BTRFS_BALANCE_OBJECTID ; BTRFS_BALANCE_ITEM_KEY ; 0 ] Older kernels simply ignore it so it's safe to mount with an older kernel and then go back to the newer one. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: soft profile changing mode (aka soft convert)Ilya Dryomov
When doing convert from one profile to another if soft mode is on restriper won't touch chunks that already have the profile we are converting to. This is useful if e.g. half of the FS was converted earlier. The soft mode switch is (like every other filter) per-type. This means that we can convert for example meta chunks the "hard" way while converting data chunks selectively with soft switch. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: implement online profile changingIlya Dryomov
Profile changing is done by launching a balance with BTRFS_BALANCE_CONVERT bits set and target fields of respective btrfs_balance_args structs initialized. Profile reducing code in this case will pick restriper's target profile if it's available instead of doing a blind reduce. If target profile is not yet available it goes back to a plain reduce. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: virtual address space subset filterIlya Dryomov
Select chunks which have at least one byte located inside a given [vstart, vend) virtual address space range. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: devid subset filterIlya Dryomov
Select chunks which have at least one byte of at least one stripe located on a device with devid X in a given [pstart,pend) physical address range. This filter only works when devid filter is turned on. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: devid filterIlya Dryomov
Relocate chunks which have at least one stripe located on a device with devid X. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: usage filterIlya Dryomov
Select chunks that are less than X percent full. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: profiles filterIlya Dryomov
Select chunks based on a given profile mask. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: add basic infrastructure for selective balancingIlya Dryomov
This allows to have a separate set of filters for each chunk type (data,meta,sys). The code however is generic and switch on chunk type is only done once. This commit also adds a type filter: it allows to balance for example meta and system chunks w/o touching data ones. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: add basic restriper infrastructureIlya Dryomov
Add basic restriper infrastructure: extended balancing ioctl and all related ioctl data structures, add data structure for tracking restriper's state to fs_info, etc. The semantics of the old balancing ioctl are fully preserved. Explicitly disallow any volume operations when balance is in progress. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: introduce masks for chunk type and profileIlya Dryomov
Chunk's type and profile are encoded in u64 flags field. Introduce masks to easily access them. Also fix the type of BTRFS_BLOCK_GROUP_* constants, it should be ULL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-16Btrfs: get rid of *_alloc_profile fieldsIlya Dryomov
{data,metadata,system}_alloc_profile fields have been unused for a long time now. Get rid of them. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-01-11Btrfs: fix possible deadlock when opening a seed deviceLi Zefan
The correct lock order is uuid_mutex -> volume_mutex -> chunk_mutex, but when we mount a filesystem which has backing seed devices, we have this lock chain: open_ctree() lock(chunk_mutex); read_chunk_tree(); read_one_dev(); open_seed_devices(); lock(uuid_mutex); and then we hit a lockdep splat. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-01-11Btrfs: simplfy calculation of stripe length for discard operationLi Zefan
For btrfs raid, while discarding a range of space, we'll need to know the start offset and length to discard for each device, and it's done in btrfs_map_block(). However the calculation is a bit complex for raid0 and raid10, so I reimplement it based on a fact that: dev1 dev2 dev3 (raid0) ----------------------------------- s0 s3 s6 s1 s4 s7 s2 s5 Each device has (total_stripes / nr_dev) stripes, or plus one. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-01-11Btrfs: don't pre-allocate btrfs bioLi Zefan
We pre-allocate a btrfs bio with fixed size, and then may re-allocate memory if we find stripes are bigger than the fixed size. But this pre-allocation is not necessary. Also we don't have to calcuate the stripe number twice. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-01-11Btrfs: don't pass a trans handle unnecessarily in volumes.cLi Zefan
Some functions never use the transaction handle passed to them. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
2012-01-08btrfs: fix a deadlock in btrfs_scan_one_device()Al Viro
pathname resolution under a global mutex, taken on some paths in ->mount() is a Bad Idea(tm) - think what happens if said pathname resolution triggers automount of some btrfs instance and walks into attempt to grab the same mutex. Deadlock - we are waiting for daemon to finish walking the path, daemon is waiting for us to release the mutex... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-06Btrfs: use bigger metadata chunks on bigger filesystemsChris Mason
The 256MB chunk is a little small on a huge FS. This scales up the chunk size. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-21Btrfs: integrate integrity check module into btrfsStefan Behrens
This is the last part of the patch series. It modifies the btrfs code to use the integrity check module if configured to do so with the define BTRFS_FS_CHECK_INTEGRITY. If this define is not set, the only effective change is that code is added that handles the mount option to activate the integrity check. If the mount option is set and the define BTRFS_FS_CHECK_INTEGRITY is not set, that code complains in the log and the mount fails with EINVAL. Add the mount option to activate the usage of the integrity check code. Add invocation of btrfs integrity check code init and cleanup function on mount and umount, respectively. Add hook to call btrfs integrity check code version of submit_bh/submit_bio. Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2011-12-15Btrfs: unplug every once and a whileChris Mason
The btrfs io submission threads can build up massive plug lists. This keeps things more reasonable so we don't hand over huge dumps of IO at once. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-09Btrfs: fix btrfs_end_bio to deal with write errors to a single mirrorChris Mason
btrfs_end_bio checks the number of errors on a bio against the max number of errors allowed before sending any EIOs up to the higher levels. If we got enough copies of the bio done for a given raid level, it is supposed to clear the bio error flag and return success. We have pointers to the original bio sent down by the higher layers and pointers to any cloned bios we made for raid purposes. If the original bio happens to be the one that got an io error, but not the last one to finish, it might not have the BIO_UPTODATE bit set. Then, when the last bio does finish, we'll call bio_end_io on the original bio. It won't have the uptodate bit set and we'll end up sending EIO to the higher layers. We already had a check for this, it just was conditional on getting the IO error on the very last bio. Make the check unconditional so we eat the EIOs properly. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-12-08Btrfs: check if the to-be-added device is writableLi Zefan
If we call ioctl(BTRFS_IOC_ADD_DEV) directly, we'll succeed in adding a readonly device to a btrfs filesystem, and btrfs will write to that device, emitting kernel errors: [ 3109.833692] lost page write due to I/O error on loop2 [ 3109.833720] lost page write due to I/O error on loop2 ... Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-10Btrfs: fix nocow when deleting the itemMiao Xie
btrfs_previous_item() just search the b+ tree, do not COW the nodes or leaves, if we modify the result of it, the meta-data will be broken. fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Merge git://git.jan-o-sch.net/btrfs-unstable into integrationChris Mason
Conflicts: fs/btrfs/Makefile fs/btrfs/extent_io.c fs/btrfs/extent_io.h fs/btrfs/scrub.c Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06Merge branch 'for-chris' of git://github.com/sensille/linux into integrationChris Mason
Conflicts: fs/btrfs/ctree.h Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-11-06btrfs: separate superblock items out of fs_infoDavid Sterba
fs_info has now ~9kb, more than fits into one page. This will cause mount failure when memory is too fragmented. Top space consumers are super block structures super_copy and super_for_commit, ~2.8kb each. Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64) Add a wrapper for freeing fs_info and all of it's dynamically allocated members. Signed-off-by: David Sterba <dsterba@suse.cz>
2011-10-20Btrfs: close all bdevs on mount failureIlya Dryomov
Fix a bug introduced by 20b45077. We have to return EINVAL on mount failure, but doing that too early in the sequence leaves all of the devices opened exclusively. This also fixes an issue where under some scenarios only a second mount -o degraded <devices> command would succeed. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2011-10-19Btrfs: allow us to overcommit our enospc reservationsJosef Bacik
One of the things that kills us is the fact that our ENOSPC reservations are horribly over the top in most normal cases. There isn't too much that can be done about this because when we are completely full we really need them to work like this so we don't under reserve. However if there is plenty of unallocated chunks on the disk we can use that to gauge how much we can overcommit. So this patch adds chunk free space accounting so we always know how much unallocated space we have. Then if we fail to make a reservation within our allocated space, check to see if we can overcommit. In the normal flushing case (like with delalloc metadata reservations) we'll take the free space and divide it by 2 if our metadata profile is setup for DUP or any of those, and then divide it by 8 to make sure we don't overcommit too much. Then if we're in a non-flushing case (we really need this reservation now!) we only limit ourselves to half of the free space. This makes this fio test [torrent] filename=torrent-test rw=randwrite size=4g ioengine=sync directory=/mnt/btrfs-test go from taking around 45 minutes to 10 seconds on my freshly formatted 3 TiB file system. This doesn't seem to break my other enospc tests, but could really use some more testing as this is a super scary change. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com>
2011-10-02btrfs: state information for readaheadArne Jansen
Add state information for readahead to btrfs_fs_info and btrfs_device Changes v2: - don't wait in radix_trees - add own set of workers for readahead Reviewed-by: Josef Bacik <josef@redhat.com> Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-09-29btrfs: Put mirror_num in bi_bdevJan Schmidt
The error correction code wants to make sure that only the bad mirror is rewritten. Thus, we need to know which mirror is the bad one. I did not find a more apropriate field than bi_bdev. But I think using this is fine, because it is modified by the block layer, anyway, and should not be read after the bio returned. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29btrfs: btrfs_multi_bio replaced with btrfs_bioJan Schmidt
btrfs_bio is a bio abstraction able to split and not complete after the last bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio tracks the mirror_num used to read data which can be used for error correction purposes. Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-08-16Btrfs: fix uninitialized sync_pendingMiao Xie
sync_pending is uninitialized before it be used, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16Btrfs: fix a bug of balance on full multi-disk partitionsliubo
When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-16Btrfs: detect wether a device supports discardJosef Bacik
We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-05Btrfs: force unplugs when switching from high to regular priority biosChris Mason
Btrfs does bio submissions from a worker thread, and each device has a list of high priority bios and regular priority bios. Synchronous writes go to the high priority thread while async writes go to regular list. This commit brings back an explicit unplug any time we switch from high to regular priority, which makes it easier for the block layer to give us low latencies. Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-08-01Merge branch 'alloc_path' of ↵Chris Mason
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/btrfs-error-handling into for-linus