aboutsummaryrefslogtreecommitdiff
path: root/fs/ceph
AgeCommit message (Collapse)Author
2010-10-07ceph: update issue_seq on cap grantSage Weil
We need to update the issue_seq on any grant operation, be it via an MDS reply or a separate grant message. The update in the grant path was missing. This broke cap release for inodes in which the MDS sent an explicit grant message that was not soon after followed by a successful MDS reply on the same inode. Also fix the signedness on seq locals. Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: send cap release message early on failed revoke.Greg Farnum
If an MDS tries to revoke caps that we don't have, we want to send releases early since they probably contain the caps message the MDS is looking for. Previously, we only sent the messages if we didn't have the inode either. But in a multi-mds system we can retain the inode after dropping all caps for a single MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: Update max_len with minimum required sizeAneesh Kumar K.V
encode_fh on error should update max_len with minimum required size, so that caller can redo the call with the reallocated buffer. This is required with open by handle patch series Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: Fix return value of encode_fh functionAneesh Kumar K.V
encode_fh function should return 255 on error as done by other file system to indicate EOVERFLOW. Also max_len is in sizeof(u32) units and not in bytes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: avoid null deref in osd request error pathSage Weil
If we interrupt an osd request, we call __cancel_request, but it wasn't verifying that req->r_osd was non-NULL before dereferencing it. This could cause a crash if osds were flapping and we aborted a request on said osd. Reported-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-10-07ceph: fix list_add usage on unsafe_writes listHenry C Chang
Fix argument order. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17ceph: select CRYPTOSage Weil
We select CRYPTO_AES, but not CRYPTO. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17ceph: check mapping to determine if FILE_CACHE cap is usedSage Weil
See if the i_data mapping has any pages to determine if the FILE_CACHE capability is currently in use, instead of assuming it is any time the rdcache_gen value is set (i.e., issued -> used). This allows the MDS RECALL_STATE process work for inodes that have cached pages. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-17ceph: only send one flushsnap per cap_snap per mds sessionSage Weil
Sending multiple flushsnap messages is problematic because we ignore the response if the tid doesn't match, and the server may only respond to each one once. It's also a waste. So, skip cap_snaps that are already on the flushing list, unless the caller tells us to resend (because we are reconnecting). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-16ceph: fix cap_snap and realm splitSage Weil
The cap_snap creation/queueing relies on both the current i_head_snapc _and_ the i_snap_realm pointers being correct, so that the new cap_snap can properly reference the old context and the new i_head_snapc can be updated to reference the new snaprealm's context. To fix this, we: - move inodes completely to the new (split) realm so that i_snap_realm is correct, and - generate the new snapc's _before_ queueing the cap_snaps in ceph_update_snap_trace(). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-14ceph: stop sending FLUSHSNAPs when we hit a dirty capsnapSage Weil
Stop sending FLUSHSNAP messages when we hit a capsnap that has dirty_pages or is still writing. We'll send the newer capsnaps only after the older ones complete. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-14ceph: correctly set 'follows' in flushsnap messagesSage Weil
The 'follows' should match the seq for the snap context for the given snap cap, which is the context under which we have been dirtying and writing data and metadata. The snapshot that _contains_ those updates thus _follows_ that context's seq #. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-13ceph: fix dn offset during readdir_prepopulateSage Weil
When adding the readdir results to the cache, ceph_set_dentry_offset was clobbered our just-set offset. This can cause the readdir result offsets to get out of sync with the server. Add an argument to the helper so that it does not. This bug was introduced by 1cd3935bedccf592d44343890251452a6dd74fc4. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix file offset wrapping at 4GB on 32-bit archsSage Weil
Cast the value before shifting so that we don't run out of bits with a 32-bit unsigned long. This fixes wrapping of high file offsets into the low 4GB of a file on disk, and the subsequent data corruption for large files. Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix reconnect encoding for old serversSage Weil
Fix the reconnect encoding to encode the cap record when the MDS does not have the FLOCK capability (i.e., pre v0.22). Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix pagelist kunmap tailYehuda Sadeh
A wrong parameter was passed to the kunmap. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-09-11ceph: fix null pointer deref on anon root dentry releaseSage Weil
When we release a root dentry, particularly after a splice, the parent (actually our) inode was evaluating to NULL and was getting dereferenced by ceph_snap(). This is reproduced by something as simple as mount -t ceph monhost:/a/b mnt mount -t ceph monhost:/a mnt2 ls mnt2 A splice_dentry() would kill the old 'b' inode's root dentry, and we'd crash while releasing it. Fix by checking for both the ROOT and NULL cases explicitly. We only need to invalidate the parent dir when we have a correct parent to invalidate. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26ceph: fix get_ticket_handler() error handlingDan Carpenter
get_ticket_handler() returns a valid pointer or it returns ERR_PTR(-ENOMEM) if kzalloc() fails. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26ceph: don't BUG on ENOMEM during mds reconnectSage Weil
We are in a position to return an error; do that instead. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-26ceph: ceph_mdsc_build_path() returns an ERR_PTRDan Carpenter
ceph_mdsc_build_path() returns an ERR_PTR but this code is set up to handle NULL returns. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-25ceph: Fix warningsAlan Cox
Just scrubbing some warnings so I can see real problem ones in the build noise. For 32bit we need to coax gcc politely into believing we really honestly intend to the casts. Using (u64)(unsigned long) means we cast from a pointer to a type of the right size and then extend it. This stops the warning spew. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-25ceph: ceph_get_inode() returns an ERR_PTRDan Carpenter
ceph_get_inode() returns an ERR_PTR and it doesn't return a NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-24ceph: initialize fields on new dentry_infosSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-24ceph: maintain i_head_snapc when any caps are dirty, not just for dataSage Weil
We used to use i_head_snapc to keep track of which snapc the current epoch of dirty data was dirtied under. It is used by queue_cap_snap to set up the cap_snap. However, since we queue cap snaps for any dirty caps, not just for dirty file data, we need to keep a valid i_head_snapc anytime we have dirty|flushing caps. This fixes a NULL pointer deref in queue_cap_snap when writing back dirty caps without data (e.g., snaptest-authwb.sh). Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: fix osd request lru adjustment when sending requestHenry C Chang
Fix argument order. We want to move the item to the end of the list, not change the position of the head. Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: don't improperly set dir complete when holding EXCL capSage Weil
If we hold the EXCL cap, we cannot trust the dir stats from the MDS (num files, subdirs) and must not incorrectly conclude that the directory is empty. If we do, we get can bad results from lookup (bad ENOENT) and bad readdir results. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22mm: exporting account_page_dirtyMichael Rubin
This allows code outside of the mm core to safely manipulate page state and not worry about the other accounting. Not using these routines means that some code will lose track of the accounting and we get bugs. This has happened once already. Signed-off-by: Michael Rubin <mrubin@google.com> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: direct requests in snapped namespace based on nonsnap parentSage Weil
When making a request in the virtual snapdir or a snapped portion of the namespace, we should choose the MDS based on the first nonsnap parent (and its caps). If that is not the best place, we will get forward hints to find the right MDS in the cluster. This fixes ESTALE errors when using the .snap directory and namespace with multiple MDSs. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: queue cap snap writeback for realm children on snap updateSage Weil
When a realm is updated, we need to queue writeback on inodes in that realm _and_ its children. Otherwise, if the inode gets cowed on the server, we can get a hang later due to out-of-sync cap/snap state. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: include dirty xattrs state in snapped capsSage Weil
When we snapshot dirty metadata that needs to be written back to the MDS, include dirty xattr metadata. Make the capsnap reference the encoded xattr blob so that it will be written back in the FLUSHSNAP op. Also fix the capsnap creation guard to include dirty auth or file bits, not just tests specific to dirty file data or file writes in progress (this fixes auth metadata writeback). Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: fix xattr cap writebackSage Weil
We should include the xattr metadata blob in the cap update message any time we are flushing dirty state, NOT just when we are also dropping the cap. This fixes async xattr writeback. Also, clean up the code slightly to avoid duplicating the bit test. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-22ceph: fix multiple mds session shutdownSage Weil
The use of a completion when waiting for session shutdown during umount is inappropriate, given the complexity of the condition. For multiple MDS's, this resulted in the umount thread spinning, often preventing the session close message from being processed in some cases. Switch to a waitqueue and defined a condition helper. This cleans things up nicely. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-10ceph: generalize mon requests, add pool op supportYehuda Sadeh
Generalize the current statfs synchronous requests, and support pool_ops. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-05ceph: only queue async writeback on cap revocation if there is dirty dataSage Weil
Normally, if the Fb cap bit is being revoked, we queue an async writeback. If there is no dirty data but we still hold the cap, this leaves the client sitting around doing nothing until the cap timeouts expire and the cap is released on its own (as it would have been without the revocation). Instead, only queue writeback if the bit is actually used (i.e., we have dirty data). If not, we can reply to the revocation immediately. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03ceph: do not ignore osd_idle_ttl mount optionSage Weil
Actually apply the mount option to the mount_args struct. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03ceph: constify dentry_operationsSage Weil
This makes checkpatch happy. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-03ceph: whitespace cleanupSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: add flock/fcntl lock supportGreg Farnum
Implement flock inode operation to support advisory file locking. All lock/unlock operations are synchronous with the MDS. Lock state is sent when reconnecting to a recovering MDS to restore the shared lock state. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: define on-wire types, constants for file locking supportGreg Farnum
Define the MDS operations and data types for doing file advisory locking with the MDS. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: add CEPH_FEATURE_FLOCK to the supported feature bitsGreg Farnum
This informs the server that we will accept v2 client_caps format and v2 client_reconnect format messages. Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: support v2 reconnect encodingSage Weil
Encode either old or v2 encoding of client_reconnect message, depending on whether the peer has the FLOCK feature bit. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: support v2 client_caps encodingSage Weil
Add support for v2 encoding of MClientCaps, which includes a flock blob. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: move AES iv definition to shared headerSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-02ceph: fix decoding of pool snap infoSage Weil
The pool info contains a vector for snap_info_t, not snap ids. This fixes the broken decoding, which would declare teh update corrupt when a pool snapshot was created. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: make ->sync_fs not wait if wait==0Sage Weil
The ->sync_fs() super op only needs to wait if wait is true. Otherwise, just get some dirty cap writeback started. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: warn on missing snap realmSage Weil
Well, this Shouldn't Happen, so it would be helpful to know the caller when it does. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: print useful error message when crush rule not foundSage Weil
Include the crush_ruleset in the error message. Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: use %pU to print uuid (fsid)Sage Weil
Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: sync header defs with server codeSage Weil
Define ROLLBACK op, IFLOCK inode lock (for advisory file locking). Signed-off-by: Sage Weil <sage@newdream.net>
2010-08-01ceph: clean up header guardsSage Weil
Signed-off-by: Sage Weil <sage@newdream.net>