aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-06-01powerpc/mm: Always invalidate tlb on hpte invalidate and updateAneesh Kumar K.V
If a hash bucket gets full, we "evict" a more/less random entry from it. When we do that we don't invalidate the TLB (hpte_remove) because we assume the old translation is still technically "valid". This implies that when we are invalidating or updating pte, even if HPTE entry is not valid we should do a tlb invalidate. This was a regression introduced by b1022fbd293564de91596b8775340cf41ad5214c Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/pseries: Improve stream generation comments in copypage/userMichael Neuling
No code changes, just documenting what's happening a little better. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/pseries: Kill all prefetch streams on context switchMichael Neuling
On context switch, we should have no prefetch streams leak from one userspace process to another. This frees up prefetch resources for the next process. Based on patch from Milton Miller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/cputable: Fix oprofile_cpu_type on power8Nishanth Aravamudan
Maynard informed me that neither the oprofile kernel module nor oprofile userspace has been updated to support that "legacy" oprofile module interface for power8, which is indicated by "ppc64/power8." This results in no samples. The solution is to default to the "timer" type, instead. The raw entry also should be updated, as "ppc64/ibm-compat-v1" indicates to oprofile userspace to use "compatibility events" which are obsolete in ISA 2.07. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/mpic: Fix irq distribution problem when MPIC_SINGLE_DEST_CPUchenhui zhao
For the mpic with a flag MPIC_SINGLE_DEST_CPU, only one bit should be set in interrupt destination registers. The code is applicable to 64-bit platforms as well as 32-bit. Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Fix userspace stack corruption on signal delivery for active ↵Michael Neuling
transactions When in an active transaction that takes a signal, we need to be careful with the stack. It's possible that the stack has moved back up after the tbegin. The obvious case here is when the tbegin is called inside a function that returns before a tend. In this case, the stack is part of the checkpointed transactional memory state. If we write over this non transactionally or in suspend, we are in trouble because if we get a tm abort, the program counter and stack pointer will be back at the tbegin but our in memory stack won't be valid anymore. To avoid this, when taking a signal in an active transaction, we need to use the stack pointer from the checkpointed state, rather than the speculated state. This ensures that the signal context (written tm suspended) will be written below the stack required for the rollback. The transaction is aborted becuase of the treclaim, so any memory written between the tbegin and the signal will be rolled back anyway. For signals taken in non-TM or suspended mode, we use the normal/non-checkpointed stack pointer. Tested with 64 and 32 bit signals Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Move TM abort cause codes to uapiMichael Neuling
These cause codes are usable by userspace, so let's export to uapi. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Abort on emulation and alignment faultsMichael Neuling
If we are emulating an instruction inside an active user transaction that touches memory, the kernel can't emulate it as it operates in transactional suspend context. We need to abort these transactions and send them back to userspace for the hardware to rollback. We can service these if the user transaction is in suspend mode, since the kernel will operate in the same suspend context. This adds a check to all alignment faults and to specific instruction emulations (only string instructions for now). If the user process is in an active (non-suspended) transaction, we abort the transaction go back to userspace allowing the HW to roll back the transaction and tell the user of the failure. This also adds new tm abort cause codes to report the reason of the persistent error to the user. Crappy test case here http://neuling.org/devel/junkcode/aligntm.c Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # v3.9 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Update cause codes documentationMichael Neuling
Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # 3.9 only Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01powerpc/tm: Make room for hypervisor in abort cause codesMichael Neuling
PAPR carves out 0xff-0xe0 for hypervisor use of transactional memory software abort cause codes. Unfortunately we don't respect this currently. Below fixes this to move our cause codes to below this region. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> # 3.9 only Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2013-06-01Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull reiserfs fixes from Jan Kara: "Three reiserfs fixes. They fix real problems spotted by users so I hope they are ok even at this stage." * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: fix deadlock with nfs racing on create/lookup reiserfs: fix problems with chowning setuid file w/ xattrs reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentry
2013-06-01Merge tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs extended attribute fixes for CRCs from Ben Myers: "Here are several fixes that are relevant on CRC enabled XFS filesystems. They are followed by a rework of the remote attribute code so that each block of the attribute contains a header with a CRC. Previously there was a CRC header per extent in the remote attribute code, but this was untenable because it was not possible to know how many extents would be allocated for the attribute until after the allocation has completed, due to the fragmentation of free space. This became complicated because the size of the headers needs to be added to the length of the payload to get the overall length required for the allocation. With a header per block, things are less complicated at the cost of a little space. I would have preferred to defer this and the rest of the CRC queue to 3.11 to mitigate risk for existing non-crc users in 3.10. Doing so would require setting a feature bit for the on-disk changes, and so I have been pressured into sending this pull request by Eric Sandeen and David Chinner from Red Hat. I'll send another pull request or two with the rest of the CRC queue next week. - Remove assert on count of remote attribute CRC headers - Fix the number of blocks read in for remote attributes - Zero remote attribute tails properly - Fix mapping of remote attribute buffers to have correct length - initialize temp leaf properly in xfs_attr3_leaf_unbalance, and xfs_attr3_leaf_compact - Rework remote atttributes to have a header per block" * tag 'for-linus-v3.10-rc4-crc-xattr-fixes' of git://oss.sgi.com/xfs/xfs: xfs: rework remote attr CRCs xfs: fully initialise temp leaf in xfs_attr3_leaf_compact xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalance xfs: correctly map remote attr buffers during removal xfs: remote attribute tail zeroing does too much xfs: remote attribute read too short xfs: remote attribute allocation may be contiguous
2013-06-01Merge tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfsLinus Torvalds
Pull xfs fixes from Ben Myers: - Fix nested transactions in xfs_qm_scall_setqlim - Clear suid/sgid bits when we truncate with size update - Fix recovery for split buffers - Fix block count on remote symlinks - Add fsgeom flag for v5 superblock support - Disable XFS_IOC_SWAPEXT for CRC enabled filesystems - Fix dirv3 freespace block corruption * tag 'for-linus-v3.10-rc4' of git://oss.sgi.com/xfs/xfs: xfs: fix dir3 freespace block corruption xfs: disable swap extents ioctl on CRC enabled filesystems xfs: add fsgeom flag for v5 superblock support. xfs: fix incorrect remote symlink block count xfs: fix split buffer vector log recovery support xfs: kill suid/sgid through the truncate path. xfs: avoid nesting transactions in xfs_qm_scall_setqlim()
2013-06-01Merge tag 'please-pull-aertracefix' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull aer error logging fix from Tony Luck: "Can't call pci_get_domain_bus_and_slot() from interupt context" * tag 'please-pull-aertracefix' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: aerdrv: Move cper_print_aer() call out of interrupt context
2013-06-01Merge tag 'arm64-stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull arm64 fixes from Catalin Marinas: - Module compilation issues (symbol not exported). - Plug a hole where user space can bring the kernel down. * tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: arm64: don't kill the kernel on a bad esr from el0 arm64: treat unhandled compat el0 traps as undef arm64: Do not report user faults for handled signals arm64: kernel: compiling issue, need 'EXPORT_SYMBOL(clear_page)'
2013-05-31reiserfs: fix deadlock with nfs racing on create/lookupJeff Mahoney
Reiserfs is currently able to be deadlocked by having two NFS clients where one has removed and recreated a file and another is accessing the file with an open file handle. If one client deletes and recreates a file with timing such that the recreated file obtains the same [dirid, objectid] pair as the original file while another client accesses the file via file handle, the create and lookup can race and deadlock if the lookup manages to create the in-memory inode first. The create thread, in insert_inode_locked4, will hold the write lock while waiting on the other inode to be unlocked. The lookup thread, anywhere in the iget path, will release and reacquire the write lock while it schedules. If it needs to reacquire the lock while the create thread has it, it will never be able to make forward progress because it needs to reacquire the lock before ultimately unlocking the inode. This patch drops the write lock across the insert_inode_locked4 call so that the ordering of inode_wait -> write lock is retained. Since this would have been the case before the BKL push-down, this is safe. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31reiserfs: fix problems with chowning setuid file w/ xattrsJeff Mahoney
reiserfs_chown_xattrs() takes the iattr struct passed into ->setattr and uses it to iterate over all the attrs associated with a file to change ownership of xattrs (and transfer quota associated with the xattr files). When the setuid bit is cleared during chown, ATTR_MODE and iattr->ia_mode are passed to all the xattrs as well. This means that the xattr directory will have S_IFREG added to its mode bits. This has been prevented in practice by a missing IS_PRIVATE check in reiserfs_acl_chmod, which caused a double-lock to occur while holding the write lock. Since the file system was completely locked up, the writeout of the corrupted mode never happened. This patch temporarily clears everything but ATTR_UID|ATTR_GID for the calls to reiserfs_setattr and adds the missing IS_PRIVATE check. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31reiserfs: fix spurious multiple-fill in reiserfs_readdir_dentryJeff Mahoney
After sleeping for filldir(), we check to see if the file system has changed and research. The next_pos pointer is updated but its value isn't pushed into the key used for the search itself. As a result, the search returns the same item that the last cycle of the loop did and filldir() is called multiple times with the same data. The end result is that the buffer can contain the same name multiple times. This can be returned to userspace or used internally in the xattr code where it can manifest with the following warning: jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2) reiserfs_for_each_xattr uses reiserfs_readdir_dentry to iterate over the xattr names and ends up trying to unlink the same name twice. The second attempt fails with -ENOENT and the error is returned. At some point I'll need to add support into reiserfsck to remove the orphaned directories left behind when this occurs. The fix is to push the value into the key before researching. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz>
2013-05-31zoran: racy refcount handling in vm_ops ->open()/->close()Al Viro
worse, we lock ->resource_lock too late when we are destroying the final clonal VMA; the check for lack of other mappings of the same opened file can race with mmap(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-06-01atmel_lcdfb: blank the backlight on removeRichard Genoud
When removing atmel_lcdfb module, the backlight is unregistered but not blanked. (only for CONFIG_BACKLIGHT_ATMEL_LCDC case). This can result in the screen going full white depending on how the PWM is wired. Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
2013-06-01trivial: atmel_lcdfb: add missing error messageRichard Genoud
When a too small framebuffer is given, the atmel_lcdfb_check_var silently fails. Adding an error message will save some head scratching. Signed-off-by: Richard Genoud <richard.genoud@gmail.com> Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
2013-05-31befs_readdir(): do not increment ->f_pos if filldir tells us to stopAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31hpfs: deadlock and race in directory lseek()Al Viro
For one thing, there's an ABBA deadlock on hpfs fs-wide lock and i_mutex in hpfs_dir_lseek() - there's a lot of methods that grab the former with the caller already holding the latter, so it must take i_mutex first. For another, locking the damn thing, carefully validating the offset, then dropping locks and assigning the offset is obviously racy. Moreover, we _must_ do hpfs_add_pos(), or the machinery in dnode.c won't modify the sucker on B-tree surgeries. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31qnx6: qnx6_readdir() has a braino in pos calculationAl Viro
We want to mask lower 5 bits out, not leave only those and clear the rest... As it is, we end up always starting to read from the beginning of directory, no matter what the current position had been. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31fix buffer leak after "scsi: saner replacements for ->proc_info()"Jan Beulich
That patch failed to set proc_scsi_fops' .release method. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31vfs: Fix invalid ida_remove() callTakashi Iwai
When the group id of a shared mount is not allocated, the umount still tries to call mnt_release_group_id(), which eventually hits a kernel warning at ida_remove() spewing a message like: ida_remove called for id=0 which is not allocated. This patch fixes the bug simply checking the group id in the caller. Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-05-31arm64: don't kill the kernel on a bad esr from el0Mark Rutland
Rather than completely killing the kernel if we receive an esr value we can't deal with in the el0 handlers, send the process a SIGILL and log the esr value in the hope that we can debug it. If we receive a bad esr from el1, we'll die() as before. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: stable@vger.kernel.org
2013-05-31arm64: treat unhandled compat el0 traps as undefMark Rutland
Currently, if a compat process reads or writes from/to a disabled cp15/cp14 register, the trap is not handled by the el0_sync_compat handler, and the kernel will head to bad_mode, where it will die(), and oops(). For 64 bit processes, disabled system register accesses are currently treated as unhandled instructions. This patch modifies entry.S to treat these unhandled traps as undefined instructions, sending a SIGILL to userspace. This gives processes a chance to handle this and stop using inaccessible registers, and prevents further issues in the kernel as a result of the die(). Reported-by: Johannes Jensen <Johannes.Jensen@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-05-31iscsi-target: Fix iscsit_free_cmd() se_cmd->cmd_kref shutdown handlingNicholas Bellinger
With the introduction of target_get_sess_cmd() referencing counting for ISCSI_OP_SCSI_CMD processing with iser-target, iscsit_free_cmd() usage in traditional iscsi-target driver code now needs to be aware of the active I/O shutdown case when a remaining se_cmd->cmd_kref reference may exist after transport_generic_free_cmd() completes, requiring a final target_put_sess_cmd() to release iscsi_cmd descriptor memory. This patch changes iscsit_free_cmd() to invoke __iscsit_free_cmd() before transport_generic_free_cmd() -> target_put_sess_cmd(), and also avoids aquiring the per-connection queue locks for typical fast-path calls during normal ISTATE_REMOVE operation. Also update iscsit_free_cmd() usage throughout iscsi-target to use the new 'bool shutdown' parameter. This patch fixes a regression bug introduced during v3.10-rc1 in commit 3e1c81a95, that was causing the following WARNING to appear: [ 257.235153] ------------[ cut here]------------ [ 257.240314] WARNING: at kernel/softirq.c:160 local_bh_enable_ip+0x3c/0x86() [ 257.248089] Modules linked in: vhost_scsi ib_srpt ib_cm ib_sa ib_mad ib_core tcm_qla2xxx tcm_loop tcm_fc libfc iscsi_target_mod target_core_pscsi target_core_file target_core_iblock target_core_mod configfs ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi loop acpi_cpufreq freq_table mperf kvm_intel kvm crc32c_intel button ehci_pci pcspkr joydev i2c_i801 microcode ext3 jbd raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 linear igb hwmon i2c_algo_bit i2c_core ptp ata_piix libata qla2xxx uhci_hcd ehci_hcd mlx4_core scsi_transport_fc scsi_tgt pps_core [ 257.308748] CPU: 1 PID: 3295 Comm: iscsi_ttx Not tainted 3.10.0-rc2+ #103 [ 257.316329] Hardware name: Intel Corporation S5520HC/S5520HC, BIOS S5500.86B.01.00.0057.031020111721 03/10/2011 [ 257.327597] ffffffff814c24b7 ffff880458331b58 ffffffff8138eef2 ffff880458331b98 [ 257.335892] ffffffff8102c052 ffff880400000008 0000000000000000 ffff88085bdf0000 [ 257.344191] ffff88085bdf00d8 ffff88085bdf00e0 ffff88085bdf00f8 ffff880458331ba8 [ 257.352488] Call Trace: [ 257.355223] [<ffffffff8138eef2>] dump_stack+0x19/0x1f [ 257.360963] [<ffffffff8102c052>] warn_slowpath_common+0x62/0x7b [ 257.367669] [<ffffffff8102c080>] warn_slowpath_null+0x15/0x17 [ 257.374181] [<ffffffff81032345>] local_bh_enable_ip+0x3c/0x86 [ 257.380697] [<ffffffff813917fd>] _raw_spin_unlock_bh+0x10/0x12 [ 257.387311] [<ffffffffa029069c>] iscsit_free_r2ts_from_list+0x5e/0x67 [iscsi_target_mod] [ 257.396438] [<ffffffffa02906c5>] iscsit_release_cmd+0x20/0x223 [iscsi_target_mod] [ 257.404893] [<ffffffffa02977a4>] lio_release_cmd+0x3a/0x3e [iscsi_target_mod] [ 257.412964] [<ffffffffa01d59a1>] target_release_cmd_kref+0x7a/0x7c [target_core_mod] [ 257.421712] [<ffffffffa01d69bc>] target_put_sess_cmd+0x5f/0x7f [target_core_mod] [ 257.430071] [<ffffffffa01d6d6d>] transport_release_cmd+0x59/0x6f [target_core_mod] [ 257.438625] [<ffffffffa01d6eb4>] transport_put_cmd+0x131/0x140 [target_core_mod] [ 257.446985] [<ffffffffa01d6192>] ? transport_wait_for_tasks+0xfa/0x1d5 [target_core_mod] [ 257.456121] [<ffffffffa01d6f11>] transport_generic_free_cmd+0x4e/0x52 [target_core_mod] [ 257.465159] [<ffffffff81050537>] ? __migrate_task+0x110/0x110 [ 257.471674] [<ffffffffa02904ba>] iscsit_free_cmd+0x46/0x55 [iscsi_target_mod] [ 257.479741] [<ffffffffa0291edb>] iscsit_immediate_queue+0x301/0x353 [iscsi_target_mod] [ 257.488683] [<ffffffffa0292f7e>] iscsi_target_tx_thread+0x1c6/0x2a8 [iscsi_target_mod] [ 257.497623] [<ffffffff81047486>] ? wake_up_bit+0x25/0x25 [ 257.503652] [<ffffffffa0292db8>] ? iscsit_ack_from_expstatsn+0xd5/0xd5 [iscsi_target_mod] [ 257.512882] [<ffffffff81046f89>] kthread+0xb0/0xb8 [ 257.518329] [<ffffffff81046ed9>] ? kthread_freezable_should_stop+0x60/0x60 [ 257.526105] [<ffffffff81396fec>] ret_from_fork+0x7c/0xb0 [ 257.532133] [<ffffffff81046ed9>] ? kthread_freezable_should_stop+0x60/0x60 [ 257.539906] ---[ end trace 5520397d0f2e0800 ]--- Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31target: Propigate up ->cmd_kref put return via transport_generic_free_cmdNicholas Bellinger
Go ahead and propigate up the ->cmd_kref put return value from target_put_sess_cmd() -> transport_release_cmd() -> transport_put_cmd() -> transport_generic_free_cmd(). This is useful for certain fabrics when determining the active I/O shutdown case with SCF_ACK_KREF where a final target_put_sess_cmd() is still required by the caller. Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "One qxl 32-bit warning fix, the rest is a bunch of radeon fixes from Alex for some issues we've been seeing." * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/qxl: fix build warnings on 32-bit radeon: use max_bus_speed to activate gen2 speeds drm/radeon: narrow scope of Apple re-POST hack drm/radeon: don't check crtcs in card_posted() on cards without DCE drm/radeon: fix card_posted check for newer asics drm/radeon: fix typo in cu_per_sh on verde drm/radeon: UVD block on SUMO2 is the same as on SUMO
2013-05-31drm/qxl: fix build warnings on 32-bitDave Airlie
Just the usual printk related warnings. Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-05-30clk: mxs: Include clk mxs header fileFabio Estevam
Fix the following sparse warnings: drivers/clk/mxs/clk-imx28.c:72:5: warning: symbol 'mxs_saif_clkmux_select' was not declared. Should it be static? drivers/clk/mxs/clk-imx28.c:156:12: warning: symbol 'mx28_clocks_init' was not declared. Should it be static? Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org> [mturquette@linaro.org: fixed $SUBJECT line]
2013-05-30iscsi-target: fix heap buffer overflow on errorKees Cook
If a key was larger than 64 bytes, as checked by iscsi_check_key(), the error response packet, generated by iscsi_add_notunderstood_response(), would still attempt to copy the entire key into the packet, overflowing the structure on the heap. Remote preauthentication kernel memory corruption was possible if a target was configured and listening on the network. CVE-2013-2850 Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "A couple minor fixes for the (new to 3.10) gss-proxy code. And one regression from user-namespace changes. (XBMC clients were doing something admittedly weird--sending -1 gid's--but something that we used to allow.)" * 'for-3.10' of git://linux-nfs.org/~bfields/linux: svcrpc: fix failures to handle -1 uid's and gid's svcrpc: implement O_NONBLOCK behavior for use-gss-proxy svcauth_gss: fix error code in use_gss_proxy()
2013-05-30target/file: Fix off-by-one READ_CAPACITY bug for !S_ISBLK exportNicholas Bellinger
This patch fixes a bug where FILEIO was incorrectly reporting the number of logical blocks (+ 1) when using non struct block_device export mode. It changes fd_get_blocks() to follow all other backend ->get_blocks() cases, and reduces the calculated dev_size by one dev->dev_attrib.block_size number of bytes, and also fixes initial fd_block_size assignment at fd_configure_device() time introduced in commit 0fd97ccf4. Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reported-by: Badari Pulavarty <pbadari@us.ibm.com> Tested-by: Badari Pulavarty <pbadari@us.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: - Three EFI-related fixes - Two early memory initialization fixes - build fix for older binutils - fix for an eager FPU performance regression -- currently we don't allow the use of the FPU at interrupt time *at all* in eager mode, which is clearly wrong. * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Allow FPU to be used at interrupt time even with eagerfpu x86, crc32-pclmul: Fix build with older binutils x86-64, init: Fix a possible wraparound bug in switchover in head_64.S x86, range: fix missing merge during add range x86, efi: initial the local variable of DataSize to zero efivar: fix oops in efivar_update_sysfs_entries() caused by memory reuse efivarfs: Never return ENOENT from firmware again
2013-05-30x86: Allow FPU to be used at interrupt time even with eagerfpuPekka Riikonen
With the addition of eagerfpu the irq_fpu_usable() now returns false negatives especially in the case of ksoftirqd and interrupted idle task, two common cases for FPU use for example in networking/crypto. With eagerfpu=off FPU use is possible in those contexts. This is because of the eagerfpu check in interrupted_kernel_fpu_idle(): ... * For now, with eagerfpu we will return interrupted kernel FPU * state as not-idle. TBD: Ideally we can change the return value * to something like __thread_has_fpu(current). But we need to * be careful of doing __thread_clear_has_fpu() before saving * the FPU etc for supporting nested uses etc. For now, take * the simple route! ... if (use_eager_fpu()) return 0; As eagerfpu is automatically "on" on those CPUs that also have the features like AES-NI this patch changes the eagerfpu check to return 1 in case the kernel_fpu_begin() has not been said yet. Once it has been the __thread_has_fpu() will start returning 0. Notice that with eagerfpu the __thread_has_fpu is always true initially. FPU use is thus always possible no matter what task is under us, unless the state has already been saved with kernel_fpu_begin(). [ hpa: this is a performance regression, not a correctness regression, but since it can be quite serious on CPUs which need encryption at interrupt time I am marking this for urgent/stable. ] Signed-off-by: Pekka Riikonen <priikone@iki.fi> Link: http://lkml.kernel.org/r/alpine.GSO.2.00.1305131356320.18@git.silcnet.org Cc: <stable@vger.kernel.org> v3.7+ Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30x86, crc32-pclmul: Fix build with older binutilsJan Beulich
binutils prior to 2.18 (e.g. the ones found on SLE10) don't support assembling PEXTRD, so a macro based approach like the one for PCLMULQDQ in the same file should be used. This requires making the helper macros capable of recognizing 32-bit general purpose register operands. [ hpa: tagging for stable as it is a low risk build fix ] Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/51A6142A02000078000D99D8@nat28.tlf.novell.com Cc: Alexander Boyko <alexander_boyko@xyratex.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Huang Ying <ying.huang@intel.com> Cc: <stable@vger.kernel.org> v3.9 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-05-30xfs: rework remote attr CRCsDave Chinner
Note: this changes the on-disk remote attribute format. I assert that this is OK to do as CRCs are marked experimental and the first kernel it is included in has not yet reached release yet. Further, the userspace utilities are still evolving and so anyone using this stuff right now is a developer or tester using volatile filesystems for testing this feature. Hence changing the format right now to save longer term pain is the right thing to do. The fundamental change is to move from a header per extent in the attribute to a header per filesytem block in the attribute. This means there are more header blocks and the parsing of the attribute data is slightly more complex, but it has the advantage that we always know the size of the attribute on disk based on the length of the data it contains. This is where the header-per-extent method has problems. We don't know the size of the attribute on disk without first knowing how many extents are used to hold it. And we can't tell from a mapping lookup, either, because remote attributes can be allocated contiguously with other attribute blocks and so there is no obvious way of determining the actual size of the atribute on disk short of walking and mapping buffers. The problem with this approach is that if we map a buffer incorrectly (e.g. we make the last buffer for the attribute data too long), we then get buffer cache lookup failure when we map it correctly. i.e. we get a size mismatch on lookup. This is not necessarily fatal, but it's a cache coherency problem that can lead to returning the wrong data to userspace or writing the wrong data to disk. And debug kernels will assert fail if this occurs. I found lots of niggly little problems trying to fix this issue on a 4k block size filesystem, finally getting it to pass with lots of fixes. The thing is, 1024 byte filesystems still failed, and it was getting really complex handling all the corner cases that were showing up. And there were clearly more that I hadn't found yet. It is complex, fragile code, and if we don't fix it now, it will be complex, fragile code forever more. Hence the simple fix is to add a header to each filesystem block. This gives us the same relationship between the attribute data length and the number of blocks on disk as we have without CRCs - it's a linear mapping and doesn't require us to guess anything. It is simple to implement, too - the remote block count calculated at lookup time can be used by the remote attribute set/get/remove code without modification for both CRC and non-CRC filesystems. The world becomes sane again. Because the copy-in and copy-out now need to iterate over each filesystem block, I moved them into helper functions so we separate the block mapping and buffer manupulations from the attribute data and CRC header manipulations. The code becomes much clearer as a result, and it is a lot easier to understand and debug. It also appears to be much more robust - once it worked on 4k block size filesystems, it has worked without failure on 1k block size filesystems, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit ad1858d77771172e08016890f0eb2faedec3ecee)
2013-05-30xfs: fully initialise temp leaf in xfs_attr3_leaf_compactDave Chinner
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the the entries in a leaf. It copies the the original buffer into the temporary buffer, then zeros the original buffer completely. It then copies the entries back into the original buffer. However, the original buffer has not been correctly initialised, and so the movement of the entries goes horribly wrong. Make sure the zeroed destination buffer is fully initialised, and once we've set up the destination incore header appropriately, write is back to the buffer before starting to move entries around. While debugging this, the _d/_s prefixes weren't sufficient to remind me what buffer was what, so rename then all _src/_dst. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit d4c712bcf26a25c2b67c90e44e0b74c7993b5334)
2013-05-30xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalanceDave Chinner
xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining the entries in two leaves when the destination leaf requires compaction. The temporary buffer ends up being copied back over the original destination buffer, so the header in the temporary buffer needs to contain all the information that is in the destination buffer. To make sure the temporary buffer is fully initialised, once we've set up the temporary incore header appropriately, write is back to the temporary buffer before starting to move entries around. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 8517de2a81da830f5d90da66b4799f4040c76dc9)
2013-05-30xfs: correctly map remote attr buffers during removalDave Chinner
If we don't map the buffers correctly (same as for get/set operations) then the incore buffer lookup will fail. If a block number matches but a length is wrong, then debug kernels will ASSERT fail in _xfs_buf_find() due to the length mismatch. Ensure that we map the buffers correctly by basing the length of the buffer on the attribute data length rather than the remote block count. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 6863ef8449f1908c19f43db572e4474f24a1e9da)
2013-05-30xfs: remote attribute tail zeroing does too muchDave Chinner
When an attribute data does not fill then entire remote block, we zero the remaining part of the buffer. This, however, needs to take into account that the buffer has a header, and so the offset where zeroing starts and the length of zeroing need to take this into account. Otherwise we end up with zeros over the end of the attribute value when CRCs are enabled. While there, make sure we only ask to map an extent that covers the remaining range of the attribute, rather than asking every time for the full length of remote data. If the remote attribute blocks are contiguous with other parts of the attribute tree, it will map those blocks as well and we can potentially zero them incorrectly. We can also get buffer size mistmatches when trying to read or remove the remote attribute, and this can lead to not finding the correct buffer when looking it up in cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 4af3644c9a53eb2f1ecf69cc53576561b64be4c6)
2013-05-30xfs: remote attribute read too shortDave Chinner
Reading a maximally size remote attribute fails when CRCs are enabled with this verification error: XFS (vdb): remote attribute header does not match required off/len/owner) There are two reasons for this, the first being that the length of the buffer being read is determined from the args->rmtblkcnt which doesn't take into account CRC headers. Hence the mapped length ends up being too short and so we need to calculate it directly from the value length. The second is that the byte count of valid data within a buffer is capped by the length of the data and so doesn't take into account that the buffer might be longer due to headers. Hence we need to calculate the data space in the buffer first before calculating the actual byte count of data. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 913e96bc292e1bb248854686c79d6545ef3ee720)
2013-05-30xfs: remote attribute allocation may be contiguousDave Chinner
When CRCs are enabled, there may be multiple allocations made if the headers cause a length overflow. This, however, does not mean that the number of headers required increases, as the second and subsequent extents may be contiguous with the previous extent. Hence when we map the extents to write the attribute data, we may end up with less extents than allocations made. Hence the assertion that we consume the number of headers we calculated in the allocation loop is incorrect and needs to be removed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 90253cf142469a40f89f989904abf0a1e500e1a6)
2013-05-30xfs: fix dir3 freespace block corruptionDave Chinner
When the directory freespace index grows to a second block (2017 4k data blocks in the directory), the initialisation of the second new block header goes wrong. The write verifier fires a corruption error indicating that the block number in the header is zero. This was being tripped by xfs/110. The problem is that the initialisation of the new block is done just fine in xfs_dir3_free_get_buf(), but the caller then users a dirv2 structure to zero on-disk header fields that xfs_dir3_free_get_buf() has already zeroed. These lined up with the block number in the dir v3 header format. While looking at this, I noticed that the struct xfs_dir3_free_hdr() had 4 bytes of padding in it that wasn't defined as padding or being zeroed by the initialisation. Add a pad field declaration and fully zero the on disk and in-core headers in xfs_dir3_free_get_buf() so that this is never an issue in the future. Note that this doesn't change the on-disk layout, just makes the 32 bits of padding in the layout explicit. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 5ae6e6a401957698f2bd8c9f4a86d86d02199fea)
2013-05-30xfs: disable swap extents ioctl on CRC enabled filesystemsDave Chinner
Currently, swapping extents from one inode to another is a simple act of switching data and attribute forks from one inode to another. This, unfortunately in no longer so simple with CRC enabled filesystems as there is owner information embedded into the BMBT blocks that are swapped between inodes. Hence swapping the forks between inodes results in the inodes having mapping blocks that point to the wrong owner and hence are considered corrupt. To fix this we need an extent tree block or record based swap algorithm so that the BMBT block owner information can be updated atomically in the swap transaction. This is a significant piece of new work, so for the moment simply don't allow swap extent operations to succeed on CRC enabled filesystems. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 02f75405a75eadfb072609f6bf839e027de6a29a)
2013-05-30xfs: add fsgeom flag for v5 superblock support.Dave Chinner
Currently userspace has no way of determining that a filesystem is CRC enabled. Add a flag to the XFS_IOC_FSGEOMETRY ioctl output to indicate that the filesystem has v5 superblock support enabled. This will allow xfs_info to correctly report the state of the filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 74137fff067961c9aca1e14d073805c3de8549bd)
2013-05-30xfs: fix incorrect remote symlink block countDave Chinner
When CRCs are enabled, the number of blocks needed to hold a remote symlink on a 1k block size filesystem may be 2 instead of 1. The transaction reservation for the allocated blocks was not taking this into account and only allocating one block. Hence when trying to read or invalidate such symlinks, we are mapping a hole where there should be a block and things go bad at that point. Fix the reservation to use the correct block count, clean up the block count calculation similar to the remote attribute calculation, and add a debug guard to detect when we don't write the entire symlink to disk. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 321a95839e65db3759a07a3655184b0283af90fe)