fathi.boudra/ubuntu/linux-linaro-precise.git - Packaged Linux-linaro kernels for Precise

Age	Commit message (Collapse)	Author
2011-11-16	UBUNTU: SAUCE: seccomp_filter: new mode with configurable syscall filters	Will Drewry
	This change adds a new seccomp mode which specifies the allowed system calls dynamically. When in the new mode (13), all system calls are checked against process-defined filters - first by system call number, then by a filter string. If an entry exists for a given system call and all filter predicates evaluate to true, then the task may proceed. Otherwise, the task is killed. Filter string parsing and evaluation is handled by the ftrace filter engine. Related patches tweak to the perf filter trace and free allowing the calls to be shared. Filters inherit their understanding of types and arguments for each system call from the CONFIG_FTRACE_SYSCALLS subsystem which already populates this information in syscall_metadata associated enter_event (and exit_event) structures. If CONFIG_FTRACE_SYSCALLS is not compiled in, only filter strings of "1" will be allowed. The net result is a process may have its system calls filtered using the ftrace filter engine's inherent understanding of systems calls. The set of filters is specified through the PR_SET_SECCOMP_FILTER argument in prctl(). For example, a filterset for a process, like pdftotext, that should only process read-only input could (roughly) look like: sprintf(rdonly, "flags == %u", O_RDONLY\|O_LARGEFILE); type = PR_SECCOMP_FILTER_SYSCALL; prctl(PR_SET_SECCOMP_FILTER, type, __NR_open, rdonly); prctl(PR_SET_SECCOMP_FILTER, type, __NR__llseek, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_brk, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_close, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_exit_group, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_fstat64, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_mmap2, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_munmap, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_read, "1"); prctl(PR_SET_SECCOMP_FILTER, type, __NR_write, "fd == 1 \|\| fd == 2"); prctl(PR_SET_SECCOMP, 13); Subsequent calls to PR_SET_SECCOMP_FILTER for the same system call will be &&'d together to ensure that attack surface may only be reduced: prctl(PR_SET_SECCOMP_FILTER, __NR_write, "fd != 2"); With the earlier example, the active filter becomes: "(fd == 1 \|\| fd == 2) && (fd != 2)" The patch also adds PR_CLEAR_SECCOMP_FILTER and PR_GET_SECCOMP_FILTER. The latter returns the current filter for a system call to userspace: prctl(PR_GET_SECCOMP_FILTER, type, __NR_write, buf, bufsize); while the former clears any filters for a given system call changing it back to a defaulty deny: prctl(PR_CLEAR_SECCOMP_FILTER, type, __NR_write); Note, type may be either PR_SECCOMP_FILTER_EVENT or PR_SECCOMP_FILTER_SYSCALL. This allows for ftrace event ids to be used in lieu of system call numbers. At present, only syscalls:sys_enter_* event id are supported, but this allows for potential future extension of the backend. v11: - Use mode "13" to avoid future overlap; with comment update - Use kref; extra memset; other clean up from msb@chromium.org - Cleaned up Makefile object merging since locally shared symbols are gone v10: - Note that PERF_EVENTS are also needed for ftrace filter engine support. - Removed dependency on ftrace code changes for event_filters (wrapping with perf_events and violating opaqueness for the filter str) - pulled in all the hacks to get access to syscall_metadata and build call objects for filter evaluation. v9: - rebase on to de505e709ffb09a7382ca8e0d8c7dbb171ba5 - disallow PR_SECCOMP_FILTER_EVENT when a compat task is calling as ftrace has no compat_syscalls awareness yet. - return -ENOSYS when filter engine strings are used on a compat call as there are no compat_syscalls events to reference yet. v8: - expand parenthical use during SET_SECCOMP_FILTER to avoid operator precedence undermining attack surface reduction (caught by segoon@openwall.com). Opted to waste bytes on () than reparse to avoid OP_OR precedence overriding extend_filter's intentions. - remove more lingering references to @state - fix incorrect compat mismatch check (anyone up for a Tested-By?) v7: - disallow seccomp_filter inheritance across fork except when seccomp is active. This avoids filters leaking across processes when they are not actively in use but ensure an allowed fork/clone doesn't drop filters. - remove the Mode: print from show as it reflected current and not the filters holder. v6: - clean up minor unnecessary changes (empty lines, ordering, etc) - fix one overly long line - add refcount overflow BUG_ON v5: - drop mutex usage when the task_struct is safe to access directly v4: - move off of RCU to a read/write guarding mutex after paulmck@linux.vnet.ibm.com's feedback (mem leak, rcu fail) - stopped inc/dec refcounts in mutex guard sections - added required changes to init the mutex in INIT_TASK and safely lock around fork inheritance. - added id_type support to the prctl interface to support using ftrace event ids as an alternative to syscall numbers. Behavior is identical otherwise (as per discussion with mingo@elte.hu) v3: - always block execve calls (as per torvalds@linux-foundation.org) - add __NR_seccomp_execve(_32) to seccomp-supporting arches - ensure compat tasks can't reach ftrace:syscalls - dropped new defines for seccomp modes. - two level array instead of hlists (sugg. by olofj@chromium.org) - added generic Kconfig entry that is not connected. - dropped internal seccomp.h - move prctl helpers to seccomp_filter - killed seccomp_t typedef (as per checkpatch) v2: - changed to use the existing syscall number ABI. - prctl changes to minimize parsing in the kernel: prctl(PR_SET_SECCOMP, {0 \| 1 \| 2 }, { 0 \| ON_EXEC }); prctl(PR_SET_SECCOMP_FILTER, __NR_read, "fd == 5"); prctl(PR_CLEAR_SECCOMP_FILTER, __NR_read); prctl(PR_GET_SECCOMP_FILTER, __NR_read, buf, bufsize); - defined PR_SECCOMP_MODE_STRICT and ..._FILTER - added flags - provide a default fail syscall_nr_to_meta in ftrace - provides fallback for unhooked system calls - use -ENOSYS and ERR_PTR(-ENOSYS) for stubbed functionality - added kernel/seccomp.h to share seccomp.c/seccomp_filter.c - moved to a hlist and 4 bit hash of linked lists - added support to operate without CONFIG_FTRACE_SYSCALLS - moved Kconfig support next to SECCOMP - made Kconfig entries dependent on EXPERIMENTAL - added macros to avoid ifdefs from kernel/fork.c - added compat task/filter matching - drop seccomp.h inclusion in sched.h and drop seccomp_t - added Filtering to "show" output - added on_exec state dup'ing when enabling after a fast-path accept. Signed-off-by: Will Drewry <wad@chromium.org> BUG=chromium-os:14496 TEST=built in x86-alex. Out of tree commandline helper test confirms functionality works. Will check in a test into the minijail repo which can be used from autotest. Change-Id: I901595e3399914783739d113a058d83550ddf8e2 Reviewed-on: http://gerrit.chromium.org/gerrit/4814 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> Tested-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
2011-11-16	fs: limit filesystem stacking depth	Miklos Szeredi
	Add a simple read-only counter to super_block that indicates deep this is in the stack of filesystems. Previously ecryptfs was the only stackable filesystem and it explicitly disallowed multiple layers of itself. Overlayfs, however, can be stacked recursively and also may be stacked on top of ecryptfs or vice versa. To limit the kernel stack usage we must limit the depth of the filesystem stack. Initially the limit is set to 2. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	Revert "UBUNTU: ubuntu: overlayfs -- fs: limit filesystem stacking depth"	Leann Ogasawara
	This reverts commit d72ccf8cb287df877221c874efb38f343f671e86.
2011-11-16	UBUNTU: SAUCE: vt -- allow grub to request automatic vt_handoff	Andy Whitcroft
	Grub may be able to select a graphics mode and paint a splash screen for us. If so it needs to be able to tell us it has done so. Add support for detecting a new graphics mode selected bit in the screen_info passed over at boot. Use this to automatically enable vt_handoff mode. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: SAUCE: vt -- maintain bootloader screen mode and content until vt switch	Andy Whitcroft
	Introduce a new VT mode KD_TRANSPARENT which endevours to leave the current content of the framebuffer untouched. This allows the bootloader to insert a graphical splash and have the kernel maintain it until the OS splash can take over. When we finally switch away (either through programs like plymouth or manually) the content is lost and the VT reverts to text mode. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: SAUCE: (no-up) vfs: Add a trace point in the mark_inode_dirty function	Arjan van de Ven
	[apw@canonical.com: This has no upstream traction but is used by powertop, so its worth carrying.] PowerTOP would like to be able to show who is keeping the disk busy by dirtying data. The most logical spot for this is in the vfs in the mark_inode_dirty() function. Doing this on the block level is not possible because by the time the IO hits the block layer the guilty party can no longer be found ("kjournald" and "pdflush" are not useful answers to "who caused this file to be dirty). The trace point follows the same logic/style as the block_dump code and pretty much dumps the same data, just not to dmesg (and thus to /var/log/messages) but via the trace events streams. Note: This patch was posted to lkml and might potentially go into 2.6.33 but I have not seen which maintainer will take it. Signed-of-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Amit Kucheria <amit.kucheria@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: SAUCE: (no-up) add tracing for user initiated readahead requests	Andy Whitcroft
	Track pages which undergo readahead and for each record which were actually consumed, via either read or faulted into a map. This allows userspace readahead applications (such as ureadahead) to track which pages in core at the end of a boot are actually required and generate an optimal readahead pack. It also allows pack adjustment and optimisation in parallel with readahead, allowing the pack to evolve to be accurate as userspace paths change. The status of the pages are reported back via the mincore() call using a newly allocated bit. Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: ubuntu: overlayfs -- fs: limit filesystem stacking depth	Miklos Szeredi
	Add a simple read-only counter to super_block that indicates deep this is in the stack of filesystems. Previously ecryptfs was the only stackable filesystem and it explicitly disallowed multiple layers of itself. Overlayfs, however, can be stacked recursively and also may be stacked on top of ecryptfs or vice versa. To limit the kernel stack usage we must limit the depth of the filesystem stack. Initially the limit is set to 2. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: ubuntu: overlayfs -- vfs: introduce clone_private_mount()	Miklos Szeredi
	Overlayfs needs a private clone of the mount, so create a function for this and export to modules. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: ubuntu: overlayfs -- vfs: add i_op->open()	Miklos Szeredi
	Add a new inode operation i_op->open(). This is for stacked filesystems that want to return a struct file from a different filesystem. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: ubuntu: nx-emu - i386: mmap randomization for executable mappings	Ingo Molnar
	This code is originally from Ingo Molnar, with some later rebasing and fixes to respect all the randomization-disabling knobs. It provides address randomization algorithm when NX emulation is in use in 32-bit processes. Kees Cook pushed the brk area further away in the case of PIE binaries landing their brk inside the CS limit. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: ubuntu: nx-emu - i386: NX emulation	Ingo Molnar
	This is old code with some cruft, all originally by Ingo Molnar with much later rebasing by Fedora folks and at least one arcane fix by Roland McGrath a few years ago. No longer uses exec-shield sysctl, merged with disable_nx. Kees Cook fixed boottime NX reporting for various corner cases. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: ubuntu: Yama - add ptrace relationship tracking interface	Kees Cook
	Some application suites have external crash handlers that depend on being able to use ptrace to generate crash reports (KDE, Wine, Chromium, Firefox, etc). Since the inferior process has a defined application-specific relationship with the debugger, allow the inferior to express that relationship by declaring who can call PTRACE_ATTACH against it. The inferior can use prctl() with PR_SET_PTRACER to allow a specific PID and its descendants to perform the ptrace instead of only a direct ancestor. Signed-off-by: Kees Cook <kees.cook@canonical.com> --- v2: - kmalloc, spinlock init, and doc typo corrections from Tetsuo Handa. - make sure to replace if possible on add, thanks to Eric Paris. v3: - make sure to use thread group leader when searching for exceptions. v4: - make sure to use thread group leader when creating exceptions. v5: - make sure to use thread group leader when deleting exceptions. Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
2011-11-16	UBUNTU: ubuntu: Yama - create task_free security callback	Kees Cook
	The current LSM interface to cred_free is not sufficient for allowing an LSM to track the life and death of a task. This patch adds the task_free hook so that an LSM can clean up resources on task death. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
2011-11-16	UBUNTU: ubuntu: AUFS -- aufs2-base.patch aufs2.1-39	Andy Whitcroft
	Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: SAUCE: ensure root is ready before running usermodehelpers in it	Andy Whitcroft
	Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	AppArmor: compatibility patch for v5 network controll	John Johansen
	Add compatibility for v5 network rules. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
2011-11-16	UBUNTU: SAUCE: Improve Amazon EBS performance for EC2	John Johansen
	OriginalAuthor: Amazona from Ben Howard <behoward@amazon.com> BugLink: http://bugs.launchpad.net/bugs/634316 The pv-ops kernel suffers from poor performance when using Amazon's Elastic block storage (EBS). This patch from Amazon improves pv-ops kernel performance, and has not exhibited any regressions. Signed-off-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: SAUCE: add option to hand off all kernel parameters to init	Andy Whitcroft
	BugLink: http://bugs.launchpad.net/bugs/586386 Some init packages such as upstart find having all of the kernel parameters passed in useful. Currently they have to open up /proc/cmdline and reparse that to obtain this information. Add a kernel configuration option to enable passing of all options. Note, enabling this option will reduce the chances that a fallback from /sbin/init to /bin/bash or /bin/sh will succeed. Though it should be noted that there are commonly unknown options present which would already break this fallback. init=/bin/foo provides explicit control over options which is unaffected by this change. Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
2011-11-16	UBUNTU: SAUCE: async_populate_rootfs: move rootfs init earlier	Andy Whitcroft
	Check to see if the machine has more than one active CPU, if it does then it is worth starting the decode of the rootfs earlier. Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: SAUCE: Make populate_rootfs asynchronous	Surbhi Palande
	The expansion of the initramfs is completely independant of other boot activities. The original data is already present at boot and the filesystem is not required until we are ready to start init. It is therefore reasonable to populate the rootfs asynchronously. Move this processing to an async call. This reduces kernel initialisation time (the time from bootloader to starting userspace) by several 10ths of a second on a selection of test hardware particularly SMP systems, although UP system also benefit. Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: SAUCE: (no-up) trace: add trace events for open(), exec() and uselib()	Scott James Remnant
	BugLink: http://bugs.launchpad.net/bugs/462111 This patch uses TRACE_EVENT to add tracepoints for the open(), exec() and uselib() syscalls so that ureadahead can cheaply trace the boot sequence to determine what to read to speed up the next. It's not upstream because it will need to be rebased onto the syscall trace events whenever that gets merged, and is a stop-gap. Signed-off-by: Scott James Remnant <scott@ubuntu.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
2011-11-16	UBUNTU: SAUCE: (no-up) disable adding scsi headers to linux-libc-dev	Andy Whitcroft
	Currently scsi headers are generated by the kernel and by libc6-dev. We need to coordinate any switch over to the kernel. Temporarily disabled these headers in the kernel package. Signed-off-by: Andy Whitcroft <apw@canonical.com>
2011-11-16	UBUNTU: SAUCE: (no-up) swap: Add notify_swap_entry_free callback for compcache	Tim Gardner
	Code is required for ubuntu/compcache Signed-off-by: Ben Collins <ben.collins@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
2011-11-14	sched: Ensure cpu_power periodic update	Vincent Guittot
	With a lot of small task, the softirq sched is nearly never called when no_hz is enable. Te load_balance is mainly called with the newly_idle mode which doesn't update the cpu_power. Add a next_update field which ensure a maximum update period when there is short activity Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
2011-11-11	Merge commit 'v3.1.1' into linaro-3.1	Nicolas Pitre

2011-11-11	ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning	Bart Van Assche
	commit c1056b42a87b59375f8f81a92ef029165f44fcce upstream. Recently the ACPI ops structs were constified but the inline version of register_hotplug_dock_device() was overlooked (see also commit 9c8b04b, June 25 2011). Update the inline function register_hotplug_dock_device() that is enabled with CONFIG_ACPI_DOCK=n too. This patch fixes at least the following compiler warnings: drivers/ata/libata-acpi.c: In function .ata_acpi_associate.: drivers/ata/libata-acpi.c:266:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops . but argument is of type .const struct acpi_dock_ops . drivers/ata/libata-acpi.c:275:11: warning: passing argument 2 of .register_hotplug_dock_device. discards qualifiers from pointer target type include/acpi/acpi_drivers.h:146:19: note: expected .struct acpi_dock_ops . but argument is of type .const struct acpi_dock_ops . Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	drm/radeon/kms: properly set panel mode for eDP	Alex Deucher
	commit 00dfb8df5bf8c3afe4c0bb8361133156b06b7a2c upstream. This should make eDP more reliable. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	thp: share get_huge_page_tail()	Andrea Arcangeli
	commit b35a35b556f5e6b7993ad0baf20173e75c09ce8c upstream. This avoids duplicating the function in every arch gup_fast. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	ext2,ext3,ext4: don't inherit APPEND_FL or IMMUTABLE_FL for new inodes	Theodore Ts'o
	commit 1cd9f0976aa4606db8d6e3dc3edd0aca8019372a upstream. This doesn't make much sense, and it exposes a bug in the kernel where attempts to create a new file in an append-only directory using O_CREAT will fail (but still leave a zero-length file). This was discovered when xfstests #79 was generalized so it could run on all file systems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	mm: thp: tail page refcounting fix	Andrea Arcangeli
	commit 70b50f94f1644e2aa7cb374819cfd93f3c28d725 upstream. Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	readlinkat: ensure we return ENOENT for the empty pathname for normal lookups	Andy Whitcroft
	commit 1fa1e7f615f4d3ae436fa319af6e4eebdd4026a8 upstream. Since the commit below which added O_PATH support to the *at() calls, the error return for readlink/readlinkat for the empty pathname has switched from ENOENT to EINVAL: commit 65cfc6722361570bfe255698d9cd4dccaf47570d Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sun Mar 13 15:56:26 2011 -0400 readlinkat(), fchownat() and fstatat() with empty relative pathnames This is both unexpected for userspace and makes readlink/readlinkat inconsistant with all other interfaces; and inconsistant with our stated return for these pathnames. As the readlinkat call does not have a flags parameter we cannot use the AT_EMPTY_PATH approach used in the other calls. Therefore expose whether the original path is infact entry via a new user_path_at_empty() path lookup function. Use this to determine whether to default to EINVAL or ENOENT for failures. Addresses http://bugs.launchpad.net/bugs/817187 [akpm@linux-foundation.org: remove unused getname_flags()] Signed-off-by: Andy Whitcroft <apw@canonical.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	mm: avoid null pointer access in vm_struct via /proc/vmallocinfo	Mitsuo Hayasaka
	commit f5252e009d5b87071a919221e4f6624184005368 upstream. The /proc/vmallocinfo shows information about vmalloc allocations in vmlist that is a linklist of vm_struct. It, however, may access pages field of vm_struct where a page was not allocated. This results in a null pointer access and leads to a kernel panic. Why this happens: In __vmalloc_node_range() called from vmalloc(), newly allocated vm_struct is added to vmlist at __get_vm_area_node() and then, some fields of vm_struct such as nr_pages and pages are set at __vmalloc_area_node(). In other words, it is added to vmlist before it is fully initialized. At the same time, when the /proc/vmallocinfo is read, it accesses the pages field of vm_struct according to the nr_pages field at show_numa_info(). Thus, a null pointer access happens. The patch adds the newly allocated vm_struct to the vmlist after it is fully initialized. So, it can avoid accessing the pages field with unallocated page when show_numa_info() is called. Signed-off-by: Mitsuo Hayasaka <mitsuo.hayasaka.hu@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Namhyung Kim <namhyung@gmail.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	io-mapping: ensure io_mapping_map_atomic _is_ atomic	Daniel Vetter
	commit 24dd85ff723f142093f44244764b9b5c152235b8 upstream. For the !HAVE_ATOMIC_IOMAP case the stub functions did not call pagefault_disable/_enable. The i915 driver relies on the map actually being atomic, otherwise it can deadlock with it's own pagefault handler in the gtt pwrite fastpath. This is exercised by gem_mmap_gtt from the intel-gpu-toosl gem testsuite. v2: Chris Wilson noted the lack of an include. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38115 Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	genirq: Add IRQF_RESUME_EARLY and resume such IRQs earlier	Ian Campbell
	commit 9bab0b7fbaceec47d32db51cd9e59c82fb071f5a upstream. This adds a mechanism to resume selected IRQs during syscore_resume instead of dpm_resume_noirq. Under Xen we need to resume IRQs associated with IPIs early enough that the resched IPI is unmasked and we can therefore schedule ourselves out of the stop_machine where the suspend/resume takes place. This issue was introduced by 676dc3cf5bc3 "xen: Use IRQF_FORCE_RESUME". Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Jeremy Fitzhardinge <Jeremy.Fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Link: http://lkml.kernel.org/r/1318713254.11016.52.camel@dagon.hellion.org.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	time: Change jiffies_to_clock_t() argument type to unsigned long	hank
	commit cbbc719fccdb8cbd87350a05c0d33167c9b79365 upstream. The parameter's origin type is long. On an i386 architecture, it can easily be larger than 0x80000000, causing this function to convert it to a sign-extended u64 type. Change the type to unsigned long so we get the correct result. Signed-off-by: hank <pyu@redhat.com> Cc: John Stultz <john.stultz@linaro.org> [ build fix ] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	net: hold sock reference while processing tx timestamps	Richard Cochran
	commit da92b194cc36b5dc1fbd85206aeeffd80bee0c39 upstream. The pair of functions, * skb_clone_tx_timestamp() * skb_complete_tx_timestamp() were designed to allow timestamping in PHY devices. The first function, called during the MAC driver's hard_xmit method, identifies PTP protocol packets, clones them, and gives them to the PHY device driver. The PHY driver may hold onto the packet and deliver it at a later time using the second function, which adds the packet to the socket's error queue. As pointed out by Johannes, nothing prevents the socket from disappearing while the cloned packet is sitting in the PHY driver awaiting a timestamp. This patch fixes the issue by taking a reference on the socket for each such packet. In addition, the comments regarding the usage of these function are expanded to highlight the rule that PHY drivers must use skb_complete_tx_timestamp() to release the packet, in order to release the socket reference, too. These functions first appeared in v2.6.36. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	SUNRPC/NFS: make rpc pipe upcall generic	Peng Tao
	commit c1225158a8dad9e9d5eee8a17dbbd9c7cda05ab9 upstream. The same function is used by idmap, gss and blocklayout code. Make it generic. Signed-off-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	USB: fix ehci alignment error	Harro Haan
	commit 276532ba9666b36974cbe16f303fc8be99c9da17 upstream. The Kirkwood gave an unaligned memory access error on line 742 of drivers/usb/host/echi-hcd.c: "ehci->last_periodic_enable = ktime_get_real();" Signed-off-by: Harro Haan <hrhaan@gmail.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-11-11	TTY: make tty_add_file non-failing	Jiri Slaby
	commit fa90e1c935472281de314e6d7c9a37db9cbc2e4e upstream. If tty_add_file fails at the point it is now, we have to revert all the changes we did to the tty. It means either decrease all refcounts if this was a tty reopen or delete the tty if it was newly allocated. There was a try to fix this in v3.0-rc2 using tty_release in 0259894c7 (TTY: fix fail path in tty_open). But instead it introduced a NULL dereference. It's because tty_release dereferences filp->private_data, but that one is set even in our tty_add_file. And when tty_add_file fails, it's still NULL/garbage. Hence tty_release cannot be called there. To circumvent the original leak (and the current NULL deref) we split tty_add_file into two functions, making the latter non-failing. In that case we may do the former early in open, where handling failures is easy. The latter stays as it is now. So there is no change in functionality. The original bug (leak) was introduced by f573bd176 (tty: Remove __GFP_NOFAIL from tty_add_file()). Thanks Dan for reporting this. Later, we may split tty_release into more functions and call only some of them in this fail path instead. (If at all possible.) Introduced-in: v2.6.37-rc2 Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-10-24	Merge branch 'devicetree/arm-linaro-3.1' of ↵	Nicolas Pitre
	git://git.secretlab.ca/git/linux-2.6 into linaro-3.1
2011-10-24	Merge commit 'v3.1' into linaro-3.1	Nicolas Pitre

2011-10-24	dt: Add id to AUXDATA structure	John Bonesio
	This patch adds the ability to set the device id in the AUXDATA structure for those few device drivers that just have to have a statically defined device id. Signed-off-by: John Bonesio <bones@secretlab.ca> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-24	Merge commit 'v3.1' into devicetree/next	Grant Likely

2011-10-24	dt: Add empty of_match_node() macro	Nicolas Ferre
	Add an empty macro for of_match_node() that will save some '#ifdef CONFIG_OF' for non-dt builds. I have chosen to use a macro instead of a function to be able to avoid defining the first parameter. In fact, this "struct of_device_id *" first parameter is usualy not defined as well on non-dt builds. Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com> Acked-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-17	Merge branch 'nf' of git://1984.lsi.us.es/net	David S. Miller

2011-10-17	udplite: fast-path computation of checksum coverage	Gerrit Renker
	Commit 903ab86d195cca295379699299c5fc10beba31c7 of 1 March this year ("udp: Add lockless transmit path") introduced a new fast TX path that broke the checksum coverage computation of UDP-lite, which so far depended on up->len (only set if the socket is locked and 0 in the fast path). Fixed by providing both fast- and slow-path computation of checksum coverage. The latter can be removed when UDP(-lite)v6 also uses a lockless transmit path. Reported-by: Thomas Volkert <thomas@homer-conferencing.com> Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-17	Merge remote-tracking branch 'rmk/devel-stable' into linaro-3.1	Nicolas Pitre

2011-10-13	dt: add empty dt helpers for non-dt build	Rajendra Nayak
	Add empty of_device_is_compatible() and of_parse_phandle() for non-dt builds to work. Signed-off-by: Rajendra Nayak <rnayak@ti.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2011-10-12	IPVS netns shutdown/startup dead-lock	Hans Schillstrom
	ip_vs_mutext is used by both netns shutdown code and startup and both implicit uses sk_lock-AF_INET mutex. cleanup CPU-1 startup CPU-2 ip_vs_dst_event() ip_vs_genl_set_cmd() sk_lock-AF_INET __ip_vs_mutex sk_lock-AF_INET __ip_vs_mutex * DEAD LOCK * A new mutex placed in ip_vs netns struct called sync_mutex is added. Comments from Julian and Simon added. This patch has been running for more than 3 month now and it seems to work. Ver. 3 IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex instead of __ip_vs_mutex as sugested by Julian. Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>