aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2015-06-29Merge tag 'v3.18.16' of ↵Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.18-rt Linux 3.18.16 * tag 'v3.18.16' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (394 commits) Linux 3.18.16 arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug md/raid0: fix restore to sector variable in raid0_make_request Linux 3.18.15 ARM: OMAP3: Fix booting with thumb2 kernel xfrm: release dst_orig in case of error in xfrm_lookup() ARC: unbork !LLSC build power/reset: at91: fix return value check in at91_reset_platform_probe() vfs: read file_handle only once in handle_to_path drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling" drm/radeon: don't share plls if monitors differ in audio support drm/radeon: retry dcpd fetch drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling drm/radeon: add new bonaire pci id iwlwifi: pcie: prevent using unmapped memory in fw monitor ACPI / init: Fix the ordering of acpi_reserve_resources() sd: Disable support for 256 byte/sector disks storvsc: Set the SRB flags correctly when no data transfer is needed rtlwifi: rtl8192cu: Fix kernel deadlock md/raid5: don't record new size if resize_stripes fails. ...
2015-06-10xfrm: release dst_orig in case of error in xfrm_lookup()huaibin Wang
[ Upstream commit ac37e2515c1a89c477459a2020b6bfdedabdb91b ] dst_orig should be released on error. Function like __xfrm_route_forward() expects that behavior. Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(), which expects the opposite. Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be done in case of error. Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10drm/radeon: add new bonaire pci idAlex Deucher
[ Upstream commit fcf3b54282e4c5a95a1f45f67558bc105acdbc6a ] Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10sched: Handle priority boosted tasks proper in setscheduler()Thomas Gleixner
[ Upstream commit 0782e63bc6fe7e2d3408d250df11d388b7799c6b ] Ronny reported that the following scenario is not handled correctly: T1 (prio = 10) lock(rtmutex); T2 (prio = 20) lock(rtmutex) boost T1 T1 (prio = 20) sys_set_scheduler(prio = 30) T1 prio = 30 .... sys_set_scheduler(prio = 10) T1 prio = 30 The last step is wrong as T1 should now be back at prio 20. Commit c365c292d059 ("sched: Consider pi boosting in setscheduler()") only handles the case where a boosted tasks tries to lower its priority. Fix it by taking the new effective priority into account for the decision whether a change of the priority is required. Reported-by: Ronny Meeus <ronny.meeus@gmail.com> Tested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Fixes: c365c292d059 ("sched: Consider pi boosting in setscheduler()") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1505051806060.4225@nanos Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10libata: Ignore spurious PHY event on LPM policy changeGabriele Mazzotta
[ Upstream commit 09c5b4803a80a5451d950d6a539d2eb311dc0fb1 ] When the LPM policy is set to ATA_LPM_MAX_POWER, the device might generate a spurious PHY event that cuases errors on the link. Ignore this event if it occured within 10s after the policy change. The timeout was chosen observing that on a Dell XPS13 9333 these spurious events can occur up to roughly 6s after the policy change. Link: http://lkml.kernel.org/g/3352987.ugV1Ipy7Z5@xps13 Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-10libata: Add helper to determine when PHY events should be ignoredGabriele Mazzotta
[ Upstream commit 8393b811f38acdf7fd8da2028708edad3e68ce1f ] This is a preparation commit that will allow to add other criteria according to which PHY events should be dropped. Signed-off-by: Gabriele Mazzotta <gabriele.mzt@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09xen/events: don't bind non-percpu VIRQs with percpu chipDavid Vrabel
[ Upstream commit 77bb3dfdc0d554befad58fdefbc41be5bc3ed38a ] A non-percpu VIRQ (e.g., VIRQ_CONSOLE) may be freed on a different VCPU than it is bound to. This can result in a race between handle_percpu_irq() and removing the action in __free_irq() because handle_percpu_irq() does not take desc->lock. The interrupt handler sees a NULL action and oopses. Only use the percpu chip/handler for per-CPU VIRQs (like VIRQ_TIMER). # cat /proc/interrupts | grep virq 40: 87246 0 xen-percpu-virq timer0 44: 0 0 xen-percpu-virq debug0 47: 0 20995 xen-percpu-virq timer1 51: 0 0 xen-percpu-virq debug1 69: 0 0 xen-dyn-virq xen-pcpu 74: 0 0 xen-dyn-virq mce 75: 29 0 xen-dyn-virq hvc_console Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ktime: Fix ktime_divns to do signed divisionJohn Stultz
[ Upstream commit f7bcb70ebae0dcdb5a2d859b09e4465784d99029 ] It was noted that the 32bit implementation of ktime_divns() was doing unsigned division and didn't properly handle negative values. And when a ktime helper was changed to utilize ktime_divns, it caused a regression on some IR blasters. See the following bugzilla for details: https://bugzilla.redhat.com/show_bug.cgi?id=1200353 This patch fixes the problem in ktime_divns by checking and preserving the sign bit, and then reapplying it if appropriate after the division, it also changes the return type to a s64 to make it more obvious this is expected. Nicolas also pointed out that negative dividers would cause infinite loops on 32bit systems, negative dividers is unlikely for users of this function, but out of caution this patch adds checks for negative dividers for both 32-bit (BUG_ON) and 64-bit(WARN_ON) versions to make sure no such use cases creep in. [ tglx: Hand an u64 to do_div() to avoid the compiler warning ] Fixes: 166afb64511e 'ktime: Sanitize ktime_to_us/ms conversion' Reported-and-tested-by: Trevor Cordes <trevor@tecnopolis.ca> Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1431118043-23452-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ktime: Optimize ktime_divns for constant divisorsNicolas Pitre
[ Upstream commit 8b618628b2bf83512fc8df5e8672619d65adfdfb ] At least on ARM, do_div() is optimized to turn constant divisors into an inline multiplication by the reciprocal value at compile time. However this optimization is missed entirely whenever ktime_divns() is used and the slow out-of-line division code is used all the time. Let ktime_divns() use do_div() inline whenever the divisor is constant and small enough. This will make things like ktime_to_us() and ktime_to_ms() much faster. Cc: Arnd Bergmann <arnd.bergmann@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-06-09ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address.Lv Zheng
[ Upstream commit f254e3c57b9d952e987502aefa0804c177dd2503 ] ACPICA commit 7d9fd64397d7c38899d3dc497525f6e6b044e0e3 OSPMs like Linux expect an acpi_physical_address returning value from acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't equal to sizeof (acpi_physical_address): drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default] In file included from include/acpi/acpi.h:64:0, from include/linux/acpi.h:36, from drivers/acpi/osl.c:41: include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *' This patch corrects acpi_find_root_pointer(). Link: https://github.com/acpica/acpica/commit/7d9fd643 Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-23nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()Ryusuke Konishi
[ Upstream commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 ] The range check for b-tree level parameter in nilfs_btree_root_broken() is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even though the level is limited to values in the range of 0 to (NILFS_BTREE_LEVEL_MAX - 1). Since the level parameter is read from storage device and used to index nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it can cause memory overrun during btree operations if the boundary value is set to the level parameter on device. This fixes the broken sanity check and adds a comment to clarify that the upper bound NILFS_BTREE_LEVEL_MAX is exclusive. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17uas: Add US_FL_MAX_SECTORS_240 flagHans de Goede
[ Upstream commit ee136af4a064c2f61e2025873584d2c7ec93f4ae ] The usb-storage driver sets max_sectors = 240 in its scsi-host template, for uas we do not want to do that for all devices, but testing has shown that some devices need it. This commit adds a US_FL_MAX_SECTORS_240 flag for such devices, and implements support for it in uas.c, while at it it also adds support for US_FL_MAX_SECTORS_64 to uas.c. Cc: stable@vger.kernel.org # 3.16 Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLECharles Keepax
[ Upstream commit a2d97723cb3a7741af81868427b36bba274b681b ] Correct small copy and paste error where autodisable was not being enabled for the SOC_DAPM_SINGLE_TLV_AUTODISABLE control. Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com> Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ALSA: emu10k1: Emu10k2 32 bit DMA modePeter Zubaj
[ Upstream commit 7241ea558c6715501e777396b5fc312c372e11d9 ] Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two modes for DMA. Second mode is useful for 64 bit os with more then 2 GB of ram (fixes problems with big soundfont loading) 1) 32MB from 2 GB address space using 8192 pages (used now as default) 2) 16MB from 4 GB address space using 4096 pages Mode is set using HCFG_EXPANDED_MEM flag in HCFG register. Also format of emu10k2 page table is then different. Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17mm/hugetlb: take page table lock in follow_huge_pmd()Naoya Horiguchi
[ Upstream commit e66f17ff71772b209eed39de35aaa99ba819c93d ] We have a race condition between move_pages() and freeing hugepages, where move_pages() calls follow_page(FOLL_GET) for hugepages internally and tries to get its refcount without preventing concurrent freeing. This race crashes the kernel, so this patch fixes it by moving FOLL_GET code for hugepages into follow_huge_pmd() with taking the page table lock. This patch intentionally removes page==NULL check after pte_page. This is justified because pte_page() never returns NULL for any architectures or configurations. This patch changes the behavior of follow_huge_pmd() for tail pages and then tail pages can be pinned/returned. So the caller must be changed to properly handle the returned tail pages. We could have a choice to add the similar locking to follow_huge_(addr|pud) for consistency, but it's not necessary because currently these functions don't support FOLL_GET flag, so let's leave it for future development. Here is the reproducer: $ cat movepages.c #include <stdio.h> #include <stdlib.h> #include <numaif.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 #define PS 0x1000 int main(int argc, char *argv[]) { int i; int nr_hp = strtol(argv[1], NULL, 0); int nr_p = nr_hp * HPS / PS; int ret; void **addrs; int *status; int *nodes; pid_t pid; pid = strtol(argv[2], NULL, 0); addrs = malloc(sizeof(char *) * nr_p + 1); status = malloc(sizeof(char *) * nr_p + 1); nodes = malloc(sizeof(char *) * nr_p + 1); while (1) { for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 1; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); for (i = 0; i < nr_p; i++) { addrs[i] = (void *)ADDR_INPUT + i * PS; nodes[i] = 0; status[i] = 0; } ret = numa_move_pages(pid, nr_p, addrs, nodes, status, MPOL_MF_MOVE_ALL); if (ret == -1) err("move_pages"); } return 0; } $ cat hugepage.c #include <stdio.h> #include <sys/mman.h> #include <string.h> #define ADDR_INPUT 0x700000000000UL #define HPS 0x200000 int main(int argc, char *argv[]) { int nr_hp = strtol(argv[1], NULL, 0); char *p; while (1) { p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); if (p != (void *)ADDR_INPUT) { perror("mmap"); break; } memset(p, 0, nr_hp * HPS); munmap(p, nr_hp * HPS); } } $ sysctl vm.nr_hugepages=40 $ ./hugepage 10 & $ ./movepages 10 $(pgrep -f hugepage) Fixes: e632a938d914 ("mm: migrate: add hugepage migration code to move_pages()") Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ebpf: verifier: check that call reg with ARG_ANYTHING is initializedDaniel Borkmann
[ Upstream commit 80f1d68ccba70b1060c9c7360ca83da430f66bed ] I noticed that a helper function with argument type ARG_ANYTHING does not need to have an initialized value (register). This can worst case lead to unintented stack memory leakage in future helper functions if they are not carefully designed, or unintended application behaviour in case the application developer was not careful enough to match a correct helper function signature in the API. The underlying issue is that ARG_ANYTHING should actually be split into two different semantics: 1) ARG_DONTCARE for function arguments that the helper function does not care about (in other words: the default for unused function arguments), and 2) ARG_ANYTHING that is an argument actually being used by a helper function and *guaranteed* to be an initialized register. The current risk is low: ARG_ANYTHING is only used for the 'flags' argument (r4) in bpf_map_update_elem() that internally does strict checking. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17ACPICA: Utilities: split IO address types from data type models.Lv Zheng
[ Upstream commit 2b8760100e1de69b6ff004c986328a82947db4ad ] ACPICA commit aacf863cfffd46338e268b7415f7435cae93b451 It is reported that on a physically 64-bit addressed machine, 32-bit kernel can trigger crashes in accessing the memory regions that are beyond the 32-bit boundary. The region field's start address should still be 32-bit compliant, but after a calculation (adding some offsets), it may exceed the 32-bit boundary. This case is rare and buggy, but there are real BIOSes leaked with such issues (see References below). This patch fixes this gap by always defining IO addresses as 64-bit, and allows OSPMs to optimize it for a real 32-bit machine to reduce the size of the internal objects. Internal acpi_physical_address usages in the structures that can be fixed by this change include: 1. struct acpi_object_region: acpi_physical_address address; 2. struct acpi_address_range: acpi_physical_address start_address; acpi_physical_address end_address; 3. struct acpi_mem_space_context; acpi_physical_address address; 4. struct acpi_table_desc acpi_physical_address address; See known issues 1 for other usages. Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer from same problem, so this patch changes it accordingly. For iasl, it will enforce acpi_physical_address as 32-bit to generate 32-bit OSPM compatible tables on 32-bit platforms, we need to define ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h. Known issues: 1. Cleanup of mapped virtual address In struct acpi_mem_space_context, acpi_physical_address is used as a virtual address: acpi_physical_address mapped_physical_address; It is better to introduce acpi_virtual_address or use acpi_size instead. This patch doesn't make such a change. Because this should be done along with a change to acpi_os_map_memory()/acpi_os_unmap_memory(). There should be no functional problem to leave this unchanged except that only this structure is enlarged unexpectedly. Link: https://github.com/acpica/acpica/commit/aacf863c Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971 Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501 Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net> Reported-and-tested-by: Sial Nije <sialnije@gmail.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17target: Fix COMPARE_AND_WRITE with SG_TO_MEM_NOALLOC handlingNicholas Bellinger
[ Upstream commit c8e639852ad720499912acedfd6b072325fd2807 ] This patch fixes a bug for COMPARE_AND_WRITE handling with fabrics using SCF_PASSTHROUGH_SG_TO_MEM_NOALLOC. It adds the missing allocation for cmd->t_bidi_data_sg within transport_generic_new_cmd() that is used by COMPARE_AND_WRITE for the initial READ payload, even if the fabric is already providing a pre-allocated buffer for cmd->t_data_sg. Also, fix zero-length COMPARE_AND_WRITE handling within the compare_and_write_callback() and target_complete_ok_work() to queue the response, skipping the initial READ. This fixes COMPARE_AND_WRITE emulation with loopback, vhost, and xen-backend fabric drivers using SG_TO_MEM_NOALLOC. Reported-by: Christoph Hellwig <hch@lst.de> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # v3.12+ Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-17usb: define a generic USB_RESUME_TIMEOUT macroFelipe Balbi
[ Upstream commit 62f0342de1f012f3e90607d39e20fce811391169 ] Every USB Host controller should use this new macro to define for how long resume signalling should be driven on the bus. Currently, almost every single USB controller is using a 20ms timeout for resume signalling. That's problematic for two reasons: a) sometimes that 20ms timer expires a little before 20ms, which makes us fail certification b) some (many) devices actually need more than 20ms resume signalling. Sure, in case of (b) we can state that the device is against the USB spec, but the fact is that we have no control over which device the certification lab will use. We also have no control over which host they will use. Most likely they'll be using a Windows PC which, again, we have no control over how that USB stack is written and how long resume signalling they are using. At the end of the day, we must make sure Linux passes electrical compliance when working as Host or as Device and currently we don't pass compliance as host because we're driving resume signallig for exactly 20ms and that confuses certification test setup resulting in Certification failure. Cc: <stable@vger.kernel.org> # v3.10+ Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Peter Chen <peter.chen@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-15Merge remote-tracking branch 'origin/linux-linaro-lsk-v3.18' into ↵Kevin Hilman
linux-linaro-lsk-v3.18-rt
2015-05-13Merge tag 'v3.18.11-rt7-lno1' of ↵Kevin Hilman
git://git.linaro.org/people/anders.roxell/linux-rt into linux-linaro-lsk-v3.18-rt Linux 3.18.11-rt7 * tag 'v3.18.11-rt7-lno1' of git://git.linaro.org/people/anders.roxell/linux-rt: (2276 commits) localversion.patch workqueue: Prevent deadlock/stall on RT sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq() md: disable bcache rt,ntp: Move call to schedule_delayed_work() to helper thread scheduling while atomic in cgroup code cgroups: use simple wait in css_release() a few open coded completions completion: Use simple wait queues rcu-more-swait-conversions.patch kernel/treercu: use a simple waitqueue work-simple: Simple work queue implemenation simple-wait: rename and export the equivalent of waitqueue_active() wait-simple: Rework for use with completions wait-simple: Simple waitqueue implementation wait.h: include atomic.h drm/i915: drop trace_i915_gem_ring_dispatch on rt gpu/i915: don't open code these things mmc: sdhci: don't provide hard irq handler mmci: Remove bogus local_irq_save() ...
2015-05-12Merge branch 'v3.18/topic/coresight' into linux-linaro-lsk-v3.18Kevin Hilman
* v3.18/topic/coresight: (42 commits) coresight: moving to new "hwtracing" directory coresight-tmc: Adding a status interface to sysfs coresight: remove the unnecessary configuration coresight-default-sink coresight: adding the LINKSINK block as a sink type coresight: Correcting documentation typographical error coresight: Adding coresight support for arm64 architecture coresight: fixing compilation warnings picked up by 64bit compiler coresight: making cpu index lookup arm64 compliant coresight: fix function etm_writel_cp14() parameter order coresight-etm: remove check for unknown Kconfig macro coresight: fixing CPU hwid lookup in device tree coresight: remove the unnecessary function coresight_is_bit_set() coresight: fix the debug AMBA bus name coresight: remove the extra spaces coresight: fix the link between orphan connection and newly added device coresight: remove the unnecessary replicator property coresight: fix the replicator subtype value coresight: fixing validity check on remote device coresight: fix comment in of_coresight.c coresight: Fixing wrong #ifdef/#endif placement ...
2015-05-11net: fix crash in build_skb()Eric Dumazet
[ Upstream commit 2ea2f62c8bda242433809c7f4e9eae1c52c40bbe ] When I added pfmemalloc support in build_skb(), I forgot netlink was using build_skb() with a vmalloc() area. In this patch I introduce __build_skb() for netlink use, and build_skb() is a wrapper handling both skb->head_frag and skb->pfmemalloc This means netlink no longer has to hack skb->head_frag [ 1567.700067] kernel BUG at arch/x86/mm/physaddr.c:26! [ 1567.700067] invalid opcode: 0000 [#1] PREEMPT SMP KASAN [ 1567.700067] Dumping ftrace buffer: [ 1567.700067] (ftrace buffer empty) [ 1567.700067] Modules linked in: [ 1567.700067] CPU: 9 PID: 16186 Comm: trinity-c182 Not tainted 4.0.0-next-20150424-sasha-00037-g4796e21 #2167 [ 1567.700067] task: ffff880127efb000 ti: ffff880246770000 task.ti: ffff880246770000 [ 1567.700067] RIP: __phys_addr (arch/x86/mm/physaddr.c:26 (discriminator 3)) [ 1567.700067] RSP: 0018:ffff8802467779d8 EFLAGS: 00010202 [ 1567.700067] RAX: 000041000ed8e000 RBX: ffffc9008ed8e000 RCX: 000000000000002c [ 1567.700067] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffffb3fd6049 [ 1567.700067] RBP: ffff8802467779f8 R08: 0000000000000019 R09: ffff8801d0168000 [ 1567.700067] R10: ffff8801d01680c7 R11: ffffed003a02d019 R12: ffffc9000ed8e000 [ 1567.700067] R13: 0000000000000f40 R14: 0000000000001180 R15: ffffc9000ed8e000 [ 1567.700067] FS: 00007f2a7da3f700(0000) GS:ffff8801d1000000(0000) knlGS:0000000000000000 [ 1567.700067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1567.700067] CR2: 0000000000738308 CR3: 000000022e329000 CR4: 00000000000007e0 [ 1567.700067] Stack: [ 1567.700067] ffffc9000ed8e000 ffff8801d0168000 ffffc9000ed8e000 ffff8801d0168000 [ 1567.700067] ffff880246777a28 ffffffffad7c0a21 0000000000001080 ffff880246777c08 [ 1567.700067] ffff88060d302e68 ffff880246777b58 ffff880246777b88 ffffffffad9a6821 [ 1567.700067] Call Trace: [ 1567.700067] build_skb (include/linux/mm.h:508 net/core/skbuff.c:316) [ 1567.700067] netlink_sendmsg (net/netlink/af_netlink.c:1633 net/netlink/af_netlink.c:2329) [ 1567.774369] ? sched_clock_cpu (kernel/sched/clock.c:311) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] ? netlink_unicast (net/netlink/af_netlink.c:2273) [ 1567.774369] sock_sendmsg (net/socket.c:614 net/socket.c:623) [ 1567.774369] sock_write_iter (net/socket.c:823) [ 1567.774369] ? sock_sendmsg (net/socket.c:806) [ 1567.774369] __vfs_write (fs/read_write.c:479 fs/read_write.c:491) [ 1567.774369] ? get_lock_stats (kernel/locking/lockdep.c:249) [ 1567.774369] ? default_llseek (fs/read_write.c:487) [ 1567.774369] ? vtime_account_user (kernel/sched/cputime.c:701) [ 1567.774369] ? rw_verify_area (fs/read_write.c:406 (discriminator 4)) [ 1567.774369] vfs_write (fs/read_write.c:539) [ 1567.774369] SyS_write (fs/read_write.c:586 fs/read_write.c:577) [ 1567.774369] ? SyS_read (fs/read_write.c:577) [ 1567.774369] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63) [ 1567.774369] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2594 kernel/locking/lockdep.c:2636) [ 1567.774369] ? trace_hardirqs_on_thunk (arch/x86/lib/thunk_64.S:42) [ 1567.774369] system_call_fastpath (arch/x86/kernel/entry_64.S:261) Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11net: add skb_checksum_complete_unsetTom Herbert
[ Upstream commit 4e18b9adf2f910ec4d30b811a74a5b626e6c6125 ] This function changes ip_summed to CHECKSUM_NONE if CHECKSUM_COMPLETE is set. This is called to discard checksum-complete when packet is being modified and checksum is not pulled for headers in a layer. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Keep elrsr/aisr in sync with software modelChristoffer Dall
commit ae705930fca6322600690df9dc1c7d0516145a93 upstream. There is an interesting bug in the vgic code, which manifests itself when the KVM run loop has a signal pending or needs a vmid generation rollover after having disabled interrupts but before actually switching to the guest. In this case, we flush the vgic as usual, but we sync back the vgic state and exit to userspace before entering the guest. The consequence is that we will be syncing the list registers back to the software model using the GICH_ELRSR and GICH_EISR from the last execution of the guest, potentially overwriting a list register containing an interrupt. This showed up during migration testing where we would capture a state where the VM has masked the arch timer but there were no interrupts, resulting in a hung test. Cc: Marc Zyngier <marc.zyngier@arm.com> Reported-by: Alex Bennee <alex.bennee@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: Require in-kernel vgic for the arch timersChristoffer Dall
commit 05971120fca43e0357789a14b3386bb56eef2201 upstream. It is curently possible to run a VM with architected timers support without creating an in-kernel VGIC, which will result in interrupts from the virtual timer going nowhere. To address this issue, move the architected timers initialization to the time when we run a VCPU for the first time, and then only initialize (and enable) the architected timers if we have a properly created and initialized in-kernel VGIC. When injecting interrupts from the virtual timer to the vgic, the current setup should ensure that this never calls an on-demand init of the VGIC, which is the only call path that could return an error from kvm_vgic_inject_irq(), so capture the return value and raise a warning if there's an error there. We also change the kvm_timer_init() function from returning an int to be a void function, since the function always succeeds. Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11arm/arm64: KVM: vgic: move reset initialization into vgic_init_maps()Peter Maydell
commit 6d3cfbe21bef5b66530b50ad16c88fdc71a04c35 upstream. VGIC initialization currently happens in three phases: (1) kvm_vgic_create() (triggered by userspace GIC creation) (2) vgic_init_maps() (triggered by userspace GIC register read/write requests, or from kvm_vgic_init() if not already run) (3) kvm_vgic_init() (triggered by first VM run) We were doing initialization of some state to correspond with the state of a freshly-reset GIC in kvm_vgic_init(); this is too late, since it will overwrite changes made by userspace using the register access APIs before the VM is run. Move this initialization earlier, into the vgic_init_maps() phase. This fixes a bug where QEMU could successfully restore a saved VM state snapshot into a VM that had already been run, but could not restore it "from cold" using the -loadvm command line option (the symptoms being that the restored VM would run but interrupts were ignored). Finally rename vgic_init_maps to vgic_init and renamed kvm_vgic_init to kvm_vgic_map_resources. [ This patch is originally written by Peter Maydell, but I have modified it somewhat heavily, renaming various bits and moving code around. If something is broken, I am to be blamed. - Christoffer ] Acked-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-11kvm: add a memslot flag for incoherent memory regionsArd Biesheuvel
commit 1050dcda3052912984b26fb6d2695a3f41792000 upstream. Memory regions may be incoherent with the caches, typically when the guest has mapped a host system RAM backed memory region as uncached. Add a flag KVM_MEMSLOT_INCOHERENT so that we can tag these memslots and handle them appropriately when mapping them. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
2015-05-08cgroups: use simple wait in css_release()Sebastian Andrzej Siewior
To avoid: |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 |in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11 |2 locks held by rcuc/11/92: | #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940 | #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0 |Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 |CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1 | ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000 | 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000 | ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80 |Call Trace: | [<ffffffff815b3a12>] dump_stack+0x4f/0x7c | [<ffffffff8109aa16>] __might_sleep+0x116/0x190 | [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60 | [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0 | [<ffffffff8110c881>] css_release+0x81/0x90 | [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0 | [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 | [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940 | [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0 | [<ffffffff810948d8>] kthread+0xe8/0x100 | [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08completion: Use simple wait queuesThomas Gleixner
Completions have no long lasting callbacks and therefor do not need the complex waitqueue variant. Use simple waitqueues which reduces the contention on the waitqueue lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08work-simple: Simple work queue implemenationDaniel Wagner
Provides a framework for enqueuing callbacks from irq context PREEMPT_RT_FULL safe. The callbacks are executed in kthread context. Bases on wait-simple. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08simple-wait: rename and export the equivalent of waitqueue_active()Paul Gortmaker
The function "swait_head_has_waiters()" was internalized into wait-simple.c but it parallels the waitqueue_active of normal waitqueue support. Given that there are over 150 waitqueue_active users in drivers/ fs/ kernel/ and the like, lets make it globally visible, and rename it to parallel the waitqueue_active accordingly. We'll need to do this if we expect to expand its usage beyond RT. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08wait-simple: Rework for use with completionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08wait-simple: Simple waitqueue implementationThomas Gleixner
wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> smp_mb() added by Steven Rostedt to fix a race condition with swait wakeups vs adding items to the list. Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08wait.h: include atomic.hSebastian Andrzej Siewior
| CC init/main.o |In file included from include/linux/mmzone.h:9:0, | from include/linux/gfp.h:4, | from include/linux/kmod.h:22, | from include/linux/module.h:13, | from init/main.c:15: |include/linux/wait.h: In function ‘wait_on_atomic_t’: |include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration] | if (atomic_read(val) == 0) | ^ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08sched: Add support for lazy preemptionThomas Gleixner
It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08softirq: Split softirq locksThomas Gleixner
The 3.x RT series removed the split softirq implementation in favour of pushing softirq processing into the context of the thread which raised it. Though this prevents us from handling the various softirqs at different priorities. Now instead of reintroducing the split softirq threads we split the locks which serialize the softirq processing. If a softirq is raised in context of a thread, then the softirq is noted on a per thread field, if the thread is in a bh disabled region. If the softirq is raised from hard interrupt context, then the bit is set in the flag field of ksoftirqd and ksoftirqd is invoked. When a thread leaves a bh disabled region, then it tries to execute the softirqs which have been raised in its own context. It acquires the per softirq / per cpu lock for the softirq and then checks, whether the softirq is still pending in the per cpu local_softirq_pending() field. If yes, it runs the softirq. If no, then some other task executed it already. This allows for zero config softirq elevation in the context of user space tasks or interrupt threads. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08softirq: Make serving softirqs a task flagThomas Gleixner
Avoid the percpu softirq_runner pointer magic by using a task flag. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08cpu/rt: Rework cpu down for PREEMPT_RTSteven Rostedt
Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08seqlock: consolidate spin_lock/unlock waiting with spin_unlock_waitNicholas Mc Guire
since c2f21ce ("locking: Implement new raw_spinlock") include/linux/spinlock.h includes spin_unlock_wait() to wait for a concurren holder of a lock. this patch just moves over to that API. spin_unlock_wait covers both raw_spinlock_t and spinlock_t so it should be safe here as well. the added rt-variant of read_seqbegin in include/linux/seqlock.h that is being modified, was introduced by patch: seqlock-prevent-rt-starvation.patch behavior should be unchanged. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08seqlock: Prevent rt starvationThomas Gleixner
If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. For seqcount writers we disable preemption over the update code path. Thaanks to Al Viro for distangling some VFS code to make that possible. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08random: Make it work on rtThomas Gleixner
Delegate the random insertion to the forced threaded interrupt handler. Store the return IP of the hard interrupt handler in the irq descriptor and feed it into the random generator as a source of entropy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08acpi/rt: Convert acpi_gbl_hardware lock back to a raw_spinlock_tSteven Rostedt
We hit the following bug with 3.6-rt: [ 5.898990] BUG: scheduling while atomic: swapper/3/0/0x00000002 [ 5.898991] no locks held by swapper/3/0. [ 5.898993] Modules linked in: [ 5.898996] Pid: 0, comm: swapper/3 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.898997] Call Trace: [ 5.899011] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899028] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899032] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899034] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899036] BUG: scheduling while atomic: swapper/7/0/0x00000002 [ 5.899037] no locks held by swapper/7/0. [ 5.899039] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899040] Modules linked in: [ 5.899041] [ 5.899045] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899046] Pid: 0, comm: swapper/7 Not tainted 3.6.11-rt28.19.el6rt.x86_64.debug #1 [ 5.899047] Call Trace: [ 5.899049] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899052] [<ffffffff810804e7>] __schedule_bug+0x67/0x90 [ 5.899054] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899056] [<ffffffff81577923>] __schedule+0x793/0x7a0 [ 5.899059] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899062] [<ffffffff810b4e40>] ? debug_rt_mutex_print_deadlock+0x50/0x200 [ 5.899068] [<ffffffff8130be64>] acpi_write_bit_register+0x33/0xb0 [ 5.899071] [<ffffffff81577b89>] schedule+0x29/0x70 [ 5.899072] [<ffffffff8130be13>] ? acpi_read_bit_register+0x33/0x51 [ 5.899074] [<ffffffff81578525>] rt_spin_lock_slowlock+0xe5/0x2f0 [ 5.899077] [<ffffffff8131d1fc>] acpi_idle_enter_bm+0x8a/0x28e [ 5.899079] [<ffffffff81579a58>] ? _raw_spin_unlock_irqrestore+0x38/0x90 [ 5.899081] [<ffffffff8107e5da>] ? this_cpu_load+0x1a/0x30 [ 5.899083] [<ffffffff81578bc6>] rt_spin_lock+0x16/0x40 [ 5.899087] [<ffffffff8144c759>] cpuidle_enter+0x19/0x20 [ 5.899088] [<ffffffff8157d3f0>] ? notifier_call_chain+0x80/0x80 [ 5.899090] [<ffffffff8144c777>] cpuidle_enter_state+0x17/0x50 [ 5.899092] [<ffffffff812f2034>] acpi_os_acquire_lock+0x1f/0x23 [ 5.899094] [<ffffffff8144d1a1>] cpuidle899101] [<ffffffff8130be13>] ? As the acpi code disables interrupts in acpi_idle_enter_bm, and calls code that grabs the acpi lock, it causes issues as the lock is currently in RT a sleeping lock. The lock was converted from a raw to a sleeping lock due to some previous issues, and tests that showed it didn't seem to matter. Unfortunately, it did matter for one of our boxes. This patch converts the lock back to a raw lock. I've run this code on a few of my own machines, one being my laptop that uses the acpi quite extensively. I've been able to suspend and resume without issues. [ tglx: Made the change exclusive for acpi_gbl_hardware_lock ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: John Kacur <jkacur@gmail.com> Cc: Clark Williams <clark@redhat.com> Link: http://lkml.kernel.org/r/1360765565.23152.5.camel@gandalf.local.home Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08KVM: use simple waitqueue for vcpu->wqMarcelo Tosatti
The problem: On -RT, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08arm-enable-highmem-for-rt.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08mm, rt: kmap_atomic schedulingPeter Zijlstra
In fact, with migrate_disable() existing one could play games with kmap_atomic. You could save/restore the kmap_atomic slots on context switch (if there are any in use of course), this should be esp easy now that we have a kmap_atomic stack. Something like the below.. it wants replacing all the preempt_disable() stuff with pagefault_disable() && migrate_disable() of course, but then you can flip kmaps around like below. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> [dvhart@linux.intel.com: build fix] Link: http://lkml.kernel.org/r/1311842631.5890.208.camel@twins [tglx@linutronix.de: Get rid of the per cpu variable and store the idx and the pte content right away in the task struct. Shortens the context switch code. ] Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08kgdb/serial: Short term workaroundJason Wessel
On 07/27/2011 04:37 PM, Thomas Gleixner wrote: > - KGDB (not yet disabled) is reportedly unusable on -rt right now due > to missing hacks in the console locking which I dropped on purpose. > To work around this in the short term you can use this patch, in addition to the clocksource watchdog patch that Thomas brewed up. Comments are welcome of course. Ultimately the right solution is to change separation between the console and the HW to have a polled mode + work queue so as not to introduce any kind of latency. Thanks, Jason. Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
2015-05-08net: sysrq via icmpCarsten Emde
There are (probably rare) situations when a system crashed and the system console becomes unresponsive but the network icmp layer still is alive. Wouldn't it be wonderful, if we then could submit a sysreq command via ping? This patch provides this facility. Please consult the updated documentation Documentation/sysrq.txt for details. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>