aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-06-29Merge tag 'v3.18.16-rt13-lno1' of ↵Kevin Hilman
git://git.linaro.org/people/anders.roxell/linux-rt into linux-linaro-lsk-v3.18-rt Linux 3.18.16-rt13 Changes since v3.18.13-rt10: - arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug - md/raid0: fix restore to sector variable in raid0_make_request * tag 'v3.18.16-rt13-lno1' of git://git.linaro.org/people/anders.roxell/linux-rt: (339 commits) Linux 3.18.16-rt13 REBASE workqueue: Prevent deadlock/stall on RT sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq() md: disable bcache rt,ntp: Move call to schedule_delayed_work() to helper thread scheduling while atomic in cgroup code cgroups: use simple wait in css_release() a few open coded completions completion: Use simple wait queues rcu-more-swait-conversions.patch kernel/treercu: use a simple waitqueue work-simple: Simple work queue implemenation simple-wait: rename and export the equivalent of waitqueue_active() wait-simple: Rework for use with completions wait-simple: Simple waitqueue implementation wait.h: include atomic.h drm/i915: drop trace_i915_gem_ring_dispatch on rt gpu/i915: don't open code these things cpufreq: drop K8's driver from beeing selected mmc: sdhci: don't provide hard irq handler ...
2015-06-29Merge tag 'v3.18.16' of ↵Kevin Hilman
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable into linux-linaro-lsk-v3.18-rt Linux 3.18.16 * tag 'v3.18.16' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (394 commits) Linux 3.18.16 arch/x86/kvm/mmu.c: work around gcc-4.4.4 bug md/raid0: fix restore to sector variable in raid0_make_request Linux 3.18.15 ARM: OMAP3: Fix booting with thumb2 kernel xfrm: release dst_orig in case of error in xfrm_lookup() ARC: unbork !LLSC build power/reset: at91: fix return value check in at91_reset_platform_probe() vfs: read file_handle only once in handle_to_path drm/radeon: partially revert "fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling" drm/radeon: don't share plls if monitors differ in audio support drm/radeon: retry dcpd fetch drm/radeon: fix VM_CONTEXT*_PAGE_TABLE_END_ADDR handling drm/radeon: add new bonaire pci id iwlwifi: pcie: prevent using unmapped memory in fw monitor ACPI / init: Fix the ordering of acpi_reserve_resources() sd: Disable support for 256 byte/sector disks storvsc: Set the SRB flags correctly when no data transfer is needed rtlwifi: rtl8192cu: Fix kernel deadlock md/raid5: don't record new size if resize_stripes fails. ...
2015-06-24Merge branch 'v3.18.16-rt13' into v3.18-rtAnders Roxell
2015-06-24Merge branch 'v3.18.13-rt10' into v3.18.16-rt13Anders Roxell
Used the "ours" merge strategy to throw away the previous -rt releases
2015-06-24Linux 3.18.16-rt13 REBASESteven Rostedt (Red Hat)
2015-06-24workqueue: Prevent deadlock/stall on RTThomas Gleixner
Austin reported a XFS deadlock/stall on RT where scheduled work gets never exececuted and tasks are waiting for each other for ever. The underlying problem is the modification of the RT code to the handling of workers which are about to go to sleep. In mainline a worker thread which goes to sleep wakes an idle worker if there is more work to do. This happens from the guts of the schedule() function. On RT this must be outside and the accessed data structures are not protected against scheduling due to the spinlock to rtmutex conversion. So the naive solution to this was to move the code outside of the scheduler and protect the data structures by the pool lock. That approach turned out to be a little naive as we cannot call into that code when the thread blocks on a lock, as it is not allowed to block on two locks in parallel. So we dont call into the worker wakeup magic when the worker is blocked on a lock, which causes the deadlock/stall observed by Austin and Mike. Looking deeper into that worker code it turns out that the only relevant data structure which needs to be protected is the list of idle workers which can be woken up. So the solution is to protect the list manipulation operations with preempt_enable/disable pairs on RT and call unconditionally into the worker code even when the worker is blocked on a lock. The preemption protection is safe as there is nothing which can fiddle with the list outside of thread context. Reported-and_tested-by: Austin Schuh <austin@peloton-tech.com> Reported-and_tested-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://vger.kernel.org/r/alpine.DEB.2.10.1406271249510.5170@nanos Cc: Richard Weinberger <richard.weinberger@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24sched: Do not clear PF_NO_SETAFFINITY flag in select_fallback_rq()Steven Rostedt
I talked with Peter Zijlstra about this, and he told me that the clearing of the PF_NO_SETAFFINITY flag was to deal with the optimization of migrate_disable/enable() that ignores tasks that have that flag set. But that optimization was removed when I did a rework of the cpu hotplug code. I found that ignoring tasks that had that flag set would cause those tasks to not sync with the hotplug code and cause the kernel to crash. Thus it needed to not treat them special and those tasks had to go though the same work as tasks without that flag set. Now that those tasks are not treated special, there's no reason to clear the flag. May still need to be tested as the migrate_me() code does not ignore those flags. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140701111444.0cfebaa1@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-24md: disable bcacheSebastian Andrzej Siewior
It uses anon semaphores |drivers/md/bcache/request.c: In function ‘cached_dev_write_complete’: |drivers/md/bcache/request.c:1007:2: error: implicit declaration of function ‘up_read_non_owner’ [-Werror=implicit-function-declaration] | up_read_non_owner(&dc->writeback_lock); | ^ |drivers/md/bcache/request.c: In function ‘request_write’: |drivers/md/bcache/request.c:1033:2: error: implicit declaration of function ‘down_read_non_owner’ [-Werror=implicit-function-declaration] | down_read_non_owner(&dc->writeback_lock); | ^ either we get rid of those or we have to introduce them… Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rt,ntp: Move call to schedule_delayed_work() to helper threadSteven Rostedt
The ntp code for notify_cmos_timer() is called from a hard interrupt context. schedule_delayed_work() under PREEMPT_RT_FULL calls spinlocks that have been converted to mutexes, thus calling schedule_delayed_work() from interrupt is not safe. Add a helper thread that does the call to schedule_delayed_work and wake up that thread instead of calling schedule_delayed_work() directly. This is only for CONFIG_PREEMPT_RT_FULL, otherwise the code still calls schedule_delayed_work() directly in irq context. Note: There's a few places in the kernel that do this. Perhaps the RT code should have a dedicated thread that does the checks. Just register a notifier on boot up for your check and wake up the thread when needed. This will be a todo. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24scheduling while atomic in cgroup codeMike Galbraith
mm, memcg: make refill_stock() use get_cpu_light() Nikita reported the following memcg scheduling while atomic bug: Call Trace: [e22d5a90] [c0007ea8] show_stack+0x4c/0x168 (unreliable) [e22d5ad0] [c0618c04] __schedule_bug+0x94/0xb0 [e22d5ae0] [c060b9ec] __schedule+0x530/0x550 [e22d5bf0] [c060bacc] schedule+0x30/0xbc [e22d5c00] [c060ca24] rt_spin_lock_slowlock+0x180/0x27c [e22d5c70] [c00b39dc] res_counter_uncharge_until+0x40/0xc4 [e22d5ca0] [c013ca88] drain_stock.isra.20+0x54/0x98 [e22d5cc0] [c01402ac] __mem_cgroup_try_charge+0x2e8/0xbac [e22d5d70] [c01410d4] mem_cgroup_charge_common+0x3c/0x70 [e22d5d90] [c0117284] __do_fault+0x38c/0x510 [e22d5df0] [c011a5f4] handle_pte_fault+0x98/0x858 [e22d5e50] [c060ed08] do_page_fault+0x42c/0x6fc [e22d5f40] [c000f5b4] handle_page_fault+0xc/0x80 What happens: refill_stock() get_cpu_var() drain_stock() res_counter_uncharge() res_counter_uncharge_until() spin_lock() <== boom Fix it by replacing get/put_cpu_var() with get/put_cpu_light(). Cc: stable-rt@vger.kernel.org Reported-by: Nikita Yushchenko <nyushchenko@dev.rtsoft.ru> Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24cgroups: use simple wait in css_release()Sebastian Andrzej Siewior
To avoid: |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 |in_atomic(): 1, irqs_disabled(): 0, pid: 92, name: rcuc/11 |2 locks held by rcuc/11/92: | #0: (rcu_callback){......}, at: [<ffffffff810e037e>] rcu_cpu_kthread+0x3de/0x940 | #1: (rcu_read_lock_sched){......}, at: [<ffffffff81328390>] percpu_ref_call_confirm_rcu+0x0/0xd0 |Preemption disabled at:[<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 |CPU: 11 PID: 92 Comm: rcuc/11 Not tainted 3.18.7-rt0+ #1 | ffff8802398cdf80 ffff880235f0bc28 ffffffff815b3a12 0000000000000000 | 0000000000000000 ffff880235f0bc48 ffffffff8109aa16 0000000000000000 | ffff8802398cdf80 ffff880235f0bc78 ffffffff815b8dd4 000000000000df80 |Call Trace: | [<ffffffff815b3a12>] dump_stack+0x4f/0x7c | [<ffffffff8109aa16>] __might_sleep+0x116/0x190 | [<ffffffff815b8dd4>] rt_spin_lock+0x24/0x60 | [<ffffffff8108d2cd>] queue_work_on+0x6d/0x1d0 | [<ffffffff8110c881>] css_release+0x81/0x90 | [<ffffffff8132844e>] percpu_ref_call_confirm_rcu+0xbe/0xd0 | [<ffffffff813284e2>] percpu_ref_switch_to_atomic_rcu+0x82/0xc0 | [<ffffffff810e03e5>] rcu_cpu_kthread+0x445/0x940 | [<ffffffff81098a2d>] smpboot_thread_fn+0x18d/0x2d0 | [<ffffffff810948d8>] kthread+0xe8/0x100 | [<ffffffff815b9c3c>] ret_from_fork+0x7c/0xb0 Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24a few open coded completionsSebastian Andrzej Siewior
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24completion: Use simple wait queuesThomas Gleixner
Completions have no long lasting callbacks and therefor do not need the complex waitqueue variant. Use simple waitqueues which reduces the contention on the waitqueue lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rcu-more-swait-conversions.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Merged Steven's static void rcu_nocb_gp_cleanup(struct rcu_state *rsp, struct rcu_node *rnp) { - swait_wake(&rnp->nocb_gp_wq[rnp->completed & 0x1]); + wake_up_all(&rnp->nocb_gp_wq[rnp->completed & 0x1]); } Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2015-06-24kernel/treercu: use a simple waitqueueSebastian Andrzej Siewior
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24work-simple: Simple work queue implemenationDaniel Wagner
Provides a framework for enqueuing callbacks from irq context PREEMPT_RT_FULL safe. The callbacks are executed in kthread context. Bases on wait-simple. Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24simple-wait: rename and export the equivalent of waitqueue_active()Paul Gortmaker
The function "swait_head_has_waiters()" was internalized into wait-simple.c but it parallels the waitqueue_active of normal waitqueue support. Given that there are over 150 waitqueue_active users in drivers/ fs/ kernel/ and the like, lets make it globally visible, and rename it to parallel the waitqueue_active accordingly. We'll need to do this if we expect to expand its usage beyond RT. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24wait-simple: Rework for use with completionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24wait-simple: Simple waitqueue implementationThomas Gleixner
wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> smp_mb() added by Steven Rostedt to fix a race condition with swait wakeups vs adding items to the list.
2015-06-24wait.h: include atomic.hSebastian Andrzej Siewior
| CC init/main.o |In file included from include/linux/mmzone.h:9:0, | from include/linux/gfp.h:4, | from include/linux/kmod.h:22, | from include/linux/module.h:13, | from init/main.c:15: |include/linux/wait.h: In function ‘wait_on_atomic_t’: |include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration] | if (atomic_read(val) == 0) | ^ Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24drm/i915: drop trace_i915_gem_ring_dispatch on rtSebastian Andrzej Siewior
This tracepoint is responsible for: |[<814cc358>] __schedule_bug+0x4d/0x59 |[<814d24cc>] __schedule+0x88c/0x930 |[<814d3b90>] ? _raw_spin_unlock_irqrestore+0x40/0x50 |[<814d3b95>] ? _raw_spin_unlock_irqrestore+0x45/0x50 |[<810b57b5>] ? task_blocks_on_rt_mutex+0x1f5/0x250 |[<814d27d9>] schedule+0x29/0x70 |[<814d3423>] rt_spin_lock_slowlock+0x15b/0x278 |[<814d3786>] rt_spin_lock+0x26/0x30 |[<a00dced9>] gen6_gt_force_wake_get+0x29/0x60 [i915] |[<a00e183f>] gen6_ring_get_irq+0x5f/0x100 [i915] |[<a00b2a33>] ftrace_raw_event_i915_gem_ring_dispatch+0xe3/0x100 [i915] |[<a00ac1b3>] i915_gem_do_execbuffer.isra.13+0xbd3/0x1430 [i915] |[<810f8943>] ? trace_buffer_unlock_commit+0x43/0x60 |[<8113e8d2>] ? ftrace_raw_event_kmem_alloc+0xd2/0x180 |[<8101d063>] ? native_sched_clock+0x13/0x80 |[<a00acf29>] i915_gem_execbuffer2+0x99/0x280 [i915] |[<a00114a3>] drm_ioctl+0x4c3/0x570 [drm] |[<8101d0d9>] ? sched_clock+0x9/0x10 |[<a00ace90>] ? i915_gem_execbuffer+0x480/0x480 [i915] |[<810f1c18>] ? rb_commit+0x68/0xa0 |[<810f1c6c>] ? ring_buffer_unlock_commit+0x1c/0xa0 |[<81197467>] do_vfs_ioctl+0x97/0x540 |[<81021318>] ? ftrace_raw_event_sys_enter+0xd8/0x130 |[<811979a1>] sys_ioctl+0x91/0xb0 |[<814db931>] tracesys+0xe1/0xe6 Chris Wilson does not like to move i915_trace_irq_get() out of the macro |No. This enables the IRQ, as well as making a number of |very expensively serialised read, unconditionally. so it is gone now on RT. Cc: stable-rt@vger.kernel.org Reported-by: Joakim Hernberg <jbh@alchemy.lu> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24gpu/i915: don't open code these thingsSebastian Andrzej Siewior
The opencode part is gone in 1f83fee0 ("drm/i915: clear up wedged transitions") the owner check is still there. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24cpufreq: drop K8's driver from beeing selectedSebastian Andrzej Siewior
Ralf posted a picture of a backtrace from | powernowk8_target_fn() -> transition_frequency_fidvid() and then at the | end: | 932 policy = cpufreq_cpu_get(smp_processor_id()); | 933 cpufreq_cpu_put(policy); crashing the system on -RT. I assumed that policy was a NULL pointer but was rulled out. Since Ralf can't do any more investigations on this and I have no machine with this, I simply switch it off. Reported-by: Ralf Mardorf <ralf.mardorf@alice-dsl.net> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24mmc: sdhci: don't provide hard irq handlerSebastian Andrzej Siewior
the sdhci code provides both irq handlers: the primary and the thread handler. Initially it was meant for the primary handler to be very short. The result is not that on -RT we have the primrary handler grabing locks and this isn't really working. As a hack for now I just push both handler into the threaded mode. Cc: stable-rt@vger.kernel.org Reported-By: Michal Šmucr <msmucr@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24mmci: Remove bogus local_irq_save()Thomas Gleixner
On !RT interrupt runs with interrupts disabled. On RT it's in a thread, so no need to disable interrupts at all. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24i2c/omap: drop the lock hard irq contextSebastian Andrzej Siewior
The lock is taken while reading two registers. On RT the first lock is taken in hard irq where it might sleep and in the threaded irq. The threaded irq runs in oneshot mode so the hard irq does not run until the thread the completes so there is no reason to grab the lock. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24leds: trigger: disable CPU trigger on -RTSebastian Andrzej Siewior
as it triggers: |CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141 |[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20) |[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c) |[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170) |[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38) |[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c) |[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c) |[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c) |[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) |[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234) |[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0) |[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380) Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24arch/arm64: Add lazy preempt supportAnders Roxell
arm64 is missing support for PREEMPT_RT. The main feature which is lacking is support for lazy preemption. The arch-specific entry code, thread information structure definitions, and associated data tables have to be extended to provide this support. Then the Kconfig file has to be extended to indicate the support is available, and also to indicate that support for full RT preemption is now available. Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24powerpc-preempt-lazy-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24arm-preempt-lazy-support.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24x86-preempt-lazy.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24sched: Add support for lazy preemptionThomas Gleixner
It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rcu: make RCU_BOOST default on RTSebastian Andrzej Siewior
Since it is no longer invoked from the softirq people run into OOM more often if the priority of the RCU thread is too low. Making boosting default on RT should help in those case and it can be switched off if someone knows better. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rcu: Eliminate softirq processing from rcutreePaul E. McKenney
Running RCU out of softirq is a problem for some workloads that would like to manage RCU core processing independently of other softirq work, for example, setting kthread priority. This commit therefore moves the RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread named rcuc. The SCHED_OTHER approach avoids the scalability problems that appeared with the earlier attempt to move RCU core processing to from softirq to kthreads. That said, kernels built with RCU_BOOST=y will run the rcuc kthreads at the RCU-boosting priority. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rcu: Disable RCU_FAST_NO_HZ on RTThomas Gleixner
This uses a timer_list timer from the irq disabled guts of the idle code. Disable it for now to prevent wreckage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rt, nohz_full: fix nohz_full for PREEMPT_RT_FULLMike Galbraith
A task being ticked and trying to shut the tick down will fail due to having just awakened ksoftirqd, subtract it from nr_running. Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: make migrate disable/enable conditioned on softirq_nestcnt transitionNicholas Mc Guire
This patch removes the recursive calls to migrate_disable/enable in local_bh_disable/enable the softirq-local-lock.patch introduces local_bh_disable/enable wich decrements/increments the current->softirq_nestcnt and disable/enables migration as well. as softirq_nestcnt (include/linux/sched.h conditioned on CONFIG_PREEMPT_RT_BASE) already is tracking the nesting level of the recursive calls to local_bh_disable/enable (all in kernel/softirq.c) - no need to do it twice. migrate_disable/enable thus can be conditionsed on softirq_nestcnt making a transition from 0-1 to disable migration and 1-0 to re-enable it. No change of functional behavior, this does noticably reduce the observed nesting level of migrate_disable/enable Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: Adapt NOHZ softirq pending check to new RT schemeThomas Gleixner
We can't rely on ksoftirqd anymore and we need to check the tasks which run a particular softirq and if such a task is pi blocked ignore the other pending bits of that task as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24API cleanup - use local_lock not __local_lock for softNicholas Mc Guire
trivial API cleanup - kernel/softirq.c was mimiking local_lock. No change of functional behavior Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: Split softirq locksThomas Gleixner
The 3.x RT series removed the split softirq implementation in favour of pushing softirq processing into the context of the thread which raised it. Though this prevents us from handling the various softirqs at different priorities. Now instead of reintroducing the split softirq threads we split the locks which serialize the softirq processing. If a softirq is raised in context of a thread, then the softirq is noted on a per thread field, if the thread is in a bh disabled region. If the softirq is raised from hard interrupt context, then the bit is set in the flag field of ksoftirqd and ksoftirqd is invoked. When a thread leaves a bh disabled region, then it tries to execute the softirqs which have been raised in its own context. It acquires the per softirq / per cpu lock for the softirq and then checks, whether the softirq is still pending in the per cpu local_softirq_pending() field. If yes, it runs the softirq. If no, then some other task executed it already. This allows for zero config softirq elevation in the context of user space tasks or interrupt threads. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: Split handling functionThomas Gleixner
Split out the inner handling function, so RT can reuse it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: Make serving softirqs a task flagThomas Gleixner
Avoid the percpu softirq_runner pointer magic by using a task flag. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24softirq: Init softirq local lock after per cpu section is set upSteven Rostedt
I discovered this bug when booting 3.4-rt on my powerpc box. It crashed with the following report: ------------[ cut here ]------------ kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient Modules linked in: NIP: c0000000004aa03c LR: c0000000004aa01c CTR: c00000000009b2ac REGS: c00000003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000082 XER: 20000000 SOFTE: 0 TASK = c00000003e8fdcd0[11] 'ksoftirqd/1' THREAD: c00000003e8d4000 CPU: 1 GPR00: 0000000000000001 c00000003e8d7bd0 c000000000d6cbb0 0000000000000000 GPR04: c00000003e8fdcd0 0000000000000000 0000000024004082 c000000000011454 GPR08: 0000000000000000 0000000080000001 c00000003e8fdcd1 0000000000000000 GPR12: 0000000024000084 c00000000fff0280 ffffffffffffffff 000000003ffffad8 GPR16: ffffffffffffffff 000000000072c798 0000000000000060 0000000000000000 GPR20: 0000000000642741 000000000072c858 000000003ffffaf0 0000000000000417 GPR24: 000000000072dcd0 c00000003e7ff990 0000000000000000 0000000000000001 GPR28: 0000000000000000 c000000000792340 c000000000ccec78 c000000001182338 NIP [c0000000004aa03c] .wakeup_next_waiter+0x44/0xb8 LR [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 Call Trace: [c00000003e8d7bd0] [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable) [c00000003e8d7c60] [c0000000004a0320] .rt_spin_lock_slowunlock+0x8c/0xe4 [c00000003e8d7ce0] [c0000000004a07cc] .rt_spin_unlock+0x54/0x64 [c00000003e8d7d60] [c0000000000636bc] .__thread_do_softirq+0x130/0x174 [c00000003e8d7df0] [c00000000006379c] .run_ksoftirqd+0x9c/0x1a4 [c00000003e8d7ea0] [c000000000080b68] .kthread+0xa8/0xb4 [c00000003e8d7f90] [c00000000001c2f8] .kernel_thread+0x54/0x70 Instruction dump: 60000000 e86d01c8 38630730 4bff7061 60000000 ebbf0008 7c7c1b78 e81d0040 7fe00278 7c000074 7800d182 68000001 <0b000000> e88d01c8 387d0010 38840738 The rtmutex_common.h:75 is: rt_mutex_top_waiter(struct rt_mutex *lock) { struct rt_mutex_waiter *w; w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list_entry); BUG_ON(w->lock != lock); return w; } Where the waiter->lock is corrupted. I saw various other random bugs that all had to with the softirq lock and plist. As plist needs to be initialized before it is used I investigated how this lock is initialized. It's initialized with: void __init softirq_early_init(void) { local_irq_lock_init(local_softirq_lock); } Where: #define local_irq_lock_init(lvar) \ do { \ int __cpu; \ for_each_possible_cpu(__cpu) \ spin_lock_init(&per_cpu(lvar, __cpu).lock); \ } while (0) As the softirq lock is a local_irq_lock, which is a per_cpu lock, the initialization is done to all per_cpu versions of the lock. But lets look at where the softirq_early_init() is called from. In init/main.c: start_kernel() /* * Interrupts are still disabled. Do necessary setups, then * enable them */ softirq_early_init(); tick_init(); boot_cpu_init(); page_address_init(); printk(KERN_NOTICE "%s", linux_banner); setup_arch(&command_line); mm_init_owner(&init_mm, &init_task); mm_init_cpumask(&init_mm); setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ One of the first things that is called is the initialization of the softirq lock. But if you look further down, we see the per_cpu areas have not been set up yet. Thus initializing a local_irq_lock() before the per_cpu section is set up, may not work as it is initializing the per cpu locks before the per cpu exists. By moving the softirq_early_init() right after setup_per_cpu_areas(), the kernel boots fine. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <clark@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Carsten Emde <cbe@osadl.org> Cc: vomlehn@texas.net Link: http://lkml.kernel.org/r/1349362924.6755.18.camel@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-24softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24perf: Make swevent hrtimer run in irq instead of softirqYong Zhang
Otherwise we get a deadlock like below: [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003 [ 1044.042752] INFO: lockdep is turned off. [ 1044.042754] Modules linked in: [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29 [ 1044.042759] Call Trace: [ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80 [ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70 [ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0 [ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0 [ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0 [ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40 [ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40 [ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50 [ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40 [ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10 [ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0 [ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90 [ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200 [ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70 [ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120 [ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40 [ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80 [ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70 [ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70 [ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200 [ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0 [ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200 [ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210 [ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0 [ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10 [ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110 [ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe [ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0 [ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24rt: rwsem/rwlock: lockdep annotationsThomas Gleixner
rwlocks and rwsems on RT do not allow multiple readers. Annotate the lockdep acquire functions accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionalsJosh Cartwright
"lockdep: Selftest: Only do hardirq context test for raw spinlock" disabled the execution of certain tests with PREEMPT_RT_FULL, but did not prevent the tests from still being defined. This leads to warnings like: ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function] ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function] ... Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT_FULL conditionals. Cc: stable-rt@vger.kernel.org Signed-off-by: Josh Cartwright <josh.cartwright@ni.com> Signed-off-by: Xander Huff <xander.huff@ni.com> Acked-by: Gratian Crisan <gratian.crisan@ni.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24lockdep: Selftest: Only do hardirq context test for raw spinlockYong Zhang
On -rt there is no softirq context any more and rwlock is sleepable, disable softirq context test and rwlock+irq test. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Yong Zhang <yong.zhang@windriver.com> Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24crypto: Convert crypto notifier chain to SRCUPeter Zijlstra
The crypto notifier deadlocks on RT. Though this can be a real deadlock on mainline as well due to fifo fair rwsems. The involved parties here are: [ 82.172678] swapper/0 S 0000000000000001 0 1 0 0x00000000 [ 82.172682] ffff88042f18fcf0 0000000000000046 ffff88042f18fc80 ffffffff81491238 [ 82.172685] 0000000000011cc0 0000000000011cc0 ffff88042f18c040 ffff88042f18ffd8 [ 82.172688] 0000000000011cc0 0000000000011cc0 ffff88042f18ffd8 0000000000011cc0 [ 82.172689] Call Trace: [ 82.172697] [<ffffffff81491238>] ? _raw_spin_unlock_irqrestore+0x6c/0x7a [ 82.172701] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.172704] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0 [ 82.172708] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c [ 82.172713] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141 [ 82.172716] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f [ 82.172719] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182 [ 82.172722] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e [ 82.172726] [<ffffffff811debfd>] crypto_wait_for_test+0x49/0x6b [ 82.172728] [<ffffffff811ded32>] crypto_register_alg+0x53/0x5a [ 82.172730] [<ffffffff811ded6c>] crypto_register_algs+0x33/0x72 [ 82.172734] [<ffffffff81ad7686>] ? aes_init+0x12/0x12 [ 82.172737] [<ffffffff81ad76ea>] aesni_init+0x64/0x66 [ 82.172741] [<ffffffff81000318>] do_one_initcall+0x7f/0x13b [ 82.172744] [<ffffffff81ac4d34>] kernel_init+0x199/0x22c [ 82.172747] [<ffffffff81ac44ef>] ? loglevel+0x31/0x31 [ 82.172752] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.172755] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.172759] [<ffffffff81ac4b9b>] ? start_kernel+0x3ca/0x3ca [ 82.172761] [<ffffffff814987c0>] ? gs_change+0x13/0x13 [ 82.174186] cryptomgr_test S 0000000000000001 0 41 2 0x00000000 [ 82.174189] ffff88042c971980 0000000000000046 ffffffff81d74830 0000000000000292 [ 82.174192] 0000000000011cc0 0000000000011cc0 ffff88042c96eb80 ffff88042c971fd8 [ 82.174195] 0000000000011cc0 0000000000011cc0 ffff88042c971fd8 0000000000011cc0 [ 82.174195] Call Trace: [ 82.174198] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.174201] [<ffffffff8148ec6b>] schedule_timeout+0x27/0xd0 [ 82.174204] [<ffffffff81043c0c>] ? unpin_current_cpu+0x1a/0x6c [ 82.174206] [<ffffffff8106e491>] ? migrate_enable+0x12f/0x141 [ 82.174209] [<ffffffff8148fbbd>] wait_for_common+0xbb/0x11f [ 82.174212] [<ffffffff810709f2>] ? try_to_wake_up+0x182/0x182 [ 82.174215] [<ffffffff8148fc96>] wait_for_completion_interruptible+0x1d/0x2e [ 82.174218] [<ffffffff811e4883>] cryptomgr_notify+0x280/0x385 [ 82.174221] [<ffffffff814943de>] notifier_call_chain+0x6b/0x98 [ 82.174224] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12 [ 82.174227] [<ffffffff810677cd>] __blocking_notifier_call_chain+0x70/0x8d [ 82.174230] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16 [ 82.174234] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50 [ 82.174236] [<ffffffff811dd7a1>] crypto_alg_mod_lookup+0x3e/0x74 [ 82.174238] [<ffffffff811dd949>] crypto_alloc_base+0x36/0x8f [ 82.174241] [<ffffffff811e9408>] cryptd_alloc_ablkcipher+0x6e/0xb5 [ 82.174243] [<ffffffff811dd591>] ? kzalloc.clone.5+0xe/0x10 [ 82.174246] [<ffffffff8103085d>] ablk_init_common+0x1d/0x38 [ 82.174249] [<ffffffff8103852a>] ablk_ecb_init+0x15/0x17 [ 82.174251] [<ffffffff811dd8c6>] __crypto_alloc_tfm+0xc7/0x114 [ 82.174254] [<ffffffff811e0caa>] ? crypto_lookup_skcipher+0x1f/0xe4 [ 82.174256] [<ffffffff811e0dcf>] crypto_alloc_ablkcipher+0x60/0xa5 [ 82.174258] [<ffffffff811e5bde>] alg_test_skcipher+0x24/0x9b [ 82.174261] [<ffffffff8106d96d>] ? finish_task_switch+0x3f/0xfa [ 82.174263] [<ffffffff811e6b8e>] alg_test+0x16f/0x1d7 [ 82.174267] [<ffffffff811e45ac>] ? cryptomgr_probe+0xac/0xac [ 82.174269] [<ffffffff811e45d8>] cryptomgr_test+0x2c/0x47 [ 82.174272] [<ffffffff81061161>] kthread+0x7e/0x86 [ 82.174275] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa [ 82.174278] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.174281] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.174284] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c [ 82.174287] [<ffffffff814987c0>] ? gs_change+0x13/0x13 [ 82.174329] cryptomgr_probe D 0000000000000002 0 47 2 0x00000000 [ 82.174332] ffff88042c991b70 0000000000000046 ffff88042c991bb0 0000000000000006 [ 82.174335] 0000000000011cc0 0000000000011cc0 ffff88042c98ed00 ffff88042c991fd8 [ 82.174338] 0000000000011cc0 0000000000011cc0 ffff88042c991fd8 0000000000011cc0 [ 82.174338] Call Trace: [ 82.174342] [<ffffffff8148fd3f>] schedule+0x64/0x66 [ 82.174344] [<ffffffff814901ad>] __rt_mutex_slowlock+0x85/0xbe [ 82.174347] [<ffffffff814902d2>] rt_mutex_slowlock+0xec/0x159 [ 82.174351] [<ffffffff81089c4d>] rt_mutex_fastlock.clone.8+0x29/0x2f [ 82.174353] [<ffffffff81490372>] rt_mutex_lock+0x33/0x37 [ 82.174356] [<ffffffff8108a0f2>] __rt_down_read+0x50/0x5a [ 82.174358] [<ffffffff8108a11c>] ? rt_down_read+0x10/0x12 [ 82.174360] [<ffffffff8108a11c>] rt_down_read+0x10/0x12 [ 82.174363] [<ffffffff810677b5>] __blocking_notifier_call_chain+0x58/0x8d [ 82.174366] [<ffffffff810677fe>] blocking_notifier_call_chain+0x14/0x16 [ 82.174369] [<ffffffff811dd272>] crypto_probing_notify+0x24/0x50 [ 82.174372] [<ffffffff811debd6>] crypto_wait_for_test+0x22/0x6b [ 82.174374] [<ffffffff811decd3>] crypto_register_instance+0xb4/0xc0 [ 82.174377] [<ffffffff811e9b76>] cryptd_create+0x378/0x3b6 [ 82.174379] [<ffffffff811de512>] ? __crypto_lookup_template+0x5b/0x63 [ 82.174382] [<ffffffff811e4545>] cryptomgr_probe+0x45/0xac [ 82.174385] [<ffffffff811e4500>] ? crypto_alloc_pcomp+0x1b/0x1b [ 82.174388] [<ffffffff81061161>] kthread+0x7e/0x86 [ 82.174391] [<ffffffff8106d9dd>] ? finish_task_switch+0xaf/0xfa [ 82.174394] [<ffffffff814987c4>] kernel_thread_helper+0x4/0x10 [ 82.174398] [<ffffffff81491574>] ? retint_restore_args+0x13/0x13 [ 82.174401] [<ffffffff810610e3>] ? __init_kthread_worker+0x8c/0x8c [ 82.174403] [<ffffffff814987c0>] ? gs_change+0x13/0x13 cryptomgr_test spawns the cryptomgr_probe thread from the notifier call. The probe thread fires the same notifier as the test thread and deadlocks on the rwsem on RT. Now this is a potential deadlock in mainline as well, because we have fifo fair rwsems. If another thread blocks with a down_write() on the notifier chain before the probe thread issues the down_read() it will block the probe thread and the whole party is dead locked. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-06-24net: Add a mutex around devnet_rename_seqSebastian Andrzej Siewior
On RT write_seqcount_begin() disables preemption and device_rename() allocates memory with GFP_KERNEL and grabs later the sysfs_mutex mutex. Serialize with a mutex and add use the non preemption disabling __write_seqcount_begin(). To avoid writer starvation, let the reader grab the mutex and release it when it detects a writer in progress. This keeps the normal case (no reader on the fly) fast. [ tglx: Instead of replacing the seqcount by a mutex, add the mutex ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>