aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-10-09Linux 3.0.45-rt67 REBASEv3.0.45-rt67-rebaseSteven Rostedt
2012-10-09fix printk flush of messagesFrank Rowand
Reverse preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch The problem addressed by that patch does not exist after applying console-make-rt-friendly-update.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4FB44EF1.9050809@am.sony.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fix printk flush of messagesFrank Rowand
Updates console-make-rt-friendly.patch #ifdef CONFIG_PREEMPT_RT_FULL, printk() output is never flushed by printk() because: # some liberties taken in this pseudo-code to make it easier to follow printk() vprintk() raw_spin_lock(&logbuf_lock) # increment preempt_count(): preempt_disable() result = console_trylock_for_printk() retval = 0 # lock will always be false, because preempt_count() will be >= 1 lock = ... && !preempt_count() if (lock) retval = 1 return retval # result will always be false since lock will always be false if (result) console_unlock() # this is where the printk() output would be flushed On system boot some printk() output is flushed because register_console() and tty_open() call console_unlock(). This change also fixes the problem that was previously fixed by preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Frank <Frank_Rowand@sonyusa.com> Link: http://lkml.kernel.org/r/4FB44FD0.4090800@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09cpu/rt: Fix cpu_hotplug variable initializationSteven Rostedt
The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a mutex for non-RT, to just a mutex for both. But the initialization of the variable was not updated to reflect this change. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09cpu/rt: Rework cpu down for PREEMPT_RTSteven Rostedt
Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09perf: Make swevent hrtimer run in irq instead of softirqYong Zhang
Otherwise we get a deadlock like below: [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003 [ 1044.042752] INFO: lockdep is turned off. [ 1044.042754] Modules linked in: [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29 [ 1044.042759] Call Trace: [ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80 [ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70 [ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0 [ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0 [ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0 [ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40 [ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40 [ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50 [ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40 [ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10 [ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0 [ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90 [ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200 [ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70 [ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120 [ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40 [ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80 [ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70 [ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70 [ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200 [ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0 [ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200 [ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210 [ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0 [ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10 [ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110 [ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe [ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0 [ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs, jbd: pull your plug when waiting for spaceMike Galbraith
With an -rt kernel, and a heavy sync IO load, tasks can jam up on journal locks without unplugging, which can lead to terminal IO starvation. Unplug and schedule when waiting for space. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Theodore Tso <tytso@mit.edu> Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09slab: Prevent local lock deadlockThomas Gleixner
On RT we avoid the cross cpu function calls and take the per cpu local locks instead. Now the code missed that taking the local lock on the cpu which runs the code must use the proper local lock functions and not a simple spin_lock(). Otherwise it deadlocks later when trying to acquire the local lock with the proper function. Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09Latency histograms: Detect another yet overlooked sharedprio conditionCarsten Emde
While waiting for an RT process to be woken up, the previous process may go to wait and switch to another one with the same priority which then becomes current. This condition was not correctly recognized and led to erroneously high latency recordings during periods of low CPU load. This patch correctly marks such latencies as sharedprio and prevents them from being recorded as actual system latency. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09Disable RT_GROUP_SCHED in PREEMPT_RT_FULLCarsten Emde
Strange CPU stalls have been observed in RT when RT_GROUP_SCHED was configured. Disable it for now. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09Latency histograms: Adjust timer, if already elapsed when programmedCarsten Emde
Nothing prevents a programmer from calling clock_nanosleep() with an already elapsed wakeup time in absolute time mode or with a too small delay in relative time mode. Such timers cannot wake up in time and, thus, should be corrected when entered into the missed timers latency histogram (CONFIG_MISSED_TIMERS_HIST). This patch marks such timers and uses a corrected expiration time. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09Latency histogramms: Cope with backwards running local trace clockCarsten Emde
Thanks to the wonders of modern technology, the local trace clock can now run backwards. Since this never happened before, the time difference between now and somewhat earlier was expected to never become negative and, thus, stored in an unsigned integer variable. Nowadays, we need a signed integer to ensure that the value is stored as underflow in the related histogram. (In cases where this is not a misfunction, bipolar histograms can be used.) This patch takes care that all latency variables are represented as signed integers and negative numbers are considered as histogram underflows. In one of the misbehaving processors switching to global clock solved the problem: echo global >/sys/kernel/debug/tracing/trace_clock Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09mips-remove-smp-reserve-lock.patchThomas Gleixner
Instead of making the lock raw, remove it as it protects nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_pollJohn Kacur
RT triggers the following: [ 11.307652] [<ffffffff81077b27>] __might_sleep+0xe7/0x110 [ 11.307663] [<ffffffff8150e524>] rt_spin_lock+0x24/0x60 [ 11.307670] [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90 [ 11.307703] [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx] [ 11.307736] [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx] Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled. Therefore we use local_irq_save_nort() instead which saves flags without disabling interrupts. This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt Suggested-by: Thomas Gleixner Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David Sommerseth <davids@redhat.com> Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09rt: Make migrate_disable/enable() and __rt_mutex_init non-GPL onlySteven Rostedt
Modules that load on the normal vanilla kernel should also load on an -rt kernel as well. This does not mean we condone non-GPL modules, we are only being consistent. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09printk: Disable migration instead of preemptionRichard Weinberger
There is no need do disable preemption in vprintk(), disable_migrate() is sufficient. This fixes the following bug in -rt: [ 14.759233] BUG: sleeping function called from invalid context at /home/rw/linux-rt/kernel/rtmutex.c:645 [ 14.759235] in_atomic(): 1, irqs_disabled(): 0, pid: 547, name: bash [ 14.759244] Pid: 547, comm: bash Not tainted 3.0.12-rt29+ #3 [ 14.759246] Call Trace: [ 14.759301] [<ffffffff8106fade>] __might_sleep+0xeb/0xf0 [ 14.759318] [<ffffffff810ad784>] rt_spin_lock_fastlock.constprop.9+0x21/0x43 [ 14.759336] [<ffffffff8161fef0>] rt_spin_lock+0xe/0x10 [ 14.759354] [<ffffffff81347ad1>] serial8250_console_write+0x81/0x121 [ 14.759366] [<ffffffff8107ecd3>] __call_console_drivers+0x7c/0x93 [ 14.759369] [<ffffffff8107ef31>] _call_console_drivers+0x5c/0x60 [ 14.759372] [<ffffffff8107f7e5>] console_unlock+0x147/0x1a2 [ 14.759374] [<ffffffff8107fd33>] vprintk+0x3ea/0x462 [ 14.759383] [<ffffffff816160e0>] printk+0x51/0x53 [ 14.759399] [<ffffffff811974e4>] ? proc_reg_poll+0x9a/0x9a [ 14.759403] [<ffffffff81335b42>] __handle_sysrq+0x50/0x14d [ 14.759406] [<ffffffff81335c8a>] write_sysrq_trigger+0x4b/0x53 [ 14.759408] [<ffffffff81335c3f>] ? __handle_sysrq+0x14d/0x14d [ 14.759410] [<ffffffff81197583>] proc_reg_write+0x9f/0xbe [ 14.759426] [<ffffffff811497ec>] vfs_write+0xac/0xf3 [ 14.759429] [<ffffffff8114a9b3>] ? fget_light+0x3a/0x9b [ 14.759431] [<ffffffff811499db>] sys_write+0x4a/0x6e [ 14.759438] [<ffffffff81625d52>] system_call_fastpath+0x16/0x1b Signed-off-by: Richard Weinberger <rw@linutronix.de> Link: http://lkml.kernel.org/r/1323696956-11445-1-git-send-email-rw@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09Revert "kprobes: adjust "fix a memory leak in function pre_handler_kretprobe()""Steven Rostedt
This reverts commit b8a0040ef7112439ad2efac6f1a79aa842b5924f. As pointed out by John Kacur, the patch breaks 3.0-rt. Because rt pulls in 7b8d0e5, the above fix which comes from v3.0.24 should not be applied. kretprobe is a raw_spinlock_t for real-time. Before the revert we get the following compile errors kernel/kprobes.c:1664: warning: passing argument 1 of 'rt_spin_lock' from incompatible pointer type kernel/kprobes.c:1666: warning: passing argument 1 of 'rt_spin_unlock' from incompatible pointer type Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs: namespace: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs: dcache: Use cpu_chill() in trylock loopsThomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09rt: Introduce cpu_chill()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09cpu: Make hotplug.lock a "sleeping" spinlock on RTSteven Rostedt
Tasks can block on hotplug.lock in pin_current_cpu(), but their state might be != RUNNING. So the mutex wakeup will set the state unconditionally to RUNNING. That might cause spurious unexpected wakeups. We could provide a state preserving mutex_lock() function, but this is semantically backwards. So instead we convert the hotplug.lock() to a spinlock for RT, which has the state preserving semantics already. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09lglock/rt: Use non-rt for_each_cpu() in -rt codeSteven Rostedt
Currently the RT version of the lglocks() does a for_each_online_cpu() in the name##_global_lock_online() functions. Non-rt uses its own mask for this, and for good reason. A task may grab a *_global_lock_online(), and in the mean time, one of the CPUs goes offline. Now when that task does a *_global_unlock_online() it releases all the locks *except* the one that went offline. Now if that CPU were to come back on line, its lock is now owned by a task that never released it when it should have. This causes all sorts of fun errors. Like owners of a lock no longer existing, or sleeping on IO, waiting to be woken up by a task that happens to be blocked on the lock it never released. Convert the RT versions to use the lglock specific cpumasks. As once a CPU comes on line, the mask is set, and never cleared even when the CPU goes offline. The locks for that CPU will still be taken and released. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.374756214@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09sched/rt: Fix wait_task_interactive() to test rt_spin_lock stateSteven Rostedt
The wait_task_interactive() will have a task sleep waiting for another task to have a certain state. But it ignores the rt_spin_locks state and can return with an incorrect result if the task it is waiting for is blocked on a rt_spin_lock() and is waking up. The rt_spin_locks save the tasks state in the saved_state field and the wait_task_interactive() must also test that state. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.979435764@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09ring-buffer/rt: Check for irqs disabled before grabbing reader lockSteven Rostedt
In RT the reader lock is a mutex and we can not grab it when preemption is disabled. The in_atomic() check that is there does not check if irqs are disabled. Add that check as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.786365803@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09futex/rt: Fix possible lockup when taking pi_lock in proxy handlerSteven Rostedt
When taking the pi_lock, we must disable interrupts because the pi_lock can also be taken in an interrupt handler. Use raw_spin_lock_irq() instead of raw_spin_lock(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.165160680@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09timer: Fix hotplug for -rtSteven Rostedt
Revert the RT patch: Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jul 3 08:30:32 2009 -0500 timers: fix timer hotplug on -rt Here we are in the CPU_DEAD notifier, and we must not sleep nor enable interrupts. There's no problem with sleeping in this notifier. But the get_cpu_var() had to be converted to a get_local_var(). Replace the previous fix with the get_local_var() convert. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190344.948157137@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09net: u64_stat: Protect seqcountThomas Gleixner
On RT we must prevent that the writer gets preempted inside the write section. Otherwise a preempting reader might spin forever. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs: Protect open coded isize seqcountThomas Gleixner
A writer might be preempted in the write side critical section on RT. Disable preemption to avoid endless spinning of a preempting reader. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09seqlock: Prevent rt starvationThomas Gleixner
If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09timekeeping: Split xtime_lockThomas Gleixner
xtime_lock is going to be split apart in mainline, so we can shorten the seqcount protected regions and avoid updating seqcount in some code pathes. This is a straight forward split, so we can avoid the whole mess with raw seqlocks for RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs: dentry use seqlockThomas Gleixner
Replace the open coded seqlock with a real seqlock, so RT can handle it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09fs: fs_struct use seqlockThomas Gleixner
Replace the open coded seqlock with a real one, so RT can handle it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09seqlock: Provide seq_spin_* functionsThomas Gleixner
In some cases it's desirable to lock the seqlock w/o changing the seqcount. Provide functions for this, so we can avoid open coded constructs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09seqlock: Use seqcountThomas Gleixner
No point in having different implementations for the same thing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09seqlock: Remove unused functionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09ia64: vsyscall: Use seqcount instead of seqlockThomas Gleixner
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09x86: vdso: Use seqcount instead of seqlockThomas Gleixner
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09x86: vdso: Remove bogus locking in update_vsyscall_tz()Thomas Gleixner
Changing the sequence count in update_vsyscall_tz() is completely pointless. The vdso code copies the data unprotected. There is no point to change this as sys_tz is nowhere protected at all. See sys_gettimeofday(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09x86-64-emulate-legacy-vsyscallsAndy Lutomirski
commit 5cec93c216db77c45f7ce970d46283bcb1933884 Author: Andy Lutomirski <luto@MIT.EDU> Date: Sun Jun 5 13:50:24 2011 -0400 x86-64: Emulate legacy vsyscalls There's a fair amount of code in the vsyscall page. It contains a syscall instruction (in the gettimeofday fallback) and who knows what will happen if an exploit jumps into the middle of some other code. Reduce the risk by replacing the vsyscalls with short magic incantations that cause the kernel to emulate the real vsyscalls. These incantations are useless if entered in the middle. This causes vsyscalls to be a little more expensive than real syscalls. Fortunately sensible programs don't use them. The only exception is time() which is still called by glibc through the vsyscall - but calling time() millions of times per second is not sensible. glibc has this fixed in the development tree. This patch is not perfect: the vread_tsc and vread_hpet functions are still at a fixed address. Fixing that might involve making alternative patching work in the vDSO. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu backport by: Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09x86-64-remove-vsyscall-number-3Andy Lutomirski
commit bb5fe2f78eadf5a52d8dcbf9a57728fd107af97b Author: Andy Lutomirski <luto@mit.edu> Date: Sun Jun 5 13:50:22 2011 -0400 x86-64: Remove vsyscall number 3 (venosys) It just segfaults since April 2008 (a4928cff), so I'm pretty sure that nothing uses it. And having an empty section makes the linker script a bit fragile. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/4a4abcf47ecadc269f2391a313576fe6d06acef7.1307292171.git.luto@mit.edu backport by: Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09futex: Fix bug on when a requeued RT task times outSteven Rostedt
Requeue with timeout causes a bug with PREEMPT_RT_FULL. The bug comes from a timed out condition. TASK 1 TASK 2 ------ ------ futex_wait_requeue_pi() futex_wait_queue_me() <timed out> double_lock_hb(); raw_spin_lock(pi_lock); if (current->pi_blocked_on) { } else { current->pi_blocked_on = PI_WAKE_INPROGRESS; run_spin_unlock(pi_lock); spin_lock(hb->lock); <-- blocked! plist_for_each_entry_safe(this) { rt_mutex_start_proxy_lock(); task_blocks_on_rt_mutex(); BUG_ON(task->pi_blocked_on)!!!! The BUG_ON() actually has a check for PI_WAKE_INPROGRESS, but the problem is that, after TASK 1 sets PI_WAKE_INPROGRESS, it then tries to grab the hb->lock, which it fails to do so. As the hb->lock is a mutex, it will block and set the "pi_blocked_on" to the hb->lock. When TASK 2 goes to requeue it, the check for PI_WAKE_INPROGESS fails because the task1's pi_blocked_on is no longer set to that, but instead, set to the hb->lock. The fix: When calling rt_mutex_start_proxy_lock() a check is made to see if the proxy tasks pi_blocked_on is set. If so, exit out early. Otherwise set it to a new flag PI_REQUEUE_INPROGRESS, which notifies the proxy task that it is being requeued, and will handle things appropriately. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09genirq: Allow disabling of softirq processing in irq thread contextThomas Gleixner
The processing of softirqs in irq thread context is a performance gain for the non-rt workloads of a system, but it's counterproductive for interrupts which are explicitely related to the realtime workload. Allow such interrupts to prevent softirq processing in their thread context. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09timer-fd: Prevent live lockThomas Gleixner
If hrtimer_try_to_cancel() requires a retry, then depending on the priority setting te retry loop might prevent timer callback completion on RT. Prevent that by waiting for completion on RT, no change for a non RT kernel. Reported-by: Sankara Muthukrishnan <sankara.m@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09x86: Do not disable preemption in int3 on 32bitSteven Rostedt
Preemption must be disabled before enabling interrupts in do_trap on x86_64 because the stack in use for int3 and debug is a per CPU stack set by th IST. But 32bit does not have an IST and the stack still belongs to the current task and there is no problem in scheduling out the task. Keep preemption enabled on X86_32 when enabling interrupts for do_trap(). The name of the function is changed from preempt_conditional_sti/cli() to conditional_sti/cli_ist(), to annotate that this function is used when the stack is on the IST. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09signal/x86: Delay calling signals in atomicOleg Nesterov
On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT_FULL is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT_FULL to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Cc: stable-rt@vger.kernel.org Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-09acpi-gpe-use-wait-simple.patchThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-09wait-simple: Simple waitqueue implementationThomas Gleixner
wait_queue is a swiss army knife and in most of the cases the complexity is not needed. For RT waitqueues are a constant source of trouble as we can't convert the head lock to a raw spinlock due to fancy and long lasting callbacks. Provide a slim version, which allows RT to replace wait queues. This should go mainline as well, as it lowers memory consumption and runtime overhead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>