aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-11-02Linux 3.0.48-rt72 REBASEv3.0.48-rt72-rebaseSteven Rostedt
2012-11-02net: netfilter: Serialize xt_write_recseq sections on RTThomas Gleixner
The netfilter code relies only on the implicit semantics of local_bh_disable() for serializing wt_write_recseq sections. RT breaks that and needs explicit serialization here. Reported-by: Peter LaDow <petela@gocougs.wsu.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2012-11-02rcu: Disable RCU_FAST_NO_HZ on RTThomas Gleixner
This uses a timer_list timer from the irq disabled guts of the idle code. Disable it for now to prevent wreckage. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2012-11-02hrtimer: Raise softirq if hrtimer irq stalledWatanabe
When the hrtimer stall detection hits the softirq is not raised. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org
2012-10-26sched: Better debug output for might sleepThomas Gleixner
might sleep can tell us where interrupts have been disabled, but we have no idea what disabled preemption. Add some debug infrastructure. Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26rt: rwsem/rwlock: lockdep annotationsThomas Gleixner
rwlocks and rwsems on RT do not allow multiple readers. Annotate the lockdep acquire functions accordingly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26mm: page_alloc: Use local_lock_on() instead of plain spinlockThomas Gleixner
The plain spinlock while sufficient does not update the local_lock internals. Use a proper local_lock function instead to ease debugging. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26mm: slab: Fix potential deadlockThomas Gleixner
============================================= [ INFO: possible recursive locking detected ] 3.6.0-rt1+ #49 Not tainted --------------------------------------------- swapper/0/1 is trying to acquire lock: lock_slab_on+0x72/0x77 but task is already holding lock: __local_lock_irq+0x24/0x77 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&per_cpu(slab_lock, __cpu).lock); lock(&per_cpu(slab_lock, __cpu).lock); *** DEADLOCK *** May be due to missing lock nesting notation 2 locks held by swapper/0/1: kmem_cache_create+0x33/0x89 __local_lock_irq+0x24/0x77 stack backtrace: Pid: 1, comm: swapper/0 Not tainted 3.6.0-rt1+ #49 Call Trace: __lock_acquire+0x9a4/0xdc4 ? __local_lock_irq+0x24/0x77 ? lock_slab_on+0x72/0x77 lock_acquire+0xc4/0x108 ? lock_slab_on+0x72/0x77 ? unlock_slab_on+0x5b/0x5b rt_spin_lock+0x36/0x3d ? lock_slab_on+0x72/0x77 ? migrate_disable+0x85/0x93 lock_slab_on+0x72/0x77 do_ccupdate_local+0x19/0x44 slab_on_each_cpu+0x36/0x5a do_tune_cpucache+0xc1/0x305 enable_cpucache+0x8c/0xb5 setup_cpu_cache+0x28/0x182 __kmem_cache_create+0x34b/0x380 ? shmem_mount+0x1a/0x1a kmem_cache_create+0x4a/0x89 ? shmem_mount+0x1a/0x1a shmem_init+0x3e/0xd4 kernel_init+0x11c/0x214 kernel_thread_helper+0x4/0x10 ? retint_restore_args+0x13/0x13 ? start_kernel+0x3bc/0x3bc ? gs_change+0x13/0x13 It's not a missing annotation. It's simply wrong code and needs to be fixed. Instead of nesting the local and the remote cpu lock simply acquire only the remote cpu lock, which is sufficient protection for this procedure. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26softirq: Init softirq local lock after per cpu section is set upSteven Rostedt
I discovered this bug when booting 3.4-rt on my powerpc box. It crashed with the following report: ------------[ cut here ]------------ kernel BUG at /work/rt/stable-rt.git/kernel/rtmutex_common.h:75! Oops: Exception in kernel mode, sig: 5 [#1] PREEMPT SMP NR_CPUS=64 NUMA PA Semi PWRficient Modules linked in: NIP: c0000000004aa03c LR: c0000000004aa01c CTR: c00000000009b2ac REGS: c00000003e8d7950 TRAP: 0700 Not tainted (3.4.11-test-rt19) MSR: 9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR: 24000082 XER: 20000000 SOFTE: 0 TASK = c00000003e8fdcd0[11] 'ksoftirqd/1' THREAD: c00000003e8d4000 CPU: 1 GPR00: 0000000000000001 c00000003e8d7bd0 c000000000d6cbb0 0000000000000000 GPR04: c00000003e8fdcd0 0000000000000000 0000000024004082 c000000000011454 GPR08: 0000000000000000 0000000080000001 c00000003e8fdcd1 0000000000000000 GPR12: 0000000024000084 c00000000fff0280 ffffffffffffffff 000000003ffffad8 GPR16: ffffffffffffffff 000000000072c798 0000000000000060 0000000000000000 GPR20: 0000000000642741 000000000072c858 000000003ffffaf0 0000000000000417 GPR24: 000000000072dcd0 c00000003e7ff990 0000000000000000 0000000000000001 GPR28: 0000000000000000 c000000000792340 c000000000ccec78 c000000001182338 NIP [c0000000004aa03c] .wakeup_next_waiter+0x44/0xb8 LR [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 Call Trace: [c00000003e8d7bd0] [c0000000004aa01c] .wakeup_next_waiter+0x24/0xb8 (unreliable) [c00000003e8d7c60] [c0000000004a0320] .rt_spin_lock_slowunlock+0x8c/0xe4 [c00000003e8d7ce0] [c0000000004a07cc] .rt_spin_unlock+0x54/0x64 [c00000003e8d7d60] [c0000000000636bc] .__thread_do_softirq+0x130/0x174 [c00000003e8d7df0] [c00000000006379c] .run_ksoftirqd+0x9c/0x1a4 [c00000003e8d7ea0] [c000000000080b68] .kthread+0xa8/0xb4 [c00000003e8d7f90] [c00000000001c2f8] .kernel_thread+0x54/0x70 Instruction dump: 60000000 e86d01c8 38630730 4bff7061 60000000 ebbf0008 7c7c1b78 e81d0040 7fe00278 7c000074 7800d182 68000001 <0b000000> e88d01c8 387d0010 38840738 The rtmutex_common.h:75 is: rt_mutex_top_waiter(struct rt_mutex *lock) { struct rt_mutex_waiter *w; w = plist_first_entry(&lock->wait_list, struct rt_mutex_waiter, list_entry); BUG_ON(w->lock != lock); return w; } Where the waiter->lock is corrupted. I saw various other random bugs that all had to with the softirq lock and plist. As plist needs to be initialized before it is used I investigated how this lock is initialized. It's initialized with: void __init softirq_early_init(void) { local_irq_lock_init(local_softirq_lock); } Where: do { \ int __cpu; \ for_each_possible_cpu(__cpu) \ spin_lock_init(&per_cpu(lvar, __cpu).lock); \ } while (0) As the softirq lock is a local_irq_lock, which is a per_cpu lock, the initialization is done to all per_cpu versions of the lock. But lets look at where the softirq_early_init() is called from. In init/main.c: start_kernel() /* * Interrupts are still disabled. Do necessary setups, then * enable them */ softirq_early_init(); tick_init(); boot_cpu_init(); page_address_init(); printk(KERN_NOTICE "%s", linux_banner); setup_arch(&command_line); mm_init_owner(&init_mm, &init_task); mm_init_cpumask(&init_mm); setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ One of the first things that is called is the initialization of the softirq lock. But if you look further down, we see the per_cpu areas have not been set up yet. Thus initializing a local_irq_lock() before the per_cpu section is set up, may not work as it is initializing the per cpu locks before the per cpu exists. By moving the softirq_early_init() right after setup_per_cpu_areas(), the kernel boots fine. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <clark@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Carsten Emde <cbe@osadl.org> Cc: vomlehn@texas.net Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1349362924.6755.18.camel@gandalf.local.home Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26fix printk flush of messagesFrank Rowand
Reverse preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch The problem addressed by that patch does not exist after applying console-make-rt-friendly-update.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Link: http://lkml.kernel.org/r/4FB44EF1.9050809@am.sony.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fix printk flush of messagesFrank Rowand
Updates console-make-rt-friendly.patch #ifdef CONFIG_PREEMPT_RT_FULL, printk() output is never flushed by printk() because: # some liberties taken in this pseudo-code to make it easier to follow printk() vprintk() raw_spin_lock(&logbuf_lock) # increment preempt_count(): preempt_disable() result = console_trylock_for_printk() retval = 0 # lock will always be false, because preempt_count() will be >= 1 lock = ... && !preempt_count() if (lock) retval = 1 return retval # result will always be false since lock will always be false if (result) console_unlock() # this is where the printk() output would be flushed On system boot some printk() output is flushed because register_console() and tty_open() call console_unlock(). This change also fixes the problem that was previously fixed by preempt-rt-allow-immediate-magic-sysrq-output-for-preempt_rt_full.patch Signed-off-by: Frank Rowand <frank.rowand@am.sony.com> Cc: Frank <Frank_Rowand@sonyusa.com> Link: http://lkml.kernel.org/r/4FB44FD0.4090800@am.sony.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26cpu/rt: Fix cpu_hotplug variable initializationSteven Rostedt
The commit "cpu/rt: Rework cpu down for PREEMPT_RT" changed the double meaning of the cpu_hotplug.lock, where it was a spinlock for RT and a mutex for non-RT, to just a mutex for both. But the initialization of the variable was not updated to reflect this change. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26cpu/rt: Rework cpu down for PREEMPT_RTSteven Rostedt
Bringing a CPU down is a pain with the PREEMPT_RT kernel because tasks can be preempted in many more places than in non-RT. In order to handle per_cpu variables, tasks may be pinned to a CPU for a while, and even sleep. But these tasks need to be off the CPU if that CPU is going down. Several synchronization methods have been tried, but when stressed they failed. This is a new approach. A sync_tsk thread is still created and tasks may still block on a lock when the CPU is going down, but how that works is a bit different. When cpu_down() starts, it will create the sync_tsk and wait on it to inform that current tasks that are pinned on the CPU are no longer pinned. But new tasks that are about to be pinned will still be allowed to do so at this time. Then the notifiers are called. Several notifiers will bring down tasks that will enter these locations. Some of these tasks will take locks of other tasks that are on the CPU. If we don't let those other tasks continue, but make them block until CPU down is done, the tasks that the notifiers are waiting on will never complete as they are waiting for the locks held by the tasks that are blocked. Thus we still let the task pin the CPU until the notifiers are done. After the notifiers run, we then make new tasks entering the pinned CPU sections grab a mutex and wait. This mutex is now a per CPU mutex in the hotplug_pcp descriptor. To help things along, a new function in the scheduler code is created called migrate_me(). This function will try to migrate the current task off the CPU this is going down if possible. When the sync_tsk is created, all tasks will then try to migrate off the CPU going down. There are several cases that this wont work, but it helps in most cases. After the notifiers are called and if a task can't migrate off but enters the pin CPU sections, it will be forced to wait on the hotplug_pcp mutex until the CPU down is complete. Then the scheduler will force the migration anyway. Also, I found that THREAD_BOUND need to also be accounted for in the pinned CPU, and the migrate_disable no longer treats them special. This helps fix issues with ksoftirqd and workqueue that unbind on CPU down. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26perf: Make swevent hrtimer run in irq instead of softirqYong Zhang
Otherwise we get a deadlock like below: [ 1044.042749] BUG: scheduling while atomic: ksoftirqd/21/141/0x00010003 [ 1044.042752] INFO: lockdep is turned off. [ 1044.042754] Modules linked in: [ 1044.042757] Pid: 141, comm: ksoftirqd/21 Tainted: G W 3.4.0-rc2-rt3-23676-ga723175-dirty #29 [ 1044.042759] Call Trace: [ 1044.042761] <IRQ> [<ffffffff8107d8e5>] __schedule_bug+0x65/0x80 [ 1044.042770] [<ffffffff8168978c>] __schedule+0x83c/0xa70 [ 1044.042775] [<ffffffff8106bdd2>] ? prepare_to_wait+0x32/0xb0 [ 1044.042779] [<ffffffff81689a5e>] schedule+0x2e/0xa0 [ 1044.042782] [<ffffffff81071ebd>] hrtimer_wait_for_timer+0x6d/0xb0 [ 1044.042786] [<ffffffff8106bb30>] ? wake_up_bit+0x40/0x40 [ 1044.042790] [<ffffffff81071f20>] hrtimer_cancel+0x20/0x40 [ 1044.042794] [<ffffffff8111da0c>] perf_swevent_cancel_hrtimer+0x3c/0x50 [ 1044.042798] [<ffffffff8111da31>] task_clock_event_stop+0x11/0x40 [ 1044.042802] [<ffffffff8111da6e>] task_clock_event_del+0xe/0x10 [ 1044.042805] [<ffffffff8111c568>] event_sched_out+0x118/0x1d0 [ 1044.042809] [<ffffffff8111c649>] group_sched_out+0x29/0x90 [ 1044.042813] [<ffffffff8111ed7e>] __perf_event_disable+0x18e/0x200 [ 1044.042817] [<ffffffff8111c343>] remote_function+0x63/0x70 [ 1044.042821] [<ffffffff810b0aae>] generic_smp_call_function_single_interrupt+0xce/0x120 [ 1044.042826] [<ffffffff81022bc7>] smp_call_function_single_interrupt+0x27/0x40 [ 1044.042831] [<ffffffff8168d50c>] call_function_single_interrupt+0x6c/0x80 [ 1044.042833] <EOI> [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042840] [<ffffffff8168b970>] ? _raw_spin_unlock_irq+0x30/0x70 [ 1044.042844] [<ffffffff8168b976>] ? _raw_spin_unlock_irq+0x36/0x70 [ 1044.042848] [<ffffffff810702e2>] run_hrtimer_softirq+0xc2/0x200 [ 1044.042853] [<ffffffff811275b0>] ? perf_event_overflow+0x20/0x20 [ 1044.042857] [<ffffffff81045265>] __do_softirq_common+0xf5/0x3a0 [ 1044.042862] [<ffffffff81045c3d>] __thread_do_softirq+0x15d/0x200 [ 1044.042865] [<ffffffff81045dda>] run_ksoftirqd+0xfa/0x210 [ 1044.042869] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042873] [<ffffffff81045ce0>] ? __thread_do_softirq+0x200/0x200 [ 1044.042877] [<ffffffff8106b596>] kthread+0xb6/0xc0 [ 1044.042881] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042886] [<ffffffff8168d994>] kernel_thread_helper+0x4/0x10 [ 1044.042889] [<ffffffff8107d98c>] ? finish_task_switch+0x8c/0x110 [ 1044.042894] [<ffffffff8168b97b>] ? _raw_spin_unlock_irq+0x3b/0x70 [ 1044.042897] [<ffffffff8168bd5d>] ? retint_restore_args+0xe/0xe [ 1044.042900] [<ffffffff8106b4e0>] ? kthreadd+0x1e0/0x1e0 [ 1044.042902] [<ffffffff8168d990>] ? gs_change+0xb/0xb Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1341476476-5666-1-git-send-email-yong.zhang0@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs, jbd: pull your plug when waiting for spaceMike Galbraith
With an -rt kernel, and a heavy sync IO load, tasks can jam up on journal locks without unplugging, which can lead to terminal IO starvation. Unplug and schedule when waiting for space. Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Theodore Tso <tytso@mit.edu> Link: http://lkml.kernel.org/r/1341812414.7370.73.camel@marge.simpson.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26slab: Prevent local lock deadlockThomas Gleixner
On RT we avoid the cross cpu function calls and take the per cpu local locks instead. Now the code missed that taking the local lock on the cpu which runs the code must use the proper local lock functions and not a simple spin_lock(). Otherwise it deadlocks later when trying to acquire the local lock with the proper function. Reported-and-tested-by: Chris Pringle <chris.pringle@miranda.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26Latency histograms: Detect another yet overlooked sharedprio conditionCarsten Emde
While waiting for an RT process to be woken up, the previous process may go to wait and switch to another one with the same priority which then becomes current. This condition was not correctly recognized and led to erroneously high latency recordings during periods of low CPU load. This patch correctly marks such latencies as sharedprio and prevents them from being recorded as actual system latency. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26Disable RT_GROUP_SCHED in PREEMPT_RT_FULLCarsten Emde
Strange CPU stalls have been observed in RT when RT_GROUP_SCHED was configured. Disable it for now. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26Latency histograms: Adjust timer, if already elapsed when programmedCarsten Emde
Nothing prevents a programmer from calling clock_nanosleep() with an already elapsed wakeup time in absolute time mode or with a too small delay in relative time mode. Such timers cannot wake up in time and, thus, should be corrected when entered into the missed timers latency histogram (CONFIG_MISSED_TIMERS_HIST). This patch marks such timers and uses a corrected expiration time. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26Latency histogramms: Cope with backwards running local trace clockCarsten Emde
Thanks to the wonders of modern technology, the local trace clock can now run backwards. Since this never happened before, the time difference between now and somewhat earlier was expected to never become negative and, thus, stored in an unsigned integer variable. Nowadays, we need a signed integer to ensure that the value is stored as underflow in the related histogram. (In cases where this is not a misfunction, bipolar histograms can be used.) This patch takes care that all latency variables are represented as signed integers and negative numbers are considered as histogram underflows. In one of the misbehaving processors switching to global clock solved the problem: echo global >/sys/kernel/debug/tracing/trace_clock Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26mips-remove-smp-reserve-lock.patchThomas Gleixner
Instead of making the lock raw, remove it as it protects nothing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26net,RT:REmove preemption disabling in netif_rx()Priyanka Jain
1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. 3)When CONFIG_PREEMPT_RT_FULL is not defined, migrate_enable(), migrate_disable() maps to preempt_enable() and preempt_disable(), so no change in functionality in case of non-RT. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com> Cc: <rostedt@goodmis.orgn> Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26scsi: qla2xxx: Use local_irq_save_nort() in qla2x00_pollJohn Kacur
RT triggers the following: [ 11.307652] [<ffffffff81077b27>] __might_sleep+0xe7/0x110 [ 11.307663] [<ffffffff8150e524>] rt_spin_lock+0x24/0x60 [ 11.307670] [<ffffffff8150da78>] ? rt_spin_lock_slowunlock+0x78/0x90 [ 11.307703] [<ffffffffa0272d83>] qla24xx_intr_handler+0x63/0x2d0 [qla2xxx] [ 11.307736] [<ffffffffa0262307>] qla2x00_poll+0x67/0x90 [qla2xxx] Function qla2x00_poll does local_irq_save() before calling qla24xx_intr_handler which has a spinlock. Since spinlocks are sleepable on rt, it is not allowed to call them with interrupts disabled. Therefore we use local_irq_save_nort() instead which saves flags without disabling interrupts. This fix needs to be applied to v3.0-rt, v3.2-rt and v3.4-rt Suggested-by: Thomas Gleixner Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: David Sommerseth <davids@redhat.com> Link: http://lkml.kernel.org/r/1335523726-10024-1-git-send-email-jkacur@redhat.com Cc: stable-rt@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26rt: Make migrate_disable/enable() and __rt_mutex_init non-GPL onlySteven Rostedt
Modules that load on the normal vanilla kernel should also load on an -rt kernel as well. This does not mean we condone non-GPL modules, we are only being consistent. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26printk: Disable migration instead of preemptionRichard Weinberger
There is no need do disable preemption in vprintk(), disable_migrate() is sufficient. This fixes the following bug in -rt: [ 14.759233] BUG: sleeping function called from invalid context at /home/rw/linux-rt/kernel/rtmutex.c:645 [ 14.759235] in_atomic(): 1, irqs_disabled(): 0, pid: 547, name: bash [ 14.759244] Pid: 547, comm: bash Not tainted 3.0.12-rt29+ #3 [ 14.759246] Call Trace: [ 14.759301] [<ffffffff8106fade>] __might_sleep+0xeb/0xf0 [ 14.759318] [<ffffffff810ad784>] rt_spin_lock_fastlock.constprop.9+0x21/0x43 [ 14.759336] [<ffffffff8161fef0>] rt_spin_lock+0xe/0x10 [ 14.759354] [<ffffffff81347ad1>] serial8250_console_write+0x81/0x121 [ 14.759366] [<ffffffff8107ecd3>] __call_console_drivers+0x7c/0x93 [ 14.759369] [<ffffffff8107ef31>] _call_console_drivers+0x5c/0x60 [ 14.759372] [<ffffffff8107f7e5>] console_unlock+0x147/0x1a2 [ 14.759374] [<ffffffff8107fd33>] vprintk+0x3ea/0x462 [ 14.759383] [<ffffffff816160e0>] printk+0x51/0x53 [ 14.759399] [<ffffffff811974e4>] ? proc_reg_poll+0x9a/0x9a [ 14.759403] [<ffffffff81335b42>] __handle_sysrq+0x50/0x14d [ 14.759406] [<ffffffff81335c8a>] write_sysrq_trigger+0x4b/0x53 [ 14.759408] [<ffffffff81335c3f>] ? __handle_sysrq+0x14d/0x14d [ 14.759410] [<ffffffff81197583>] proc_reg_write+0x9f/0xbe [ 14.759426] [<ffffffff811497ec>] vfs_write+0xac/0xf3 [ 14.759429] [<ffffffff8114a9b3>] ? fget_light+0x3a/0x9b [ 14.759431] [<ffffffff811499db>] sys_write+0x4a/0x6e [ 14.759438] [<ffffffff81625d52>] system_call_fastpath+0x16/0x1b Signed-off-by: Richard Weinberger <rw@linutronix.de> Link: http://lkml.kernel.org/r/1323696956-11445-1-git-send-email-rw@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26Revert "kprobes: adjust "fix a memory leak in function pre_handler_kretprobe()""Steven Rostedt
This reverts commit b8a0040ef7112439ad2efac6f1a79aa842b5924f. As pointed out by John Kacur, the patch breaks 3.0-rt. Because rt pulls in 7b8d0e5, the above fix which comes from v3.0.24 should not be applied. kretprobe is a raw_spinlock_t for real-time. Before the revert we get the following compile errors kernel/kprobes.c:1664: warning: passing argument 1 of 'rt_spin_lock' from incompatible pointer type kernel/kprobes.c:1666: warning: passing argument 1 of 'rt_spin_unlock' from incompatible pointer type Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26net: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs: namespace: Use cpu_chill() instead of cpu_relax()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs: dcache: Use cpu_chill() in trylock loopsThomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26rt: Introduce cpu_chill()Thomas Gleixner
Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26softirq: Check preemption after reenabling interruptsThomas Gleixner
raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde <cbe@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26cpu: Make hotplug.lock a "sleeping" spinlock on RTSteven Rostedt
Tasks can block on hotplug.lock in pin_current_cpu(), but their state might be != RUNNING. So the mutex wakeup will set the state unconditionally to RUNNING. That might cause spurious unexpected wakeups. We could provide a state preserving mutex_lock() function, but this is semantically backwards. So instead we convert the hotplug.lock() to a spinlock for RT, which has the state preserving semantics already. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/1330702617.25686.265.camel@gandalf.stny.rr.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26lglock/rt: Use non-rt for_each_cpu() in -rt codeSteven Rostedt
Currently the RT version of the lglocks() does a for_each_online_cpu() in the name##_global_lock_online() functions. Non-rt uses its own mask for this, and for good reason. A task may grab a *_global_lock_online(), and in the mean time, one of the CPUs goes offline. Now when that task does a *_global_unlock_online() it releases all the locks *except* the one that went offline. Now if that CPU were to come back on line, its lock is now owned by a task that never released it when it should have. This causes all sorts of fun errors. Like owners of a lock no longer existing, or sleeping on IO, waiting to be woken up by a task that happens to be blocked on the lock it never released. Convert the RT versions to use the lglock specific cpumasks. As once a CPU comes on line, the mask is set, and never cleared even when the CPU goes offline. The locks for that CPU will still be taken and released. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.374756214@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26sched/rt: Fix wait_task_interactive() to test rt_spin_lock stateSteven Rostedt
The wait_task_interactive() will have a task sleep waiting for another task to have a certain state. But it ignores the rt_spin_locks state and can return with an incorrect result if the task it is waiting for is blocked on a rt_spin_lock() and is waking up. The rt_spin_locks save the tasks state in the saved_state field and the wait_task_interactive() must also test that state. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.979435764@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26ring-buffer/rt: Check for irqs disabled before grabbing reader lockSteven Rostedt
In RT the reader lock is a mutex and we can not grab it when preemption is disabled. The in_atomic() check that is there does not check if irqs are disabled. Add that check as well. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.786365803@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26futex/rt: Fix possible lockup when taking pi_lock in proxy handlerSteven Rostedt
When taking the pi_lock, we must disable interrupts because the pi_lock can also be taken in an interrupt handler. Use raw_spin_lock_irq() instead of raw_spin_lock(). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190345.165160680@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26timer: Fix hotplug for -rtSteven Rostedt
Revert the RT patch: Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jul 3 08:30:32 2009 -0500 timers: fix timer hotplug on -rt Here we are in the CPU_DEAD notifier, and we must not sleep nor enable interrupts. There's no problem with sleeping in this notifier. But the get_cpu_var() had to be converted to a get_local_var(). Replace the previous fix with the get_local_var() convert. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Carsten Emde <C.Emde@osadl.org> Cc: John Kacur <jkacur@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Clark Williams <clark.williams@gmail.com> Cc: stable-rt@vger.kernel.org Link: http://lkml.kernel.org/r/20120301190344.948157137@goodmis.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2012-10-26net: u64_stat: Protect seqcountThomas Gleixner
On RT we must prevent that the writer gets preempted inside the write section. Otherwise a preempting reader might spin forever. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs: Protect open coded isize seqcountThomas Gleixner
A writer might be preempted in the write side critical section on RT. Disable preemption to avoid endless spinning of a preempting reader. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26seqlock: Prevent rt starvationThomas Gleixner
If a low prio writer gets preempted while holding the seqlock write locked, a high prio reader spins forever on RT. To prevent this let the reader grab the spinlock, so it blocks and eventually boosts the writer. This way the writer can proceed and endless spinning is prevented. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26timekeeping: Split xtime_lockThomas Gleixner
xtime_lock is going to be split apart in mainline, so we can shorten the seqcount protected regions and avoid updating seqcount in some code pathes. This is a straight forward split, so we can avoid the whole mess with raw seqlocks for RT. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs: dentry use seqlockThomas Gleixner
Replace the open coded seqlock with a real seqlock, so RT can handle it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26fs: fs_struct use seqlockThomas Gleixner
Replace the open coded seqlock with a real one, so RT can handle it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26seqlock: Provide seq_spin_* functionsThomas Gleixner
In some cases it's desirable to lock the seqlock w/o changing the seqcount. Provide functions for this, so we can avoid open coded constructs. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26seqlock: Use seqcountThomas Gleixner
No point in having different implementations for the same thing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26seqlock: Remove unused functionsThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26ia64: vsyscall: Use seqcount instead of seqlockThomas Gleixner
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26x86: vdso: Use seqcount instead of seqlockThomas Gleixner
The update of the vdso data happens under xtime_lock, so adding a nested lock is pointless. Just use a seqcount to sync the readers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26x86: vdso: Remove bogus locking in update_vsyscall_tz()Thomas Gleixner
Changing the sequence count in update_vsyscall_tz() is completely pointless. The vdso code copies the data unprotected. There is no point to change this as sys_tz is nowhere protected at all. See sys_gettimeofday(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2012-10-26x86-64-emulate-legacy-vsyscallsAndy Lutomirski
commit 5cec93c216db77c45f7ce970d46283bcb1933884 Author: Andy Lutomirski <luto@MIT.EDU> Date: Sun Jun 5 13:50:24 2011 -0400 x86-64: Emulate legacy vsyscalls There's a fair amount of code in the vsyscall page. It contains a syscall instruction (in the gettimeofday fallback) and who knows what will happen if an exploit jumps into the middle of some other code. Reduce the risk by replacing the vsyscalls with short magic incantations that cause the kernel to emulate the real vsyscalls. These incantations are useless if entered in the middle. This causes vsyscalls to be a little more expensive than real syscalls. Fortunately sensible programs don't use them. The only exception is time() which is still called by glibc through the vsyscall - but calling time() millions of times per second is not sensible. glibc has this fixed in the development tree. This patch is not perfect: the vread_tsc and vread_hpet functions are still at a fixed address. Fixing that might involve making alternative patching work in the vDSO. Signed-off-by: Andy Lutomirski <luto@mit.edu> Link: http://lkml.kernel.org/r/e64e1b3c64858820d12c48fa739efbd1485e79d5.1307292171.git.luto@mit.edu backport by: Signed-off-by: Steven Rostedt <rostedt@goodmis.org>