aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2013-05-02Linux 3.6.11.2-rt34v3.6.11.2-rt34Steven Rostedt (Red Hat)
2013-05-02swap: Use unique local lock name for swap_lockSteven Rostedt
From lib/Kconfig.debug on CONFIG_FORCE_WEAK_PER_CPU: ---- s390 and alpha require percpu variables in modules to be defined weak to work around addressing range issue which puts the following two restrictions on percpu variable definitions. 1. percpu symbols must be unique whether static or not 2. percpu variables can't be defined inside a function To ensure that generic code follows the above rules, this option forces all percpu variables to be defined as weak. ---- The addition of the local IRQ swap_lock in mm/swap.c broke this config as the name "swap_lock" is used through out the kernel. Just do a "git grep swap_lock" to see, and the new swap_lock is a local lock which defines the swap_lock for per_cpu. The fix was to rename swap_lock to swapvec_lock which keeps it unique. Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-02powerpc/64bit,PREEMPT_RT: Check preempt_count before preemptingPriyanka Jain
In ret_from_except_lite() with CONFIG_PREEMPT enabled, add the missing check to compare value of preempt_count with zero before continuing with preemption process of the current task. If preempt_count is non-zero, restore reg and return, else continue the preemption process. Cc: stable-rt@vger.kernel.org Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-05-02x86/mce: Defer mce wakeups to threads for PREEMPT_RTSteven Rostedt
We had a customer report a lockup on a 3.0-rt kernel that had the following backtrace: [ffff88107fca3e80] rt_spin_lock_slowlock at ffffffff81499113 [ffff88107fca3f40] rt_spin_lock at ffffffff81499a56 [ffff88107fca3f50] __wake_up at ffffffff81043379 [ffff88107fca3f80] mce_notify_irq at ffffffff81017328 [ffff88107fca3f90] intel_threshold_interrupt at ffffffff81019508 [ffff88107fca3fa0] smp_threshold_interrupt at ffffffff81019fc1 [ffff88107fca3fb0] threshold_interrupt at ffffffff814a1853 It actually bugged because the lock was taken by the same owner that already had that lock. What happened was the thread that was setting itself on a wait queue had the lock when an MCE triggered. The MCE interrupt does a wake up on its wait list and grabs the same lock. NOTE: THIS IS NOT A BUG ON MAINLINE Sorry for yelling, but as I Cc'd mainline maintainers I want them to know that this is an PREEMPT_RT bug only. I only Cc'd them for advice. On PREEMPT_RT the wait queue locks are converted from normal "spin_locks" into an rt_mutex (see the rt_spin_lock_slowlock above). These are not to be taken by hard interrupt context. This usually isn't a problem as most all interrupts in PREEMPT_RT are converted into schedulable threads. Unfortunately that's not the case with the MCE irq. As wait queue locks are notorious for long hold times, we can not convert them to raw_spin_locks without causing issues with -rt. But Thomas has created a "simple-wait" structure that uses raw spin locks which may have been a good fit. Unfortunately, wait queues are not the only issue, as the mce_notify_irq also does a schedule_work(), which grabs the workqueue spin locks that have the exact same issue. Thus, this patch I'm proposing is to move the actual work of the MCE interrupt into a helper thread that gets woken up on the MCE interrupt and does the work in a schedulable context. NOTE: THIS PATCH ONLY CHANGES THE BEHAVIOR WHEN PREEMPT_RT IS SET Oops, sorry for yelling again, but I want to stress that I keep the same behavior of mainline when PREEMPT_RT is not set. Thus, this only changes the MCE behavior when PREEMPT_RT is configured. Cc: stable-rt@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [bigeasy@linutronix: make mce_notify_work() a proper prototype, use kthread_run()] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
2013-05-02tcp: force a dst refcount when prequeue packetEric Dumazet
Before escaping RCU protected section and adding packet into prequeue, make sure the dst is refcounted. Cc: stable-rt@vger.kernel.org Reported-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-22Linux 3.6.11.2-rt33v3.6.11.2-rt33Steven Rostedt (Red Hat)
2013-04-22Merge tag 'v3.6.11.2' into v3.6-rtSteven Rostedt (Red Hat)
Linux 3.6.11.2
2013-04-12Linux 3.6.11.2v3.6.11.2Steven Rostedt (Red Hat)
2013-04-12sfc: Reset driver's MAC stats after MC reboot seenBen Hutchings
[ Upstream commit 876be083b669c43203c0ee8709d749896e1d8d60 ] If the MC reboots then the stats it reports to us will have been reset. We need to reset ours to get efx_update_diff_stat() working properly. (Ideally we would maintain stats across the reboot, but as this should only happen immediately after a firmware upgrade it's not really worth the trouble.) Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Fix timekeeping in efx_mcdi_poll()Ben Hutchings
[ Upstream commit ebf98e797b4e26ad52ace1511a0b503ee60a6cd4 ] efx_mcdi_poll() uses get_seconds() to read the current time and to implement a polling timeout. The use of this function was chosen partly because it could easily be replaced in a co-sim environment with a macro that read the simulated time. Unfortunately the real get_seconds() returns the system time (real time) which is subject to adjustment by e.g. ntpd. If the system time is adjusted forward during a polled MCDI operation, the effective timeout can be shorter than the intended 10 seconds, resulting in a spurious failure. It is also possible for a backward adjustment to delay detection of a areal failure. Use jiffies instead, and change MCDI_RPC_TIMEOUT to be denominated in jiffies. Also correct rounding of the timeout: check time > finish (or rather time_after(time, finish)) and not time >= finish. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Correctly initialise reset_method in siena_test_chip()Ben Hutchings
[ Upstream commit ef492f11efed9a6a1686bf914fb74468df59385c ] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Work-around flush timeout when flushes have completedDaniel Pieczko
[ Upstream commit 525d9e824018cd7cc8d8d44832ddcd363abfe6e1 ] We sometimes hit a "failed to flush" timeout on some TX queues, but the flushes have completed and the flush completion events seem to go missing. In this case, we can check the TX_DESC_PTR_TBL register and drain the queues if the flushes had finished. [bwh: Minor fixes to coding style] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Avoid generating over-length MC_CMD_FLUSH_RX_QUEUES requestBen Hutchings
[ Upstream commit 450783747f42dfa3883920acfad4acdd93ce69af ] MCDI supports requests up to 252 bytes long, which is only enough to pass 63 RX queue IDs to MC_CMD_FLUSH_RX_QUEUES. However a VF may have up to 64 RX queues, and if we try to flush them all we will generate an over-length request and BUG() in efx_mcdi_copyin(). Currently all VF drivers limit themselves to 32 RX queues, so reducing the limit to 63 does no harm. Also add a BUILD_BUG_ON in efx_mcdi_flush_rxqs() so we remember to deal with the same problem there if EFX_MAX_CHANNELS is increased. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Fix MCDI structure field lookupBen Hutchings
[ Upstream commit 0a6e5008a9df678b48f8d4e57601aa4270df6c14 ] The least significant bit number (LBN) of a field within an MCDI structure is counted from the start of the structure, not the containing dword. In MCDI_ARRAY_FIELD() we need to mask it rather than using the usual EFX_DWORD_FIELD() macro. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Add parentheses around use of bitfield macro argumentsBen Hutchings
[ Upstream commit 9724a8504c875145f5a513bb8eca50671cee23b4 ] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Convert firmware subtypes to native byte order in efx_mcdi_get_board_cfg()Ben Hutchings
[ Upstream commit bfeed902946a31692e7a24ed355b6d13ac37d014 ] On big-endian systems the MTD partition names currently have mangled subtype numbers and are not recognised by the firmware update tool (sfupdate). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Really disable flow control while flushingBen Hutchings
[ Upstream commit d5e8cc6c946e0857826dcfbb3585068858445bfe ] Receiving pause frames can block TX queue flushes. Earlier changes work around this by reconfiguring the MAC during flushes for VFs, but during flushes for the PF we would only change the fc_disable counter. Unless the MAC is reconfigured for some other reason during the flush (which I would not expect to happen) this had no effect at all. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sfc: Disable soft interrupt handling during efx_device_detach_sync()Ben Hutchings
[ Upstream commit 35205b211c8d17a8a0b5e8926cb7c73e9a7ef1ad ] efx_device_detach_sync() locks all TX queues before marking the device detached and thus disabling further TX scheduling. But it can still be interrupted by TX completions which then result in TX scheduling in soft interrupt context. This will deadlock when it tries to acquire a TX queue lock that efx_device_detach_sync() already acquired. To avoid deadlock, we must use netif_tx_{,un}lock_bh(). Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11bonding: get netdev_rx_handler_unregister out of locksVeaceslav Falico
[ Upstream commit fcd99434fb5c137274d2e15dd2a6a7455f0f29ff ] Now that netdev_rx_handler_unregister contains synchronize_net(), we need to call it outside of bond->lock, cause it might sleep. Also, remove the already unneded synchronize_net(). Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11smsc75xx: fix jumbo frame supportSteve Glendinning
[ Upstream commit 4c51e53689569398d656e631c17308d9b8e84650 ] This patch enables RX of jumbo frames for LAN7500. Previously the driver would transmit jumbo frames succesfully but would drop received jumbo frames (incrementing the interface errors count). With this patch applied the device can succesfully receive jumbo frames up to MTU 9000 (9014 bytes on the wire including ethernet header). Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11pch_gbe: fix ip_summed checksum reporting on rxVeaceslav Falico
[ Upstream commit 76a0e68129d7d24eb995a6871ab47081bbfa0acc ] skb->ip_summed should be CHECKSUM_UNNECESSARY when the driver reports that checksums were correct and CHECKSUM_NONE in any other case. They're currently placed vice versa, which breaks the forwarding scenario. Fix it by placing them as described above. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11net: fq_codel: Fix off-by-one errorVijay Subramanian
[ Upstream commit cd68ddd4c29ab523440299f24ff2417fe7a0dca6 ] Currently, we hold a max of sch->limit -1 number of packets instead of sch->limit packets. Fix this off-by-one error. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11net: add a synchronize_net() in netdev_rx_handler_unregister()Eric Dumazet
[ Upstream commit 00cfec37484761a44a3b6f4675a54caa618210ae ] commit 35d48903e97819 (bonding: fix rx_handler locking) added a race in bonding driver, reported by Steven Rostedt who did a very good diagnosis : <quoting Steven> I'm currently debugging a crash in an old 3.0-rt kernel that one of our customers is seeing. The bug happens with a stress test that loads and unloads the bonding module in a loop (I don't know all the details as I'm not the one that is directly interacting with the customer). But the bug looks to be something that may still be present and possibly present in mainline too. It will just be much harder to trigger it in mainline. In -rt, interrupts are threads, and can schedule in and out just like any other thread. Note, mainline now supports interrupt threads so this may be easily reproducible in mainline as well. I don't have the ability to tell the customer to try mainline or other kernels, so my hands are somewhat tied to what I can do. But according to a core dump, I tracked down that the eth irq thread crashed in bond_handle_frame() here: slave = bond_slave_get_rcu(skb->dev); bond = slave->bond; <--- BUG the slave returned was NULL and accessing slave->bond caused a NULL pointer dereference. Looking at the code that unregisters the handler: void netdev_rx_handler_unregister(struct net_device *dev) { ASSERT_RTNL(); RCU_INIT_POINTER(dev->rx_handler, NULL); RCU_INIT_POINTER(dev->rx_handler_data, NULL); } Which is basically: dev->rx_handler = NULL; dev->rx_handler_data = NULL; And looking at __netif_receive_skb() we have: rx_handler = rcu_dereference(skb->dev->rx_handler); if (rx_handler) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); pt_prev = NULL; } switch (rx_handler(&skb)) { My question to all of you is, what stops this interrupt from happening while the bonding module is unloading? What happens if the interrupt triggers and we have this: CPU0 CPU1 ---- ---- rx_handler = skb->dev->rx_handler netdev_rx_handler_unregister() { dev->rx_handler = NULL; dev->rx_handler_data = NULL; rx_handler() bond_handle_frame() { slave = skb->dev->rx_handler; bond = slave->bond; <-- NULL pointer dereference!!! What protection am I missing in the bond release handler that would prevent the above from happening? </quoting Steven> We can fix bug this in two ways. First is adding a test in bond_handle_frame() and others to check if rx_handler_data is NULL. A second way is adding a synchronize_net() in netdev_rx_handler_unregister() to make sure that a rcu protected reader has the guarantee to see a non NULL rx_handler_data. The second way is better as it avoids an extra test in fast path. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Pirko <jpirko@redhat.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11ks8851: Fix interpretation of rxlen field.Max.Nekludov@us.elster.com
[ Upstream commit 14bc435ea54cb888409efb54fc6b76c13ef530e9 ] According to the Datasheet (page 52): 15-12 Reserved 11-0 RXBC Receive Byte Count This field indicates the present received frame byte size. The code has a bug: rxh = ks8851_rdreg32(ks, KS_RXFHSR); rxstat = rxh & 0xffff; rxlen = rxh >> 16; // BUG!!! 0xFFF mask should be applied Signed-off-by: Max Nekludov <Max.Nekludov@us.elster.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11ipv6: don't accept node local multicast traffic from the wireHannes Frederic Sowa
[ Upstream commit 1c4a154e5253687c51123956dfcee9e9dfa8542d ] Erik Hugne's errata proposal (Errata ID: 3480) to RFC4291 has been verified: http://www.rfc-editor.org/errata_search.php?eid=3480 We have to check for pkt_type and loopback flag because either the packets are allowed to travel over the loopback interface (in which case pkt_type is PACKET_HOST and IFF_LOOPBACK flag is set) or they travel over a non-loopback interface back to us (in which case PACKET_TYPE is PACKET_LOOPBACK and IFF_LOOPBACK flag is not set). Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11ipv6: don't accept multicast traffic with scope 0Hannes Frederic Sowa
[ Upstream commit 20314092c1b41894d8c181bf9aa6f022be2416aa ] v2: a) moved before multicast source address check b) changed comment to netdev style Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11ipv6: fix bad free of addrconf_init_netHong Zhiguo
[ Upstream commit a79ca223e029aa4f09abb337accf1812c900a800 ] Signed-off-by: Hong Zhiguo <honkiko@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11DM9000B: driver initialization upgradeJoseph CHANG
[ Upstream commit 6741f40d198c6a5feb23653a1efd4ca47f93d83d ] Fix bug for DM9000 revision B which contain a DSP PHY DM9000B use DSP PHY instead previouse DM9000 revisions' analog PHY, So need extra change in initialization, For explicity PHY Reset and PHY init parameter, and first DM9000_NCR reset need NCR_MAC_LBK bit by dm9000_probe(). Following DM9000_NCR reset cause by dm9000_open() clear the NCR_MAC_LBK bit. Without this fix, Power-up FIFO pointers error happen around 2% rate among Davicom's customers' boards. With this fix, All above cases can be solved. Signed-off-by: Joseph CHANG <josright123@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11atl1e: drop pci-msi support because of packet corruptionHannes Frederic Sowa
[ Upstream commit 188ab1b105c96656f6bcfb49d0d8bb1b1936b632 ] Usage of pci-msi results in corrupted dma packet transfers to the host. Reported-by: rebelyouth <rebelyouth.hacklab@gmail.com> Cc: Huang, Xiong <xiong@qca.qualcomm.com> Tested-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11aoe: reserve enough headroom on skbsEric Dumazet
[ Upstream commit 91c5746425aed8f7188a351f1224a26aa232e4b3 ] Some network drivers use a non default hard_header_len Transmitted skb should take into account dev->hard_header_len, or risk crashes or expensive reallocations. In the case of aoe, lets reserve MAX_HEADER bytes. David reported a crash in defxx driver, solved by this patch. Reported-by: David Oostdyk <daveo@ll.mit.edu> Tested-by: David Oostdyk <daveo@ll.mit.edu> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ed Cashin <ecashin@coraid.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11drivers: net: ethernet: cpsw: use netif_wake_queue() while restarting tx queueMugunthan V N
[ Upstream commit b56d6b3fca6d1214dbc9c5655f26e5d4ec04afc8 ] To restart tx queue use netif_wake_queue() intead of netif_start_queue() so that net schedule will restart transmission immediately which will increase network performance while doing huge data transfers. Reported-by: Dan Franke <dan.franke@schneider-electric.com> Suggested-by: Sriramakrishnan A G <srk@ti.com> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11drivers: net: ethernet: davinci_emac: use netif_wake_queue() while ↵Mugunthan V N
restarting tx queue [ Upstream commit 7e51cde276ca820d526c6c21cf8147df595a36bf ] To restart tx queue use netif_wake_queue() intead of netif_start_queue() so that net schedule will restart transmission immediately which will increase network performance while doing huge data transfers. Reported-by: Dan Franke <dan.franke@schneider-electric.com> Suggested-by: Sriramakrishnan A G <srk@ti.com> Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11bonding: fix disabling of arp_interval and miimonnikolay@redhat.com
[ Upstream commit 1bc7db16782c2a581fb4d53ca853631050f31611 ] Currently if either arp_interval or miimon is disabled, they both get disabled, and upon disabling they get executed once more which is not the proper behaviour. Also when doing a no-op and disabling an already disabled one, the other again gets disabled. Also fix the error messages with the proper valid ranges, and a small typo fix in the up delay error message (outputting "down delay", instead of "up delay"). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11bonding: fix miimon and arp_interval delayed work race conditionsnikolay@redhat.com
[ Upstream commit fbb0c41b814d497c656fc7be9e35456f139cb2fb ] First I would give three observations which will be used later. Observation 1: if (delayed_work_pending(wq)) cancel_delayed_work(wq) This usage is wrong because the pending bit is cleared just before the work's fn is executed and if the function re-arms itself we might end up with the work still running. It's safe to call cancel_delayed_work_sync() even if the work is not queued at all. Observation 2: Use of INIT_DELAYED_WORK() Work needs to be initialized only once prior to (de/en)queueing. Observation 3: IFF_UP is set only after ndo_open is called Related race conditions: 1. Race between bonding_store_miimon() and bonding_store_arp_interval() Because of Obs.1 we can end up having both works enqueued. 2. Multiple races with INIT_DELAYED_WORK() Since the works are not protected by anything between INIT_DELAYED_WORK() and calls to (en/de)queue it is possible for races between the following functions: (races are also possible between the calls to INIT_DELAYED_WORK() and workqueue code) bonding_store_miimon() - bonding_store_arp_interval(), bond_close(), bond_open(), enqueued functions bonding_store_arp_interval() - bonding_store_miimon(), bond_close(), bond_open(), enqueued functions 3. By Obs.1 we need to change bond_cancel_all() Bugs 1 and 2 are fixed by moving all work initializations in bond_open which by Obs. 2 and Obs. 3 and the fact that we make sure that all works are cancelled in bond_close(), is guaranteed not to have any work enqueued. Also RTNL lock is now acquired in bonding_store_miimon/arp_interval so they can't race with bond_close and bond_open. The opposing work is cancelled only if the IFF_UP flag is set and it is cancelled unconditionally. The opposing work is already cancelled if the interface is down so no need to cancel it again. This way we don't need new synchronizations for the bonding workqueue. These bugs (and fixes) are tied together and belong in the same patch. Note: I have left 1 line intentionally over 80 characters (84) because I didn't like how it looks broken down. If you'd prefer it otherwise, then simply break it. v2: Make description text < 75 columns Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11bonding: remove already created master sysfs link on failureVeaceslav Falico
[ Upstream commit 9fe16b78ee17579cb4f333534cf7043e94c67024 ] If slave sysfs symlink failes to be created - we end up without removing the master sysfs symlink. Remove it in case of failure. Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11unix: fix a race condition in unix_release()Paul Moore
[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ] As reported by Jan, and others over the past few years, there is a race condition caused by unix_release setting the sock->sk pointer to NULL before properly marking the socket as dead/orphaned. This can cause a problem with the LSM hook security_unix_may_send() if there is another socket attempting to write to this partially released socket in between when sock->sk is set to NULL and it is marked as dead/orphaned. This patch fixes this by only setting sock->sk to NULL after the socket has been marked as dead; I also take the opportunity to make unix_release_sock() a void function as it only ever returned 0/success. Dave, I think this one should go on the -stable pile. Special thanks to Jan for coming up with a reproducer for this problem. Reported-by: Jan Stancek <jan.stancek@gmail.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11genetlink: trigger BUG_ON if a group name is too longMasatake YAMATO
[ Upstream commit f1e79e208076ffe7bad97158275f1c572c04f5c7 ] Trigger BUG_ON if a group name is longer than GENL_NAMSIZ. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11thermal: shorten too long mcast group nameMasatake YAMATO
[ Upstream commit 73214f5d9f33b79918b1f7babddd5c8af28dd23d ] The original name is too long. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-118021q: fix a potential use-after-freeCong Wang
[ Upstream commit 4a7df340ed1bac190c124c1601bfc10cde9fb4fb ] vlan_vid_del() could possibly free ->vlan_info after a RCU grace period, however, we may still refer to the freed memory area by 'grp' pointer. Found by code inspection. This patch moves vlan_vid_del() as behind as possible. Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11tcp: undo spurious timeout after SACK renegingYuchung Cheng
[ Upstream commit 7ebe183c6d444ef5587d803b64a1f4734b18c564 ] On SACK reneging the sender immediately retransmits and forces a timeout but disables Eifel (undo). If the (buggy) receiver does not drop any packet this can trigger a false slow-start retransmit storm driven by the ACKs of the original packets. This can be detected with undo and TCP timestamps. Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11tcp: preserve ACK clocking in TSOEric Dumazet
[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ] A long standing problem with TSO is the fact that tcp_tso_should_defer() rearms the deferred timer, while it should not. Current code leads to following bad bursty behavior : 20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119 20:11:24.484337 IP B > A: . ack 263721 win 1117 20:11:24.485086 IP B > A: . ack 265241 win 1117 20:11:24.485925 IP B > A: . ack 266761 win 1117 20:11:24.486759 IP B > A: . ack 268281 win 1117 20:11:24.487594 IP B > A: . ack 269801 win 1117 20:11:24.488430 IP B > A: . ack 271321 win 1117 20:11:24.489267 IP B > A: . ack 272841 win 1117 20:11:24.490104 IP B > A: . ack 274361 win 1117 20:11:24.490939 IP B > A: . ack 275881 win 1117 20:11:24.491775 IP B > A: . ack 277401 win 1117 20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119 20:11:24.492620 IP B > A: . ack 278921 win 1117 20:11:24.493448 IP B > A: . ack 280441 win 1117 20:11:24.494286 IP B > A: . ack 281961 win 1117 20:11:24.495122 IP B > A: . ack 283481 win 1117 20:11:24.495958 IP B > A: . ack 285001 win 1117 20:11:24.496791 IP B > A: . ack 286521 win 1117 20:11:24.497628 IP B > A: . ack 288041 win 1117 20:11:24.498459 IP B > A: . ack 289561 win 1117 20:11:24.499296 IP B > A: . ack 291081 win 1117 20:11:24.500133 IP B > A: . ack 292601 win 1117 20:11:24.500970 IP B > A: . ack 294121 win 1117 20:11:24.501388 IP B > A: . ack 295641 win 1117 20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119 While the expected behavior is more like : 20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119 20:19:49.260446 IP B > A: . ack 154281 win 1212 20:19:49.261282 IP B > A: . ack 155801 win 1212 20:19:49.262125 IP B > A: . ack 157321 win 1212 20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119 20:19:49.262958 IP B > A: . ack 158841 win 1212 20:19:49.263795 IP B > A: . ack 160361 win 1212 20:19:49.264628 IP B > A: . ack 161881 win 1212 20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119 20:19:49.265465 IP B > A: . ack 163401 win 1212 20:19:49.265886 IP B > A: . ack 164921 win 1212 20:19:49.266722 IP B > A: . ack 166441 win 1212 20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119 20:19:49.267559 IP B > A: . ack 167961 win 1212 20:19:49.268394 IP B > A: . ack 169481 win 1212 20:19:49.269232 IP B > A: . ack 171001 win 1212 20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Van Jacobson <vanj@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sky2: Threshold for Pause Packet is set wrongMirko Lindner
[ Upstream commit 74f9f42c1c1650e74fb464f76644c9041f996851 ] The sky2 driver sets the Rx Upper Threshold for Pause Packet generation to a wrong value which leads to only 2kB of RAM remaining space. This can lead to Rx overflow errors even with activated flow-control. Fix: We should increase the value to 8192/8 Signed-off-by: Mirko Lindner <mlindner@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11sky2: Receive Overflows not countedMirko Lindner
[ Upstream commit 9cfe8b156c21cf340b3a10ecb3022fbbc1c39185 ] The sky2 driver doesn't count the Receive Overflows because the MAC interrupt for this event is not set in the MAC's interrupt mask. The MAC's interrupt mask is set only for Transmit FIFO Underruns. Fix: The correct setting should be (GM_IS_TX_FF_UR | GM_IS_RX_FF_OR) Otherwise the Receive Overflow event will not generate any interrupt. The Receive Overflow interrupt is handled correctly Signed-off-by: Mirko Lindner <mlindner@marvell.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11net: remove a WARN_ON() in net_enable_timestamp()Eric Dumazet
[ Upstream commit 9979a55a833883242e3a29f3596676edd7199c46 ] The WARN_ON(in_interrupt()) in net_enable_timestamp() can get false positive, in socket clone path, run from softirq context : [ 3641.624425] WARNING: at net/core/dev.c:1532 net_enable_timestamp+0x7b/0x80() [ 3641.668811] Call Trace: [ 3641.671254] <IRQ> [<ffffffff80286817>] warn_slowpath_common+0x87/0xc0 [ 3641.677871] [<ffffffff8028686a>] warn_slowpath_null+0x1a/0x20 [ 3641.683683] [<ffffffff80742f8b>] net_enable_timestamp+0x7b/0x80 [ 3641.689668] [<ffffffff80732ce5>] sk_clone_lock+0x425/0x450 [ 3641.695222] [<ffffffff8078db36>] inet_csk_clone_lock+0x16/0x170 [ 3641.701213] [<ffffffff807ae449>] tcp_create_openreq_child+0x29/0x820 [ 3641.707663] [<ffffffff807d62e2>] ? ipt_do_table+0x222/0x670 [ 3641.713354] [<ffffffff807aaf5b>] tcp_v4_syn_recv_sock+0xab/0x3d0 [ 3641.719425] [<ffffffff807af63a>] tcp_check_req+0x3da/0x530 [ 3641.724979] [<ffffffff8078b400>] ? inet_hashinfo_init+0x60/0x80 [ 3641.730964] [<ffffffff807ade6f>] ? tcp_v4_rcv+0x79f/0xbe0 [ 3641.736430] [<ffffffff807ab9bd>] tcp_v4_do_rcv+0x38d/0x4f0 [ 3641.741985] [<ffffffff807ae14a>] tcp_v4_rcv+0xa7a/0xbe0 Its safe at this point because the parent socket owns a reference on the netstamp_needed, so we cant have a 0 -> 1 transition, which requires to lock a mutex. Instead of refining the check, lets remove it, as all known callers are safe. If it ever changes in the future, static_key_slow_inc() will complain anyway. Reported-by: Laurent Chavey <chavey@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11tracing: Prevent buffer overwrite disabled for latency tracersSteven Rostedt (Red Hat)
[ Upstream commit 613f04a0f51e6e68ac6fe571ab79da3c0a5eb4da ] The latency tracers require the buffers to be in overwrite mode, otherwise they get screwed up. Force the buffers to stay in overwrite mode when latency tracers are enabled. Added a flag_changed() method to the tracer structure to allow the tracers to see what flags are being changed, and also be able to prevent the change from happing. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11Btrfs: fix space leak when we fail to reserve metadata spaceJosef Bacik
[ Upstream commit f4881bc7a83eff263789dd524b7c269d138d4af5 ] Dave reported a warning when running xfstest 275. We have been leaking delalloc metadata space when our reservations fail. This is because we were improperly calculating how much space to free for our checksum reservations. The problem is we would sometimes free up space that had already been freed in another thread and we would end up with negative usage for the delalloc space. This patch fixes the problem by calculating how much space the other threads would have already freed, and then calculate how much space we need to free had we not done the reservation at all, and then freeing any excess space. This makes xfstests 275 no longer have leaked space. Thanks Cc: stable@vger.kernel.org Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fusionio.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11iwlwifi: dvm: don't send HCMD in restart flowEmmanuel Grumbach
[ Upstream commit 2d5d50ee596361566f7f84300117cba7d7672bc5 ] There is a race between the restart flow and the workers. The workers are cancelled after the fw is already killed and might send HCMD when there is fw to handle them. Simply check that there is a fw to which the HCMD can be sent before actually sending it. Cc: stable@vger.kernel.org Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11drm/i915: Don't clobber crtc->fb when queue_flip failsVille Syrjälä
[ Upstream commit 4a35f83b2b7c6aae3fc0d1c4554fdc99dc33ad07 ] Restore crtc->fb to the old framebuffer if queue_flip fails. While at it, kill the pointless intel_fb temp variable. v2: Update crtc->fb before queue_flip and restore it back after a failure. Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-and-Tested-by: Mika Kuoppala <mika.kuoppala@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()Takashi Iwai
[ Upstream commit 9d1a455b0ca1c2c956b4d9ab212864a8695270f1 ] The eDP output on HP Z1 is still broken when X is started even after fixing the infinite link-train loop. The regression was introduced in 3.6 kernel for cleaning up the mode clock handling code in intel_dp.c by the commit [71244653: drm/i915: adjusted_mode->clock in the dp mode_fix]. In the past, the clock of the reference mode was modified in intel_dp_mode_fixup() in the case of eDP fixed clock, and this clock was used for calculating in intel_dp_set_m_n(). This override was removed, thus the wrong mode clock is used for the calculation, resulting in a psychedelic smoking output in the end. This patch corrects the clock to be used in the place. v1->v2: Use intel_edp_target_clock() for checking eDP fixed clock instead of open code as in ironlake_set_m_n(). Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2013-04-11nfsd4: reject "negative" acl lengthsJ. Bruce Fields
[ Upstream commit 64a817cfbded8674f345d1117b117f942a351a69 ] Since we only enforce an upper bound, not a lower bound, a "negative" length can get through here. The symptom seen was a warning when we attempt to a kmalloc with an excessive size. Reported-by: Toralf Förster <toralf.foerster@gmx.de> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>