summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-06-18Add debugfs for cpuidlelkftandroid-hikey-linaro-4.14-idle-loopDaniel Lezcano
This gives debugfs information for cpuidle Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18cpuidle: menu: Avoid selecting shallow states with stopped tickRafael J. Wysocki
If the scheduler tick has been stopped already and the governor selects a shallow idle state, the CPU can spend a long time in that state if the selection is based on an inaccurate prediction of idle time. That effect turns out to be relevant, so it needs to be mitigated. To that end, modify the menu governor to discard the result of the idle time prediction if the tick is stopped and the predicted idle time is less than the tick period length, unless the tick timer is going to expire soon. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18cpuidle: menu: Refine idle state selection for running tickRafael J. Wysocki
If the tick isn't stopped, the target residency of the state selected by the menu governor may be greater than the actual time to the next tick and that means lost energy. To avoid that, make tick_nohz_get_sleep_length() return the current time to the next event (before stopping the tick) in addition to the estimated one via an extra pointer argument and make menu_select() use that value to refine the state selection when necessary. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18sched: idle: Select idle state before stopping the tickRafael J. Wysocki
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_stop_tick(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. Namely, tick_nohz_get_sleep_length() can be made call tick_nohz_next_event(), introduced earlier, to get the time to the next non-highres timer event. If that happens, tick_nohz_next_event() need not be called by __tick_nohz_idle_stop_tick() again. If it turns out that the scheduler tick cannot be stopped going forward or the next timer event is too close for the tick to be stopped, tick_nohz_get_sleep_length() can simply return the time to the next event currently programmed into the corresponding clock event device. In addition to knowing the return value of tick_nohz_next_event(), however, tick_nohz_get_sleep_length() needs to know the time to the next highres timer event, but with the scheduler tick timer excluded, which can be computed with the help of hrtimer_get_next_event(). That minimum of that number and the tick_nohz_next_event() return value is the total time to the next timer event with the assumption that the tick will be stopped. It can be returned to the idle governor which can use it for predicting idle duration (under the assumption that the tick will be stopped) and deciding whether or not it makes sense to stop the tick before putting the CPU into the selected idle state. With the above, the sleep_length field in struct tick_sched is not necessary any more, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18time: hrtimer: Introduce hrtimer_next_event_without()Rafael J. Wysocki
The next set of changes will need to compute the time to the next hrtimer event over all hrtimers except for the scheduler tick one. To that end introduce a new helper function, hrtimer_next_event_without(), for computing the time until the next hrtimer event over all timers except for one and modify the underlying code in __hrtimer_next_event_base() to prepare it for being called by that new function. No intentional code behavior changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18time: tick-sched: Split tick_nohz_stop_sched_tick()Rafael J. Wysocki
In order to address the issue with short idle duration predictions by the idle governor after the scheduler tick has been stopped, split tick_nohz_stop_sched_tick() into two separate routines, one computing the time to the next timer event and the other simply stopping the tick when the time to the next timer event is known. Prepare these two routines to be called separately, as one of them will be called by the idle governor in the cpuidle_select() code path after subsequent changes. Update the former callers of tick_nohz_stop_sched_tick() to use the new routines, tick_nohz_next_event() and tick_nohz_stop_tick(), instead of it and move the updates of the sleep_length field in struct tick_sched into __tick_nohz_idle_stop_tick() as it doesn't need to be updated anywhere else. There should be no intentional visible changes in functionality resulting from this change. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18cpuidle: Return nohz hint from cpuidle_select()Rafael J. Wysocki
Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_update() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick and the return value of tick_nohz_get_sleep_length() is not within the tick boundary, the predicted idle duration is likely too short, so make menu_update() try to compensate for that by updating the governor statistics as though the CPU was idle for a long time. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18jiffies: Introduce USER_TICK_USEC and redefine TICK_USECRafael J. Wysocki
Since the subsequent changes will need a TICK_USEC definition analogous to TICK_NSEC, rename the existing TICK_USEC as USER_TICK_USEC, update its users and redefine TICK_USEC accordingly. Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18sched: idle: Do not stop the tick before cpuidle_idle_call()Rafael J. Wysocki
Make cpuidle_idle_call() decide whether or not to stop the tick. First, the cpuidle_enter_s2idle() path deals with the tick (and with the entire timekeeping for that matter) by itself and it doesn't need the tick to be stopped beforehand. Second, to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, it will be necessary to change the ordering of cpuidle_select() with respect to tick_nohz_idle_stop_tick(). To prepare for that, put a tick_nohz_idle_stop_tick() call in the same branch in which cpuidle_select() is called. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18sched: idle: Do not stop the tick upfront in the idle loopRafael J. Wysocki
Push the decision whether or not to stop the tick somewhat deeper into the idle loop. Stopping the tick upfront leads to unpleasant outcomes in case the idle governor doesn't agree with the nohz code on the duration of the upcoming idle period. Specifically, if the tick has been stopped and the idle governor predicts short idle, the situation is bad regardless of whether or not the prediction is accurate. If it is accurate, the tick has been stopped unnecessarily which means excessive overhead. If it is not accurate, the CPU is likely to spend too much time in the (shallow, because short idle has been predicted) idle state selected by the governor [1]. As the first step towards addressing this problem, change the code to make the tick stopping decision inside of the loop in do_idle(). In particular, do not stop the tick in the cpu_idle_poll() code path. Also don't do that in tick_nohz_irq_exit() which doesn't really have enough information on whether or not to stop the tick. Link: https://marc.info/?l=linux-pm&m=150116085925208&w=2 # [1] Link: https://tu-dresden.de/zih/forschung/ressourcen/dateien/projekte/haec/powernightmares.pdf Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18time: tick-sched: Reorganize idle tick management codeRafael J. Wysocki
Prepare the scheduler tick code for reworking the idle loop to avoid stopping the tick in some cases. The idea is to split the nohz idle entry call to decouple the idle time stats accounting and preparatory work from the actual tick stop code, in order to later be able to delay the tick stop once we reach more power-knowledgeable callers. Move away the tick_nohz_start_idle() invocation from __tick_nohz_idle_enter(), rename the latter to __tick_nohz_idle_stop_tick() and define tick_nohz_idle_stop_tick() as a wrapper around it for calling it from the outside. Make tick_nohz_idle_enter() only call tick_nohz_start_idle() instead of calling the entire __tick_nohz_idle_enter(), add another wrapper disabling and enabling interrupts around tick_nohz_idle_stop_tick() and make the current callers of tick_nohz_idle_enter() call it too to retain their current functionality. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2018-06-18arm64: hikey960: update idle-statesVincent Guittot
Update entry/exit latency of hikey960 to more realistic values based on tests figures Update min residency time based on measurements done on the board Change-Id: I61bf9b17078e043a780f60b81bcb9e87efebee39 Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
2018-06-18arm64: hikey960:disable leds activty which just waste powerVincent Guittot
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
2018-06-14Merge remote-tracking branch 'aosp/android-4.14' into android-hikey-linaro-4.14Alistair Strachan
Change-Id: I04c407fce9808cf09b7636c13c671373c340ec7a Signed-off-by: Alistair Strachan <astrachan@google.com>
2018-06-12Merge 4.14.49 into android-4.14Greg Kroah-Hartman
Changes in 4.14.49 scsi: sd_zbc: Fix potential memory leak scsi: sd_zbc: Avoid that resetting a zone fails sporadically mmap: introduce sane default mmap limits mmap: relax file size limit for regular files btrfs: define SUPER_FLAG_METADUMP_V2 kconfig: Avoid format overflow warning from GCC 8.1 be2net: Fix error detection logic for BE3 bnx2x: use the right constant dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect() enic: set DMA mask to 47 bit ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeeds ip6_tunnel: remove magic mtu value 0xFFF8 ipmr: properly check rhltable_init() return value ipv4: remove warning in ip_recv_error ipv6: omit traffic class when calculating flow hash isdn: eicon: fix a missing-check bug kcm: Fix use-after-free caused by clonned sockets netdev-FAQ: clarify DaveM's position for stable backports net: ipv4: add missing RTA_TABLE to rtm_ipv4_policy net: metrics: add proper netlink validation net/packet: refine check for priv area size net: phy: broadcom: Fix bcm_write_exp() net: usb: cdc_mbim: add flag FLAG_SEND_ZLP packet: fix reserve calculation qed: Fix mask for physical address in ILT entry sctp: not allow transport timeout value less than HZ/5 for hb_timer team: use netdev_features_t instead of u32 vhost: synchronize IOTLB message with dev cleanup vrf: check the original netdevice for generating redirect ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inline net: phy: broadcom: Fix auxiliary control register reads net-sysfs: Fix memory leak in XPS configuration virtio-net: correctly transmit XDP buff after linearizing net/mlx4: Fix irq-unsafe spinlock usage tun: Fix NULL pointer dereference in XDP redirect virtio-net: correctly check num_buf during err path net/mlx5e: When RXFCS is set, add FCS data into checksum calculation virtio-net: fix leaking page for gso packet during mergeable XDP rtnetlink: validate attributes in do_setlink() cls_flower: Fix incorrect idr release when failing to modify rule PCI: hv: Do not wait forever on a device that has disappeared drm: set FMODE_UNSIGNED_OFFSET for drm files Linux 4.14.49 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-06-11Linux 4.14.49Greg Kroah-Hartman
2018-06-11drm: set FMODE_UNSIGNED_OFFSET for drm filesDave Airlie
commit 76ef6b28ea4f81c3d511866a9b31392caa833126 upstream. Since we have the ttm and gem vma managers using a subset of the file address space for objects, and these start at 0x100000000 they will overflow the new mmap checks. I've checked all the mmap routines I could see for any bad behaviour but overall most people use GEM/TTM VMA managers even the legacy drivers have a hashtable. Reported-and-Tested-by: Arthur Marsh (amarsh04 on #radeon) Fixes: be83bbf8068 (mmap: introduce sane default mmap limits) Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11PCI: hv: Do not wait forever on a device that has disappearedDexuan Cui
commit c3635da2a336441253c33298b87b3042db100725 upstream. Before the guest finishes the device initialization, the device can be removed anytime by the host, and after that the host won't respond to the guest's request, so the guest should be prepared to handle this case. Add a polling mechanism to detect device presence. Signed-off-by: Dexuan Cui <decui@microsoft.com> [lorenzo.pieralisi@arm.com: edited commit log] Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11cls_flower: Fix incorrect idr release when failing to modify rulePaul Blakey
[ Upstream commit 8258d2da9f9f521dce7019e018360c28d116354e ] When we fail to modify a rule, we incorrectly release the idr handle of the unmodified old rule. Fix that by checking if we need to release it. Fixes: fe2502e49b58 ("net_sched: remove cls_flower idr on failure") Reported-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11rtnetlink: validate attributes in do_setlink()Eric Dumazet
[ Upstream commit 644c7eebbfd59e72982d11ec6cc7d39af12450ae ] It seems that rtnl_group_changelink() can call do_setlink while a prior call to validate_linkmsg(dev = NULL, ...) could not validate IFLA_ADDRESS / IFLA_BROADCAST Make sure do_setlink() calls validate_linkmsg() instead of letting its callers having this responsibility. With help from Dmitry Vyukov, thanks a lot ! BUG: KMSAN: uninit-value in is_valid_ether_addr include/linux/etherdevice.h:199 [inline] BUG: KMSAN: uninit-value in eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] BUG: KMSAN: uninit-value in eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 CPU: 1 PID: 8695 Comm: syz-executor3 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 is_valid_ether_addr include/linux/etherdevice.h:199 [inline] eth_prepare_mac_addr_change net/ethernet/eth.c:275 [inline] eth_mac_addr+0x203/0x2b0 net/ethernet/eth.c:308 dev_set_mac_address+0x261/0x530 net/core/dev.c:7157 do_setlink+0xbc3/0x5fc0 net/core/rtnetlink.c:2317 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007fc07480ec68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fc07480f6d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 00000000200003c0 RDI: 0000000000000014 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 kmsan_memcpy_origins+0x11d/0x170 mm/kmsan/kmsan.c:527 __msan_memcpy+0x109/0x160 mm/kmsan/kmsan_instr.c:478 do_setlink+0xb84/0x5fc0 net/core/rtnetlink.c:2315 rtnl_group_changelink net/core/rtnetlink.c:2824 [inline] rtnl_newlink+0x1fe9/0x37a0 net/core/rtnetlink.c:2976 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: e7ed828f10bd ("netlink: support setting devgroup parameters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11virtio-net: fix leaking page for gso packet during mergeable XDPJason Wang
[ Upstream commit 3d62b2a0db505bbf9ed0755f254e45d775f9807f ] We need to drop refcnt to xdp_page if we see a gso packet. Otherwise it will be leaked. Fixing this by moving the check of gso packet above the linearizing logic. While at it, remove useless comment as well. Cc: John Fastabend <john.fastabend@gmail.com> Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net/mlx5e: When RXFCS is set, add FCS data into checksum calculationEran Ben Elisha
[ Upstream commit 902a545904c71d719ed144234d67df75f31db63b ] When RXFCS feature is enabled, the HW do not strip the FCS data, however it is not present in the checksum calculated by the HW. Fix that by manually calculating the FCS checksum and adding it to the SKB checksum field. Add helper function to find the FCS data for all SKB forms (linear, one fragment or more). Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11virtio-net: correctly check num_buf during err pathJason Wang
[ Upstream commit 850e088d5bbb333342fd4def08d0a4035f2b7126 ] If we successfully linearize the packet, num_buf will be set to zero which may confuse error handling path which assumes num_buf is at least 1 and this can lead the code tries to pop the descriptor of next buffer. Fixing this by checking num_buf against 1 before decreasing. Fixes: 4941d472bf95 ("virtio-net: do not reset during XDP set") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11tun: Fix NULL pointer dereference in XDP redirectToshiaki Makita
[ Upstream commit 6547e387d7f52f2ba681a229de3c13e5b9e01ee1 ] Calling XDP redirection requires bh disabled. Softirq can call another XDP function and redirection functions, then the percpu static variable ri->map can be overwritten to NULL. This is a generic XDP case called from tun. [ 3535.736058] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 3535.743974] PGD 0 P4D 0 [ 3535.746530] Oops: 0000 [#1] SMP PTI [ 3535.750049] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ipmi_ssif irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel crypto_simd cryptd enclosure hpwdt hpilo glue_helper ipmi_si pcspkr wmi mei_me ioatdma mei ipmi_devintf shpchp dca ipmi_msghandler lpc_ich acpi_power_meter sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm smartpqi i40e crc32c_intel scsi_transport_sas tg3 i2c_core ptp pps_core [ 3535.813456] CPU: 5 PID: 1630 Comm: vhost-1614 Not tainted 4.17.0-rc4 #2 [ 3535.820127] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 3535.828732] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 3535.833740] RSP: 0018:ffffb4bc47bf7c58 EFLAGS: 00010246 [ 3535.839009] RAX: ffff9fdfcfea1c40 RBX: 0000000000000000 RCX: ffff9fdf27fe3100 [ 3535.846205] RDX: ffff9fdfca769200 RSI: 0000000000000000 RDI: 0000000000000000 [ 3535.853402] RBP: ffffb4bc491d9000 R08: 00000000000045ad R09: 0000000000000ec0 [ 3535.860597] R10: 0000000000000001 R11: ffff9fdf26c3ce4e R12: ffff9fdf9e72c000 [ 3535.867794] R13: 0000000000000000 R14: fffffffffffffff2 R15: ffff9fdfc82cdd00 [ 3535.874990] FS: 0000000000000000(0000) GS:ffff9fdfcfe80000(0000) knlGS:0000000000000000 [ 3535.883152] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3535.888948] CR2: 0000000000000018 CR3: 0000000bde724004 CR4: 00000000007626e0 [ 3535.896145] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3535.903342] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3535.910538] PKRU: 55555554 [ 3535.913267] Call Trace: [ 3535.915736] xdp_do_generic_redirect+0x7a/0x310 [ 3535.920310] do_xdp_generic.part.117+0x285/0x370 [ 3535.924970] tun_get_user+0x5b9/0x1260 [tun] [ 3535.929279] tun_sendmsg+0x52/0x70 [tun] [ 3535.933237] handle_tx+0x2ad/0x5f0 [vhost_net] [ 3535.937721] vhost_worker+0xa5/0x100 [vhost] [ 3535.942030] kthread+0xf5/0x130 [ 3535.945198] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 3535.950031] ? kthread_bind+0x10/0x10 [ 3535.953727] ret_from_fork+0x35/0x40 [ 3535.957334] Code: 0e 74 15 83 f8 10 75 05 e9 49 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 29 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 49 a9 b3 ff 31 c0 c3 [ 3535.976387] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffffb4bc47bf7c58 [ 3535.982883] CR2: 0000000000000018 [ 3535.987096] ---[ end trace 383b299dd1430240 ]--- [ 3536.131325] Kernel panic - not syncing: Fatal exception [ 3536.137484] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 3536.281406] ---[ end Kernel panic - not syncing: Fatal exception ]--- And a kernel with generic case fixed still panics in tun driver XDP redirect, because it disabled only preemption, but not bh. [ 2055.128746] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 2055.136662] PGD 0 P4D 0 [ 2055.139219] Oops: 0000 [#1] SMP PTI [ 2055.142736] Modules linked in: vhost_net vhost tap tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc vfat fat ext4 mbcache jbd2 intel_rapl skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ses aesni_intel ipmi_ssif crypto_simd enclosure cryptd hpwdt glue_helper ioatdma hpilo wmi dca pcspkr ipmi_si acpi_power_meter ipmi_devintf shpchp mei_me ipmi_msghandler mei lpc_ich sch_fq_codel ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm i40e smartpqi tg3 scsi_transport_sas crc32c_intel i2c_core ptp pps_core [ 2055.206142] CPU: 6 PID: 1693 Comm: vhost-1683 Tainted: G W 4.17.0-rc5-fix-tun+ #1 [ 2055.215011] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 11/14/2017 [ 2055.223617] RIP: 0010:__xdp_map_lookup_elem+0x5/0x30 [ 2055.228624] RSP: 0018:ffff998b07607cc0 EFLAGS: 00010246 [ 2055.233892] RAX: ffff8dbd8e235700 RBX: ffff8dbd8ff21c40 RCX: 0000000000000004 [ 2055.241089] RDX: ffff998b097a9000 RSI: 0000000000000000 RDI: 0000000000000000 [ 2055.248286] RBP: 0000000000000000 R08: 00000000000065a8 R09: 0000000000005d80 [ 2055.255483] R10: 0000000000000040 R11: ffff8dbcf0100000 R12: ffff998b097a9000 [ 2055.262681] R13: ffff8dbd8c98c000 R14: 0000000000000000 R15: ffff998b07607d78 [ 2055.269879] FS: 0000000000000000(0000) GS:ffff8dbd8ff00000(0000) knlGS:0000000000000000 [ 2055.278039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2055.283834] CR2: 0000000000000018 CR3: 0000000c0c8cc005 CR4: 00000000007626e0 [ 2055.291030] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2055.298227] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2055.305424] PKRU: 55555554 [ 2055.308153] Call Trace: [ 2055.310624] xdp_do_redirect+0x7b/0x380 [ 2055.314499] tun_get_user+0x10fe/0x12a0 [tun] [ 2055.318895] tun_sendmsg+0x52/0x70 [tun] [ 2055.322852] handle_tx+0x2ad/0x5f0 [vhost_net] [ 2055.327337] vhost_worker+0xa5/0x100 [vhost] [ 2055.331646] kthread+0xf5/0x130 [ 2055.334813] ? vhost_dev_ioctl+0x3b0/0x3b0 [vhost] [ 2055.339646] ? kthread_bind+0x10/0x10 [ 2055.343343] ret_from_fork+0x35/0x40 [ 2055.346950] Code: 0e 74 15 83 f8 10 75 05 e9 e9 aa b3 ff f3 c3 0f 1f 80 00 00 00 00 f3 c3 e9 c9 9d b3 ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 <8b> 47 18 83 f8 0e 74 0d 83 f8 10 75 05 e9 e9 a9 b3 ff 31 c0 c3 [ 2055.366004] RIP: __xdp_map_lookup_elem+0x5/0x30 RSP: ffff998b07607cc0 [ 2055.372500] CR2: 0000000000000018 [ 2055.375856] ---[ end trace 2a2dcc5e9e174268 ]--- [ 2055.523626] Kernel panic - not syncing: Fatal exception [ 2055.529796] Kernel Offset: 0x2e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 2055.677539] ---[ end Kernel panic - not syncing: Fatal exception ]--- v2: - Removed preempt_disable/enable since local_bh_disable will prevent preemption as well, feedback from Jason Wang. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net/mlx4: Fix irq-unsafe spinlock usageJack Morgenstein
[ Upstream commit d546b67cda015fb92bfee93d5dc0ceadb91deaee ] spin_lock/unlock was used instead of spin_un/lock_irq in a procedure used in process space, on a spinlock which can be grabbed in an interrupt. This caused the stack trace below to be displayed (on kernel 4.17.0-rc1 compiled with Lock Debugging enabled): [ 154.661474] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 154.668909] 4.17.0-rc1-rdma_rc_mlx+ #3 Tainted: G I [ 154.675856] ----------------------------------------------------- [ 154.682706] modprobe/10159 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 154.690254] 00000000f3b0e495 (&(&qp_table->lock)->rlock){+.+.}, at: mlx4_qp_remove+0x20/0x50 [mlx4_core] [ 154.700927] and this task is already holding: [ 154.707461] 0000000094373b5d (&(&cq->lock)->rlock/1){....}, at: destroy_qp_common+0x111/0x560 [mlx4_ib] [ 154.718028] which would create a new lock dependency: [ 154.723705] (&(&cq->lock)->rlock/1){....} -> (&(&qp_table->lock)->rlock){+.+.} [ 154.731922] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 154.740798] (&(&cq->lock)->rlock){..-.} [ 154.740800] ... which became SOFTIRQ-irq-safe at: [ 154.752163] _raw_spin_lock_irqsave+0x3e/0x50 [ 154.757163] mlx4_ib_poll_cq+0x36/0x900 [mlx4_ib] [ 154.762554] ipoib_tx_poll+0x4a/0xf0 [ib_ipoib] ... to a SOFTIRQ-irq-unsafe lock: [ 154.815603] (&(&qp_table->lock)->rlock){+.+.} [ 154.815604] ... which became SOFTIRQ-irq-unsafe at: [ 154.827718] ... [ 154.827720] _raw_spin_lock+0x35/0x50 [ 154.833912] mlx4_qp_lookup+0x1e/0x50 [mlx4_core] [ 154.839302] mlx4_flow_attach+0x3f/0x3d0 [mlx4_core] Since mlx4_qp_lookup() is called only in process space, we can simply replace the spin_un/lock calls with spin_un/lock_irq calls. Fixes: 6dc06c08bef1 ("net/mlx4: Fix the check in attaching steering rules") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11virtio-net: correctly transmit XDP buff after linearizingJason Wang
[ Upstream commit 5d458a13dd59d04b4d6658a6d5b94d42732b15ae ] We should not go for the error path after successfully transmitting a XDP buffer after linearizing. Since the error path may try to pop and drop next packet and increase the drop counters. Fixing this by simply drop the refcnt of original page and go for xmit path. Fixes: 72979a6c3590 ("virtio_net: xdp, add slowpath case for non contiguous buffers") Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net-sysfs: Fix memory leak in XPS configurationAlexander Duyck
[ Upstream commit 664088f8d68178809b848ca450f2797efb34e8e7 ] This patch reorders the error cases in showing the XPS configuration so that we hold off on memory allocation until after we have verified that we can support XPS on a given ring. Fixes: 184c449f91fe ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net: phy: broadcom: Fix auxiliary control register readsFlorian Fainelli
[ Upstream commit 733a969a7ed14fc5786bcc59c1bdda83c7ddb46e ] We are currently doing auxiliary control register reads with the shadow register value 0b111 (0x7) which incidentally is also the selector value that should be present in bits [2:0]. Fix this by using the appropriate selector mask which is defined (MII_BCM54XX_AUXCTL_SHDWSEL_MASK). This does not have a functional impact yet because we always access the MII_BCM54XX_AUXCTL_SHDWSEL_MISC (0x7) register in the current code. This might change at some point though. Fixes: 5b4e29005123 ("net: phy: broadcom: add bcm54xx_auxctl_read") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipv6: sr: fix memory OOB access in seg6_do_srh_encap/inlineMathieu Xhonneux
[ Upstream commit bbb40a0b75209734ff9286f3326171638c9f6569 ] seg6_do_srh_encap and seg6_do_srh_inline can possibly do an out-of-bounds access when adding the SRH to the packet. This no longer happen when expanding the skb not only by the size of the SRH (+ outer IPv6 header), but also by skb->mac_len. [ 53.793056] BUG: KASAN: use-after-free in seg6_do_srh_encap+0x284/0x620 [ 53.794564] Write of size 14 at addr ffff88011975ecfa by task ping/674 [ 53.796665] CPU: 0 PID: 674 Comm: ping Not tainted 4.17.0-rc3-ARCH+ #90 [ 53.796670] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 [ 53.796673] Call Trace: [ 53.796679] <IRQ> [ 53.796689] dump_stack+0x71/0xab [ 53.796700] print_address_description+0x6a/0x270 [ 53.796707] kasan_report+0x258/0x380 [ 53.796715] ? seg6_do_srh_encap+0x284/0x620 [ 53.796722] memmove+0x34/0x50 [ 53.796730] seg6_do_srh_encap+0x284/0x620 [ 53.796741] ? seg6_do_srh+0x29b/0x360 [ 53.796747] seg6_do_srh+0x29b/0x360 [ 53.796756] seg6_input+0x2e/0x2e0 [ 53.796765] lwtunnel_input+0x93/0xd0 [ 53.796774] ipv6_rcv+0x690/0x920 [ 53.796783] ? ip6_input+0x170/0x170 [ 53.796791] ? eth_gro_receive+0x2d0/0x2d0 [ 53.796800] ? ip6_input+0x170/0x170 [ 53.796809] __netif_receive_skb_core+0xcc0/0x13f0 [ 53.796820] ? netdev_info+0x110/0x110 [ 53.796827] ? napi_complete_done+0xb6/0x170 [ 53.796834] ? e1000_clean+0x6da/0xf70 [ 53.796845] ? process_backlog+0x129/0x2a0 [ 53.796853] process_backlog+0x129/0x2a0 [ 53.796862] net_rx_action+0x211/0x5c0 [ 53.796870] ? napi_complete_done+0x170/0x170 [ 53.796887] ? run_rebalance_domains+0x11f/0x150 [ 53.796891] __do_softirq+0x10e/0x39e [ 53.796894] do_softirq_own_stack+0x2a/0x40 [ 53.796895] </IRQ> [ 53.796898] do_softirq.part.16+0x54/0x60 [ 53.796900] __local_bh_enable_ip+0x5b/0x60 [ 53.796903] ip6_finish_output2+0x416/0x9f0 [ 53.796906] ? ip6_dst_lookup_flow+0x110/0x110 [ 53.796909] ? ip6_sk_dst_lookup_flow+0x390/0x390 [ 53.796911] ? __rcu_read_unlock+0x66/0x80 [ 53.796913] ? ip6_mtu+0x44/0xf0 [ 53.796916] ? ip6_output+0xfc/0x220 [ 53.796918] ip6_output+0xfc/0x220 [ 53.796921] ? ip6_finish_output+0x2b0/0x2b0 [ 53.796923] ? memcpy+0x34/0x50 [ 53.796926] ip6_send_skb+0x43/0xc0 [ 53.796929] rawv6_sendmsg+0x1216/0x1530 [ 53.796932] ? __orc_find+0x6b/0xc0 [ 53.796934] ? rawv6_rcv_skb+0x160/0x160 [ 53.796937] ? __rcu_read_unlock+0x66/0x80 [ 53.796939] ? __rcu_read_unlock+0x66/0x80 [ 53.796942] ? is_bpf_text_address+0x1e/0x30 [ 53.796944] ? kernel_text_address+0xec/0x100 [ 53.796946] ? __kernel_text_address+0xe/0x30 [ 53.796948] ? unwind_get_return_address+0x2f/0x50 [ 53.796950] ? __save_stack_trace+0x92/0x100 [ 53.796954] ? save_stack+0x89/0xb0 [ 53.796956] ? kasan_kmalloc+0xa0/0xd0 [ 53.796958] ? kmem_cache_alloc+0xd2/0x1f0 [ 53.796961] ? prepare_creds+0x23/0x160 [ 53.796963] ? __x64_sys_capset+0x252/0x3e0 [ 53.796966] ? do_syscall_64+0x69/0x160 [ 53.796968] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.796971] ? __alloc_pages_nodemask+0x170/0x380 [ 53.796973] ? __alloc_pages_slowpath+0x12c0/0x12c0 [ 53.796977] ? tty_vhangup+0x20/0x20 [ 53.796979] ? policy_nodemask+0x1a/0x90 [ 53.796982] ? __mod_node_page_state+0x8d/0xa0 [ 53.796986] ? __check_object_size+0xe7/0x240 [ 53.796989] ? __sys_sendto+0x229/0x290 [ 53.796991] ? rawv6_rcv_skb+0x160/0x160 [ 53.796993] __sys_sendto+0x229/0x290 [ 53.796996] ? __ia32_sys_getpeername+0x50/0x50 [ 53.796999] ? commit_creds+0x2de/0x520 [ 53.797002] ? security_capset+0x57/0x70 [ 53.797004] ? __x64_sys_capset+0x29f/0x3e0 [ 53.797007] ? __x64_sys_rt_sigsuspend+0xe0/0xe0 [ 53.797011] ? __do_page_fault+0x664/0x770 [ 53.797014] __x64_sys_sendto+0x74/0x90 [ 53.797017] do_syscall_64+0x69/0x160 [ 53.797019] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797022] RIP: 0033:0x7f43b7a6714a [ 53.797023] RSP: 002b:00007ffd891bd368 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 53.797026] RAX: ffffffffffffffda RBX: 00000000006129c0 RCX: 00007f43b7a6714a [ 53.797028] RDX: 0000000000000040 RSI: 00000000006129c0 RDI: 0000000000000004 [ 53.797029] RBP: 00007ffd891be640 R08: 0000000000610940 R09: 000000000000001c [ 53.797030] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040 [ 53.797032] R13: 000000000060e6a0 R14: 0000000000008004 R15: 000000000040b661 [ 53.797171] Allocated by task 642: [ 53.797460] kasan_kmalloc+0xa0/0xd0 [ 53.797463] kmem_cache_alloc+0xd2/0x1f0 [ 53.797465] getname_flags+0x40/0x210 [ 53.797467] user_path_at_empty+0x1d/0x40 [ 53.797469] do_faccessat+0x12a/0x320 [ 53.797471] do_syscall_64+0x69/0x160 [ 53.797473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.797607] Freed by task 642: [ 53.797869] __kasan_slab_free+0x130/0x180 [ 53.797871] kmem_cache_free+0xa8/0x230 [ 53.797872] filename_lookup+0x15b/0x230 [ 53.797874] do_faccessat+0x12a/0x320 [ 53.797876] do_syscall_64+0x69/0x160 [ 53.797878] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 53.798014] The buggy address belongs to the object at ffff88011975e600 which belongs to the cache names_cache of size 4096 [ 53.799043] The buggy address is located 1786 bytes inside of 4096-byte region [ffff88011975e600, ffff88011975f600) [ 53.800013] The buggy address belongs to the page: [ 53.800414] page:ffffea000465d600 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 53.801259] flags: 0x17fff0000008100(slab|head) [ 53.801640] raw: 017fff0000008100 0000000000000000 0000000000000000 0000000100070007 [ 53.803147] raw: dead000000000100 dead000000000200 ffff88011b185a40 0000000000000000 [ 53.803787] page dumped because: kasan: bad access detected [ 53.804384] Memory state around the buggy address: [ 53.804788] ffff88011975eb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805384] ffff88011975ec00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.805979] >ffff88011975ec80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.806577] ^ [ 53.807165] ffff88011975ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.807762] ffff88011975ed80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 53.808356] ================================================================== [ 53.808949] Disabling lock debugging due to kernel taint Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: David Lebrun <dlebrun@google.com> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11vrf: check the original netdevice for generating redirectStephen Suryaputra
[ Upstream commit 2f17becfbea5e9a0529b51da7345783e96e69516 ] Use the right device to determine if redirect should be sent especially when using vrf. Same as well as when sending the redirect. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11vhost: synchronize IOTLB message with dev cleanupJason Wang
[ Upstream commit 1b15ad683ab42a203f98b67045b40720e99d0e9a ] DaeRyong Jeong reports a race between vhost_dev_cleanup() and vhost_process_iotlb_msg(): Thread interleaving: CPU0 (vhost_process_iotlb_msg) CPU1 (vhost_dev_cleanup) (In the case of both VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE) ===== ===== vhost_umem_clean(dev->iotlb); if (!dev->iotlb) { ret = -EFAULT; break; } dev->iotlb = NULL; The reason is we don't synchronize between them, fixing by protecting vhost_process_iotlb_msg() with dev mutex. Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11team: use netdev_features_t instead of u32Dan Carpenter
[ Upstream commit 25ea66544bfd1d9df1b7e1502f8717e85fa1e6e6 ] This code was introduced in 2011 around the same time that we made netdev_features_t a u64 type. These days a u32 is not big enough to hold all the potential features. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11sctp: not allow transport timeout value less than HZ/5 for hb_timerXin Long
[ Upstream commit 1d88ba1ebb2763aa86172cd7ca05dedbeccc0d35 ] syzbot reported a rcu_sched self-detected stall on CPU which is caused by too small value set on rto_min with SCTP_RTOINFO sockopt. With this value, hb_timer will get stuck there, as in its timer handler it starts this timer again with this value, then goes to the timer handler again. This problem is there since very beginning, and thanks to Eric for the reproducer shared from a syzbot mail. This patch fixes it by not allowing sctp_transport_timeout to return a smaller value than HZ/5 for hb_timer, which is based on TCP's min rto. Note that it doesn't fix this issue by limiting rto_min, as some users are still using small rto and no proper value was found for it yet. Reported-by: syzbot+3dcd59a1f907245f891f@syzkaller.appspotmail.com Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11qed: Fix mask for physical address in ILT entryShahed Shaikh
[ Upstream commit fdd13dd350dda1826579eb5c333d76b14513b812 ] ILT entry requires 12 bit right shifted physical address. Existing mask for ILT entry of physical address i.e. ILT_ENTRY_PHY_ADDR_MASK is not sufficient to handle 64bit address because upper 8 bits of 64 bit address were getting masked which resulted in completer abort error on PCIe bus due to invalid address. Fix that mask to handle 64bit physical address. Fixes: fe56b9e6a8d9 ("qed: Add module with basic common support") Signed-off-by: Shahed Shaikh <shahed.shaikh@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11packet: fix reserve calculationWillem de Bruijn
[ Upstream commit 9aad13b087ab0a588cd68259de618f100053360e ] Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") ensures that packet_snd always starts writing the link layer header in reserved headroom allocated for this purpose. This is needed because packets may be shorter than hard_header_len, in which case the space up to hard_header_len may be zeroed. But that necessary padding is not accounted for in skb->len. The fix, however, is buggy. It calls skb_push, which grows skb->len when moving skb->data back. But in this case packet length should not change. Instead, call skb_reserve, which moves both skb->data and skb->tail back, without changing length. Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation") Reported-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net: usb: cdc_mbim: add flag FLAG_SEND_ZLPDaniele Palmas
[ Upstream commit 9f7c728332e8966084242fcd951aa46583bc308c ] Testing Telit LM940 with ICMP packets > 14552 bytes revealed that the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc mbim data interface won't be anymore responsive. Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net: phy: broadcom: Fix bcm_write_exp()Florian Fainelli
[ Upstream commit 79fb218d97980d4fee9a64f4c8ff05289364ba25 ] On newer PHYs, we need to select the expansion register to write with setting bits [11:8] to 0xf. This was done correctly by bcm7xxx.c prior to being migrated to generic code under bcm-phy-lib.c which unfortunately used the older implementation from the BCM54xx days. Fix this by creating an inline stub: bcm_write_exp_sel() which adds the correct value (MII_BCM54XX_EXP_SEL_ER) and update both the Cygnus PHY and BCM7xxx PHY drivers which require setting these bits. broadcom.c is unchanged because some PHYs even use a different selector method, so let them specify it directly (e.g: SerDes secondary selector). Fixes: a1cba5613edf ("net: phy: Add Broadcom phy library for common interfaces") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net/packet: refine check for priv area sizeEric Dumazet
[ Upstream commit eb73190f4fbeedf762394e92d6a4ec9ace684c88 ] syzbot was able to trick af_packet again [1] Various commits tried to address the problem in the past, but failed to take into account V3 header size. [1] tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96 BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106 CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436 prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline] prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039 __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline] packet_current_rx_frame net/packet/af_packet.c:1117 [inline] tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282 dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018 xmit_one net/core/dev.c:3049 [inline] dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069 __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584 dev_queue_xmit+0x17/0x20 net/core/dev.c:3617 neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358 neigh_output include/net/neighbour.h:482 [inline] ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120 ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154 NF_HOOK_COND include/linux/netfilter.h:277 [inline] ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171 dst_output include/net/dst.h:444 [inline] NF_HOOK include/linux/netfilter.h:288 [inline] ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491 ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633 addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 The buggy address belongs to the page: page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80 flags: 0x2fffc0000000000() raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80 raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff Fixes: 2b6867c2ce76 ("net/packet: fix overflow in check for priv area size") Fixes: dc808110bb62 ("packet: handle too big packets for PACKET_V3") Fixes: f6fb8f100b80 ("af-packet: TPACKET_V3 flexible buffer implementation.") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net: metrics: add proper netlink validationEric Dumazet
[ Upstream commit 5b5e7a0de2bbf2a1afcd9f49e940010e9fb80d53 ] Before using nla_get_u32(), better make sure the attribute is of the proper size. Code recently was changed, but bug has been there from beginning of git. BUG: KMSAN: uninit-value in rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 CPU: 1 PID: 14139 Comm: syz-executor6 Not tainted 4.17.0-rc5+ #103 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:113 kmsan_report+0x149/0x260 mm/kmsan/kmsan.c:1084 __msan_warning_32+0x6e/0xc0 mm/kmsan/kmsan_instr.c:686 rtnetlink_put_metrics+0x553/0x960 net/core/rtnetlink.c:746 fib_dump_info+0xc42/0x2190 net/ipv4/fib_semantics.c:1361 rtmsg_fib+0x65f/0x8c0 net/ipv4/fib_semantics.c:419 fib_table_insert+0x2314/0x2b50 net/ipv4/fib_trie.c:1287 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x455a09 RSP: 002b:00007faae5fd8c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007faae5fd96d4 RCX: 0000000000455a09 RDX: 0000000000000000 RSI: 0000000020000000 RDI: 0000000000000013 RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 00000000000005d0 R14: 00000000006fdc20 R15: 0000000000000000 Uninit was stored to memory at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_save_stack mm/kmsan/kmsan.c:294 [inline] kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:685 __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:529 fib_convert_metrics net/ipv4/fib_semantics.c:1056 [inline] fib_create_info+0x2d46/0x9dc0 net/ipv4/fib_semantics.c:1150 fib_table_insert+0x3e4/0x2b50 net/ipv4/fib_trie.c:1146 inet_rtm_newroute+0x210/0x340 net/ipv4/fib_frontend.c:779 rtnetlink_rcv_msg+0xa32/0x1560 net/core/rtnetlink.c:4646 netlink_rcv_skb+0x378/0x600 net/netlink/af_netlink.c:2448 rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4664 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline] netlink_unicast+0x1678/0x1750 net/netlink/af_netlink.c:1336 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322 slab_post_alloc_hook mm/slab.h:446 [inline] slab_alloc_node mm/slub.c:2753 [inline] __kmalloc_node_track_caller+0xb32/0x11b0 mm/slub.c:4395 __kmalloc_reserve net/core/skbuff.c:138 [inline] __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206 alloc_skb include/linux/skbuff.h:988 [inline] netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline] netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876 sock_sendmsg_nosec net/socket.c:629 [inline] sock_sendmsg net/socket.c:639 [inline] ___sys_sendmsg+0xec0/0x1310 net/socket.c:2117 __sys_sendmsg net/socket.c:2155 [inline] __do_sys_sendmsg net/socket.c:2164 [inline] __se_sys_sendmsg net/socket.c:2162 [inline] __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162 do_syscall_64+0x152/0x230 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: a919525ad832 ("net: Move fib_convert_metrics to metrics file") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11net: ipv4: add missing RTA_TABLE to rtm_ipv4_policyRoopa Prabhu
[ Upstream commit 2eabd764cb5512f1338d06ffc054c8bc9fbe9104 ] Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11netdev-FAQ: clarify DaveM's position for stable backportsCong Wang
[ Upstream commit 75d4e704fa8d2cf33ff295e5b441317603d7f9fd ] Per discussion with David at netconf 2018, let's clarify DaveM's position of handling stable backports in netdev-FAQ. This is important for people relying on upstream -stable releases. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11kcm: Fix use-after-free caused by clonned socketsKirill Tkhai
[ Upstream commit eb7f54b90bd8f469834c5e86dcf72ebf9a629811 ] (resend for properly queueing in patchwork) kcm_clone() creates kernel socket, which does not take net counter. Thus, the net may die before the socket is completely destructed, i.e. kcm_exit_net() is executed before kcm_done(). Reported-by: syzbot+5f1a04e374a635efc426@syzkaller.appspotmail.com Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11isdn: eicon: fix a missing-check bugWenwen Wang
[ Upstream commit 6009d1fe6ba3bb2dab55921da60465329cc1cd89 ] In divasmain.c, the function divas_write() firstly invokes the function diva_xdi_open_adapter() to open the adapter that matches with the adapter number provided by the user, and then invokes the function diva_xdi_write() to perform the write operation using the matched adapter. The two functions diva_xdi_open_adapter() and diva_xdi_write() are located in diva.c. In diva_xdi_open_adapter(), the user command is copied to the object 'msg' from the userspace pointer 'src' through the function pointer 'cp_fn', which eventually calls copy_from_user() to do the copy. Then, the adapter number 'msg.adapter' is used to find out a matched adapter from the 'adapter_queue'. A matched adapter will be returned if it is found. Otherwise, NULL is returned to indicate the failure of the verification on the adapter number. As mentioned above, if a matched adapter is returned, the function diva_xdi_write() is invoked to perform the write operation. In this function, the user command is copied once again from the userspace pointer 'src', which is the same as the 'src' pointer in diva_xdi_open_adapter() as both of them are from the 'buf' pointer in divas_write(). Similarly, the copy is achieved through the function pointer 'cp_fn', which finally calls copy_from_user(). After the successful copy, the corresponding command processing handler of the matched adapter is invoked to perform the write operation. It is obvious that there are two copies here from userspace, one is in diva_xdi_open_adapter(), and one is in diva_xdi_write(). Plus, both of these two copies share the same source userspace pointer, i.e., the 'buf' pointer in divas_write(). Given that a malicious userspace process can race to change the content pointed by the 'buf' pointer, this can pose potential security issues. For example, in the first copy, the user provides a valid adapter number to pass the verification process and a valid adapter can be found. Then the user can modify the adapter number to an invalid number. This way, the user can bypass the verification process of the adapter number and inject inconsistent data. This patch reuses the data copied in diva_xdi_open_adapter() and passes it to diva_xdi_write(). This way, the above issues can be avoided. Signed-off-by: Wenwen Wang <wang6495@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipv6: omit traffic class when calculating flow hashMichal Kubecek
[ Upstream commit fa1be7e01ea863e911349e30456706749518eeab ] Some of the code paths calculating flow hash for IPv6 use flowlabel member of struct flowi6 which, despite its name, encodes both flow label and traffic class. If traffic class changes within a TCP connection (as e.g. ssh does), ECMP route can switch between path. It's also inconsistent with other code paths where ip6_flowlabel() (returning only flow label) is used to feed the key. Use only flow label everywhere, including one place where hash key is set using ip6_flowinfo(). Fixes: 51ebd3181572 ("ipv6: add support of equal cost multipath (ECMP)") Fixes: f70ea018da06 ("net: Add functions to get skb->hash based on flow structures") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipv4: remove warning in ip_recv_errorWillem de Bruijn
[ Upstream commit 730c54d59403658a62af6517338fa8d4922c1b28 ] A precondition check in ip_recv_error triggered on an otherwise benign race. Remove the warning. The warning triggers when passing an ipv6 socket to this ipv4 error handling function. RaceFuzzer was able to trigger it due to a race in setsockopt IPV6_ADDRFORM. --- CPU0 do_ipv6_setsockopt sk->sk_socket->ops = &inet_dgram_ops; --- CPU1 sk->sk_prot->recvmsg udp_recvmsg ip_recv_error WARN_ON_ONCE(sk->sk_family == AF_INET6); --- CPU0 do_ipv6_setsockopt sk->sk_family = PF_INET; This socket option converts a v6 socket that is connected to a v4 peer to an v4 socket. It updates the socket on the fly, changing fields in sk as well as other structs. This is inherently non-atomic. It races with the lockless udp_recvmsg path. No other code makes an assumption that these fields are updated atomically. It is benign here, too, as ip_recv_error cares only about the protocol of the skbs enqueued on the error queue, for which sk_family is not a precise predictor (thanks to another isue with IPV6_ADDRFORM). Link: http://lkml.kernel.org/r/20180518120826.GA19515@dragonet.kaist.ac.kr Fixes: 7ce875e5ecb8 ("ipv4: warn once on passing AF_INET6 socket to ip_recv_error") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ipmr: properly check rhltable_init() return valueEric Dumazet
[ Upstream commit 66fb33254f45df4b049f487aff1cbde1ef919390 ] commit 8fb472c09b9d ("ipmr: improve hash scalability") added a call to rhltable_init() without checking its return value. This problem was then later copied to IPv6 and factorized in commit 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 31552 Comm: syz-executor7 Not tainted 4.17.0-rc5+ #60 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:rht_key_hashfn include/linux/rhashtable.h:277 [inline] RIP: 0010:__rhashtable_lookup include/linux/rhashtable.h:630 [inline] RIP: 0010:rhltable_lookup include/linux/rhashtable.h:716 [inline] RIP: 0010:mr_mfc_find_parent+0x2ad/0xbb0 net/ipv4/ipmr_base.c:63 RSP: 0018:ffff8801826aef70 EFLAGS: 00010203 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffffc90001ea0000 RDX: 0000000000000079 RSI: ffffffff8661e859 RDI: 000000000000000c RBP: ffff8801826af1c0 R08: ffff8801b2212000 R09: ffffed003b5e46c2 R10: ffffed003b5e46c2 R11: ffff8801daf23613 R12: dffffc0000000000 R13: ffff8801826af198 R14: ffff8801cf8225c0 R15: ffff8801826af658 FS: 00007ff7fa732700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000003ffffff9c CR3: 00000001b0210000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip6mr_cache_find_parent net/ipv6/ip6mr.c:981 [inline] ip6mr_mfc_delete+0x1fe/0x6b0 net/ipv6/ip6mr.c:1221 ip6_mroute_setsockopt+0x15c6/0x1d70 net/ipv6/ip6mr.c:1698 do_ipv6_setsockopt.isra.9+0x422/0x4660 net/ipv6/ipv6_sockglue.c:163 ipv6_setsockopt+0xbd/0x170 net/ipv6/ipv6_sockglue.c:922 rawv6_setsockopt+0x59/0x140 net/ipv6/raw.c:1060 sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:3039 __sys_setsockopt+0x1bd/0x390 net/socket.c:1903 __do_sys_setsockopt net/socket.c:1914 [inline] __se_sys_setsockopt net/socket.c:1911 [inline] __x64_sys_setsockopt+0xbe/0x150 net/socket.c:1911 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 8fb472c09b9d ("ipmr: improve hash scalability") Fixes: 0bbbf0e7d0e7 ("ipmr, ip6mr: Unite creation of new mr_table") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Cc: Yuval Mintz <yuvalm@mellanox.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ip6_tunnel: remove magic mtu value 0xFFF8Nicolas Dichtel
[ Upstream commit f7ff1fde9441b4fcc8ffb6e66e6e5a00d008937e ] I don't know where this value comes from (probably a copy and paste and paste and paste ...). Let's use standard values which are a bit greater. Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11ip6mr: only set ip6mr_table from setsockopt when ip6mr_new_table succeedsSabrina Dubroca
[ Upstream commit 848235edb5c93ed086700584c8ff64f6d7fc778d ] Currently, raw6_sk(sk)->ip6mr_table is set unconditionally during ip6_mroute_setsockopt(MRT6_TABLE). A subsequent attempt at the same setsockopt will fail with -ENOENT, since we haven't actually created that table. A similar fix for ipv4 was included in commit 5e1859fbcc3c ("ipv4: ipmr: various fixes and cleanups"). Fixes: d1db275dd3f6 ("ipv6: ip6mr: support multiple tables") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11enic: set DMA mask to 47 bitGovindarajulu Varadarajan
[ Upstream commit 322eaa06d55ebc1402a4a8d140945cff536638b4 ] In commit 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") DMA mask was changed from 40 bits to 64 bits. Hardware actually supports only 47 bits. Fixes: 624dbf55a359b ("driver/net: enic: Try DMA 64 first, then failover to DMA") Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-06-11dccp: don't free ccid2_hc_tx_sock struct in dccp_disconnect()Alexey Kodanev
[ Upstream commit 2677d20677314101293e6da0094ede7b5526d2b1 ] Syzbot reported the use-after-free in timer_is_static_object() [1]. This can happen because the structure for the rto timer (ccid2_hc_tx_sock) is removed in dccp_disconnect(), and ccid2_hc_tx_rto_expire() can be called after that. The report [1] is similar to the one in commit 120e9dabaf55 ("dccp: defer ccid_hc_tx_delete() at dismantle time"). And the fix is the same, delay freeing ccid2_hc_tx_sock structure, so that it is freed in dccp_sk_destruct(). [1] ================================================================== BUG: KASAN: use-after-free in timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 Read of size 8 at addr ffff8801bebb5118 by task syz-executor2/25299 CPU: 1 PID: 25299 Comm: syz-executor2 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 timer_is_static_object+0x80/0x90 kernel/time/timer.c:607 debug_object_activate+0x2d9/0x670 lib/debugobjects.c:508 debug_timer_activate kernel/time/timer.c:709 [inline] debug_activate kernel/time/timer.c:764 [inline] __mod_timer kernel/time/timer.c:1041 [inline] mod_timer+0x4d3/0x13b0 kernel/time/timer.c:1102 sk_reset_timer+0x22/0x60 net/core/sock.c:2742 ccid2_hc_tx_rto_expire+0x587/0x680 net/dccp/ccids/ccid2.c:147 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> ... Allocated by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 ccid_new+0x25b/0x3e0 net/dccp/ccid.c:151 dccp_hdlr_ccid+0x27/0x150 net/dccp/feat.c:44 __dccp_feat_activate+0x184/0x270 net/dccp/feat.c:344 dccp_feat_activate_values+0x3a7/0x819 net/dccp/feat.c:1538 dccp_create_openreq_child+0x472/0x610 net/dccp/minisocks.c:128 dccp_v4_request_recv_sock+0x12c/0xca0 net/dccp/ipv4.c:408 dccp_v6_request_recv_sock+0x125d/0x1f10 net/dccp/ipv6.c:415 dccp_check_req+0x455/0x6a0 net/dccp/minisocks.c:197 dccp_v4_rcv+0x7b8/0x1f3f net/dccp/ipv4.c:841 ip_local_deliver_finish+0x2e3/0xd80 net/ipv4/ip_input.c:215 NF_HOOK include/linux/netfilter.h:288 [inline] ip_local_deliver+0x1e1/0x720 net/ipv4/ip_input.c:256 dst_input include/net/dst.h:450 [inline] ip_rcv_finish+0x81b/0x2200 net/ipv4/ip_input.c:396 NF_HOOK include/linux/netfilter.h:288 [inline] ip_rcv+0xb70/0x143d net/ipv4/ip_input.c:492 __netif_receive_skb_core+0x26f5/0x3630 net/core/dev.c:4592 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4657 process_backlog+0x219/0x760 net/core/dev.c:5337 napi_poll net/core/dev.c:5735 [inline] net_rx_action+0x7b7/0x1930 net/core/dev.c:5801 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 Freed by task 25374: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 ccid_hc_tx_delete+0xc3/0x100 net/dccp/ccid.c:190 dccp_disconnect+0x130/0xc66 net/dccp/proto.c:286 dccp_close+0x3bc/0xe60 net/dccp/proto.c:1045 inet_release+0x104/0x1f0 net/ipv4/af_inet.c:427 inet6_release+0x50/0x70 net/ipv6/af_inet6.c:460 sock_release+0x96/0x1b0 net/socket.c:594 sock_close+0x16/0x20 net/socket.c:1149 __fput+0x34d/0x890 fs/file_table.c:209 ____fput+0x15/0x20 fs/file_table.c:243 task_work_run+0x1e4/0x290 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop+0x2bd/0x310 arch/x86/entry/common.c:166 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:265 [inline] do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8801bebb4cc0 which belongs to the cache ccid2_hc_tx_sock of size 1240 The buggy address is located 1112 bytes inside of 1240-byte region [ffff8801bebb4cc0, ffff8801bebb5198) The buggy address belongs to the page: page:ffffea0006faed00 count:1 mapcount:0 mapping:ffff8801bebb41c0 index:0xffff8801bebb5240 compound_mapcount: 0 flags: 0x2fffc0000008100(slab|head) raw: 02fffc0000008100 ffff8801bebb41c0 ffff8801bebb5240 0000000100000003 raw: ffff8801cdba3138 ffffea0007634120 ffff8801cdbaab40 0000000000000000 page dumped because: kasan: bad access detected ... ================================================================== Reported-by: syzbot+5d47e9ec91a6f15dbd6f@syzkaller.appspotmail.com Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>