summaryrefslogtreecommitdiff
path: root/drivers/net
AgeCommit message (Collapse)Author
2023-11-23net: ipa: fix one GSI register field widthAlex Elder
The width of the R_LENGTH field of the EV_CH_E_CNTXT_1 GSI register is 24 bits (not 20 bits) starting with IPA v5.0. Fix this. Fixes: faf0678ec8a0 ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alex Elder <elder@linaro.org> Link: https://lore.kernel.org/r/20231122231708.896632-1-elder@linaro.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-23net: axienet: Fix check for partial TX checksumSamuel Holland
Due to a typo, the code checked the RX checksum feature in the TX path. Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Samuel Holland <samuel.holland@sifive.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Link: https://lore.kernel.org/r/20231122004219.3504219-1-samuel.holland@sifive.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-23i40e: Fix adding unsupported cloud filtersIvan Vecera
If a VF tries to add unsupported cloud filter through virtchnl then i40e_add_del_cloud_filter(_big_buf) returns -ENOTSUPP but this error code is stored in 'ret' instead of 'aq_ret' that is used as error code sent back to VF. In this scenario where one of the mentioned functions fails the value of 'aq_ret' is zero so the VF will incorrectly receive a 'success'. Use 'aq_ret' to store return value and remove 'ret' local variable. Additionally fix the issue when filter allocation fails, in this case no notification is sent back to the VF. Fixes: e284fc280473 ("i40e: Add and delete cloud filter") Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Rafal Romanowski <rafal.romanowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20231121211338.3348677-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-23ice: restore timestamp configuration after device resetJacob Keller
The driver calls ice_ptp_cfg_timestamp() during ice_ptp_prepare_for_reset() to disable timestamping while the device is resetting. This operation destroys the user requested configuration. While the driver does call ice_ptp_cfg_timestamp in ice_rebuild() to restore some hardware settings after a reset, it unconditionally passes true or false, resulting in failure to restore previous user space configuration. This results in a device reset forcibly disabling timestamp configuration regardless of current user settings. This was not detected previously due to a quirk of the LinuxPTP ptp4l application. If ptp4l detects a missing timestamp, it enters a fault state and performs recovery logic which includes executing SIOCSHWTSTAMP again, restoring the now accidentally cleared configuration. Not every application does this, and for these applications, timestamps will mysteriously stop after a PF reset, without being restored until an application restart. Fix this by replacing ice_ptp_cfg_timestamp() with two new functions: 1) ice_ptp_disable_timestamp_mode() which unconditionally disables the timestamping logic in ice_ptp_prepare_for_reset() and ice_ptp_release() 2) ice_ptp_restore_timestamp_mode() which calls ice_ptp_restore_tx_interrupt() to restore Tx timestamping configuration, calls ice_set_rx_tstamp() to restore Rx timestamping configuration, and issues an immediate TSYN_TX interrupt to ensure that timestamps which may have occurred during the device reset get processed. Modify the ice_ptp_set_timestamp_mode to directly save the user configuration and then call ice_ptp_restore_timestamp_mode. This way, reset no longer destroys the saved user configuration. This obsoletes the ice_set_tx_tstamp() function which can now be safely removed. With this change, all devices should now restore Tx and Rx timestamping functionality correctly after a PF reset without application intervention. Fixes: 77a781155a65 ("ice: enable receive hardware timestamping") Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23ice: unify logic for programming PFINT_TSYN_MSKJacob Keller
Commit d938a8cca88a ("ice: Auxbus devices & driver for E822 TS") modified how Tx timestamps are handled for E822 devices. On these devices, only the clock owner handles reading the Tx timestamp data from firmware. To do this, the PFINT_TSYN_MSK register is modified from the default value to one which enables reacting to a Tx timestamp on all PHY ports. The driver currently programs PFINT_TSYN_MSK in different places depending on whether the port is the clock owner or not. For the clock owner, the PFINT_TSYN_MSK value is programmed during ice_ptp_init_owner just before calling ice_ptp_tx_ena_intr to program the PHY ports. For the non-clock owner ports, the PFINT_TSYN_MSK is programmed during ice_ptp_init_port. If a large enough device reset occurs, the PFINT_TSYN_MSK register will be reset to the default value in which only the PHY associated directly with the PF will cause the Tx timestamp interrupt to trigger. The driver lacks logic to reprogram the PFINT_TSYN_MSK register after a device reset. For the E822 device, this results in the PF no longer responding to interrupts for other ports. This results in failure to deliver Tx timestamps to user space applications. Rename ice_ptp_configure_tx_tstamp to ice_ptp_cfg_tx_interrupt, and unify the logic for programming PFINT_TSYN_MSK and PFINT_OICR_ENA into one place. This function will program both registers according to the combination of user configuration and device requirements. This ensures that PFINT_TSYN_MSK is always restored when we configure the Tx timestamp interrupt. Fixes: d938a8cca88a ("ice: Auxbus devices & driver for E822 TS") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23ice: remove ptp_tx ring parameter flagJacob Keller
Before performing a Tx timestamp in ice_stamp(), the driver checks a ptp_tx ring variable to see if timestamping is enabled on that ring. This value is set for all rings whenever userspace configures Tx timestamping. Ostensibly this was done to avoid wasting cycles checking other fields when timestamping has not been enabled. However, for Tx timestamps we already get an individual per-SKB flag indicating whether userspace wants to request a timestamp on that packet. We do not gain much by also having a separate flag to check for whether timestamping was enabled. In fact, the driver currently fails to restore the field after a PF reset. Because of this, if a PF reset occurs, timestamps will be disabled. Since this flag doesn't add value in the hotpath, remove it and always provide a timestamp if the SKB flag has been set. A following change will fix the reset path to properly restore user timestamping configuration completely. This went unnoticed for some time because one of the most common applications using Tx timestamps, ptp4l, will reconfigure the socket as part of its fault recovery logic. Fixes: ea9b847cda64 ("ice: enable transmit timestamps for E810 devices") Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23amd-xgbe: propagate the correct speed and duplex statusRaju Rangoju
xgbe_get_link_ksettings() does not propagate correct speed and duplex information to ethtool during cable unplug. Due to which ethtool reports incorrect values for speed and duplex. Address this by propagating correct information. Fixes: 7c12aa08779c ("amd-xgbe: Move the PHY support into amd-xgbe") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23amd-xgbe: handle the corner-case during tx completionRaju Rangoju
The existing implementation uses software logic to accumulate tx completions until the specified time (1ms) is met and then poll them. However, there exists a tiny gap which leads to a race between resetting and checking the tx_activate flag. Due to this the tx completions are not reported to upper layer and tx queue timeout kicks-in restarting the device. To address this, introduce a tx cleanup mechanism as part of the periodic maintenance process. Fixes: c5aa9e3b8156 ("amd-xgbe: Initial AMD 10GbE platform driver") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23amd-xgbe: handle corner-case during sfp hotplugRaju Rangoju
Force the mode change for SFI in Fixed PHY configurations. Fixed PHY configurations needs PLL to be enabled while doing mode set. When the SFP module isn't connected during boot, driver assumes AN is ON and attempts auto-negotiation. However, if the connected SFP comes up in Fixed PHY configuration the link will not come up as PLL isn't enabled while the initial mode set command is issued. So, force the mode change for SFI in Fixed PHY configuration to fix link issues. Fixes: e57f7a3feaef ("amd-xgbe: Prepare for working with more than one type of phy") Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23net: veth: fix ethtool stats reportingLorenzo Bianconi
Fix a possible misalignment between page_pool stats and tx xdp_stats reported in veth_get_ethtool_stats routine. The issue can be reproduced configuring the veth pair with the following tx/rx queues: $ip link add v0 numtxqueues 2 numrxqueues 4 type veth peer name v1 \ numtxqueues 1 numrxqueues 1 and loading a simple XDP program on v0 that just returns XDP_PASS. In this case on v0 the page_pool stats overwrites tx xdp_stats for queue 1. Fix the issue incrementing pp_idx of dev->real_num_tx_queues * VETH_TQ_STATS_LEN since we always report xdp_stats for all tx queues in ethtool. Fixes: 4fc418053ec7 ("net: veth: add page_pool stats") Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/c5b5d0485016836448453f12846c7c4ab75b094a.1700593593.git.lorenzo@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23octeontx2-pf: Fix ntuple rule creation to direct packet to VF with higher Rx ↵Suman Ghosh
queue than its PF It is possible to add a ntuple rule which would like to direct packet to a VF whose number of queues are greater/less than its PF's queue numbers. For example a PF can have 2 Rx queues but a VF created on that PF can have 8 Rx queues. As of today, ntuple rule will reject rule because it is checking the requested queue number against PF's number of Rx queues. As a part of this fix if the action of a ntuple rule is to move a packet to a VF's queue then the check is removed. Also, a debug information is printed to aware user that it is user's responsibility to cross check if the requested queue number on that VF is a valid one. Fixes: f0a1913f8a6f ("octeontx2-pf: Add support for ethtool ntuple filters") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20231121165624.3664182-1-sumang@marvell.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-23net: usb: qmi_wwan: claim interface 4 for ZTE MF290Lech Perczak
Interface 4 is used by for QMI interface in stock firmware of MF28D, the router which uses MF290 modem. Rebind it to qmi_wwan after freeing it up from option driver. The proper configuration is: Interface mapping is: 0: QCDM, 1: (unknown), 2: AT (PCUI), 2: AT (Modem), 4: QMI T: Bus=01 Lev=02 Prnt=02 Port=00 Cnt=01 Dev#= 4 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=19d2 ProdID=0189 Rev= 0.00 S: Manufacturer=ZTE, Incorporated S: Product=ZTE LTE Technologies MSM C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms I:* If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=4ms Cc: Bjørn Mork <bjorn@mork.no> Signed-off-by: Lech Perczak <lech.perczak@gmail.com> Link: https://lore.kernel.org/r/20231117231918.100278-3-lech.perczak@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-22Revert "net: r8169: Disable multicast filter for RTL8168H and RTL8107E"Heiner Kallweit
This reverts commit efa5f1311c4998e9e6317c52bc5ee93b3a0f36df. I couldn't reproduce the reported issue. What I did, based on a pcap packet log provided by the reporter: - Used same chip version (RTL8168h) - Set MAC address to the one used on the reporters system - Replayed the EAPOL unicast packet that, according to the reporter, was filtered out by the mc filter. The packet was properly received. Therefore the root cause of the reported issue seems to be somewhere else. Disabling mc filtering completely for the most common chip version is a quite big hammer. Therefore revert the change and wait for further analysis results from the reporter. Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-21Merge tag 'for-netdev' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2023-11-21 We've added 19 non-merge commits during the last 4 day(s) which contain a total of 18 files changed, 1043 insertions(+), 416 deletions(-). The main changes are: 1) Fix BPF verifier to validate callbacks as if they are called an unknown number of times in order to fix not detecting some unsafe programs, from Eduard Zingerman. 2) Fix bpf_redirect_peer() handling which missed proper stats accounting for veth and netkit and also generally fix missing stats for the latter, from Peilin Ye, Daniel Borkmann et al. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: check if max number of bpf_loop iterations is tracked bpf: keep track of max number of bpf_loop callback iterations selftests/bpf: test widening for iterating callbacks bpf: widening for callback iterators selftests/bpf: tests for iterating callbacks bpf: verify callbacks as if they are called unknown number of times bpf: extract setup_func_entry() utility function bpf: extract __check_reg_arg() utility function selftests/bpf: fix bpf_loop_bench for new callback verification scheme selftests/bpf: track string payload offset as scalar in strobemeta selftests/bpf: track tcp payload offset as scalar in xdp_synproxy selftests/bpf: Add netkit to tc_redirect selftest selftests/bpf: De-veth-ize the tc_redirect test case bpf, netkit: Add indirect call wrapper for fetching peer dev bpf: Fix dev's rx stats for bpf_redirect_peer traffic veth: Use tstats per-CPU traffic counters netkit: Add tstats per-CPU traffic counters net: Move {l,t,d}stats allocation to core and convert veth & vrf net, vrf: Move dstats structure to core ==================== Link: https://lore.kernel.org/r/20231121193113.11796-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21net: usb: ax88179_178a: fix failed operations during ax88179_resetJose Ignacio Tornos Martinez
Using generic ASIX Electronics Corp. AX88179 Gigabit Ethernet device, the following test cycle has been implemented: - power on - check logs - shutdown - after detecting the system shutdown, disconnect power - after approximately 60 seconds of sleep, power is restored Running some cycles, sometimes error logs like this appear: kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to write reg index 0x0001: -19 kernel: ax88179_178a 2-9:1.0 (unnamed net_device) (uninitialized): Failed to read reg index 0x0001: -19 ... These failed operation are happening during ax88179_reset execution, so the initialization could not be correct. In order to avoid this, we need to increase the delay after reset and clock initial operations. By using these larger values, many cycles have been run and no failed operations appear. It would be better to check some status register to verify when the operation has finished, but I do not have found any available information (neither in the public datasheets nor in the manufacturer's driver). The only available information for the necessary delays is the maufacturer's driver (original values) but the proposed values are not enough for the tested devices. Fixes: e2ca90c276e1f ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver") Reported-by: Herb Wei <weihao.bj@ieisystem.com> Tested-by: Herb Wei <weihao.bj@ieisystem.com> Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Link: https://lore.kernel.org/r/20231120120642.54334-1-jtornosm@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-21hv_netvsc: Mark VF as slave before exposing it to user-modeLong Li
When a VF is being exposed form the kernel, it should be marked as "slave" before exposing to the user-mode. The VF is not usable without netvsc running as master. The user-mode should never see a VF without the "slave" flag. This commit moves the code of setting the slave flag to the time before VF is exposed to user-mode. Cc: stable@vger.kernel.org Fixes: 0c195567a8f6 ("netvsc: transparent VF management") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21hv_netvsc: Fix race of register_netdevice_notifier and VF registerHaiyang Zhang
If VF NIC is registered earlier, NETDEV_REGISTER event is replayed, but NETDEV_POST_INIT is not. Move register_netdevice_notifier() earlier, so the call back function is set before probing. Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-21hv_netvsc: fix race of netvsc and VF register_netdeviceHaiyang Zhang
The rtnl lock also needs to be held before rndis_filter_device_add() which advertises nvsp_2_vsc_capability / sriov bit, and triggers VF NIC offering and registering. If VF NIC finished register_netdev() earlier it may cause name based config failure. To fix this issue, move the call to rtnl_lock() before rndis_filter_device_add(), so VF will be registered later than netvsc / synthetic NIC, and gets a name numbered (ethX) after netvsc. Cc: stable@vger.kernel.org Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()") Reported-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-20bpf, netkit: Add indirect call wrapper for fetching peer devDaniel Borkmann
ndo_get_peer_dev is used in tcx BPF fast path, therefore make use of indirect call wrapper and therefore optimize the bpf_redirect_peer() internal handling a bit. Add a small skb_get_peer_dev() wrapper which utilizes the INDIRECT_CALL_1() macro instead of open coding. Future work could potentially add a peer pointer directly into struct net_device in future and convert veth and netkit over to use it so that eventually ndo_get_peer_dev can be removed. Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20231114004220.6495-7-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-20veth: Use tstats per-CPU traffic countersPeilin Ye
Currently veth devices use the lstats per-CPU traffic counters, which only cover TX traffic. veth_get_stats64() actually populates RX stats of a veth device from its peer's TX counters, based on the assumption that a veth device can _only_ receive packets from its peer, which is no longer true: For example, recent CNIs (like Cilium) can use the bpf_redirect_peer() BPF helper to redirect traffic from NIC's tc ingress to veth's tc ingress (in a different netns), skipping veth's peer device. Unfortunately, this kind of traffic isn't currently accounted for in veth's RX stats. In preparation for the fix, use tstats (instead of lstats) to maintain both RX and TX counters for each veth device. We'll use RX counters for bpf_redirect_peer() traffic, and keep using TX counters for the usual "peer-to-peer" traffic. In veth_get_stats64(), calculate RX stats by _adding_ RX count to peer's TX count, in order to cover both kinds of traffic. veth_stats_rx() might need a name change (perhaps to "veth_stats_xdp()") for less confusion, but let's leave it to another patch to keep the fix minimal. Signed-off-by: Peilin Ye <peilin.ye@bytedance.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20231114004220.6495-5-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-20netkit: Add tstats per-CPU traffic countersDaniel Borkmann
Add dev->tstats traffic accounting to netkit. The latter contains per-CPU RX and TX counters. The dev's TX counters are bumped upon pass/unspec as well as redirect verdicts, in other words, on everything except for drops. The dev's RX counters are bumped upon successful __netif_rx(), as well as from skb_do_redirect() (not part of this commit here). Using dev->lstats with having just a single packets/bytes counter and inferring one another's RX counters from the peer dev's lstats is not possible given skb_do_redirect() can also bump the device's stats. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://lore.kernel.org/r/20231114004220.6495-4-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-20net: Move {l,t,d}stats allocation to core and convert veth & vrfDaniel Borkmann
Move {l,t,d}stats allocation to the core and let netdevs pick the stats type they need. That way the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc) - all happening in the core. Co-developed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231114004220.6495-3-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-20net, vrf: Move dstats structure to coreDaniel Borkmann
Just move struct pcpu_dstats out of the vrf into the core, and streamline the field names slightly, so they better align with the {t,l}stats ones. No functional change otherwise. A conversion of the u64s to u64_stats_t could be done at a separate point in future. This move is needed as we are moving the {t,l,d}stats allocation/freeing to the core. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Cc: Jakub Kicinski <kuba@kernel.org> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20231114004220.6495-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-11-19octeontx2-pf: Fix memory leak during interface downSuman Ghosh
During 'ifconfig <netdev> down' one RSS memory was not getting freed. This patch fixes the same. Fixes: 81a4362016e7 ("octeontx2-pf: Add RSS multi group support") Signed-off-by: Suman Ghosh <sumang@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19wireguard: use DEV_STATS_INC()Eric Dumazet
wg_xmit() can be called concurrently, KCSAN reported [1] some device stats updates can be lost. Use DEV_STATS_INC() for this unlikely case. [1] BUG: KCSAN: data-race in wg_xmit / wg_xmit read-write to 0xffff888104239160 of 8 bytes by task 1375 on cpu 0: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... read-write to 0xffff888104239160 of 8 bytes by task 1378 on cpu 1: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... v2: also change wg_packet_consume_data_done() (Hangbin Liu) and wg_packet_purge_staged_packets() Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-19net: wangxun: fix kernel panic due to null pointerJiawen Wu
When the device uses a custom subsystem vendor ID, the function wx_sw_init() returns before the memory of 'wx->mac_table' is allocated. The null pointer will causes the kernel panic. Fixes: 79625f45ca73 ("net: wangxun: Move MAC address handling to libwx") Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16stmmac: dwmac-loongson: Add architecture dependencyJean Delvare
Only present the DWMAC_LOONGSON option on architectures where it can actually be used. This follows the same logic as the DWMAC_INTEL option. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Keguang Zhang <keguang.zhang@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16usb: aqc111: check packet for fixup for true limitOliver Neukum
If a device sends a packet that is inbetween 0 and sizeof(u64) the value passed to skb_trim() as length will wrap around ending up as some very large value. The driver will then proceed to parse the header located at that position, which will either oops or process some random value. The fix is to check against sizeof(u64) rather than 0, which the driver currently does. The issue exists since the introduction of the driver. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-11-16Merge tag 'net-6.7-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from BPF and netfilter. Current release - regressions: - core: fix undefined behavior in netdev name allocation - bpf: do not allocate percpu memory at init stage - netfilter: nf_tables: split async and sync catchall in two functions - mptcp: fix possible NULL pointer dereference on close Current release - new code bugs: - eth: ice: dpll: fix initial lock status of dpll Previous releases - regressions: - bpf: fix precision backtracking instruction iteration - af_unix: fix use-after-free in unix_stream_read_actor() - tipc: fix kernel-infoleak due to uninitialized TLV value - eth: bonding: stop the device in bond_setup_by_slave() - eth: mlx5: - fix double free of encap_header - avoid referencing skb after free-ing in drop path - eth: hns3: fix VF reset - eth: mvneta: fix calls to page_pool_get_stats Previous releases - always broken: - core: set SOCK_RCU_FREE before inserting socket into hashtable - bpf: fix control-flow graph checking in privileged mode - eth: ppp: limit MRU to 64K - eth: stmmac: avoid rx queue overrun - eth: icssg-prueth: fix error cleanup on failing initialization - eth: hns3: fix out-of-bounds access may occur when coalesce info is read via debugfs - eth: cortina: handle large frames Misc: - selftests: gso: support CONFIG_MAX_SKB_FRAGS up to 45" * tag 'net-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (78 commits) macvlan: Don't propagate promisc change to lower dev in passthru net: sched: do not offload flows with a helper in act_ct net/mlx5e: Check return value of snprintf writing to fw_version buffer for representors net/mlx5e: Check return value of snprintf writing to fw_version buffer net/mlx5e: Reduce the size of icosq_str net/mlx5: Increase size of irq name buffer net/mlx5e: Update doorbell for port timestamping CQ before the software counter net/mlx5e: Track xmit submission to PTP WQ after populating metadata map net/mlx5e: Avoid referencing skb after free-ing in drop path of mlx5e_sq_xmit_wqe net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offload net/mlx5e: Fix pedit endianness net/mlx5e: fix double free of encap_header in update funcs net/mlx5e: fix double free of encap_header net/mlx5: Decouple PHC .adjtime and .adjphase implementations net/mlx5: DR, Allow old devices to use multi destination FTE net/mlx5: Free used cpus mask when an IRQ is released Revert "net/mlx5: DR, Supporting inline WQE when possible" bpf: Do not allocate percpu memory at init stage net: Fix undefined behavior in netdev name allocation dt-bindings: net: ethernet-controller: Fix formatting error ...
2023-11-16macvlan: Don't propagate promisc change to lower dev in passthruVlad Buslov
Macvlan device in passthru mode sets its lower device promiscuous mode according to its MACVLAN_FLAG_NOPROMISC flag instead of synchronizing it to its own promiscuity setting. However, macvlan_change_rx_flags() function doesn't check the mode before propagating such changes to the lower device which can cause net_device->promiscuity counter overflow as illustrated by reproduction example [0] and resulting dmesg log [1]. Fix the issue by first verifying the mode in macvlan_change_rx_flags() function before propagating promiscuous mode change to the lower device. [0]: ip link add macvlan1 link enp8s0f0 type macvlan mode passthru ip link set macvlan1 promisc on ip l set dev macvlan1 up ip link set macvlan1 promisc off ip l set dev macvlan1 down ip l set dev macvlan1 up [1]: [ 5156.281724] macvlan1: entered promiscuous mode [ 5156.285467] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode [ 5156.287639] macvlan1: left promiscuous mode [ 5156.288339] mlx5_core 0000:08:00.0 enp8s0f0: left promiscuous mode [ 5156.290907] mlx5_core 0000:08:00.0 enp8s0f0: entered promiscuous mode [ 5156.317197] mlx5_core 0000:08:00.0 enp8s0f0: promiscuity touches roof, set promiscuity failed. promiscuity feature of device might be broken. Fixes: efdbd2b30caa ("macvlan: Propagate promiscuity setting to lower devices.") Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20231114175915.1649154-1-vladbu@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-11-15net/mlx5e: Check return value of snprintf writing to fw_version buffer for ↵Rahul Rameshbabu
representors Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. This logic for representors is identical to normal flow with ethtool. Without check, will trigger -Wformat-truncation with W=1. drivers/net/ethernet/mellanox/mlx5/core/en_rep.c: In function 'mlx5e_rep_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:78:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 78 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_rep.c:77:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 77 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 78 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 79 | fw_rev_maj(mdev), fw_rev_min(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 80 | fw_rev_sub(mdev), mdev->board_id); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: cf83c8fdcd47 ("net/mlx5e: Add missing ethtool driver info for representors") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-16-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Check return value of snprintf writing to fw_version bufferRahul Rameshbabu
Treat the operation as an error case when the return value is equivalent to the size of the name buffer. Failed to write null terminator to the name buffer, making the string malformed and should not be used. Provide a string with only the firmware version when forming the string with the board id fails. Without check, will trigger -Wformat-truncation with W=1. drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c: In function 'mlx5e_ethtool_get_drvinfo': drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:49:31: warning: '%.16s' directive output may be truncated writing up to 16 bytes into a region of size between 13 and 22 [-Wformat-truncation=] 49 | "%d.%d.%04d (%.16s)", | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c:48:9: note: 'snprintf' output between 12 and 37 bytes into a destination of size 32 48 | snprintf(drvinfo->fw_version, sizeof(drvinfo->fw_version), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 49 | "%d.%d.%04d (%.16s)", | ~~~~~~~~~~~~~~~~~~~~~ 50 | fw_rev_maj(mdev), fw_rev_min(mdev), fw_rev_sub(mdev), | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 51 | mdev->board_id); | ~~~~~~~~~~~~~~~ Fixes: 84e11edb71de ("net/mlx5e: Show board id in ethtool driver information") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Reduce the size of icosq_strSaeed Mahameed
icosq_str size is unnecessarily too long, and it causes a build warning -Wformat-truncation with W=1. Looking closely, It doesn't need to be 255B, hence this patch reduces the size to 32B which should be more than enough to host the string: "ICOSQ: 0x%x, ". While here, add a missing space in the formatted string. This fixes the following build warning: $ KCFLAGS='-Wall -Werror' $ make O=/tmp/kbuild/linux W=1 -s -j12 drivers/net/ethernet/mellanox/mlx5/core/ drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c: In function 'mlx5e_reporter_rx_timeout': drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:718:56: error: ', CQ: 0x' directive output may be truncated writing 8 bytes into a region of size between 0 and 255 [-Werror=format-truncation=] 718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x", | ^~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en/reporter_rx.c:717:9: note: 'snprintf' output between 43 and 322 bytes into a destination of size 288 717 | snprintf(err_str, sizeof(err_str), | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 718 | "RX timeout on channel: %d, %sRQ: 0x%x, CQ: 0x%x", | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 719 | rq->ix, icosq_str, rq->rqn, rq->cq.mcq.cqn); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: 521f31af004a ("net/mlx5e: Allow RQ outside of channel context") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-14-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Increase size of irq name bufferRahul Rameshbabu
Without increased buffer size, will trigger -Wformat-truncation with W=1 for the snprintf operation writing to the buffer. drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c: In function 'mlx5_irq_alloc': drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:296:7: error: '@pci:' directive output may be truncated writing 5 bytes into a region of size between 1 and 32 [-Werror=format-truncation=] 296 | "%s@pci:%s", name, pci_name(dev->pdev)); | ^~~~~ drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c:295:2: note: 'snprintf' output 6 or more bytes (assuming 37) into a destination of size 32 295 | snprintf(irq->name, MLX5_MAX_IRQ_NAME, | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 296 | "%s@pci:%s", name, pci_name(dev->pdev)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fixes: ada9f5d00797 ("IB/mlx5: Fix eq names to display nicely in /proc/interrupts") Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6d4ab2e97dcfbcd748ae71761a9d8e5e41cc732c Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-13-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Update doorbell for port timestamping CQ before the software counterRahul Rameshbabu
Previously, mlx5e_ptp_poll_ts_cq would update the device doorbell with the incremented consumer index after the relevant software counters in the kernel were updated. In the mlx5e_sq_xmit_wqe context, this would lead to either overrunning the device CQ or exceeding the expected software buffer size in the device CQ if the device CQ size was greater than the software buffer size. Update the relevant software counter only after updating the device CQ consumer index in the port timestamping napi_poll context. Log: mlx5_core 0000:08:00.0: cq_err_event_notifier:517:(pid 0): CQ error on CQN 0x487, syndrome 0x1 mlx5_core 0000:08:00.0 eth2: mlx5e_cq_error_event: cqn=0x000487 event=0x04 Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-12-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Track xmit submission to PTP WQ after populating metadata mapRahul Rameshbabu
Ensure the skb is available in metadata mapping to skbs before tracking the metadata index for detecting undelivered CQEs. If the metadata index is put in the tracking list before putting the skb in the map, the metadata index might be used for detecting undelivered CQEs before the relevant skb is available in the map, which can lead to a null-ptr-deref. Log: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] CPU: 0 PID: 1243 Comm: kworker/0:2 Not tainted 6.6.0-rc4+ #108 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: events mlx5e_rx_dim_work [mlx5_core] RIP: 0010:mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core] Code: 8c 24 38 cc ff ff 4c 8d 3c c1 4c 89 f9 48 c1 e9 03 42 80 3c 31 00 0f 85 97 0f 00 00 4d 8b 3f 49 8d 7f 28 48 89 f9 48 c1 e9 03 <42> 80 3c 31 00 0f 85 8b 0f 00 00 49 8b 47 28 48 85 c0 0f 84 05 07 RSP: 0018:ffff8884d3c09c88 EFLAGS: 00010206 RAX: 0000000000000069 RBX: ffff8881160349d8 RCX: 0000000000000005 RDX: ffffed10218f48cf RSI: 0000000000000004 RDI: 0000000000000028 RBP: ffff888122707700 R08: 0000000000000001 R09: ffffed109a781383 R10: 0000000000000003 R11: 0000000000000003 R12: ffff88810c7a7a40 R13: ffff888122707700 R14: dffffc0000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8884d3c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f4f878dd6e0 CR3: 000000014d108002 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? die_addr+0x3c/0xa0 ? exc_general_protection+0x144/0x210 ? asm_exc_general_protection+0x22/0x30 ? mlx5e_ptp_napi_poll+0x9a4/0x2290 [mlx5_core] ? mlx5e_ptp_napi_poll+0x8f6/0x2290 [mlx5_core] __napi_poll.constprop.0+0xa4/0x580 net_rx_action+0x460/0xb80 ? _raw_spin_unlock_irqrestore+0x32/0x60 ? __napi_poll.constprop.0+0x580/0x580 ? tasklet_action_common.isra.0+0x2ef/0x760 __do_softirq+0x26c/0x827 irq_exit_rcu+0xc2/0x100 common_interrupt+0x7f/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:__kmem_cache_alloc_node+0xb/0x330 Code: 41 5d 41 5e 41 5f c3 8b 44 24 14 8b 4c 24 10 09 c8 eb d5 e8 b7 43 ca 01 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 <41> 56 41 89 d6 41 55 41 89 f5 41 54 49 89 fc 53 48 83 e4 f0 48 83 RSP: 0018:ffff88812c4079c0 EFLAGS: 00000246 RAX: 1ffffffff083c7fe RBX: ffff888100042dc0 RCX: 0000000000000218 RDX: 00000000ffffffff RSI: 0000000000000dc0 RDI: ffff888100042dc0 RBP: ffff88812c4079c8 R08: ffffffffa0289f96 R09: ffffed1025880ea9 R10: ffff888138839f80 R11: 0000000000000002 R12: 0000000000000dc0 R13: 0000000000000100 R14: 000000000000008c R15: ffff8881271fc450 ? cmd_exec+0x796/0x2200 [mlx5_core] kmalloc_trace+0x26/0xc0 cmd_exec+0x796/0x2200 [mlx5_core] mlx5_cmd_do+0x22/0xc0 [mlx5_core] mlx5_cmd_exec+0x17/0x30 [mlx5_core] mlx5_core_modify_cq_moderation+0x139/0x1b0 [mlx5_core] ? mlx5_add_cq_to_tasklet+0x280/0x280 [mlx5_core] ? lockdep_set_lock_cmp_fn+0x190/0x190 ? process_one_work+0x659/0x1220 mlx5e_rx_dim_work+0x9d/0x100 [mlx5_core] process_one_work+0x730/0x1220 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? max_active_store+0xf0/0xf0 ? assign_work+0x168/0x240 worker_thread+0x70f/0x12d0 ? __kthread_parkme+0xd1/0x1d0 ? process_one_work+0x1220/0x1220 kthread+0x2d9/0x3b0 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x2d/0x70 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Modules linked in: xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay mlx5_ib ib_uverbs ib_core zram zsmalloc mlx5_core fuse ---[ end trace 0000000000000000 ]--- Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-11-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Avoid referencing skb after free-ing in drop path of ↵Rahul Rameshbabu
mlx5e_sq_xmit_wqe When SQ is a port timestamping SQ for PTP, do not access tx flags of skb after free-ing the skb. Free the skb only after all references that depend on it have been handled in the dropped WQE path. Fixes: 3178308ad4ca ("net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-10-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Don't modify the peer sent-to-vport rules for IPSec offloadJianbo Liu
As IPSec packet offload in switchdev mode is not supported with LAG, it's unnecessary to modify those sent-to-vport rules to the peer eswitch. Fixes: c6c2bf5db4ea ("net/mlx5e: Support IPsec packet offload for TX in switchdev mode") Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-9-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: Fix pedit endiannessVlad Buslov
Referenced commit addressed endianness issue in mlx5 pedit implementation in ad hoc manner instead of systematically treating integer values according to their types which left pedit fields of sizes not equal to 4 and where the bytes being modified are not least significant ones broken on big endian machines since wrong bits will be consumed during parsing which leads to following example error when applying pedit to source and destination MAC addresses: [Wed Oct 18 12:52:42 2023] mlx5_core 0001:00:00.1 p1v3_r: attempt to offload an unsupported field (cmd 0) [Wed Oct 18 12:52:42 2023] mask: 00000000330c5b68: 00 00 00 00 ff ff 00 00 00 00 ff ff 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 0000000017d22fd9: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 000000008186d717: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 0000000029eb6149: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 000000007ed103e4: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 00000000db8101a6: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [Wed Oct 18 12:52:42 2023] mask: 00000000ec3c08a9: 00 00 00 00 00 00 00 00 00 00 00 00 ............ Treat masks and values of pedit and filter match as network byte order, refactor pointers to them to void pointers instead of confusing u32 pointers and only cast to pointer-to-integer when reading a value from them. Treat pedit mlx5_fields->field_mask as host byte order according to its type u32, change the constants in fields array accordingly. Fixes: 82198d8bcdef ("net/mlx5e: Fix endianness when calculating pedit mask first bit") Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-8-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: fix double free of encap_header in update funcsGavin Li
Follow up to the previous patch to fix the same issue for mlx5e_tc_tun_update_header_ipv4{6} when mlx5_packet_reformat_alloc() fails. When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_update_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue. mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header) This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success. Fixes: a54e20b4fcae ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Gavin Li <gavinl@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-7-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5e: fix double free of encap_headerDust Li
When mlx5_packet_reformat_alloc() fails, the encap_header allocated in mlx5e_tc_tun_create_header_ipv4{6} will be released within it. However, e->encap_header is already set to the previously freed encap_header before mlx5_packet_reformat_alloc(). As a result, the later mlx5e_encap_put() will free e->encap_header again, causing a double free issue. mlx5e_encap_put() --> mlx5e_encap_dealloc() --> kfree(e->encap_header) This happens when cmd: MLX5_CMD_OP_ALLOC_PACKET_REFORMAT_CONTEXT fail. This patch fix it by not setting e->encap_header until mlx5_packet_reformat_alloc() success. Fixes: d589e785baf5e ("net/mlx5e: Allow concurrent creation of encap entries") Reported-by: Cruz Zhao <cruzzhao@linux.alibaba.com> Reported-by: Tianchen Ding <dtcccc@linux.alibaba.com> Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Decouple PHC .adjtime and .adjphase implementationsRahul Rameshbabu
When running a phase adjustment operation, the free running clock should not be modified at all. The phase control keyword is intended to trigger an internal servo on the device that will converge to the provided delta. A free running counter cannot implement phase adjustment. Fixes: 8e11a68e2e8a ("net/mlx5: Add adjphase function to support hardware-only offset control") Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-5-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: DR, Allow old devices to use multi destination FTEErez Shitrit
The current check isn't aware of old devices that don't have the relevant FW capability. This patch allows multi destination FTE in old cards, as it was before this check. Fixes: f6f46e7173cb ("net/mlx5: DR, Add check for multi destination FTE") Signed-off-by: Erez Shitrit <erezsh@nvidia.com> Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-4-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15net/mlx5: Free used cpus mask when an IRQ is releasedMaher Sanalla
Each EQ table maintains a cpumask of the already used CPUs that are mapped to IRQs to ensure that each IRQ gets mapped to a unique CPU. However, on IRQ release, the said cpumask is not updated by clearing the CPU from the mask to allow future IRQ request, causing the following error when a SF is reloaded after it has utilized all CPUs for its IRQs: mlx5_irq_affinity_request:135:(pid 306010): Didn't find a matching IRQ. err = -28 Thus, when releasing an IRQ, clear its mapped CPU from the used CPUs mask, to prevent the case described above. While at it, move the used cpumask update to the EQ layer as it is more fitting and preserves symmetricity of the IRQ request/release API. Fixes: a1772de78d73 ("net/mlx5: Refactor completion IRQ request/release API") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-15Revert "net/mlx5: DR, Supporting inline WQE when possible"Itamar Gozlan
This reverts commit 95c337cce0e11d06a715da73e6796ade9216637f. The revert is required due to the suspicion it cause some tests fail and will be moved to further investigation. Fixes: 95c337cce0e1 ("net/mlx5: DR, Supporting inline WQE when possible") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20231114215846.5902-2-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-14gve: Fixes for napi_poll when budget is 0Ziwei Xiao
Netpoll will explicilty pass the polling call with a budget of 0 to indicate it's clearing the Tx path only. For the gve_rx_poll and gve_xdp_poll, they were mistakenly taking the 0 budget as the indication to do all the work. Add check to avoid the rx path and xdp path being called when budget is 0. And also avoid napi_complete_done being called when budget is 0 for netpoll. Fixes: f5cedc84a30d ("gve: Add transmit and receive support") Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Link: https://lore.kernel.org/r/20231114004144.2022268-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-14Merge branch '100GbE' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2023-11-13 (ice) This series contains updates to ice driver only. Arkadiusz ensures the device is initialized with valid lock status value. He also removes range checking of dpll priority to allow firmware to process the request; supported values are firmware dependent. Finally, he removes setting of can change capability for pins that cannot be changed. Dan restores ability to load a package which doesn't contain a signature segment. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: fix DDP package download for packages without signature segment ice: dpll: fix output pin capabilities ice: dpll: fix check for dpll input priority range ice: dpll: fix initial lock status of dpll ==================== Link: https://lore.kernel.org/r/20231113230551.548489-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-14pds_core: fix up some format-truncation complaintsShannon Nelson
Our friendly kernel test robot pointed out a couple of potential string truncation issues. None of which were we worried about, but can be relatively easily fixed to quiet the complaints. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310211736.66syyDpp-lkp@intel.com/ Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231113183257.71110-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-14pds_core: use correct index to mask irqShannon Nelson
Use the qcq's interrupt index, not the irq number, to mask the interrupt. Since the irq number can be out of range from the number of possible interrupts, we can end up accessing and potentially scribbling on out-of-range and/or unmapped memory, making the kernel angry. [ 3116.039364] BUG: unable to handle page fault for address: ffffbeea1c3edf84 [ 3116.047059] #PF: supervisor write access in kernel mode [ 3116.052895] #PF: error_code(0x0002) - not-present page [ 3116.058636] PGD 100000067 P4D 100000067 PUD 1001f2067 PMD 10f82e067 PTE 0 [ 3116.066221] Oops: 0002 [#1] SMP NOPTI [ 3116.092948] RIP: 0010:iowrite32+0x9/0x76 [ 3116.190452] Call Trace: [ 3116.193185] <IRQ> [ 3116.195430] ? show_trace_log_lvl+0x1d6/0x2f9 [ 3116.200298] ? show_trace_log_lvl+0x1d6/0x2f9 [ 3116.205166] ? pdsc_adminq_isr+0x43/0x55 [pds_core] [ 3116.210618] ? __die_body.cold+0x8/0xa [ 3116.214806] ? page_fault_oops+0x16d/0x1ac [ 3116.219382] ? exc_page_fault+0xbe/0x13b [ 3116.223764] ? asm_exc_page_fault+0x22/0x27 [ 3116.228440] ? iowrite32+0x9/0x76 [ 3116.232143] pdsc_adminq_isr+0x43/0x55 [pds_core] [ 3116.237627] __handle_irq_event_percpu+0x3a/0x184 [ 3116.243088] handle_irq_event+0x57/0xb0 [ 3116.247575] handle_edge_irq+0x87/0x225 [ 3116.252062] __common_interrupt+0x3e/0xbc [ 3116.256740] common_interrupt+0x7b/0x98 [ 3116.261216] </IRQ> [ 3116.263745] <TASK> [ 3116.266268] asm_common_interrupt+0x22/0x27 Reported-by: Joao Martins <joao.m.martins@oracle.com> Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231113183257.71110-2-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-11-14tg3: Increment tx_dropped in tg3_tso_bug()Alex Pakhunov
tg3_tso_bug() drops a packet if it cannot be segmented for any reason. The number of discarded frames should be incremented accordingly. Signed-off-by: Alex Pakhunov <alexey.pakhunov@spacex.com> Signed-off-by: Vincent Wong <vincent.wong2@spacex.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Link: https://lore.kernel.org/r/20231113182350.37472-2-alexey.pakhunov@spacex.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>