aboutsummaryrefslogtreecommitdiff
path: root/net/dsa/dsa_priv.h
AgeCommit message (Collapse)Author
2022-09-30net: dsa: don't leave dangling pointers in dp->pl when failingVladimir Oltean
There is a desire to simplify the dsa_port registration path with devlink, and this involves reworking a bit how user ports which fail to connect to their PHY (because it's missing) get reinitialized as UNUSED devlink ports. The desire is for the change to look something like this; basically dsa_port_setup() has failed, we just change dp->type and call dsa_port_setup() again. -/* Destroy the current devlink port, and create a new one which has the UNUSED - * flavour. - */ -static int dsa_port_reinit_as_unused(struct dsa_port *dp) +static int dsa_port_setup_as_unused(struct dsa_port *dp) { - dsa_port_devlink_teardown(dp); dp->type = DSA_PORT_TYPE_UNUSED; - return dsa_port_devlink_setup(dp); + return dsa_port_setup(dp); } For an UNUSED port, dsa_port_setup() mostly only calls dsa_port_devlink_setup() anyway, so we could get away with calling just that. But if we call the full blown dsa_port_setup(dp) (which will be needed to properly set dp->setup = true), the callee will have the tendency to go through this code block too, and call dsa_port_disable(dp): switch (dp->type) { case DSA_PORT_TYPE_UNUSED: dsa_port_disable(dp); break; That is not very good, because dsa_port_disable() has this hidden inside of it: if (dp->pl) phylink_stop(dp->pl); Fact is, we are not prepared to handle a call to dsa_port_disable() with a struct dsa_port that came from a previous (and failed) call to dsa_port_setup(). We do not clean up dp->pl, and this will make the second call to dsa_port_setup() call phylink_stop() on a dangling dp->pl pointer. Solve this by creating an API for phylink destruction which is symmetric to the phylink creation, and never leave dp->pl set to anything except NULL or a valid phylink structure. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-20net: dsa: allow masters to join a LAGVladimir Oltean
There are 2 ways in which a DSA user port may become handled by 2 CPU ports in a LAG: (1) its current DSA master joins a LAG ip link del bond0 && ip link add bond0 type bond mode 802.3ad ip link set eno2 master bond0 When this happens, all user ports with "eno2" as DSA master get automatically migrated to "bond0" as DSA master. (2) it is explicitly configured as such by the user # Before, the DSA master was eno3 ip link set swp0 type dsa master bond0 The design of this configuration is that the LAG device dynamically becomes a DSA master through dsa_master_setup() when the first physical DSA master becomes a LAG slave, and stops being so through dsa_master_teardown() when the last physical DSA master leaves. A LAG interface is considered as a valid DSA master only if it contains existing DSA masters, and no other lower interfaces. Therefore, we mainly rely on method (1) to enter this configuration. Each physical DSA master (LAG slave) retains its dev->dsa_ptr for when it becomes a standalone DSA master again. But the LAG master also has a dev->dsa_ptr, and this is actually duplicated from one of the physical LAG slaves, and therefore needs to be balanced when LAG slaves come and go. To the switch driver, putting DSA masters in a LAG is seen as putting their associated CPU ports in a LAG. We need to prepare cross-chip host FDB notifiers for CPU ports in a LAG, by calling the driver's ->lag_fdb_add method rather than ->port_fdb_add. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: propagate extack to port_lag_joinVladimir Oltean
Drivers could refuse to offload a LAG configuration for a variety of reasons, mainly having to do with its TX type. Additionally, since DSA masters may now also be LAG interfaces, and this will translate into a call to port_lag_join on the CPU ports, there may be extra restrictions there. Propagate the netlink extack to this DSA method in order for drivers to give a meaningful error message back to the user. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: allow the DSA master to be seen and changed through rtnetlinkVladimir Oltean
Some DSA switches have multiple CPU ports, which can be used to improve CPU termination throughput, but DSA, through dsa_tree_setup_cpu_ports(), sets up only the first one, leading to suboptimal use of hardware. The desire is to not change the default configuration but to permit the user to create a dynamic mapping between individual user ports and the CPU port that they are served by, configurable through rtnetlink. It is also intended to permit load balancing between CPU ports, and in that case, the foreseen model is for the DSA master to be a bonding interface whose lowers are the physical DSA masters. To that end, we create a struct rtnl_link_ops for DSA user ports with the "dsa" kind. We expose the IFLA_DSA_MASTER link attribute that contains the ifindex of the newly desired DSA master. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-09-20net: dsa: introduce dsa_port_get_master()Vladimir Oltean
There is a desire to support for DSA masters in a LAG. That configuration is intended to work by simply enslaving the master to a bonding/team device. But the physical DSA master (the LAG slave) still has a dev->dsa_ptr, and that cpu_dp still corresponds to the physical CPU port. However, we would like to be able to retrieve the LAG that's the upper of the physical DSA master. In preparation for that, introduce a helper called dsa_port_get_master() that replaces all occurrences of the dp->cpu_dp->master pattern. The distinction between LAG and non-LAG will be made later within the helper itself. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-23net: dsa: all DSA masters must be down when changing the tagging protocolVladimir Oltean
The fact that the tagging protocol is set and queried from the /sys/class/net/<dsa-master>/dsa/tagging file is a bit of a quirk from the single CPU port days which isn't aging very well now that DSA can have more than a single CPU port. This is because the tagging protocol is a switch property, yet in the presence of multiple CPU ports it can be queried and set from multiple sysfs files, all of which are handled by the same implementation. The current logic ensures that the net device whose sysfs file we're changing the tagging protocol through must be down. That net device is the DSA master, and this is fine for single DSA master / CPU port setups. But exactly because the tagging protocol is per switch [ tree, in fact ] and not per DSA master, this isn't fine any longer with multiple CPU ports, and we must iterate through the tree and find all DSA masters, and make sure that all of them are down. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-08-22net: dsa: rename dsa_port_link_{,un}register_ofVladimir Oltean
There is a subset of functions that applies only to shared (DSA and CPU) ports, yet this is difficult to comprehend by looking at their code alone. These are dsa_port_link_register_of(), dsa_port_link_unregister_of(), and the functions that only these 2 call. Rename this class of functions to dsa_shared_port_* to make this fact more evident, even if this goes against the apparent convention that function names in port.c must start with dsa_port_. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-12net: dsa: felix: manage host flooding using a specific driver callbackVladimir Oltean
At the time - commit 7569459a52c9 ("net: dsa: manage flooding on the CPU ports") - not introducing a dedicated switch callback for host flooding made sense, because for the only user, the felix driver, there was nothing different to do for the CPU port than set the flood flags on the CPU port just like on any other bridge port. There are 2 reasons why this approach is not good enough, however. (1) Other drivers, like sja1105, support configuring flooding as a function of {ingress port, egress port}, whereas the DSA ->port_bridge_flags() function only operates on an egress port. So with that driver we'd have useless host flooding from user ports which don't need it. (2) Even with the felix driver, support for multiple CPU ports makes it difficult to piggyback on ->port_bridge_flags(). The way in which the felix driver is going to support host-filtered addresses with multiple CPU ports is that it will direct these addresses towards both CPU ports (in a sort of multicast fashion), then restrict the forwarding to only one of the two using the forwarding masks. Consequently, flooding will also be enabled towards both CPU ports. However, ->port_bridge_flags() gets passed the index of a single CPU port, and that leaves the flood settings out of sync between the 2 CPU ports. This is to say, it's better to have a specific driver method for host flooding, which takes the user port as argument. This solves problem (1) by allowing the driver to do different things for different user ports, and problem (2) by abstracting the operation and letting the driver do whatever, rather than explicitly making the DSA core point to the CPU port it thinks needs to be touched. This new method also creates a problem, which is that cross-chip setups are not handled. However I don't have hardware right now where I can test what is the proper thing to do, and there isn't hardware compatible with multi-switch trees that supports host flooding. So it remains a problem to be tackled in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-20net: dsa: don't emit targeted cross-chip notifiers for MTU changeVladimir Oltean
A cross-chip notifier with "targeted_match=true" is one that matches only the local port of the switch that emitted it. In other words, passing through the cross-chip notifier layer serves no purpose. Eliminate this concept by calling directly ds->ops->port_change_mtu instead of emitting a targeted cross-chip notifier. This leaves the DSA_NOTIFIER_MTU event being emitted only for MTU updates on the CPU port, which need to be reflected also across all DSA links. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-20net: dsa: make cross-chip notifiers more efficient for host eventsVladimir Oltean
To determine whether a given port should react to the port targeted by the notifier, dsa_port_host_vlan_match() and dsa_port_host_address_match() look at the positioning of the switch port currently executing the notifier relative to the switch port for which the notifier was emitted. To maintain stylistic compatibility with the other match functions from switch.c, the host address and host VLAN match functions take the notifier information about targeted port, switch and tree indices as argument. However, these functions only use that information to retrieve the struct dsa_port *targeted_dp, which is an invariant for the outer loop that calls them. So it makes more sense to calculate the targeted dp only once, and pass it to them as argument. But furthermore, the targeted dp is actually known at the time the call to dsa_port_notify() is made. It is just that we decide to only save the indices of the port, switch and tree in the notifier structure, just to retrace our steps and find the dp again using dsa_switch_find() and dsa_to_port(). But both the above functions are relatively expensive, since they need to iterate through lists. It appears more straightforward to make all notifiers just pass the targeted dp inside their info structure, and have the code that needs the indices to look at info->dp->index instead of info->port, or info->dp->ds->index instead of info->sw_index, or info->dp->ds->dst->index instead of info->tree_index. For the sake of consistency, all cross-chip notifiers are converted to pass the "dp" directly. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-17net: dsa: Handle MST state changesTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes. When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question). Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Pass VLAN MSTI migration notifications to driverTobias Waldekranz
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-17net: dsa: Validate hardware support for MSTTobias Waldekranz
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging. When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported. At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events. Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-03-09net: dsa: felix: avoid early deletion of host FDB entriesVladimir Oltean
The Felix driver declares FDB isolation but puts all standalone ports in VID 0. This is mostly problem-free as discussed with Alvin here: https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870 however there is one catch. DSA still thinks that FDB entries are installed on the CPU port as many times as there are user ports, and this is problematic when multiple user ports share the same MAC address. Consider the default case where all user ports inherit their MAC address from the DSA master, and then the user runs: ip link set swp0 address 00:01:02:03:04:05 The above will make dsa_slave_set_mac_address() call dsa_port_standalone_host_fdb_add() for 00:01:02:03:04:05 in port 0's standalone database, and dsa_port_standalone_host_fdb_del() for the old address of swp0, again in swp0's standalone database. Both the ->port_fdb_add() and ->port_fdb_del() will be propagated down to the felix driver, which will end up deleting the old MAC address from the CPU port. But this is still in use by other user ports, so we end up breaking unicast termination for them. There isn't a problem in the fact that DSA keeps track of host standalone addresses in the individual database of each user port: some drivers like sja1105 need this. There also isn't a problem in the fact that some drivers choose the same VID/FID for all standalone ports. It is just that the deletion of these host addresses must be delayed until they are known to not be in use any longer, and only the driver has this knowledge. Since DSA keeps these addresses in &cpu_dp->fdbs and &cpu_db->mdbs, it is just a matter of walking over those lists and see whether the same MAC address is present on the CPU port in the port db of another user port. I have considered reusing the generic dsa_port_walk_fdbs() and dsa_port_walk_mdbs() schemes for this, but locking makes it difficult. In the ->port_fdb_add() method and co, &dp->addr_lists_lock is held, but dsa_port_walk_fdbs() also acquires that lock. Also, even assuming that we introduce an unlocked variant of the address iterator, we'd still need some relatively complex data structures, and a void *ctx in the dsa_fdb_walk_cb_t which we don't currently pass, such that drivers are able to figure out, after iterating, whether the same MAC address is or isn't present in the port db of another port. All the above, plus the fact that I expect other drivers to follow the same model as felix where all standalone ports use the same FID, made me conclude that a generic method provided by DSA is necessary: dsa_fdb_present_in_other_db() and the mdb equivalent. Felix calls this from the ->port_fdb_del() handler for the CPU port, when the database was classified to either a port db, or a LAG db. For symmetry, we also call this from ->port_fdb_add(), because if the address was installed once, then installing it a second time serves no purpose: it's already in hardware in VID 0 and it affects all standalone ports. This change moves dsa_db_equal() from switch.c to dsa.c, since it now has one more caller. Fixes: 54c319846086 ("net: mscc: ocelot: enforce FDB isolation when VLAN-unaware") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: install secondary unicast and multicast addresses as host FDB/MDBVladimir Oltean
In preparation of disabling flooding towards the CPU in standalone ports mode, identify the addresses requested by upper interfaces and use the new API for DSA FDB isolation to request the hardware driver to offload these as FDB or MDB objects. The objects belong to the user port's database, and are installed pointing towards the CPU port. Because dev_uc_add()/dev_mc_add() is VLAN-unaware, we offload to the port standalone database addresses with VID 0 (also VLAN-unaware). So this excludes switches with global VLAN filtering from supporting unicast filtering, because there, it is possible for a port of a switch to join a VLAN-aware bridge, and this changes the VLAN awareness of standalone ports, requiring VLAN-aware standalone host FDB entries. For the same reason, hellcreek, which requires VLAN awareness in standalone mode, is also exempted from unicast filtering. We create "standalone" variants of dsa_port_host_fdb_add() and dsa_port_host_mdb_add() (and the _del coresponding functions). We also create a separate work item type for handling deferred standalone host FDB/MDB entries compared to the switchdev one. This is done for the purpose of clarity - the procedure for offloading a bridge FDB entry is different than offloading a standalone one, and the switchdev event work handles only FDBs anyway, not MDBs. Deferral is needed for standalone entries because ndo_set_rx_mode runs in atomic context. We could probably optimize things a little by first queuing up all entries that need to be offloaded, and scheduling the work item just once, but the data structures that we can pass through __dev_uc_sync() and __dev_mc_sync() are limiting (there is nothing like a void *priv), so we'd have to keep the list of queued events somewhere in struct dsa_switch, and possibly a lock for it. Too complicated for now. Adding the address to the master is handled by dev_uc_sync(), adding it to the hardware is handled by __dev_uc_sync(). So this is the reason why dsa_port_standalone_host_fdb_add() does not call dev_uc_add(). Not that it had the rtnl_mutex anyway - ndo_set_rx_mode has it, but is atomic. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-03-03net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespaceVladimir Oltean
We are preparing to add API in port.c that adds FDB and MDB entries that correspond to the port's standalone database. Rename the existing methods to make it clear that the FDB and MDB entries offloaded come from the bridge database. Since the function names lengthen in dsa_slave_switchdev_event_work(), we place "addr" and "vid" in temporary variables, to shorten those. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: pass extack to .port_bridge_join driver methodsVladimir Oltean
As FDB isolation cannot be enforced between VLAN-aware bridges in lack of hardware assistance like extra FID bits, it seems plausible that many DSA switches cannot do it. Therefore, they need to reject configurations with multiple VLAN-aware bridges from the two code paths that can transition towards that state: - joining a VLAN-aware bridge - toggling VLAN awareness on an existing bridge The .port_vlan_filtering method already propagates the netlink extack to the driver, let's propagate it from .port_bridge_join too, to make sure that the driver can use the same function for both. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: request drivers to perform FDB isolationVladimir Oltean
For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges. The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del} aka the pre-crosschip-notifier functions. Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method. DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges. The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone. It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions. Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient. DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge. In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-27net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridgingVladimir Oltean
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too tied with the sja1105 switch: each port uses the same pvid which is also used for standalone operation (a unique one from which the source port and device ID can be retrieved when packets from that port are forwarded to the CPU). Since each port has a unique pvid when performing autonomous forwarding, the switch must be configured for Shared VLAN Learning (SVL) such that the VLAN ID itself is ignored when performing FDB lookups. Without SVL, packets would always be flooded, since FDB lookup in the source port's VLAN would never find any entry. First of all, to make tag_8021q more palatable to switches which might not support Shared VLAN Learning, let's just use a common VLAN for all ports that are under the same bridge. Secondly, using Shared VLAN Learning means that FDB isolation can never be enforced. But if all ports under the same VLAN-unaware bridge share the same VLAN ID, it can. The disadvantage is that the CPU port can no longer perform precise source port identification for these packets. But at least we have a mechanism which has proven to be adequate for that situation: imprecise RX (dsa_find_designated_bridge_port_by_vid), which is what we use for termination on VLAN-aware bridges. The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the same one as we were previously using for imprecise TX (bridge TX forwarding offload). It is already allocated, it is just a matter of using it. Note that because now all ports under the same bridge share the same VLAN, the complexity of performing a tag_8021q bridge join decreases dramatically. We no longer have to install the RX VLAN of a newly joining port into the port membership of the existing bridge ports. The newly joining port just becomes a member of the VLAN corresponding to that bridge, and the other ports are already members of it from when they joined the bridge themselves. So forwarding works properly. This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put these calls directly into the sja1105 driver. With this new mode of operation, a port controlled by tag_8021q can have two pvids whereas before it could only have one. The pvid for standalone operation is different from the pvid used for VLAN-unaware bridging. This is done, again, so that FDB isolation can be enforced. Let tag_8021q manage this by deleting the standalone pvid when a port joins a bridge, and restoring it when it leaves it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-24net: dsa: support FDB events on offloaded LAG interfacesVladimir Oltean
This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as support for filtering towards the CPU local FDB entries emitted for LAG interfaces that are bridge ports. Conceptually, host addresses on LAG ports are identical to what we do for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply be replicated towards all member ports like we do for multicast, or VLAN. Instead we need new driver API. Hardware usually considers a LAG to be a "logical port", and sets the entire LAG as the forwarding destination. The physical egress port selection within the LAG is made by hashing policy, as usual. To represent the logical port corresponding to the LAG, we pass by value a copy of the dsa_lag structure to all switches in the tree that have at least one port in that LAG. To illustrate why a refcounted list of FDB entries is needed in struct dsa_lag, it is enough to say that: - a LAG may be a bridge port and may therefore receive FDB events even while it isn't yet offloaded by any DSA interface - DSA interfaces may be removed from a LAG while that is a bridge port; we don't want FDB entries lingering around, but we don't want to remove entries that are still in use, either For all the cases below to work, the idea is to always keep an FDB entry on a LAG with a reference count equal to the DSA member ports. So: - if a port joins a LAG, it requests the bridge to replay the FDB, and the FDB entries get created, or their refcount gets bumped by one - if a port leaves a LAG, the FDB replay deletes or decrements refcount by one - if an FDB is installed towards a LAG with ports already present, that entry is created (if it doesn't exist) and its refcount is bumped by the amount of ports already present in the LAG echo "Adding FDB entry to bond with existing ports" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link del bond0 echo "Adding FDB entry to empty bond, then removing ports one by one" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link set swp1 nomaster ip link set swp2 nomaster ip link del br0 ip link del bond0 Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: call SWITCHDEV_FDB_OFFLOADED for the orig_devVladimir Oltean
When switchdev_handle_fdb_event_to_device() replicates a FDB event emitted for the bridge or for a LAG port and DSA offloads that, we should notify back to switchdev that the FDB entry on the original device is what was offloaded, not on the DSA slave devices that the event is replicated on. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: remove "ds" and "port" from struct dsa_switchdev_event_workVladimir Oltean
By construction, the struct net_device *dev passed to dsa_slave_switchdev_event_work() via struct dsa_switchdev_event_work is always a DSA slave device. Therefore, it is redundant to pass struct dsa_switch and int port information in the deferred work structure. This can be retrieved at all times from the provided struct net_device via dsa_slave_to_port(). For the same reason, we can drop the dsa_is_user_port() check in dsa_fdb_offload_notify(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: create a dsa_lag structureVladimir Oltean
The main purpose of this change is to create a data structure for a LAG as seen by DSA. This is similar to what we have for bridging - we pass a copy of this structure by value to ->port_lag_join and ->port_lag_leave. For now we keep the lag_dev, id and a reference count in it. Future patches will add a list of FDB entries for the LAG (these also need to be refcounted to work properly). The LAG structure is created using dsa_port_lag_create() and destroyed using dsa_port_lag_destroy(), just like we have for bridging. Because now, the dsa_lag itself is refcounted, we can simplify dsa_lag_map() and dsa_lag_unmap(). These functions need to keep a LAG in the dst->lags array only as long as at least one port uses it. The refcounting logic inside those functions can be removed now - they are called only when we should perform the operation. dsa_lag_dev() is renamed to dsa_lag_by_id() and now returns the dsa_lag structure instead of the lag_dev net_device. dsa_lag_foreach_port() now takes the dsa_lag structure as argument. dst->lags holds an array of dsa_lag structures. dsa_lag_map() now also saves the dsa_lag->id value, so that linear walking of dst->lags in drivers using dsa_lag_id() is no longer necessary. They can just look at lag.id. dsa_port_lag_id_get() is a helper, similar to dsa_port_bridge_num_get(), which can be used by drivers to get the LAG ID assigned by DSA to a given port. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-24net: dsa: rename references to "lag" as "lag_dev"Vladimir Oltean
In preparation of converting struct net_device *dp->lag_dev into a struct dsa_lag *dp->lag, we need to rename, for consistency purposes, all occurrences of the "lag" variable in the DSA core to "lag_dev". Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-16net: dsa: add explicit support for host bridge VLANsVladimir Oltean
Currently, DSA programs VLANs on shared (DSA and CPU) ports each time it does so on user ports. This is good for basic functionality but has several limitations: - the VLAN group which must reach the CPU may be radically different from the VLAN group that must be autonomously forwarded by the switch. In other words, the admin may want to isolate noisy stations and avoid traffic from them going to the control processor of the switch, where it would just waste useless cycles. The bridge already supports independent control of VLAN groups on bridge ports and on the bridge itself, and when VLAN-aware, it will drop packets in software anyway if their VID isn't added as a 'self' entry towards the bridge device. - Replaying host FDB entries may depend, for some drivers like mv88e6xxx, on replaying the host VLANs as well. The 2 VLAN groups are approximately the same in most regular cases, but there are corner cases when timing matters, and DSA's approximation of replicating VLANs on shared ports simply does not work. - If a user makes the bridge (implicitly the CPU port) join a VLAN by accident, there is no way for the CPU port to isolate itself from that noisy VLAN except by rebooting the system. This is because for each VLAN added on a user port, DSA will add it on shared ports too, but for each VLAN deletion on a user port, it will remain installed on shared ports, since DSA has no good indication of whether the VLAN is still in use or not. Now that the bridge driver emits well-balanced SWITCHDEV_OBJ_ID_PORT_VLAN addition and removal events, DSA has a simple and straightforward task of separating the bridge port VLANs (these have an orig_dev which is a DSA slave interface, or a LAG interface) from the host VLANs (these have an orig_dev which is a bridge interface), and to keep a simple reference count of each VID on each shared port. Forwarding VLANs must be installed on the bridge ports and on all DSA ports interconnecting them. We don't have a good view of the exact topology, so we simply install forwarding VLANs on all DSA ports, which is what has been done until now. Host VLANs must be installed primarily on the dedicated CPU port of each bridge port. More subtly, they must also be installed on upstream-facing and downstream-facing DSA ports that are connecting the bridge ports and the CPU. This ensures that the mv88e6xxx's problem (VID of host FDB entry may be absent from VTU) is still addressed even if that switch is in a cross-chip setup, and it has no local CPU port. Therefore: - user ports contain only bridge port (forwarding) VLANs, and no refcounting is necessary - DSA ports contain both forwarding and host VLANs. Refcounting is necessary among these 2 types. - CPU ports contain only host VLANs. Refcounting is also necessary. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-14net: dsa: mv88e6xxx: flush switchdev FDB workqueue before removing VLANVladimir Oltean
mv88e6xxx is special among DSA drivers in that it requires the VTU to contain the VID of the FDB entry it modifies in mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP. Sometimes due to races this is not always satisfied even if external code does everything right (first deletes the FDB entries, then the VLAN), because DSA commits to hardware FDB entries asynchronously since commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification"). Therefore, the mv88e6xxx driver must close this race condition by itself, by asking DSA to flush the switchdev workqueue of any FDB deletions in progress, prior to exiting a VLAN. Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") Reported-by: Rafael Richter <rafael.richter@gin.de> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-02net: dsa: provide switch operations for tracking the master stateVladimir Oltean
Certain drivers may need to send management traffic to the switch for things like register access, FDB dump, etc, to accelerate what their slow bus (SPI, I2C, MDIO) can already do. Ethernet is faster (especially in bulk transactions) but is also more unreliable, since the user may decide to bring the DSA master down (or not bring it up), therefore severing the link between the host and the attached switch. Drivers needing Ethernet-based register access already should have fallback logic to the slow bus if the Ethernet method fails, but that fallback may be based on a timeout, and the I/O to the switch may slow down to a halt if the master is down, because every Ethernet packet will have to time out. The driver also doesn't have the option to turn off Ethernet-based I/O momentarily, because it wouldn't know when to turn it back on. Which is where this change comes in. By tracking NETDEV_CHANGE, NETDEV_UP and NETDEV_GOING_DOWN events on the DSA master, we should know the exact interval of time during which this interface is reliably available for traffic. Provide this information to switches so they can use it as they wish. An helper is added dsa_port_master_is_operational() to check if a master port is operational. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: remove cross-chip support for HSRVladimir Oltean
The cross-chip notifiers for HSR are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: George McCollister <george.mccollister@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: George McCollister <george.mccollister@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-01-05net: dsa: remove cross-chip support for MRPVladimir Oltean
The cross-chip notifiers for MRP are bypass operations, meaning that even though all switches in a tree are notified, only the switch specified in the info structure is targeted. We can eliminate the unnecessary complexity by deleting the cross-chip notifier logic and calling the ds->ops straight from port.c. Cc: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-12-30 The following pull-request contains BPF updates for your *net-next* tree. We've added 72 non-merge commits during the last 20 day(s) which contain a total of 223 files changed, 3510 insertions(+), 1591 deletions(-). The main changes are: 1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii. 2) Beautify and de-verbose verifier logs, from Christy. 3) Composable verifier types, from Hao. 4) bpf_strncmp helper, from Hou. 5) bpf.h header dependency cleanup, from Jakub. 6) get_func_[arg|ret|arg_cnt] helpers, from Jiri. 7) Sleepable local storage, from KP. 8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-29net: Don't include filter.h from net/sock.hJakub Kicinski
sock.h is pretty heavily used (5k objects rebuilt on x86 after it's touched). We can drop the include of filter.h from it and add a forward declaration of struct sk_filter instead. This decreases the number of rebuilt objects when bpf.h is touched from ~5k to ~1k. There's a lot of missing includes this was masking. Primarily in networking tho, this time. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@nvidia.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/bpf/20211229004913.513372-1-kuba@kernel.org
2021-12-14net: dsa: make tagging protocols connect to individual switches from a treeVladimir Oltean
On the NXP Bluebox 3 board which uses a multi-switch setup with sja1105, the mechanism through which the tagger connects to the switch tree is broken, due to improper DSA code design. At the time when tag_ops->connect() is called in dsa_port_parse_cpu(), DSA hasn't finished "touching" all the ports, so it doesn't know how large the tree is and how many ports it has. It has just seen the first CPU port by this time. As a result, this function will call the tagger's ->connect method too early, and the tagger will connect only to the first switch from the tree. This could be perhaps addressed a bit more simply by just moving the tag_ops->connect(dst) call a bit later (for example in dsa_tree_setup), but there is already a design inconsistency at present: on the switch side, the notification is on a per-switch basis, but on the tagger side, it is on a per-tree basis. Furthermore, the persistent storage itself is per switch (ds->tagger_data). And the tagger connect and disconnect procedures (at least the ones that exist currently) could see a fair bit of simplification if they didn't have to iterate through the switches of a tree. To fix the issue, this change transforms tag_ops->connect(dst) into tag_ops->connect(ds) and moves it somewhere where we already iterate over all switches of a tree. That is in dsa_switch_setup_tag_protocol(), which is a good placement because we already have there the connection call to the switch side of things. As for the dsa_tree_bind_tag_proto() method (called from the code path that changes the tag protocol), things are a bit more complicated because we receive the tree as argument, yet when we unwind on errors, it would be nice to not call tag_ops->disconnect(ds) where we didn't previously call tag_ops->connect(ds). We didn't have this problem before because the tag_ops connection operations passed the entire dst before, and this is more fine grained now. To solve the error rewind case using the new API, we have to create yet one more cross-chip notifier for disconnection, and stay connected with the old tag protocol to all the switches in the tree until we've succeeded to connect with the new one as well. So if something fails half way, the whole tree is still connected to the old tagger. But there may still be leaks if the tagger fails to connect to the 2nd out of 3 switches in a tree: somebody needs to tell the tagger to disconnect from the first switch. Nothing comes for free, and this was previously handled privately by the tagging protocol driver before, but now we need to emit a disconnect cross-chip notifier for that, because DSA has to take care of the unwind path. We assume that the tagging protocol has connected to a switch if it has set ds->tagger_data to something, otherwise we avoid calling its disconnection method in the error rewind path. The rest of the changes are in the tagging protocol drivers, and have to do with the replacement of dst with ds. The iteration is removed and the error unwind path is simplified, as mentioned above. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-12net: dsa: introduce tagger-owned storage for private and shared dataVladimir Oltean
Ansuel is working on register access over Ethernet for the qca8k switch family. This requires the qca8k tagging protocol driver to receive frames which aren't intended for the network stack, but instead for the qca8k switch driver itself. The dp->priv is currently the prevailing method for passing data back and forth between the tagging protocol driver and the switch driver. However, this method is riddled with caveats. The DSA design allows in principle for any switch driver to return any protocol it desires in ->get_tag_protocol(). The dsa_loop driver can be modified to do just that. But in the current design, the memory behind dp->priv has to be allocated by the switch driver, so if the tagging protocol is paired to an unexpected switch driver, we may end up in NULL pointer dereferences inside the kernel, or worse (a switch driver may allocate dp->priv according to the expectations of a different tagger). The latter possibility is even more plausible considering that DSA switches can dynamically change tagging protocols in certain cases (dsa <-> edsa, ocelot <-> ocelot-8021q), and the current design lends itself to mistakes that are all too easy to make. This patch proposes that the tagging protocol driver should manage its own memory, instead of relying on the switch driver to do so. After analyzing the different in-tree needs, it can be observed that the required tagger storage is per switch, therefore a ds->tagger_data pointer is introduced. In principle, per-port storage could also be introduced, although there is no need for it at the moment. Future changes will replace the current usage of dp->priv with ds->tagger_data. We define a "binding" event between the DSA switch tree and the tagging protocol. During this binding event, the tagging protocol's ->connect() method is called first, and this may allocate some memory for each switch of the tree. Then a cross-chip notifier is emitted for the switches within that tree, and they are given the opportunity to fix up the tagger's memory (for example, they might set up some function pointers that represent virtual methods for consuming packets). Because the memory is owned by the tagger, there exists a ->disconnect() method for the tagger (which is the place to free the resources), but there doesn't exist a ->disconnect() method for the switch driver. This is part of the design. The switch driver should make minimal use of the public part of the tagger data, and only after type-checking it using the supplied "proto" argument. In the code there are in fact two binding events, one is the initial event in dsa_switch_setup_tag_protocol(). At this stage, the cross chip notifier chains aren't initialized, so we call each switch's connect() method by hand. Then there is dsa_tree_bind_tag_proto() during dsa_tree_change_tag_proto(), and here we have an old protocol and a new one. We first connect to the new one before disconnecting from the old one, to simplify error handling a bit and to ensure we remain in a valid state at all times. Co-developed-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Ansuel Smith <ansuelsmth@gmail.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-08net: dsa: add a "tx_fwd_offload" argument to ->port_bridge_joinVladimir Oltean
This is a preparation patch for the removal of the DSA switch methods ->port_bridge_tx_fwd_offload() and ->port_bridge_tx_fwd_unoffload(). The plan is for the switch to report whether it offloads TX forwarding directly as a response to the ->port_bridge_join() method. This change deals with the noisy portion of converting all existing function prototypes to take this new boolean pointer argument. The bool is placed in the cross-chip notifier structure for bridge join, and a reference to it is provided to drivers. In the next change, DSA will then actually look at this value instead of calling ->port_bridge_tx_fwd_offload(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: keep the bridge_dev and bridge_num as part of the same structureVladimir Oltean
The main desire behind this is to provide coherent bridge information to the fast path without locking. For example, right now we set dp->bridge_dev and dp->bridge_num from separate code paths, it is theoretically possible for a packet transmission to read these two port properties consecutively and find a bridge number which does not correspond with the bridge device. Another desire is to start passing more complex bridge information to dsa_switch_ops functions. For example, with FDB isolation, it is expected that drivers will need to be passed the bridge which requested an FDB/MDB entry to be offloaded, and along with that bridge_dev, the associated bridge_num should be passed too, in case the driver might want to implement an isolation scheme based on that number. We already pass the {bridge_dev, bridge_num} pair to the TX forwarding offload switch API, however we'd like to remove that and squash it into the basic bridge join/leave API. So that means we need to pass this pair to the bridge join/leave API. During dsa_port_bridge_leave, first we unset dp->bridge_dev, then we call the driver's .port_bridge_leave with what used to be our dp->bridge_dev, but provided as an argument. When bridge_dev and bridge_num get folded into a single structure, we need to preserve this behavior in dsa_port_bridge_leave: we need a copy of what used to be in dp->bridge. Switch drivers check bridge membership by comparing dp->bridge_dev with the provided bridge_dev, but now, if we provide the struct dsa_bridge as a pointer, they cannot keep comparing dp->bridge to the provided pointer, since this only points to an on-stack copy. To make this obvious and prevent driver writers from forgetting and doing stupid things, in this new API, the struct dsa_bridge is provided as a full structure (not very large, contains an int and a pointer) instead of a pointer. An explicit comparison function needs to be used to determine bridge membership: dsa_port_offloads_bridge(). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: export bridging offload helpers to driversVladimir Oltean
Move the static inline helpers from net/dsa/dsa_priv.h to include/net/dsa.h, so that drivers can call functions such as dsa_port_offloads_bridge_dev(), which will be necessary after the transition to a more complex bridge structure. More functions than are needed right now are being moved, but this is done for uniformity. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: rename dsa_port_offloads_bridge to dsa_port_offloads_bridge_devVladimir Oltean
Currently the majority of dsa_port_bridge_dev_get() calls in drivers is just to check whether a port is under the bridge device provided as argument by the DSA API. We'd like to change that DSA API so that a more complex structure is provided as argument. To keep things more generic, and considering that the new complex structure will be provided by value and not by reference, direct comparisons between dp->bridge and the provided bridge will be broken. The generic way to do the checking would simply be to do something like dsa_port_offloads_bridge(dp, &bridge). But there's a problem, we already have a function named that way, which actually takes a bridge_dev net_device as argument. Rename it so that we can use dsa_port_offloads_bridge for something else. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: hide dp->bridge_dev and dp->bridge_num in the core behind helpersVladimir Oltean
The location of the bridge device pointer and number is going to change. It is not going to be kept individually per port, but in a common structure allocated dynamically and which will have lockdep validation. Create helpers to access these elements so that we have a migration path to the new organization. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-08net: dsa: make dp->bridge_num one-basedVladimir Oltean
I have seen too many bugs already due to the fact that we must encode an invalid dp->bridge_num as a negative value, because the natural tendency is to check that invalid value using (!dp->bridge_num). Latest example can be seen in commit 1bec0f05062c ("net: dsa: fix bridge_num not getting cleared after ports leaving the bridge"). Convert the existing users to assume that dp->bridge_num == 0 is the encoding for invalid, and valid bridge numbers start from 1. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-01net: dsa: consolidate phylink creationRussell King (Oracle)
The code in port.c and slave.c creating the phylink instance is very similar - let's consolidate this into a single function. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Marek Behún <kabel@kernel.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-15net: dsa: flush switchdev workqueue before tearing down CPU/DSA portsVladimir Oltean
Sometimes when unbinding the mv88e6xxx driver on Turris MOX, these error messages appear: mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete be:79:b4:9e:9e:96 vid 0 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 100 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 1 from fdb: -2 mv88e6085 d0032004.mdio-mii:12: port 1 failed to delete d8:58:d7:00:ca:6d vid 0 from fdb: -2 (and similarly for other ports) What happens is that DSA has a policy "even if there are bugs, let's at least not leak memory" and dsa_port_teardown() clears the dp->fdbs and dp->mdbs lists, which are supposed to be empty. But deleting that cleanup code, the warnings go away. => the FDB and MDB lists (used for refcounting on shared ports, aka CPU and DSA ports) will eventually be empty, but are not empty by the time we tear down those ports. Aka we are deleting them too soon. The addresses that DSA complains about are host-trapped addresses: the local addresses of the ports, and the MAC address of the bridge device. The problem is that offloading those entries happens from a deferred work item scheduled by the SWITCHDEV_FDB_DEL_TO_DEVICE handler, and this races with the teardown of the CPU and DSA ports where the refcounting is kept. In fact, not only it races, but fundamentally speaking, if we iterate through the port list linearly, we might end up tearing down the shared ports even before we delete a DSA user port which has a bridge upper. So as it turns out, we need to first tear down the user ports (and the unused ones, for no better place of doing that), then the shared ports (the CPU and DSA ports). In between, we need to ensure that all work items scheduled by our switchdev handlers (which only run for user ports, hence the reason why we tear them down first) have finished. Fixes: 161ca59d39e9 ("net: dsa: reference count the MDB entries at the cross-chip notifier level") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20210914134726.2305133-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-08-24net: dsa: don't advertise 'rx-vlan-filter' when not neededVladimir Oltean
There have been multiple independent reports about dsa_slave_vlan_rx_add_vid being called (and consequently calling the drivers' .port_vlan_add) when it isn't needed, and sometimes (not always) causing problems in the process. Case 1: mv88e6xxx_port_vlan_prepare is stubborn and only accepts VLANs on bridged ports. That is understandably so, because standalone mv88e6xxx ports are VLAN-unaware, and VTU entries are said to be a scarce resource. Otherwise said, the following fails lamentably on mv88e6xxx: ip link add br0 type bridge vlan_filtering 1 ip link set lan3 master br0 ip link add link lan10 name lan10.1 type vlan id 1 [485256.724147] mv88e6085 d0032004.mdio-mii:12: p10: hw VLAN 1 already used by port 3 in br0 RTNETLINK answers: Operation not supported This has become a worse issue since commit 9b236d2a69da ("net: dsa: Advertise the VLAN offload netdev ability only if switch supports it"). Up to that point, the driver was returning -EOPNOTSUPP and DSA was reconverting that error to 0, making the 8021q upper think all is ok (but obviously the error message was there even prior to this change). After that change the -EOPNOTSUPP is propagated to vlan_vid_add, and it is a hard error. Case 2: Ports that don't offload the Linux bridge (have a dp->bridge_dev = NULL because they don't implement .port_bridge_{join,leave}). Understandably, a standalone port should not offload VLANs either, it should remain VLAN unaware and any VLAN should be a software VLAN (as long as the hardware is not quirky, that is). In fact, dsa_slave_port_obj_add does do the right thing and rejects switchdev VLAN objects coming from the bridge when that bridge is not offloaded: case SWITCHDEV_OBJ_ID_PORT_VLAN: if (!dsa_port_offloads_bridge_port(dp, obj->orig_dev)) return -EOPNOTSUPP; err = dsa_slave_vlan_add(dev, obj, extack); But it seems that the bridge is able to trick us. The __vlan_vid_add from br_vlan.c has: /* Try switchdev op first. In case it is not supported, fallback to * 8021q add. */ err = br_switchdev_port_vlan_add(dev, v->vid, flags, extack); if (err == -EOPNOTSUPP) return vlan_vid_add(dev, br->vlan_proto, v->vid); So it says "no, no, you need this VLAN in your life!". And we, naive as we are, say "oh, this comes from the vlan_vid_add code path, it must be an 8021q upper, sure, I'll take that". And we end up with that bridge VLAN installed on our port anyway. But this time, it has the wrong flags: if the bridge was trying to install VLAN 1 as a pvid/untagged VLAN, failed via switchdev, retried via vlan_vid_add, we have this comment: /* This API only allows programming tagged, non-PVID VIDs */ So what we do makes absolutely no sense. Backtracing a bit, we see the common pattern. We allow the network stack to think that our standalone ports are VLAN-aware, but they aren't, for the vast majority of switches. The quirky ones should not dictate the norm. The dsa_slave_vlan_rx_add_vid and dsa_slave_vlan_rx_kill_vid methods exist for drivers that need the 'rx-vlan-filter: on' feature in ethtool -k, which can be due to any of the following reasons: 1. vlan_filtering_is_global = true, and some ports are under a VLAN-aware bridge while others are standalone, and the standalone ports would otherwise drop VLAN-tagged traffic. This is described in commit 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation"). 2. the ports that are under a VLAN-aware bridge should also set this feature, for 8021q uppers having a VID not claimed by the bridge. In this case, the driver will essentially not even know that the VID is coming from the 8021q layer and not the bridge. 3. Hellcreek. This driver needs it because in standalone mode, it uses unique VLANs per port to ensure separation. For separation of untagged traffic, it uses different PVIDs for each port, and for separation of VLAN-tagged traffic, it never accepts 8021q uppers with the same vid on two ports. If a driver does not fall under any of the above 3 categories, there is no reason why it should advertise the 'rx-vlan-filter' feature, therefore no reason why it should offload the VLANs added through vlan_vid_add. This commit fixes the problem by removing the 'rx-vlan-filter' feature from the slave devices when they operate in standalone mode, and when they offload a VLAN-unaware bridge. The way it works is that vlan_vid_add will now stop its processing here: vlan_add_rx_filter_info: if (!vlan_hw_filter_capable(dev, proto)) return 0; So the VLAN will still be saved in the interface's VLAN RX filtering list, but because it does not declare VLAN filtering in its features, the 8021q module will return zero without committing that VLAN to hardware. This gives the drivers what they want, since it keeps the 8021q VLANs away from the VLAN table until VLAN awareness is enabled (point at which the ports are no longer standalone, hence in the mv88e6xxx case, the check in mv88e6xxx_port_vlan_prepare passes). Since the issue predates the existence of the hellcreek driver, case 3 will be dealt with in a separate patch. The main change that this patch makes is to no longer set NETIF_F_HW_VLAN_CTAG_FILTER unconditionally, but toggle it dynamically (for most switches, never). The second part of the patch addresses an issue that the first part introduces: because the 'rx-vlan-filter' feature is now dynamically toggled, and our .ndo_vlan_rx_add_vid does not get called when 'rx-vlan-filter' is off, we need to avoid bugs such as the following by replaying the VLANs from 8021q uppers every time we enable VLAN filtering: ip link add link lan0 name lan0.100 type vlan id 100 ip addr add 192.168.100.1/24 dev lan0.100 ping 192.168.100.2 # should work ip link add br0 type bridge vlan_filtering 0 ip link set lan0 master br0 ping 192.168.100.2 # should still work ip link set br0 type bridge vlan_filtering 1 ping 192.168.100.2 # should still work but doesn't As reported by Florian, some drivers look at ds->vlan_filtering in their .port_vlan_add() implementation. So this patch also makes sure that ds->vlan_filtering is committed before calling the driver. This is the reason why it is first committed, then restored on the failure path. Reported-by: Tobias Waldekranz <tobias@waldekranz.com> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-23net: dsa: track unique bridge numbers across all DSA switch treesVladimir Oltean
Right now, cross-tree bridging setups work somewhat by mistake. In the case of cross-tree bridging with sja1105, all switch instances need to agree upon a common VLAN ID for forwarding a packet that belongs to a certain bridging domain. With TX forwarding offload, the VLAN ID is the bridge VLAN for VLAN-aware bridging, and the tag_8021q TX forwarding offload VID (a VLAN which has non-zero VBID bits) for VLAN-unaware bridging. The VBID for VLAN-unaware bridging is derived from the dp->bridge_num value calculated by DSA independently for each switch tree. If ports from one tree join one bridge, and ports from another tree join another bridge, DSA will assign them the same bridge_num, even though the bridges are different. If cross-tree bridging is supported, this is an issue. Modify DSA to calculate the bridge_num globally across all switch trees. This has the implication for a driver that the dp->bridge_num value that DSA will assign to its ports might not be contiguous, if there are boards with multiple DSA drivers instantiated. Additionally, all bridge_num values eat up towards each switch's ds->num_fwd_offloading_bridges maximum, which is potentially unfortunate, and can be seen as a limitation introduced by this patch. However, that is the lesser evil for now. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-12net: dsa: tag_8021q: don't broadcast during setup/teardownVladimir Oltean
Currently, on my board with multiple sja1105 switches in disjoint trees described in commit f66a6a69f97a ("net: dsa: permit cross-chip bridging between all trees in the system"), rebooting the board triggers the following benign warnings: [ 12.345566] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 1088 deletion: -ENOENT [ 12.353804] sja1105 spi2.0: port 0 failed to notify tag_8021q VLAN 2112 deletion: -ENOENT [ 12.362019] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 1089 deletion: -ENOENT [ 12.370246] sja1105 spi2.0: port 1 failed to notify tag_8021q VLAN 2113 deletion: -ENOENT [ 12.378466] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 1090 deletion: -ENOENT [ 12.386683] sja1105 spi2.0: port 2 failed to notify tag_8021q VLAN 2114 deletion: -ENOENT Basically switch 1 calls dsa_tag_8021q_unregister, and switch 1's TX and RX VLANs cannot be found on switch 2's CPU port. But why would switch 2 even attempt to delete switch 1's TX and RX tag_8021q VLANs from its CPU port? Well, because we use dsa_broadcast, and it is supposed that it had added those VLANs in the first place (because in dsa_port_tag_8021q_vlan_match, all CPU ports match regardless of their tree index or switch index). The two trees probe asynchronously, and when switch 1 probed, it called dsa_broadcast which did not notify the tree of switch 2, because that didn't probe yet. But during unbind, switch 2's tree _is_ probed, so it _is_ notified of the deletion. Before jumping to introduce a synchronization mechanism between the probing across disjoint switch trees, let's take a step back and see whether we _need_ to do that in the first place. The RX and TX VLANs of switch 1 would be needed on switch 2's CPU port only if switch 1 and 2 were part of a cross-chip bridge. And dsa_tag_8021q_bridge_join takes care precisely of that (but if probing was synchronous, the bridge_join would just end up bumping the VLANs' refcount, because they are already installed by the setup path). Since by the time the ports are bridged, all DSA trees are already set up, and we don't need the tag_8021q VLANs of one switch installed on the other switches during probe time, the answer is that we don't need to fix the synchronization issue. So make the setup and teardown code paths call dsa_port_notify, which notifies only the local tree, and the bridge code paths call dsa_broadcast, which let the other trees know as well. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: dsa: create a helper for locating EtherType DSA headers on TXVladimir Oltean
Create a similar helper for locating the offset to the DSA header relative to skb->data, and make the existing EtherType header taggers to use it. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: dsa: create a helper for locating EtherType DSA headers on RXVladimir Oltean
It seems that protocol tagging driver writers are always surprised about the formula they use to reach their EtherType header on RX, which becomes apparent from the fact that there are comments in multiple drivers that mention the same information. Create a helper that returns a void pointer to skb->data - 2, as well as centralize the explanation why that is the case. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: dsa: create a helper which allocates space for EtherType DSA headersVladimir Oltean
Hide away the memmove used by DSA EtherType header taggers to shift the MAC SA and DA to the left to make room for the header, after they've called skb_push(). The call to skb_push() is still left explicit in drivers, to be symmetric with dsa_strip_etype_header, and because not all callers can be refactored to do it (for example, brcm_tag_xmit_ll has common code for a pre-Ethernet DSA tag and an EtherType DSA tag). Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11net: dsa: create a helper that strips EtherType DSA headers on RXVladimir Oltean
All header taggers open-code a memmove that is fairly not all that obvious, and we can hide the details behind a helper function, since the only thing specific to the driver is the length of the header tag. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-08net: dsa: centralize fast ageing when address learning is turned offVladimir Oltean
Currently DSA leaves it down to device drivers to fast age the FDB on a port when address learning is disabled on it. There are 2 reasons for doing that in the first place: - when address learning is disabled by user space, through IFLA_BRPORT_LEARNING or the brport_attr_learning sysfs, what user space typically wants to achieve is to operate in a mode with no dynamic FDB entry on that port. But if the port is already up, some addresses might have been already learned on it, and it seems silly to wait for 5 minutes for them to expire until something useful can be done. - when a port leaves a bridge and becomes standalone, DSA turns off address learning on it. This also has the nice side effect of flushing the dynamically learned bridge FDB entries on it, which is a good idea because standalone ports should not have bridge FDB entries on them. We let drivers manage fast ageing under this condition because if DSA were to do it, it would need to track each port's learning state, and act upon the transition, which it currently doesn't. But there are 2 reasons why doing it is better after all: - drivers might get it wrong and not do it (see b53_port_set_learning) - we would like to flush the dynamic entries from the software bridge too, and letting drivers do that would be another pain point So track the port learning state and trigger a fast age process automatically within DSA. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>