aboutsummaryrefslogtreecommitdiff
path: root/datapath
AgeCommit message (Collapse)Author
2010-09-08datapath: Check for backported __wsum and __sum16.Jesse Gross
Reported-by: Alexey I. Froloff <raorn@altlinux.org> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Ben Pfaff <blp@nicira.com>
2010-08-30datapath: Include net/udp.h in vport-capwap.cSimon Horman
net/udp.h is currently included indirectly via linux/ipv6.h which is in turn included indirectly via linux/ip.h. However, this breaks down if CONFIG_IPV6 is not set, leading to a number of build errors. Signed-off-by: Simon Horman <horms@verge.net.au> [Jesse: shortened commit message] Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30datapath: Include linux/version.h in action.h for LINUX_VERSIONSimon Horman
Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30treewide: Use pr_fmt and pr_<level>Joe Perches
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> [Jesse: Added missing pr_fmt in vport-gre.c and dp_sysfs_dp.c] Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30datapath: Add compat functions for pr_*.Jesse Gross
In the earliest kernels that we support this family of macros wasn't defined at all. Later they were defined but did not include the module name. Finally, pr_warn was made a synonym for pr_warning. This harmonizes the behavior across all kernels. Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-30treewide: Remove trailing whitespaceJoe Perches
Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-27datapath: Avoid accesses past the end of skbuff data in actions.Ben Pfaff
Some of the flow actions that modify skbuff data did not check that the skbuff was long enough before doing so. This commit fixes that problem. Previously, the strategy for avoiding this was to only indicate the layer-3 nw_proto field in the flow if the corresponding layer-4 header was fully present, so that if, for example, nw_proto was IPPROTO_TCP, this meant that a TCP header was present. The original motivation for this patch was to add corresponding code to only indicate a layer-2 dl_type if the corresponding layer-3 header was fully present. But I'm now convinced that this approach is conceptually wrong, because the meaning of a layer-N header should not be affected by the meaning of a layer-(N+1) header. This commit switches to a new approach. Now, when a header is missing, its fields in the flow are simply zeroed and have no effect on the "type" field for the outer header. Responsibility for ensuring that a header is fully present is now shifted to the actions that wish to modify that header. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27datapath: Fix default value of skb transport_header.Ben Pfaff
This commit started out as simply better documenting flow_extract(), but then I realized that nothing cares about transport_header in the non-IP case, so don't bother with it at all. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27datapath: Avoid pskb_may_pull() checks where not needed.Ben Pfaff
These calls to pskb_may_pull() can be reduced to checks on skb->len because in these contexts those headers will already have been pulled into the skb linear area if it is there at all. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-27datapath: Report memory allocation errors in flow_extract().Ben Pfaff
Until now flow_extract() has simply returned a bogus flow when memory allocation errors occurred. This fixes the problem by propagating the error to the caller. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26Add Nicira extension to OpenFlow for dropping spoofed ARP packets.Ben Pfaff
"ARP spoofing" is when a host claims an incorrect association between an IP address and a MAC address for deceptive purposes. OpenFlow by itself can prevent a host from sending out ARP replies from an incorrect MAC address in the Ethernet L2 header, but it cannot control the MAC addresses inside the ARP L3 packet. This commit adds a new action that can be used to drop these spoofed packets. CC: Paul Ingram <paul@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26datapath: Free up flow_extract() return value for reporting errors.Ben Pfaff
flow_extract() can fail due to memory allocation errors in pskb_may_pull(). Currently it doesn't return those properly, instead just reporting a bogus flow to the caller. But its return value is currently in use for reporting whether the packet was an IPv4 fragment. This commit switches to reporting that in the skb itself so that the return value can be reused to report errors. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26datapath: Remove skb->len >= ETH_HLEN check from flow_extract().Ben Pfaff
The callers ensure that this is already the case. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26datapath: Use 'bool' instead of 'int' where appropriate.Ben Pfaff
'bool' is better modern kernel style. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-26datapath: Use min() instead of open-coding it.Ben Pfaff
Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-24datapath: Unconditionally call kfree_skb()Simon Horman
kfree_skb() will ignore a NULL pointer. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-24datapath: Add support for CAPWAP UDP transport.Jesse Gross
Add support for the transport portion of the CAPWAP protocol as an alternative to GRE for L2 over L3 tunneling. This is not full support for the CAPWAP protocol. CAPWAP covers management of wireless access points and describes a control protocol for setting those devices up. It also describes a data plane protocol that allows packets to be tunneled to a controller for inspection. This data plane protocol is the only component covered by this commit. Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-24datapath: Add support for tunnel fragmentation.Jesse Gross
Up until now it was assumed that encapsulated packets larger than the MTU would be fragmented by the IP stack. However, some tunneling protocols provide their own fragmentation mechanism. This adds the necessary support to the generic tunnel code to support fragmentation. Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-24datapath: Abstract tunneling implementation from GRE.Jesse Gross
Much of the code in the GRE implementation is not specific to the GRE protocol but is actually common to all types of tunnels. In order to support future types of tunnels, move this code into a common library. Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: struct brport_attribute no longer has an owner elementSimon Horman
Between 2.6.35 and 2.6.36-rc1 the owner element of struct brport_attribute was removed. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: Use rtnl_link_stats64Simon Horman
This adds compatibility with a series kernel changesets that introduces 64bit statistics. The final changeset (to date) being "net: Document that dev_get_stats() returns the given pointer". The relevant changesets were added between 2.6.35 and 2.6.36-rc1. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: use rx_handler_data pointerSimon Horman
This adds compatibility with kernel changeset "bridge: use rx_handler_data pointer to store net_bridge_port pointer" which was added between 2.6.35 and 2.6.36-rc1. With this change it is now safe to (attempt to) insert both bridge and datapath with newer (>=2.6.36) kernels, although whichever is inserted second will fail to initialise on the call to netdev_rx_handler_register() Signed-off-by: Simon Horman <horms@verge.net.au> [Jesse: fixed merge conflicts in vport-netdev.c and netdevice.h] Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: Take a rcu_dereference() in netdev_get_vport()Simon Horman
Although not strictly necessary, this will make this function more consistent when compatibility for 2.6.36 is added. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: rtable may not have a u. memberSimon Horman
This brings the code up to sync with the kernel as of changeset "net-next: remove useless union keyword", which was added between 2.6.35 and 2.6.36-rc1 Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: Handle duplicate netdev in netdev_rx_handler_register()Simon Horman
For kernels that have netdev_rx_handler_register() (>=2.6.35), duplicate netdevs are detected by netdev_rx_handler_register(). So by adding duplicate detection to the netdev_rx_handler_register() compatibility code the explicit check in netdev_create() can be removed. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-23datapath: dont use non-existent receive hooksSimon Horman
This adds compatibility with kernel changeset of changeset "net: add rx_handler data pointer" and thus "net: replace hooks in __netif_receive_skb V5", which were added between 2.6.35 and 2.6.36-rc1 Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-20datpath: Avoid reporting half updated statistics.Jesse Gross
We enforce mutual exclusion when updating statistics by disabling bottom halves and only writing to per-CPU state. However, reading requires looking at the statistics for foreign CPUs, which could be in the process of updating them since there isn't a lock. This means we could get garbage values for 64-bit values on 32-bit machines or byte counts that don't correspond to packet counts, etc. This commit introduces a sequence lock for statistics values to avoid this problem. Getting a write lock is very cheap - it only requires incrementing a counter plus a memory barrier (which is compiled away on x86) to acquire or release the lock and will never block. On read we spin until the sequence number hasn't changed in the middle of the operation, indicating that the we have a consistent set of values. Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-20gre: Don't require incoming checksum.Jesse Gross
The current meaning of the GRE checksum option is to include checksums on transmit and require packets to have them on receive. In addition, incoming packets with checksums are always validated regardless of this option. Requiring checksums on receive creates surprising behavior and interoperability issues. This disables the requirement on receive. The new behavior is that the sender decides whether to checksum packets and the receiver will validate packets with checksums (similar to UDP). Signed-off-by: Jesse Gross <jesse@nicira.com>
2010-08-10datapath: Fix handling of 802.1Q and SNAP headers.Ben Pfaff
The kernel and user datapaths have code that assumes that 802.1Q headers are used only inside Ethernet II frames, not inside SNAP-encapsulated frames. But the kernel and user flow_extract() implementations would interpret 802.1Q headers inside SNAP headers as being valid VLANs. This would cause packet corruption if any VLAN-related actions were to be taken, so change the two flow_extract() implementations only to accept 802.1Q as an Ethernet II frame type, not as a SNAP-encoded frame type. 802.1Q-2005 says that this is correct anyhow: Where the ISS instance used to transmit and receive tagged frames is provided by a media access control method that can support Ethernet Type encoding directly (e.g., is an IEEE 802.3 or IEEE 802.11 MAC) or is media access method independent (e.g., 6.6), the TPID is Ethernet Type encoded, i.e., is two octets in length and comprises solely the assigned Ethernet Type value. Where the ISS instance is provided by a media access method that cannot directly support Ethernet Type encoding (e.g., is an IEEE 802.5 or FDDI MAC), the TPID is encoded according to the rule for a Subnetwork Access Protocol (Clause 10 of IEEE Std 802) that encapsulates Ethernet frames over LLC, and comprises the SNAP header (AA-AA-03) followed by the SNAP PID (00-00-00) followed by the two octets of the assigned Ethernet Type value. All of the media that OVS handles supports Ethernet Type fields, so to me that means that we don't have to handle 802.1Q-inside-SNAP. On the other hand, we *do* have to handle SNAP-inside-802.1Q, because this is actually allowed by the standards. So this commit also adds that support. I verified that, with this change, both SNAP and Ethernet packets are properly recognized both with and without 802.1Q encapsulation. I was a bit surprised to find out that Linux does not accept SNAP-encapsulated IP frames on Ethernet. Here's a summary of how frames are handled before and after this commit: Common cases ------------ Ethernet +------------+ 1. |dst|src|TYPE| +------------+ Ethernet LLC SNAP +------------+ +--------+ +-----------+ 2. |dst|src| len| |aa|aa|03| |000000|TYPE| +------------+ +--------+ +-----------+ Ethernet 802.1Q +------------+ +---------+ 3. |dst|src|8100| |VLAN|TYPE| +------------+ +---------+ Ethernet 802.1Q LLC SNAP +------------+ +---------+ +--------+ +-----------+ 4. |dst|src|8100| |VLAN| LEN| |aa|aa|03| |000000|TYPE| +------------+ +---------+ +--------+ +-----------+ Unusual cases ------------- Ethernet LLC SNAP 802.1Q +------------+ +--------+ +-----------+ +---------+ 5. |dst|src| len| |aa|aa|03| |000000|8100| |VLAN|TYPE| +------------+ +--------+ +-----------+ +---------+ Ethernet LLC +------------+ +--------+ 6. |dst|src| len| |xx|xx|xx| +------------+ +--------+ Ethernet LLC SNAP +------------+ +--------+ +-----------+ 7. |dst|src| len| |aa|aa|03| |xxxxxx|xxxx| +------------+ +--------+ +-----------+ Ethernet 802.1Q LLC +------------+ +---------+ +--------+ 8. |dst|src|8100| |VLAN| LEN| |xx|xx|xx| +------------+ +---------+ +--------+ Ethernet 802.1Q LLC SNAP +------------+ +---------+ +--------+ +-----------+ 9. |dst|src|8100| |VLAN| LEN| |aa|aa|03| |xxxxxx|xxxx| +------------+ +---------+ +--------+ +-----------+ Behavior -------- --------------- --------------- ------------------------------------- Before After this commit this commit dl_type dl_vlan dl_type dl_vlan Notes ------- ------- ------- ------- ------------------------------------- 1. TYPE ffff TYPE ffff no change 2. TYPE ffff TYPE ffff no change 3. TYPE VLAN TYPE VLAN no change 4. LEN VLAN TYPE VLAN proposal fixes behavior 5. TYPE VLAN 8100 ffff 802.1Q says this is invalid framing 6. 05ff ffff 05ff ffff no change 7. 05ff ffff 05ff ffff no change 8. LEN VLAN 05ff VLAN proposal fixes behavior 9. LEN VLAN 05ff VLAN proposal fixes behavior Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-08-03datapath: Detect and suppress flows that are implicated in loops.Ben Pfaff
In-kernel loops need to be suppressed; otherwise, they cause high CPU consumption, even to the point that the machine becomes unusable. Ideally these flows should never be added to the Open vSwitch flow table, but it is fairly easy for a buggy controller to create them given the menagerie of tunnels, patches, etc. that OVS makes available. Commit ecbb6953b "datapath: Add loop checking" did the initial work toward suppressing loops, by dropping packets that recursed more than 5 times. This at least prevented the kernel stack from overflowing and thereby OOPSing the machine. But even with this commit, it is still possible to waste a lot of CPU time due to loops. The problem is not limited to 5 recursive calls per packet: any packet can be sent to multiple destinations, which in turn can themselves be sent to multiple destinations, and so on. We have actually seen in practice a case where each packet was, apparently, sent to at least 2 destinations per hop, so that each packet actually consumed CPU time for 2**5 == 32 packets, possibly more. This commit takes loop suppression a step further, by clearing the actions of flows that are implicated in loops. Thus, after the first packet in such a flow, later packets for either the "root" flow or for flows that it ends up looping through are simply discarded, saving a huge amount of CPU time. This version of the commit just clears the actions from the flows that a part of the loop. Probably, there should be some additional action to tell ovs-vswitchd that a loop has been detected, so that it can in turn inform the controller one way or another. My test case was this: ovs-controller -H --max-idle=permanent punix:/tmp/controller ovs-vsctl -- \ set-controller br0 unix:/tmp/controller -- \ add-port br0 patch00 -- \ add-port br0 patch01 -- \ add-port br0 patch10 -- \ add-port br0 patch11 -- \ add-port br0 patch20 -- \ add-port br0 patch21 -- \ add-port br0 patch30 -- \ add-port br0 patch31 -- \ set Interface patch00 type=patch options:peer=patch01 -- \ set Interface patch01 type=patch options:peer=patch00 -- \ set Interface patch10 type=patch options:peer=patch11 -- \ set Interface patch11 type=patch options:peer=patch10 -- \ set Interface patch20 type=patch options:peer=patch21 -- \ set Interface patch21 type=patch options:peer=patch20 -- \ set Interface patch30 type=patch options:peer=patch31 -- \ set Interface patch31 type=patch options:peer=patch30 followed by sending a single "ping" packet from an attached Ethernet port into the bridge. After this, without this commit the vswitch userspace and kernel consume 50-75% of the machine's CPU (in my KVM test setup on a single physical host); with this commit, some CPU is consumed initially but it converges on 0% quickly. A more challenging test sends a series of packets in multiple flows; I used "hping3" with its default options. Without this commit, the vswitch consumes 100% of the machine's CPU, most of which is in the kernel. With this commit, the vswitch consumes "only" 33-50% CPU, most of which is in userspace, so the machine is more responsive. A refinement on this commit would be to pass the loop counter down to userspace as part of the odp_msg struct and then back up as part of the ODP_EXECUTE command arguments. This would, presumably, reduce the CPU requirements, since it would allow loop detection to happen earlier, during initial setup of flows, instead of just on the second and subsequent packets of flows.
2010-08-02datapath: Inline flow_cast().Ben Pfaff
This function is both trivial and on the packet processing fast path, so expand it inline.
2010-08-02datapath: Don't track IP TOS value two different ways.Ben Pfaff
Originally, the datapath didn't care about IP TOS at all. Then, to support NetFlow, we made it keep track of the last-seen IP TOS value on a per-flow basis. Then, to support OpenFlow 1.0, we added a nw_tos field to odp_flow_key. We don't need both methods, so this commit drops the NetFlow-specific tracking. This introduces a small kernel ABI break: upgrading the kernel module without upgrading the OVS userspace will mean that NetFlow records will all show an IP TOS value of 0. I don't consider that to be a serious problem.
2010-08-02datapath: Remove netdev_alloc_skb_ip_align() compat code.Jesse Gross
We don't actually use this function anymore so there isn't a point in having a configure test for it.
2010-08-02datapath: Fix build with backported netdev_alloc_skb_ip_align()Alexey I. Froloff
Signed-off-by: Alexey I. Froloff <raorn@altlinux.org>
2010-07-31datapath: Clean-up previous undefined symbol commitJustin Pettit
The previous commit still had some issues with the "set_normalized_timespec" symbol being undefined. Here we just replace it. We can search for a more elegant solution later if necessary.
2010-07-31datapath: Fix undefined symbol "set_normalized_timespec"Justin Pettit
The commit "datapath: Don't query time for every packet." (6bfafa55) introduced the use of "set_normalized_timespec". Unfortunately, older kernels don't export the symbol. This implements the function on those older kernels.
2010-07-30vport-internal: Set vport to NULL when detaching.Jesse Gross
'struct net_device' is refcounted and can stick around for quite a while if someone is still holding a reference to it. However, we free the vport that it is attached to in the next RCU grace period after detach. This assigns the vport to NULL on detach and adds appropriate checks.
2010-07-30vport: Make dp_port->vport always valid.Jesse Gross
When we detached a vport we would assign NULL to dp_port->vport before calling synchronize_rcu(). However, since vports have a longer lifetime than dp_ports there were no checks before dereferencing dp_port->vport. This changes the behavior to match the assumption by not assigning NULL during detach. This avoids a potential NULL pointer dereference in do_output() among other places.
2010-07-30datapath: Remove dead code.Jesse Gross
Several blocks of code were either no longer being called or had been "#if 0"'d out for a long time. This removes them.
2010-07-30datapath: Remove redundant checks on SKBs.Jesse Gross
On vport ingress we already check for shared SKBs but then later warn in several other places. In a similar vein, we check every packet to see if it is LRO but only certain vports can produce these packets. Remove and consolidate checks to the places where they are needed.
2010-07-29datapath: Catch missed formatting changes.Jesse Gross
A few functions were missed in the change to move the return type onto the same line as the arguments.
2010-07-26datapath: Don't query time for every packet.Jesse Gross
Rather than actually query the time every time a packet comes through, just store the current jiffies and convert it to actual time when requested. GRE is the primary beneficiary of this because the traffic travels through the datapath twice. This change reduces CPU utilization 3-4% with GRE.
2010-07-15datapath: Don't update flow key when applying actions.Jesse Gross
Currently the flow key is updated to match an action that is applied to a packet but these field are never looked at again. Not only is this a waste of time it also makes optimizations involving caching the flow key more difficult.
2010-07-15datapath: Don't set tunnel_id in a function.Jesse Gross
We don't need a function to set a variable. In practice it will almost certainly get inlined but this makes it easier to read.
2010-07-15gre: Use struct for parsing GRE header.Jesse Gross
GRE is a somewhat annoying protocol because the header is variable length. However, it does have a few fields that are always present so we can make the parsing seem less magical by using a struct for those fields instead of building it up field by field.
2010-07-15gre: Wait for an RCU grace period before freeing port.Jesse Gross
We currently remove ports from the GRE hash table and then immediately free the ports. Since received packets could be using that port this can lead to a crash (the port has already been detached from the datapath so this can't happen for transmitted packets). As a result we need to wait for an RCU grace period to elapse before actually freeing the port. In an ideal world we would actually remove the port from the hash table in a hypothetical gre_detach() function since this is one of the purposes of detaching. However, we also use the hash table to look for collisions in the lookup criteria and don't want to allow two identical ports to exist. It doesn't matter though because we aren't blocking on the freeing of resources.
2010-07-15vport: Use DEFINE_PER_CPU instead of dynamically allocating loop counter.Jesse Gross
DEFINE_PER_CPU is simpler and faster than alloc_percpu() so use it for the loop counter, which is already statically defined.
2010-07-15datapath: Put return type on same line as arguments for functions.Jesse Gross
In some places we would put the return type on the same line as the rest of the function definition and other places we wouldn't. Reformat everything to match kernel style.
2010-07-15vport: Use EBUSY to represent already attached device.Jesse Gross
We currently use EEXIST to represent both a device that is already attached and for GRE devices that are the same as another one. Instead use EBUSY for already attached devices to disambiguate the two situations.
2010-07-15datapath: Make checksum offsets unsigned.Jesse Gross
The offsets for checksum offsets should always be positive so make that explicit by using unsigned ints. This helps bug checks that test if the offsets are greater than their upper limits.