Age | Commit message (Collapse) | Author |
|
Following patch memset ovs_key_ipv4_tunnel padding area so that
packets from a flow would be mapped to same flow in kernel datapath
flow table.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #14843
|
|
Following patch fixes flow buffer size calculation to allocate
sufficient memory for all nested attributes in new tunnel
attribute.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
Bug #14767
|
|
Following patch breaks down single ipv4_tunnel netlink attribute into
individual member attributes. It will help when we extend tunneling
parameters in future.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #14611
|
|
This patch adds support for skb mark matching and set action.
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ansis Atteka <aatteka@nicira.com>
Conflicts:
datapath/flow.c
lib/dpif-netdev.c
lib/flow.h
lib/odp-util.c
ofproto/ofproto-dpif.c
|
|
Header caching previously required the ability to maintain the lifetime
of flows across RCU boundaries. However, now that header caching is
gone we can simplfy the code and make it match the upstream version.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
|
|
Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
This is a first pass at providing a tun_key which can be
used as the basis for flow-based tunnelling. The
tun_key includes and replaces the tun_id in both struct
ovs_skb_cb and struct sw_tun_key.
This patch allows all existing tun_id behaviour to still work. Existing
users of tun_id are redirected to tun_key->tun_id to retain compatibility.
However, when the userspace code is updated to make use of the new
tun_key, the old behaviour will be deprecated and removed.
NOTE: With these changes, the tunneling code no longer assumes input and
output keys are symmetric. If they are not, PMTUD needs to be disabled
for tunneling to work.
Signed-off-by: Kyle Mestery <kmestery@cisco.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reviewed-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
This is analogous to the change made in userspace with
2508ac16defd417b94fb69689b6b1da4fbc76282 (odp-util: Update
ODPUTIL_FLOW_KEY_BYTES for current kernel flow format.). The extra
space for vlan encapsulation was not included in the allocation for
maximum length flows.
Found by code inspection and to my knowledge has never been hit, likely
because skb allocations are padded out to a cacheline, making userspace
more susceptible to this problem than the kernel. In theory, however,
the right combination of flow and packet size could result in a kernel
panic.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Kyle Mestery <kmestery@cisco.com>
|
|
Replaced all instances of Nicira Networks(, Inc) to Nicira, Inc.
Feature #10593
Signed-off-by: Raju Subramanian <rsubramanian@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Use hash table to store ports of datapath. Allow 64K ports per switch.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #2462
|
|
Following patch introduces a timer based event to rehash flow-hash
table. It makes finding collisions difficult to for an attacker.
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
OVS has quite a few global symbols that should be scoped with a
prefix to prevent collisions with other modules in the kernel.
Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Signed-off-by: Jesse Gross <jesse@nicira.com>
|
|
Many of our kernel copyright messages make reference to code being
copied from the Linux kernel, which is a bit odd for code in the
kernel. This changes them to use the standard GNU GPL boilerplate
instead. It does not change the actual license, which continues to
be GPLv2.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Without this, every VLAN packet goes to userspace because VLAN flows
cannot be set up.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
A few of the recently added fields in struct sw_flow_key had
comments that weren't properly aligned.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Add support matching the IPv4 TTL and IPv6 hop limit fields. This
commit also adds support for modifying the IPv4 TTL. Modifying the IPv6
hop limit isn't currently supported, since we don't support modifying
IPv6 headers.
We will likely want to change the user-space interface, since basic
matching and setting the TTL are not generally useful. We will probably
want the ability to match on extraordinary events (such as TTL of 0 or 1)
and a decrement action.
Feature #8024
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
This will be useful later when we add support for matching the ECN bits
within the TOS field.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Most of issues are reported by checkpatch.pl
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7771
|
|
Following patch adds skb-priority to flow key. So userspace will know
what was priority when packet arrived and we can remove the pop/reset
priority action. It's no longer necessary to have a special action for
pop that is based on the kernel remembering original skb->priority.
Userspace can just emit a set priority action with the original value.
Since the priority field is a match field with just a normal set action,
we can convert it into the new model for actions that are based on
matches.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7715
|
|
Until now, OVS has handled IP fragments more awkwardly than necessary. It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present. This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.
This commit improves the situation significantly. It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset. This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers. In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.
This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments". It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
|
|
Almost all current actions can be expressed in the form of
push/pop/set <field>, where field is one of the match fields. We can
create three base actions and take a field. This has both a nice
symmetry and avoids inconsistencies where we can match on the vlan
TPID but not set it.
Following patch converts all actions to this new format.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7115
|
|
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
|
|
Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that
switched from multicast to unicast Netlink for sending upcalls added a
Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions
within the flow as target.
This commit drops this per-flow PID in favor of a per-action PID, because
that is more flexible. It does not yet make use of this additional
flexibility, so behavior should not change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
|
|
Currently we publish several multicast groups for upcalls and let
userspace sockets subscribe to them. The benefit of this is mostly
that userspace is the one doing the subscription - the actual
multicast capability is not currently used and probably wouldn't be
even if we moved to a multiprocess model. Despite the convenience,
multicast sockets have a number of disadvantages, primarily that
we only have a limited number of them so there could be collisions.
In addition, unicast sockets give additional flexibility to userspace
by allowing every object to potentially have a different socket
chosen by userspace for upcalls. Finally, any future optimizations
for upcalls to reduce copying will likely not be compatible with
multicast anyways so disallowing it potentially simplifies things.
We also never unregistered the multicast groups registered for upcalls
and leaked them on module unload. As a side effect, this solves that
problem.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Currently OVS uses its own hashing implmentation for hash tables
which has some problems, e.g. error case on deletion code.
Following patch replaces that with hlist based hash table which is
consistent with other kernel hash tables. As Jesse suggested, flex-array
is used for allocating hash buckets, So that we can have large
hash-table without large contiguous kernel memory.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
When ovs-vswitchd executes actions on a synthesized packet, that is, on a
packet that is not being forwarded from any particular port but is being
generated by ovs-vswitchd itself or by an OpenFlow controller (using a
OFPT_PACKET_OUT message with an in_port of OFPP_NONE), there is no good
choice for the in_port to pass to the kernel in the flow in the
OVS_PACKET_CMD_EXECUTE message. This commit allows ovs-vswitchd to omit
the in_port entirely in this case.
This fixes a bug in OFPT_PACKET_OUT: using an in_port of OFPP_NONE would
cause the packet to be dropped by the kernel, since that's an invalid
input port.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Reported-by: Aaron Rosen <arosen@clemson.edu>
|
|
The prefix "ODP_*" is not overly descriptive in the context of the
larger Linux tree. This commit changes the prefix to "OVS_*" for the
userpace to kernel interactions. The userspace libraries still use
"ODP_" in many of their interfaces since it is more descriptive in the
OVS oeuvre.
Feature #6904
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
The fields of the kernel flow key are now grouped by protocol rather
than using generic names. The containing structures describe the
category, so it is no longer necessary to use prefixes. Most of
these prefixes have been removed but nw_proto and nw_tos have
retained them. This renames the fields for consistency and brevity.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Until now, the tun_id and in_port have been lost when a packet is sent from
the kernel to userspace and then back to the kernel. I didn't think that
this was a problem, but recent behavior made me look closer and see that
it makes a difference if sFlow is turned on or if an
ODP_ATTR_ACTION_CONTROLLER action is present. We could possibly kluge
around those, but for future-proofing it seems better to pass the packet
metadata from userspace to the kernel. That is what this commit does.
This commit introduces a user-kernel protocol break. We could avoid that,
if it is desirable, by making ODP_PACKET_ATTR_KEY optional for
ODP_PACKET_CMD_EXECUTE commands.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Currently the whole flow key struct is hashed on every packet received from the
network or userspace. The whole struct is also compared byte-for-byte when
doing flow table lookups. This consumes a fair percentage of CPU time, and most
of the time part of the structure is unused (e.g. the IPv6 fields when handling
IPv4 traffic; the IPv4 fields when handling Ethernet frames).
This commit reorders the fields in the flow key struct to put the least
commonly used elements at the end and changes the hash and comparison functions
to look only at the part that contains data.
Signed-off-by: Andrew Evans <aevans@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Fixes a number of minor elements in the IPv6 extraction and
parsing code to better conform to kernel style. Examples include
using kernel types/functions, adding line breaks, and using
unlikely() macros. There is no functional change.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Both userspace and the kernel allocate space based on the max size of a
nlattr-formatted flow. It was easy to change the max size of a flow
definition and cause crashes by forgetting to update one or both of
those definitions. This commit attempts to make that harder by
providing a better description of how the max size is calculated and a
build check to look for a common indication that it may have changed.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
The addition of IPv6 matching increased the maximum size of a
nlattr-formatted flow. This was not properly reflected in the userspace
and kernel #defines that reserve space for the flows and could lead to
crashes. This commit increases the size uniformly to 132 bytes.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
IPv6 uses Neighbor Discovery messages in a similar manner to how IPv4
uses ARP. This commit adds support for matching deeper into the
payloads of Neighbor Solicitation (NS) and Neighbor Advertisement (NA)
messages. Currently, the matching fields include:
- NS and NA Target (nd_target)
- NS Source Link Layer Address (nd_sll)
- NA Target Link Layer Address (nd_tll)
When defining IPv6 Neighbor Discovery rules, the Nicira Extensible Match
(NXM) extension to OVS must be used.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Provides ability to match over IPv6 traffic in the same manner as IPv4.
Currently, the matching fields include:
- IPv6 source and destination addresses (ipv6_src and ipv6_dst)
- Traffic Class (nw_tos)
- Next Header (nw_proto)
- ICMPv6 Type and Code (icmp_type and icmp_code)
- TCP and UDP Ports over IPv6 (tp_src and tp_dst)
When defining IPv6 rules, the Nicira Extensible Match (NXM) extension to
OVS must be used.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
OpenFlow 1.0 doesn't allow matching on the ARP source and target
hardware address. This has caused us to introduce hacks such as the
Drop Spoofed ARP action. Now that we have extensible match, we can
match on more fields within ARP:
- Source Hardware Address (arp_sha)
- Target Hardware Address (arp_tha)
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
The current reporting of flow last used time has two issues that
cause it to incorrectly report the system monotonic time when the
flow was last used.
The first is that it simply converts the stored jiffies value to
milliseconds by scaling with a constant. This does not work because
jiffies is not zero based and can wrap around on 32-bit platforms.
The second is there is no guarantee that jiffies advances at the
same rate as the RTC based monotonic time that userspace uses.
A variety of factors can cause differences, including system suspend
and clock drift. These are not too important for relatively short
time periods such as the duration of the flow (nor is the flow timing
precision of extreme importance). However, when the time being
measured is the duration since system boot (assuming that the above
issues had been addressed) the difference can become significant.
This addresses both issues by restoring behavior similar to the
previous method of computing the flow used time, though in a
slightly different form to reflect the needs of the Netlink code.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
This completes the transition to the Generic Netlink interface, and
so this commit restores support for Linux 2.6.18 and later.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
This commit takes one step in that direction by making the kernel report
its idea of the flow that a packet belongs to whenever it passes a packet
up to userspace. This means that userspace can intelligently figure out
what to do:
- If userspace's notion of the flow for the packet matches the kernel's,
then nothing special is necessary.
- If the kernel has a more specific notion for the flow than userspace,
for example if the kernel decoded IPv6 headers but userspace stopped
at the Ethernet type (because it does not understand IPv6), then again
nothing special is necessary: userspace can still set up the flow in
the usual way.
- If userspace has a more specific notion for the flow than the kernel,
for example if userspace decoded an IPv6 header but the kernel
stopped at the Ethernet type, then userspace can forward the packet
manually, without setting up a flow in the kernel. (This case is
bad from a performance point of view, but at least it is correct.)
This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, although userspace does now
have enough information to do that intelligently. This will have to wait
for later commits.
This commit is bigger than it would otherwise be because it is rolled
together with changing "struct odp_msg" to a sequence of Netlink
attributes. The alternative, to do each of those changes in a separate
patch, seemed like overkill because it meant that either we would have to
introduce and then kill off Netlink attributes for in_port and tun_id, if
Netlink conversion went first, or shove yet another variable-length header
into the stuff already after odp_msg, if adding the flow key to odp_msg
went first.
This commit will slow down performance of checksumming packets sent up to
userspace. I'm not entirely pleased with how I did it. I considered a
couple of alternatives, but none of them seemed that much better.
Suggestions welcome. Not changing anything wasn't an option,
unfortunately. At any rate some slowdown will become unavoidable when OVS
actually starts using Netlink instead of just Netlink framing.
(Actually, I thought of one option where we could avoid that: make
userspace do the checksum instead, by passing csum_start and csum_offset as
part of what goes to userspace. But that's not perfect either.)
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
One of the goals for Open vSwitch is to decouple kernel and userspace
software, so that either one can be upgraded or rolled back independent of
the other. To do this in full generality, it must be possible to change
the kernel's idea of the flow key separately from the userspace version.
In turn, that means that flow keys must become variable-length. This
commit makes that change using Netlink attribute sequences.
This commit does not actually make userspace flexible enough to handle
changes in the kernel flow key structure, because userspace doesn't yet
have enough information to do that intelligently. Upcoming commits will
fix that.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Sparse can warn about incorrect usage of RCU via direct access to
points when used in conjuction with __rcu and CONFIG_SPARSE_RCU.
This adds the necessary annotations.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
The __packed macro is preferred instead of an explicit GCC attribute,
so use it instead to deal with structure packing.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
While doing test builds on numerous kernel versions I found that one build
failed because "struct nlattr" wasn't visible from flow.h. I guess that
we accidentally depend on <linux/netlink.h> being included indirectly, but
this didn't always happen.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
In the medium term, we plan to migrate the datapath to use Netlink as its
communication channel. In the short term, we need to be able to have
actions with 64-bit arguments but "struct odp_action" only has room for
48 bits. So this patch shifts to variable-length arguments using Netlink
attributes, which starts in on the Netlink transition and makes 64-bit
arguments possible at the same time.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
These functions run 99% of the time in atomic context and the benefit of
passing along the 'gfp' argument for the other 1% doesn't seem to outweigh
the cost.
Suggested-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
is_frag is only used for communication between two functions, which
means that it doesn't really need to be in the SKB CB. This wouldn't
necessarily be a problem except that there are also a number of other
paths that lead to this being uninitialized. This isn't a problem
now but uninitialized memory seems dangerous and there isn't much
upside.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Reviewed-by: Ben Pfaff <blp@nicira.com>
|