Age | Commit message (Collapse) | Author |
|
Commit e72e793 (Add ability to restrict flow mods and flow stats
requests to cookies.) modified cookie handling. Some of its behavior
was unintuitive and there was at least one bug (described below).
Commit f66b87d (DESIGN: Document uses for flow cookies.) attempted to
document a clean design for cookie handling. This commit updates the
DESIGN document and brings the implementation in line with it.
In commit e72e793, the code that handled processing OpenFlow flow
modification requests set the cookie mask to exact-match. This seems
reasonable for adding flows, but is not correct for matching, since
OpenFlow 1.0 doesn't support matching based on the cookie. This commit
changes to cookie mask to fully wildcarded, which is the correct
behavior for modifications and deletions. It doesn't cause any problems
for flow additions, since the mask is ignored for that operation.
Bug #9742
Reported-by: Luca Giraudo <lgiraudo@nicira.com>
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
|
|
Open vSwitch userspace can set up flows at a high rate, but it is somewhat
"bursty" in opportunities to set up flows, by which I mean that OVS sets up
a batch of flows, then goes off and does some other work for a while, then
sets up another batch of flows, and so on. The result is that, if a large
number of packets that need flow setups come in all at once, then some of
them can overflow the relatively small kernel-to-user buffers.
This commit increases the kernel-to-user buffers from the default of
approximately 120 kB each to 1 MB each. In one somewhat synthetic test
case that I ran based on an "hping3" that generated a load of about 20,000
new flows per second (including both requests and replies), this reduced
the packets dropped at the kernel-to-user interface from about 30% to none.
I expect that it will similarly improve packet loss in workloads where
flow arrival is not easily predictable.
(This has little effect on workloads generated by "ovs-benchmark rate"
because that benchmark is effectively "self-clocking", that is, a new flow
is triggered only by a reply to a request made earlier, which means that
the number of buffered packets at any given has a known, constant upper
limit.)
Bug #10210.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
This makes it possible to add entries for decoding OpenFlow messages with
newer versions, e.g. OpenFlow 1.1 or 1.2. However, no actual messages for
newer versions are actually implemented yet; that will come later.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Most structures in this file have an "nx_" prefix, so this makes naming
more consistent.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
The new PACKET_IN format implemented in this patch includes flow
metadata such as the cookie, table_id, and registers.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
|
|
With this commit, it is possible to limit flow deletions and
modifications to specific cookies. It also provides the ability to
dump flows based on their cookies.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
|
|
|
|
Many of our kernel copyright messages make reference to code being
copied from the Linux kernel, which is a bit odd for code in the
kernel. This changes them to use the standard GNU GPL boilerplate
instead. It does not change the actual license, which continues to
be GPLv2.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
In the future it is likely that our vlan support will expand to
include multiply tagged packets. When this happens, we would
ideally like for it to be consistent with our current tagging.
Currently, if we receive a packet with a partial VLAN tag we will
automatically drop it in the kernel, which is unique among the
protocols we support. The only other reason to drop a packet is
a memory allocation error. For a doubly tagged packet, we will
parse the first tag and indicate that another tag was present but
do not drop if the second tag is incorrect as we do not parse it.
This changes the behavior of the vlan parser to match other protocols
and also deeper tags by indicating the presence of a broken tag with
the 802.1Q EtherType but no vlan information. This shifts the policy
decision to userspace on whether to drop broken tags and allows us to
uniformly add new levels of tag parsing.
Although additional levels of control are provided to userspace, this
maintains the current behavior of dropping packets with a broken
tag when using the NORMAL action because that is the correct behavior
for an 802.1Q-aware switch. The userspace flow parser actually
already had the new behavior so this corrects an inconsistency.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
When the datapath was converted to use Netlink attributes for describing
flow keys, I had a vague idea of how it could be smoothly extensible, but
I didn't actually implement extensibility or carefully think it through.
This commit adds a document that describes how flow keys can be extended
in a compatible fashion and adapts the existing interface to match what
it says.
This commit doesn't actually implement extensibility. I already have a
separate patch series out for that. This patch series borrows from that
one heavily, but the extensibility series will need to be reworked
somewhat once this one is in.
This commit is only lightly tested because I don't have a good test setup
for VLANs.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
It's useful to be able to track sets of attributes by using their values as
bit indexes. That's easier if the values are all in the range of a basic
C integer type.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
IPv6 uses the term "traffic class" for what IPv4 calls
"type-of-service". This commit renames the the "ipv6_tos" field to
"ipv6_tclass" in the "ovs-key_ipv6" struct to be more consistent with
the IPv6 terminology.
Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Add support matching the IPv4 TTL and IPv6 hop limit fields. This
commit also adds support for modifying the IPv4 TTL. Modifying the IPv6
hop limit isn't currently supported, since we don't support modifying
IPv6 headers.
We will likely want to change the user-space interface, since basic
matching and setting the TTL are not generally useful. We will probably
want the ability to match on extraordinary events (such as TTL of 0 or 1)
and a decrement action.
Feature #8024
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Signed-off-by: Justin Pettit <jpettit@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
The interfaces related to tunneling aren't finalized enough to be
sent upstream but we also still want to retain them in the OVS
repository. Since userspace should be compatible with both versions
of the kernel, this renumbers the tunnel interfaces to high numbers
so that we can continue to add new interfaces without conflict.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Most of issues are reported by checkpatch.pl
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7771
|
|
Some invalid ports (those above the maximum port number supported by the
datapath, including OpenFlow reserved ports that are not translated by OVS
into some other number) will be rejected by the datapath. It's better to
catch these early and send back an appropriate OpenFlow error code, rather
than to just get EINVAL from the kernel and have to guess at the problem.
Reported-by: Aaron Rosen <arosen@clemson.edu>
|
|
|
|
|
|
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Following patch adds skb-priority to flow key. So userspace will know
what was priority when packet arrived and we can remove the pop/reset
priority action. It's no longer necessary to have a special action for
pop that is based on the kernel remembering original skb->priority.
Userspace can just emit a set priority action with the original value.
Since the priority field is a match field with just a normal set action,
we can convert it into the new model for actions that are based on
matches.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7715
|
|
The exit action causes the switch to immediately halt processing of
further actions. It's intended to be used in conjunction with
multi table support. It allows a table to force tables which call
it to discontinue processing a flow.
|
|
|
|
This patch special cases OFPP_NONE to be always up in bundle
actions. Presumably, if a controller put OFPP_NONE in their bundle
action, they want it to be an available choice.
This patch also adds documentation to the bundle action about slave
liveness.
|
|
Generally we've used the comments to the right of attribute enums to
explain the types of the arguments and the ones above them to explain their
meaning. This is a reasonable separation since it ensures that the type
of the argument is obvious, which in my opinion is important.
This updates a few comments to match this pattern.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
The userspace/kernel interface file had acquired a mixture of userspace
and kernel style, so this makes it use kernel style consistently.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Currently we hard code the versions of our GENL families to 1 but it's
nicer to have symbolic constants.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Until now, OVS has handled IP fragments more awkwardly than necessary. It
has not been possible to match on L4 headers, even in fragments with offset
0 where they are actually present. This means that there was no way to
implement ACLs that treat, say, different TCP ports differently, on
fragmented traffic; instead, all decisions for fragment forwarding had to
be made on the basis of L2 and L3 headers alone.
This commit improves the situation significantly. It is still not possible
to match on L4 headers in fragments with nonzero offset, because that
information is simply not present in such fragments, but this commit adds
the ability to match on L4 headers for fragments with zero offset. This
means that it becomes possible to implement ACLs that drop such "first
fragments" on the basis of L4 headers. In practice, that effectively
blocks even fragmented traffic on an L4 basis, because the receiving IP
stack cannot reassemble a full packet when the first fragment is missing.
This commit works by adding a new "fragment type" to the kernel flow match
and making it available through OpenFlow as a new NXM field named
NXM_NX_IP_FRAG. Because OpenFlow 1.0 explicitly says that the L4 fields
are always 0 for IP fragments, it adds a new OpenFlow fragment handling
mode that fills in the L4 fields for "first fragments". It also enhances
ovs-ofctl to allow users to configure this new fragment handling mode and
to parse the new field.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #7557.
|
|
Almost all current actions can be expressed in the form of
push/pop/set <field>, where field is one of the match fields. We can
create three base actions and take a field. This has both a nice
symmetry and avoids inconsistencies where we can match on the vlan
TPID but not set it.
Following patch converts all actions to this new format.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7115
|
|
|
|
The Linux headers only check endianness if __CHECK_ENDIAN__ is declared.
We want that, so turn it on.
|
|
Avoids errors like the following:
In file included from ./include/openvswitch/types.h:21,
from ./lib/vconn.h:21,
from tests/test-vconn.c:18:
/usr/include/sys/types.h:52: error: conflicting types for 'ino_t'
/usr/include/linux/types.h:14: error: previous declaration of 'ino_t' was here
/usr/include/sys/types.h:62: error: conflicting types for 'dev_t'
/usr/include/linux/types.h:13: error: previous declaration of 'dev_t' was here
/usr/include/sys/types.h:67: error: conflicting types for 'gid_t'
/usr/include/linux/types.h:27: error: previous declaration of 'gid_t' was here
/usr/include/sys/types.h:72: error: conflicting types for 'mode_t'
/usr/include/linux/types.h:15: error: previous declaration of 'mode_t' was here
/usr/include/sys/types.h:77: error: conflicting types for 'nlink_t'
/usr/include/linux/types.h:16: error: previous declaration of 'nlink_t' was here
/usr/include/sys/types.h:82: error: conflicting types for 'uid_t'
/usr/include/linux/types.h:26: error: previous declaration of 'uid_t' was here
/usr/include/sys/types.h:90: error: conflicting types for 'off_t'
/usr/include/linux/types.h:17: error: previous declaration of 'off_t' was here
|
|
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
|
|
We want datapath-protocol.h to be acceptable as a Linux kernel header, so
it must use Linux kernel types and must not have references to Open vSwitch
symbols or header files. This commit primarily makes that change to
datapath-protocol.h.
At the same time, at least for now we also want datapath-protocol.h to be
usable on non-Linux platforms, so we need some kind of compatiblity. Thus,
this commit also introduces a <linux/types.h> header file that defines the
necessary Linux kernel types on non-Linux platforms.
In turn, this requires openvswitch/types.h to use the Linux types directly
for ovs_be<N>; otherwise, sparse complains because now __be<N> and
ovs_be<N> are incompatible from its perspective, so this commit makes that
change too.
I don't have a non-Linux kernel platform readily available, so I only
tested the non-Linux part of the linux/types.h substitute by forcing that
case to be triggered with #if 0. It worked, except for errors in actual
Linux kernel headers included explicitly from OVS source files, so I think
it's likely to work in practice.
Bug #7559.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Most of the enum tags in this file are lowercased versions of the uppercase
enum prefixes (or slightly less abbreviated versions, e.g. "dp" becomes
"datapath"). This commit fixes up the others for consistency.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
It's not needed.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
|
|
Bug #7559.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Commit b063d9f06 "datapath: Use unicast Netlink sockets for upcalls" that
switched from multicast to unicast Netlink for sending upcalls added a
Netlink PID to each kernel flow, used by OVS_ACTION_ATTR_USERSPACE actions
within the flow as target.
This commit drops this per-flow PID in favor of a per-action PID, because
that is more flexible. It does not yet make use of this additional
flexibility, so behavior should not change.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7559.
|
|
These macros caused trouble if datapath-protocol.h was included before
openflow.h. Later references to the icmp_type and icmp_code members of
struct ovs_key_icmp caused compiler errors, because the macros caused them
to try to refer to nonexistent tp_src and tp_dst members in those
structures.
|
|
Following patch removes ifIndex attribute of vport which is not
used in userspace.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Bug #7114
|
|
Feature #7527
|
|
Following patch adds sampling action which takes probability and set
of actions as arguments. When probability is hit, actions are executed for
given packet.
USERSPACE action's userdata (u64) is used to store struct
user_action_cookie as cookie. CONTROLLER action is fixed accordingly.
Now we can remove sFlow code from kernel and implement sFlow generically
as SAMPLE action. sFlow is defined as SAMPLE Action with probability (sFlow
sampling rate) and USERSPACE action as argument. USERSPACE action's data
is used as cookie. sFlow uses this cookie to store output-port, number of
output ports and vlan-id. sample-pool is calculated by using vport
stats.
Signed-off-by: Pravin Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
Currently we publish several multicast groups for upcalls and let
userspace sockets subscribe to them. The benefit of this is mostly
that userspace is the one doing the subscription - the actual
multicast capability is not currently used and probably wouldn't be
even if we moved to a multiprocess model. Despite the convenience,
multicast sockets have a number of disadvantages, primarily that
we only have a limited number of them so there could be collisions.
In addition, unicast sockets give additional flexibility to userspace
by allowing every object to potentially have a different socket
chosen by userspace for upcalls. Finally, any future optimizations
for upcalls to reduce copying will likely not be compatible with
multicast anyways so disallowing it potentially simplifies things.
We also never unregistered the multicast groups registered for upcalls
and leaked them on module unload. As a side effect, this solves that
problem.
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Ben Pfaff <blp@nicira.com>
|
|
The 'u' in uint64_t apparently got clipped off of the tx_dropped
member of struct vport_stats in between review and push, incorrectly
making this a signed type.
CC: Pravin Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Acked-by: Pravin Shelar <pshelar@nicira.com>
|
|
I noticed a couple of typos and inaccuracies here while reviewing Jean's
changes to it for OXM at https://www.opennetworking.org/bugs/browse/EXT-1
|
|
Older kernels do not advertise the multicast groups of families
when requested by userspace. As a workaround, this patch hardcodes
the multicast group ID of the ovs_vport family on these kernels.
Userspace will be able to fall back to this hardcoded value if the
standard mechanism is unavailable.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
Currently ovs is using device stats for Linux devices and count them
itself in other situations. This leads to overlap with hardware stats,
inconsistencies, etc. It's much better to just always count the packets
flowing through the switch and let userspace do any merging that it wants.
Following patch removes vport->get_stats() interface. vport-stat is changed
to use new `struct ovs_vport_stat` rather than rtnl_link_stats64.
Definitions of rtnl_link_stats64 is removed from OVS. dipf_port->stat is also
removed as aggregate stats are only available at netdev layer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
|
|
There are a few loose ends here. First, learning actions cause too much
flow revalidation. Upcoming commits will fix that problem. The following
additional issues have not yet been addressed:
* Resource limits: nothing yet limits the maximum number of flows that
can be learned. It is possible to exhaust all system memory.
* Age reporting: there is no way to find out how soon a learned table
entry is due to be evicted.
To try this action out, here's a recipe for a very simple-minded MAC
learning switch. It uses a 10-second MAC expiration time to make it easier
to see what's going on:
ovs-vsctl del-controller br0
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, hard_timeout=10, \
NXM_OF_VLAN_TCI[0..11], NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \
output:NXM_OF_IN_PORT[]), resubmit(,1)"
ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood"
You can then dump the MAC learning table with:
ovs-ofctl dump-flows br0 table=1
|