Age | Commit message (Collapse) | Author |
|
Some OpenFlow 1.0 controllers incorrectly use OPFP_CONTROLLER as the
in_port in packet-out messages, when OFPP_NONE is their intent. Until now,
Open vSwitch has rejected such requests with an error message. This commit
makes Open vSwitch instead treat OFPP_CONTROLLER the same as OFPP_NONE for
compatibility with those controllers.
(Also, as of this writing, OpenFlow 1.0.1 appears to be changing the port
to use from OFPP_NONE to OFPP_CONTROLLER.)
Suggested-by: Rob Sherwood <rob.sherwood@bigswitch.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
This code in xlate_table_action() is supposed to tag flows in tables that
have special forms so that changes do not require revalidating every flow.
When rule->tag is nonzero, its value can be used, because we know in this
case that rule->cr.wc is the same as table->other_table->wc and that thus
rule->tag caches the return value of the rule_calculate_tag() expression.
When rule->tag is zero (a "catchall" rule) we need to calculate the tag
manually because we have no way to cache it in that case.
I discovered this bug by running an "hping3" between a couple of VMs plus
the following commands on OVS in the middle:
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, \
idle_timeout=600, NXM_OF_VLAN_TCI[0..11], \
NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \
output:NXM_OF_IN_PORT[], fin_idle_timeout=10), resubmit(,1)"
ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood"
Without this patch, flows don't get properly invalidated upon initial MAC
learning, so one sees warnings like the following:
in_port(2),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),
eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,
ttl=64,frag=no),tcp(src=13966,dst=0): inconsistency in subfacet
(actions were: 3,0,1) (correct actions: 1)
This patch fixes the problem and thus avoids these warnings.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
"ovs-ofctl del-flows <bridge>" can result in the following call path:
delete_flows_loose() in ofproto.c
-> collect_rules_loose() -- uses 'ofproto_node' inside 'struct rule'
-> rule_destruct() in ofproto-dpif.c
-> facet_revalidate()
-> facet_remove()
-> facet_flush_stats()
-> facet_account()
-> xlate_actions()
-> xlate_learn_action()
-> ofproto_flow_mod() back in ofproto.c
-> modify_flow_strict()
-> collect_rules_strict() -- also uses 'ofproto_node'
which goes "boom" when we fall back up the call chain because the nested
use of ofproto_node steps on the outer use of ofproto_node.
This commit fixes the problem by refusing to translate "learn" actions
within facet_flush_stats(), breaking the doubled use.
Another possible approach would be to switch to another way to keep track
of rules in the flow_mod implementations, so that there'd be no fighting
over 'ofproto_node'. But then "ovs-ofctl del-flows" might still leave some
flows around (ones created by "learn" actions as flows are accounted as
facets get deleted), which would be surprising behavior. And it seems in
general a bad idea to allow recursive flow_mods; the consequences have not
been carefully thought through.
Before this commit, one can reproduce the problem by running an "hping3"
between a couple of VMs plus the following commands on OVS in the middle.
Sometimes you have to run them a few times:
ovs-ofctl del-flows br0
ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, \
idle_timeout=600, NXM_OF_VLAN_TCI[0..11], \
NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \
output:NXM_OF_IN_PORT[], fin_idle_timeout=10), resubmit(,1)"
ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood"
This commit has a side effect that leftover unaccounted packets no longer
update the timeouts in MAC learning actions in some cases, when the facets
that cause updates are deleted. At most one second of updates should be
lost.
Bug #10184.
Reported-by: Michael Mao <mmao@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
tag_type is currently uint32_t but using uint32_t directly is conceptually
wrong.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
When STP is enabled on a bridge, the STP module puts its ports in
an STP_LISTENING state until STP converges. This causes all
traffic destined for these ports to be dropped. If STP is disabled
on the bridge, but not explicitly disabled on its ports, the bridge
fails to remove the STP state from these ports. Therefore, if a
port is in an STP_LISTENING state, it will remain in that state and
continue to drop all traffic indefinitely. This patch fixes the
issue.
Signed-off-by: Ethan Jackson <ethan@nicira.com>
Bug #9157.
|
|
Signed-off-by: Aaron Rosen <arosen@clemson.edu>
[Ben Pfaff added the test.]
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
NICS-11.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Signed-off-by: Ethan Jackson <ethan@nicira.com>
|
|
Otherwise bad translations can stick around.
Bug #9253.
Reported-by: Paul Ingram <paul@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
If a subfacet expired when its facet still had statistics that had not
yet been pushed into the rule, and the facet either used the "normal"
action or the bridge contained a bond port, then facet_account() would
be called after the last subfacet was removed from its facet's list of
subfacets, triggering an assertion failure in list_front().
This fixes the problem by always running facet_flush_stats() (which calls
facet_account()) before deleting the last subfacet from a facet.
This problem took a while to surface because subfacets usually expire only
long after their statistics have been pushed into the rule.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Reported-by: Mike Kruze <mkruze@nicira.com>
Bug #9074.
|
|
Both mirroring and "normal" processing make use of the input bundle to
perform various sanity checks. Controller-generated traffic typically
uses an ingress port of OFPP_NONE, which doesn't have a corresponding
input bundle. This commit fakes one up well enough that mirroring and
"normal" processing succeed.
We looked at creating an actual bundle based on the "real" OFPP_NONE.
This was even uglier, since there were even more special-cases that
needed to be handled, including having to hide it from port queries.
Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Justin Pettit <jpettit@nicira.com>
|
|
Previously, the function returned an "ofport_dpif", but it makes more
sense to return an "ofbundle". This will also be helpful in a
forthcoming commit.
Signed-off-by: Justin Pettit <jpettit@nicira.com>
|
|
Bug #8671.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Consider this scenario: two hypervisors HV-1 and HV-2, connected to a
common physical network over SLB bonds. Two virtual machines VM-1 and VM-2
are both running on HV-1. Patch ports are in use, so that each VM is not
connected to a bridge with a physical Ethernet port but is actually one
virtual "hop" away across a patch port to a second OVS bridge. VM-2 is
running a "ping" process directed at VM-1.
Now migrate VM-1 to HV-2. Suppose that VM-1 fails to send out a gratuitous
ARP following migration, or that the gratuitous ARPs are lost, e.g. because
they are sent before the OpenFlow controller manages to populate the flow
table with rules to allow the VM's traffic
Now we are in a situation where HV-1 has learned that VM-1 is local and
HV-2 has learned that VM-1 is on its bond; both are wrong. One would
expect the problem to resolve itself as soon the VM-1 sends out its first
packet. However, SLB bonds (for important reasons documented in
vswitchd/INTERNALS) are very reluctant to learn that a currently local MAC
is actually on the bond: the only ways to learn that the MAC is on the bond
are to receive a gratuitous ARP (which we won't, since they were dropped)
or for the MAC learning entry to expire after 60 seconds. This means that
VM-1 can send out as much ordinary traffic as it wants (even ARP requests
and other broadcasts) but HV-1 will drop all of it at the physical Ethernet
since it believes that VM-1 is local.
(In an ordinary setup with a single bridge, HV-1 would have unlearned the
address for VM-1 when VM-1's port was deleted, but that didn't happen
because HV-1 only learned that VM-1 was on the patch port that leads to the
integration bridge. The patch port didn't get deleted.)
HV-2 does quickly learn that VM-1 is now local. SLB bonds are only
reluctant to learn that something they think is local is actually on the
bond, not the reverse.
This commit attempts to work around the problem by flushing the MAC
associated with a port from *every* bridge when a port is deleted.
This commit demonstrates yet another good reason not to use SLB bonds.
Build and unit tested only.
Bug #7978.
Bug #7687.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Commit 501f8d1fd75 (ofproto-dpif: Batch interacting with the dpif on flow
miss operations.) caused packets handled manually in userspace not to be
counted in flow statistics. This patch fixes the problem.
Signed-off-by: Ben Pfaff <blp@nicira.com>
Bug #8494.
|
|
When a flow consists solely of an output to OFPP_CONTROLLER, we avoid a
round trip to the kernel and back by calling execute_controller_action()
from handle_flow_miss(). However, execute_controller_action() frees the
packet passed in. This is dangerous, because the packet and the upcall
key are in the same block of malloc()'d memory, as the comment on struct
dpif_upcall says:
/* A packet passed up from the datapath to userspace.
*
* If 'key' or 'actions' is nonnull, then it points into data owned by
* 'packet', so their memory cannot be freed separately. (This is hardly a
* great way to do things but it works out OK for the dpif providers and
* clients that exist so far.)
*/
Thus, we get a use-after-free later on in handle_flow_miss() and eventually
a double free.
This fixes the problem by making execute_controller_action() clone the
packet in this case.
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
Not every port has an associated bundle, so we must not unconditionally
dereference ofport->bundle without first checking that it is nonnull.
(One example of a port without a bundle is a VLAN splinter port.)
Bug #8671.
Reported-by: Michael Mao <mmao@nicira.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
|
|
|
|
Following patch fixes memory leak in case there is ODP_FIT_ERROR
on flow key.
|
|
In an effort to simplify ofproto-dpif, this commit moves the
definition of commit_odp_actions() to odp-util.
|
|
|
|
The key to getting good performance on the netperf CRR test seems to be to
handle the first packet of each new flow as quickly as possible. Until
now, we've only had one opportunity to do that on each trip through the
main poll loop. One way to improve would be to make that poll loop
circulate more quickly. My experiments show, however, that even just
commenting out the slower parts of the poll loop yield minimal improvement.
This commit takes another approach. Instead of making the poll loop
overall faster, it invokes the performance-critical parts of it more than
once during each poll loop.
My measurements show that this commit improves netperf CRR performance by
24% versus the previous commit, for an overall improvement of 87% versus
the baseline just before the commit that removed the poll_fd_woke(). With
this commit, ovs-benchmark performance has also improved by 13% overall
since that baseline.
|
|
This yields a 27% improvement in netperf CRR results in my tests
versus the previous commit, which is a 52% improvement versus
the baseline from just before the poll_fd_woke() optimization was
removed.
|
|
Previously, mirrors only worked when using the "normal" action. This
commit performs mirroring even when mirroring is not used. It also adds
some unit tests.
|
|
This commit adds support for tracking the number of packets and bytes
sent through a mirror. The numbers are kept in the new "statistics"
column on the mirror table in the "tx_packets" and "tx_bytes" keys.
|
|
The "VLAN splinters" feature works around buggy device drivers in old Linux
versions.
This feature is deprecated. When broken device drivers are no longer in
widespread use, we will delete this feature.
I tested earlier versions of this commit, but I have not tested this
version.
See ovs-vswitchd.conf.db(5) for more information.
|
|
In an upcoming commit, VLAN splinters can cause the VLAN TCI of a packet
received on an interface to differ from the logical VLAN TCI. That is,
a packet that is received on a Linux VLAN network device has no VLAN (so
its initial VLAN TCI is 0) but we logically treat it as if it has the VLAN
associated with the VLAN device.
This is only desirable for use with VLAN splinters and should be reverted
when this feature is no longer needed. I'm breaking it out here only to
make the series easier to review.
|
|
This is a prerequisite for the upcoming VLAN splinter patch, because
splinters and non-splintered subfacets might need slightly different
actions due to the VLAN tag being initially different (present vs. absent).
This is only desirable for use with VLAN splinters and should be reverted
when this feature is no longer needed. I'm breaking it out here only to
make the series easier to review.
|
|
All the callers already have the ofport handy, so they might as well just
pass it in directly.
|
|
Feature #4886.
|
|
NetFlow active timeouts were only mixed in with flow expiration for
convenience: both processes need to iterate all the facets. But
an upcoming commit will change flow expiration to work in terms of
a new "subfacet" entity, so they will no longer fit together well.
This change could be seen as an optimization, since NetFlow active
timeouts don't ordinarily have to run as often as flow expiration,
especially when the flow expiration rate is stepped up due to a
large volume of flows.
|
|
|
|
The dscp column of the queue table instructs Open vSwitch to mark
all traffic egressing the queue with the given DSCP bits in its tos
field.
Bug #7046.
|
|
Before executing an output action, ofproto-dpif must commit the
changes it's made to the flow so they are reflected in the
packet. This code has been unnecessarily complex. This patch
attempts to simplify the code in the following ways.
- Commit in fewer places.
In an attempt to provide some optimization, the ofproto-dpif code
separated the commit and output composition steps so things like
flood actions could avoid redundant commits. This is a case of
premature optimization that makes the code significantly more
difficult to reason about. With this patch, commits happen only
when really necessary.
- Only perform full commits.
In an attempt to provide some optimization, the ofproto-dpif code
would allow callers to only commit the part of the flow that they
had modified by directly calling the relevant subroutine. This
practice made the code difficult to reason about and is thus
discontinued.
- Perform all output logic in one function.
All of the logic surrounding the datapath output action has been
placed in the compose_output_action__() function. Most callers
will use the compose_output_action() function which simply passes
reasonable defaults through to compose_output_action__().
|
|
ofproto-dpif failed to update the base flow's tos and ttl fields
when preparing for an output action. This could cause redundant
updates of those fields in the datapath. A future patch adds a
test which could have caught the issue for the tos bits.
|
|
Before this patch, the logic for outputting to a port was scattered
all around ofproto-dpif. This patch simplifies the code by forcing
it to use one code path to check if a port is forwarding, and
output if appropriate. Future patches will rely on this
simplification to implement new features.
|
|
According to the OpenFlow specification, the OFPP_ALL output to
every port except the input port regardless of whether or not they
are "blocked or link down". This patch implements this logic, and
marginally simplifies the interface of flood_packets().
|
|
The add_output_action() function takes an OpenFlow port number, but
the enqueue action passes it a datapath port number.
|
|
Some parts of the code set the NetFlow Output Interface to the
OpenFlow port. Other set it to the datapath port. This patch
consistently sets it to the OpenFlow port.
|
|
|
|
Linux hosts (and probably others) tend to ignore priority-tagged frames, so
this new setting allows Open vSwitch to suppress sending them.
Reported-by: Michael Mao <mmao@nicira.com>
Bug #8320.
|
|
Fixes the following gcc warning:
"error: variable ‘flow_vid’ set but not used
[-Werror=unused-but-set-variable]"
|
|
Fixes crash introduced in 9ba15e (ofproto-dpif: Improve RSPAN
translation performance from O(n**2) to O(n).) The code always
dereferenced the members of the "mirrors" array in ofproto even if they
were null.
|
|
struct dst is an intermediate form for OFPP_NORMAL translation that
has existed since the beginning of Open vSwitch development. It
has always been a bit ugly, since ideally we wouldn't need any intermediate
form at all. This commit eliminates it, which speeds up OFPP_NORMAL
translation.
struct dst was used earlier to eliminate duplicate packets in OFPP_NORMAL
output. Now, we have rules that eliminate duplicate packets in a cheaper
way.
OFPP_NORMAL outputs packets for two different reasons, forwarding and
mirroring, so those are the two possible sources of packet duplication.
Forwarding by itself never outputs two copies of a packet to a single port
on a given VLAN, and this is also true of mirroring by itself since commit
"ofproto-dpif: Improve RSPAN translation performance from O(n**2) to O(n)".
The only remaining possibility, then, is that forwarding and mirroring
between them duplicate an output packet. However, the algorithms are now
designed to prevent this. Forwarding will never output to a mirroring
destination output port. Forwarding will only output to a mirroring
output VLAN if the packet arrived on a mirroring output VLAN, and in that
case mirroring is disabled.
This commit has the side effect of improving behavior for VLAN output.
Previously, a packet that arrived with an 802.1Q tag with both VID and PCP
of 0 would be output with a similar tag in some cases; now it is
consistently stripped. This change is reflected in the unit test change.
We really need some unit tests for mirroring. I have not tested mirroring
behavior, only theorized about it (see above).
|
|
Really the "trace" command should support this but in fact I need it for
an upcoming update to a test.
|
|
An upcoming patch to odp-util will add a new user, but this seems like a
reasonable change in any case.
|
|
This code previously checked whether each individual mirror output was
already in the set of destinations. This is O(n**2) in the number of
ports in a bridge.
The new code uses a smarter algorithm to eliminate duplicates, one that is
O(n) in the number of ports in a bridge.
|
|
I came close to add a "continue;" inside the main "while" loop in
compose_mirror_dsts(), which would have turned it into an infinite loop.
This commit changes it to a "for" loop that is harder to screw up.
|
|
flow_get_vlan() duplicated the logic in input_vid_to_vlan() in an
unclear way and added some logic of its own to detect invalid
input VLANs. This commit eliminates the duplication and makes the
code easier to understand.
|
|
It's always been my intention that ports used as port mirroring
destinations (SPAN) be reserved solely for that purpose. When SPAN and
RSPAN are both configured, however, RSPAN mirror packets could get directed
to SPAN ports, which was unintentional.
This commit also updates the documentation to make it clear (if that is
even necessary) that this behavior is intentional.
Found by inspection.
|