aboutsummaryrefslogtreecommitdiff
path: root/ofproto/ofproto-dpif.c
AgeCommit message (Collapse)Author
2012-05-14ofproto: Treat a packet-out in_port of OFPP_CONTROLLER as OFPP_NONE.Ben Pfaff
Some OpenFlow 1.0 controllers incorrectly use OPFP_CONTROLLER as the in_port in packet-out messages, when OFPP_NONE is their intent. Until now, Open vSwitch has rejected such requests with an error message. This commit makes Open vSwitch instead treat OFPP_CONTROLLER the same as OFPP_NONE for compatibility with those controllers. (Also, as of this writing, OpenFlow 1.0.1 appears to be changing the port to use from OFPP_NONE to OFPP_CONTROLLER.) Suggested-by: Rob Sherwood <rob.sherwood@bigswitch.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-03-21ofproto-dpif: Fix tag caching for learned flows.Ben Pfaff
This code in xlate_table_action() is supposed to tag flows in tables that have special forms so that changes do not require revalidating every flow. When rule->tag is nonzero, its value can be used, because we know in this case that rule->cr.wc is the same as table->other_table->wc and that thus rule->tag caches the return value of the rule_calculate_tag() expression. When rule->tag is zero (a "catchall" rule) we need to calculate the tag manually because we have no way to cache it in that case. I discovered this bug by running an "hping3" between a couple of VMs plus the following commands on OVS in the middle: ovs-ofctl del-flows br0 ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, \ idle_timeout=600, NXM_OF_VLAN_TCI[0..11], \ NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \ output:NXM_OF_IN_PORT[], fin_idle_timeout=10), resubmit(,1)" ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood" Without this patch, flows don't get properly invalidated upon initial MAC learning, so one sees warnings like the following: in_port(2),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07), eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0, ttl=64,frag=no),tcp(src=13966,dst=0): inconsistency in subfacet (actions were: 3,0,1) (correct actions: 1) This patch fixes the problem and thus avoids these warnings. Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-03-21ofproto-dpif: Avoid segfault deleting facets that execute LEARN actions.Ben Pfaff
"ovs-ofctl del-flows <bridge>" can result in the following call path: delete_flows_loose() in ofproto.c -> collect_rules_loose() -- uses 'ofproto_node' inside 'struct rule' -> rule_destruct() in ofproto-dpif.c -> facet_revalidate() -> facet_remove() -> facet_flush_stats() -> facet_account() -> xlate_actions() -> xlate_learn_action() -> ofproto_flow_mod() back in ofproto.c -> modify_flow_strict() -> collect_rules_strict() -- also uses 'ofproto_node' which goes "boom" when we fall back up the call chain because the nested use of ofproto_node steps on the outer use of ofproto_node. This commit fixes the problem by refusing to translate "learn" actions within facet_flush_stats(), breaking the doubled use. Another possible approach would be to switch to another way to keep track of rules in the flow_mod implementations, so that there'd be no fighting over 'ofproto_node'. But then "ovs-ofctl del-flows" might still leave some flows around (ones created by "learn" actions as flows are accounted as facets get deleted), which would be surprising behavior. And it seems in general a bad idea to allow recursive flow_mods; the consequences have not been carefully thought through. Before this commit, one can reproduce the problem by running an "hping3" between a couple of VMs plus the following commands on OVS in the middle. Sometimes you have to run them a few times: ovs-ofctl del-flows br0 ovs-ofctl add-flow br0 "table=0 actions=learn(table=1, \ idle_timeout=600, NXM_OF_VLAN_TCI[0..11], \ NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], \ output:NXM_OF_IN_PORT[], fin_idle_timeout=10), resubmit(,1)" ovs-ofctl add-flow br0 "table=1 priority=0 actions=flood" This commit has a side effect that leftover unaccounted packets no longer update the timeouts in MAC learning actions in some cases, when the facets that cause updates are deleted. At most one second of updates should be lost. Bug #10184. Reported-by: Michael Mao <mmao@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-03-21ofproto-dpif: Fix return type of rule_calculate_tag().Ben Pfaff
tag_type is currently uint32_t but using uint32_t directly is conceptually wrong. Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-02-19ofproto-dpif: Cleanup STP on ports when disabled on their bridge.Ethan Jackson
When STP is enabled on a bridge, the STP module puts its ports in an STP_LISTENING state until STP converges. This causes all traffic destined for these ports to be dropped. If STP is disabled on the bridge, but not explicitly disabled on its ports, the bridge fails to remove the STP state from these ports. Therefore, if a port is in an STP_LISTENING state, it will remain in that state and continue to drop all traffic indefinitely. This patch fixes the issue. Signed-off-by: Ethan Jackson <ethan@nicira.com> Bug #9157.
2012-02-08ofproto-dpif: Don't output to in_port even if in_port is OFPP_LOCAL.Aaron Rosen
Signed-off-by: Aaron Rosen <arosen@clemson.edu> [Ben Pfaff added the test.] Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-02-01vswitchd: Make the MAC entry aging time configurable.Ben Pfaff
NICS-11. Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-24ofproto: Optionally flush all learning tables with appctl.Ethan Jackson
Signed-off-by: Ethan Jackson <ethan@nicira.com>
2012-01-23ofproto-dpif: Revalidate flows after "fdb/flush".Ben Pfaff
Otherwise bad translations can stick around. Bug #9253. Reported-by: Paul Ingram <paul@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2012-01-06ofproto-dpif: Keep subfacets longer to avoid assert-fail in facet_account().Ben Pfaff
If a subfacet expired when its facet still had statistics that had not yet been pushed into the rule, and the facet either used the "normal" action or the bridge contained a bond port, then facet_account() would be called after the last subfacet was removed from its facet's list of subfacets, triggering an assertion failure in list_front(). This fixes the problem by always running facet_flush_stats() (which calls facet_account()) before deleting the last subfacet from a facet. This problem took a while to surface because subfacets usually expire only long after their statistics have been pushed into the rule. Signed-off-by: Ben Pfaff <blp@nicira.com> Reported-by: Mike Kruze <mkruze@nicira.com> Bug #9074.
2012-01-04ofproto-dpif: Fake-up OFPP_NONE input bundle for mirroring and normal.Justin Pettit
Both mirroring and "normal" processing make use of the input bundle to perform various sanity checks. Controller-generated traffic typically uses an ingress port of OFPP_NONE, which doesn't have a corresponding input bundle. This commit fakes one up well enough that mirroring and "normal" processing succeed. We looked at creating an actual bundle based on the "real" OFPP_NONE. This was even uglier, since there were even more special-cases that needed to be handled, including having to hide it from port queries. Reported-by: Jesse Gross <jesse@nicira.com> Signed-off-by: Justin Pettit <jpettit@nicira.com>
2012-01-04ofproto-dpif: Have lookup_input_bundle() return a bundle.Justin Pettit
Previously, the function returned an "ofport_dpif", but it makes more sense to return an "ofbundle". This will also be helpful in a forthcoming commit. Signed-off-by: Justin Pettit <jpettit@nicira.com>
2011-12-27ofproto-dpif: Fix bug in VLAN splinters.Ben Pfaff
Bug #8671. Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-12-22ofproto-dpif: Flush MACs for deleted ports from every bridge.Ben Pfaff
Consider this scenario: two hypervisors HV-1 and HV-2, connected to a common physical network over SLB bonds. Two virtual machines VM-1 and VM-2 are both running on HV-1. Patch ports are in use, so that each VM is not connected to a bridge with a physical Ethernet port but is actually one virtual "hop" away across a patch port to a second OVS bridge. VM-2 is running a "ping" process directed at VM-1. Now migrate VM-1 to HV-2. Suppose that VM-1 fails to send out a gratuitous ARP following migration, or that the gratuitous ARPs are lost, e.g. because they are sent before the OpenFlow controller manages to populate the flow table with rules to allow the VM's traffic Now we are in a situation where HV-1 has learned that VM-1 is local and HV-2 has learned that VM-1 is on its bond; both are wrong. One would expect the problem to resolve itself as soon the VM-1 sends out its first packet. However, SLB bonds (for important reasons documented in vswitchd/INTERNALS) are very reluctant to learn that a currently local MAC is actually on the bond: the only ways to learn that the MAC is on the bond are to receive a gratuitous ARP (which we won't, since they were dropped) or for the MAC learning entry to expire after 60 seconds. This means that VM-1 can send out as much ordinary traffic as it wants (even ARP requests and other broadcasts) but HV-1 will drop all of it at the physical Ethernet since it believes that VM-1 is local. (In an ordinary setup with a single bridge, HV-1 would have unlearned the address for VM-1 when VM-1's port was deleted, but that didn't happen because HV-1 only learned that VM-1 was on the patch port that leads to the integration bridge. The patch port didn't get deleted.) HV-2 does quickly learn that VM-1 is now local. SLB bonds are only reluctant to learn that something they think is local is actually on the bond, not the reverse. This commit attempts to work around the problem by flushing the MAC associated with a port from *every* bridge when a port is deleted. This commit demonstrates yet another good reason not to use SLB bonds. Build and unit tested only. Bug #7978. Bug #7687. Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-12-19ofproto-dpif: Include datapath flow misses in flow statistics.Ben Pfaff
Commit 501f8d1fd75 (ofproto-dpif: Batch interacting with the dpif on flow miss operations.) caused packets handled manually in userspace not to be counted in flow statistics. This patch fixes the problem. Signed-off-by: Ben Pfaff <blp@nicira.com> Bug #8494.
2011-12-16ofproto-dpif: Fix use-after-free for OFPP_CONTROLLER flows.Ben Pfaff
When a flow consists solely of an output to OFPP_CONTROLLER, we avoid a round trip to the kernel and back by calling execute_controller_action() from handle_flow_miss(). However, execute_controller_action() frees the packet passed in. This is dangerous, because the packet and the upcall key are in the same block of malloc()'d memory, as the comment on struct dpif_upcall says: /* A packet passed up from the datapath to userspace. * * If 'key' or 'actions' is nonnull, then it points into data owned by * 'packet', so their memory cannot be freed separately. (This is hardly a * great way to do things but it works out OK for the dpif providers and * clients that exist so far.) */ Thus, we get a use-after-free later on in handle_flow_miss() and eventually a double free. This fixes the problem by making execute_controller_action() clone the packet in this case. Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-12-13ofproto-dpif: Avoid segfault for ports with bundles in add_mirror_actions().Ben Pfaff
Not every port has an associated bundle, so we must not unconditionally dereference ofport->bundle without first checking that it is nonnull. (One example of a port without a bundle is a VLAN splinter port.) Bug #8671. Reported-by: Michael Mao <mmao@nicira.com> Signed-off-by: Ben Pfaff <blp@nicira.com>
2011-11-30ofproto-dpif: Delete un-fit flow from datapath.Pravin B Shelar
2011-11-30ofproto-dpif: Fix memory leak.Pravin B Shelar
Following patch fixes memory leak in case there is ODP_FIT_ERROR on flow key.
2011-11-28odp-util: Move commit_odp_actions() from ofproto-dpif.Ethan Jackson
In an effort to simplify ofproto-dpif, this commit moves the definition of commit_odp_actions() to odp-util.
2011-11-28ofproto-dpif: Remove redundant commit_odp_actions() call.Ethan Jackson
2011-11-28ofproto: Add "fast path".Ben Pfaff
The key to getting good performance on the netperf CRR test seems to be to handle the first packet of each new flow as quickly as possible. Until now, we've only had one opportunity to do that on each trip through the main poll loop. One way to improve would be to make that poll loop circulate more quickly. My experiments show, however, that even just commenting out the slower parts of the poll loop yield minimal improvement. This commit takes another approach. Instead of making the poll loop overall faster, it invokes the performance-critical parts of it more than once during each poll loop. My measurements show that this commit improves netperf CRR performance by 24% versus the previous commit, for an overall improvement of 87% versus the baseline just before the commit that removed the poll_fd_woke(). With this commit, ovs-benchmark performance has also improved by 13% overall since that baseline.
2011-11-28ofproto-dpif: Process multiple batches of upcalls in a single poll loop.Ben Pfaff
This yields a 27% improvement in netperf CRR results in my tests versus the previous commit, which is a 52% improvement versus the baseline from just before the poll_fd_woke() optimization was removed.
2011-11-24mirroring: Don't require the "normal" action to perform mirroring.Justin Pettit
Previously, mirrors only worked when using the "normal" action. This commit performs mirroring even when mirroring is not used. It also adds some unit tests.
2011-11-23ovs-vswitchd: Track packet and byte statistics sent on mirrors.Justin Pettit
This commit adds support for tracking the number of packets and bytes sent through a mirror. The numbers are kept in the new "statistics" column on the mirror table in the "tx_packets" and "tx_bytes" keys.
2011-11-23Implement new "VLAN splinters" feature.Ben Pfaff
The "VLAN splinters" feature works around buggy device drivers in old Linux versions. This feature is deprecated. When broken device drivers are no longer in widespread use, we will delete this feature. I tested earlier versions of this commit, but I have not tested this version. See ovs-vswitchd.conf.db(5) for more information.
2011-11-23ofproto-dpif: Separately track the initial VLAN TCI of arriving packets.Ben Pfaff
In an upcoming commit, VLAN splinters can cause the VLAN TCI of a packet received on an interface to differ from the logical VLAN TCI. That is, a packet that is received on a Linux VLAN network device has no VLAN (so its initial VLAN TCI is 0) but we logically treat it as if it has the VLAN associated with the VLAN device. This is only desirable for use with VLAN splinters and should be reverted when this feature is no longer needed. I'm breaking it out here only to make the series easier to review.
2011-11-23ofproto-dpif: Move ODP actions from facets to subfacets.Ben Pfaff
This is a prerequisite for the upcoming VLAN splinter patch, because splinters and non-splintered subfacets might need slightly different actions due to the VLAN tag being initially different (present vs. absent). This is only desirable for use with VLAN splinters and should be reverted when this feature is no longer needed. I'm breaking it out here only to make the series easier to review.
2011-11-23ofproto-dpif: Simplify invocation of send_packet().Ben Pfaff
All the callers already have the ofport handy, so they might as well just pass it in directly.
2011-11-23ofproto-dpif: Support differing user/kernel packet parsing support.Ben Pfaff
Feature #4886.
2011-11-23ofproto-dpif: Factor NetFlow active timeouts out of flow expiration.Ben Pfaff
NetFlow active timeouts were only mixed in with flow expiration for convenience: both processes need to iterate all the facets. But an upcoming commit will change flow expiration to work in terms of a new "subfacet" entity, so they will no longer fit together well. This change could be seen as an optimization, since NetFlow active timeouts don't ordinarily have to run as often as flow expiration, especially when the flow expiration rate is stepped up due to a large volume of flows.
2011-11-23ofproto: Add "const" to ->rule_execute's "flow" parameter.Ben Pfaff
2011-11-23vswitch: Implement dscp column of the Queue table.Ethan Jackson
The dscp column of the queue table instructs Open vSwitch to mark all traffic egressing the queue with the given DSCP bits in its tos field. Bug #7046.
2011-11-22ofproto-dpif: Simplify commit logic.Ethan Jackson
Before executing an output action, ofproto-dpif must commit the changes it's made to the flow so they are reflected in the packet. This code has been unnecessarily complex. This patch attempts to simplify the code in the following ways. - Commit in fewer places. In an attempt to provide some optimization, the ofproto-dpif code separated the commit and output composition steps so things like flood actions could avoid redundant commits. This is a case of premature optimization that makes the code significantly more difficult to reason about. With this patch, commits happen only when really necessary. - Only perform full commits. In an attempt to provide some optimization, the ofproto-dpif code would allow callers to only commit the part of the flow that they had modified by directly calling the relevant subroutine. This practice made the code difficult to reason about and is thus discontinued. - Perform all output logic in one function. All of the logic surrounding the datapath output action has been placed in the compose_output_action__() function. Most callers will use the compose_output_action() function which simply passes reasonable defaults through to compose_output_action__().
2011-11-22ofproto-dpif: Properly update tos and ttl fields.Ethan Jackson
ofproto-dpif failed to update the base flow's tos and ttl fields when preparing for an output action. This could cause redundant updates of those fields in the datapath. A future patch adds a test which could have caught the issue for the tos bits.
2011-11-18ofproto-dpif: Simplify output action composition.Ethan Jackson
Before this patch, the logic for outputting to a port was scattered all around ofproto-dpif. This patch simplifies the code by forcing it to use one code path to check if a port is forwarding, and output if appropriate. Future patches will rely on this simplification to implement new features.
2011-11-18ofproto-dpif: Improperly handled OFPP_ALL action.Ethan Jackson
According to the OpenFlow specification, the OFPP_ALL output to every port except the input port regardless of whether or not they are "blocked or link down". This patch implements this logic, and marginally simplifies the interface of flood_packets().
2011-11-18ofproto-dpif: Enqueue incorrectly calls add_output_action().Ethan Jackson
The add_output_action() function takes an OpenFlow port number, but the enqueue action passes it a datapath port number.
2011-11-18ofproto-dpif: Consistently set NetFlow Output Interface.Ethan Jackson
Some parts of the code set the NetFlow Output Interface to the OpenFlow port. Other set it to the datapath port. This patch consistently sets it to the OpenFlow port.
2011-11-18ofproto-dpif: Remove trailing whitespace.Ethan Jackson
2011-11-17Implement a new port setting "other-config:priority-tags".Ben Pfaff
Linux hosts (and probably others) tend to ignore priority-tagged frames, so this new setting allows Open vSwitch to suppress sending them. Reported-by: Michael Mao <mmao@nicira.com> Bug #8320.
2011-11-17ofproto: Remove dead variable.Ethan Jackson
Fixes the following gcc warning: "error: variable ‘flow_vid’ set but not used [-Werror=unused-but-set-variable]"
2011-11-17ofproto-dpif: Fix segfault in mirror_update_dups().Justin Pettit
Fixes crash introduced in 9ba15e (ofproto-dpif: Improve RSPAN translation performance from O(n**2) to O(n).) The code always dereferenced the members of the "mirrors" array in ofproto even if they were null.
2011-11-17ofproto-dpif: Get rid of "struct dst".Ben Pfaff
struct dst is an intermediate form for OFPP_NORMAL translation that has existed since the beginning of Open vSwitch development. It has always been a bit ugly, since ideally we wouldn't need any intermediate form at all. This commit eliminates it, which speeds up OFPP_NORMAL translation. struct dst was used earlier to eliminate duplicate packets in OFPP_NORMAL output. Now, we have rules that eliminate duplicate packets in a cheaper way. OFPP_NORMAL outputs packets for two different reasons, forwarding and mirroring, so those are the two possible sources of packet duplication. Forwarding by itself never outputs two copies of a packet to a single port on a given VLAN, and this is also true of mirroring by itself since commit "ofproto-dpif: Improve RSPAN translation performance from O(n**2) to O(n)". The only remaining possibility, then, is that forwarding and mirroring between them duplicate an output packet. However, the algorithms are now designed to prevent this. Forwarding will never output to a mirroring destination output port. Forwarding will only output to a mirroring output VLAN if the packet arrived on a mirroring output VLAN, and in that case mirroring is disabled. This commit has the side effect of improving behavior for VLAN output. Previously, a packet that arrived with an 802.1Q tag with both VID and PCP of 0 would be output with a similar tag in some cases; now it is consistently stripped. This change is reflected in the unit test change. We really need some unit tests for mirroring. I have not tested mirroring behavior, only theorized about it (see above).
2011-11-17odp-util: Add support for named ports to odp_flow_key_from_string().Ben Pfaff
Really the "trace" command should support this but in fact I need it for an upcoming update to a test.
2011-11-17odp-util: New function factored out of put_userspace_action().Ben Pfaff
An upcoming patch to odp-util will add a new user, but this seems like a reasonable change in any case.
2011-11-17ofproto-dpif: Improve RSPAN translation performance from O(n**2) to O(n).Ben Pfaff
This code previously checked whether each individual mirror output was already in the set of destinations. This is O(n**2) in the number of ports in a bridge. The new code uses a smarter algorithm to eliminate duplicates, one that is O(n) in the number of ports in a bridge.
2011-11-17ofproto-dpif: Make compose_mirror_dsts() harder to screw up.Ben Pfaff
I came close to add a "continue;" inside the main "while" loop in compose_mirror_dsts(), which would have turned it into an infinite loop. This commit changes it to a "for" loop that is harder to screw up.
2011-11-17ofproto-dpif: Remove duplicate VLAN logic.Ben Pfaff
flow_get_vlan() duplicated the logic in input_vid_to_vlan() in an unclear way and added some logic of its own to detect invalid input VLANs. This commit eliminates the duplication and makes the code easier to understand.
2011-11-17ofproto-dpif: Do not output RSPAN packets to SPAN ports.Ben Pfaff
It's always been my intention that ports used as port mirroring destinations (SPAN) be reserved solely for that purpose. When SPAN and RSPAN are both configured, however, RSPAN mirror packets could get directed to SPAN ports, which was unintentional. This commit also updates the documentation to make it clear (if that is even necessary) that this behavior is intentional. Found by inspection.