aboutsummaryrefslogtreecommitdiff
path: root/lib/dpif-linux.c
AgeCommit message (Collapse)Author
2011-01-31dpif-linux: Always pass an actions attribute in dpif_flow_put().Ben Pfaff
The kernel expects that ODP_FLOW_NEW always has an ODP_FLOW_ATTR_ACTIONS attribute, even though that attribute may be empty to drop all of the packets in the flow. Similarly, ODP_FLOW_SET as used by dpif_linux_flow_put() should always have such an attribute, since it is used by OVS to update the flow's actions. So make it possible for dpif_linux_flow_to_ofpbuf() to pass an empty actions attribute, and make dpif_linux_flow_put() always force that behavior if the actions_len passed to it is 0. This fixes EINVAL error creating flows to drop packets. Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-31dpif-linux: Read flow used time.Jesse Gross
We were never storing the flow used time from the Netlink message into our local struct, which caused flows to timeout prematurely. Acked-by: Ben Pfaff <blp@nicira.com>
2011-01-29dpif-linux: Add missing NLM_F_ECHO flag to flow requests.Jesse Gross
Flow transactions expect a response after the operation has completed but the request did not have NLM_F_ECHO set. This caused userspace to receive only the Netlink ACK instead of a real response, making it appear that the operation had failed when it actually succeeded.
2011-01-29dpif-linux: Remove extraneous name variable.Ethan Jackson
Fixes a "used uninitialized" warning.
2011-01-28dpif: Remove dpif_get_all_names().Ben Pfaff
None of the remaining dpif implementations have more than one name per dpif, so there's no need for this function anymore. Suggested-by: Jesse Gross <jesse@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28datapath: Change dp_idx to dp_ifindex, the ifindex of the local port.Ben Pfaff
I can't see any real value in maintaining a dp_idx separate from the ifindex of the local port. With the current implementation it also artificially limits the number of datapaths. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28dpif-linux: Replace 'minor' by 'dp_idx'.Ben Pfaff
The dp_idx used to be the character device minor number, but there's no character device anymore, so rename for clarity. Reviewed by Justin Pettit.
2011-01-28datapath: Convert ODP_FLOW_* commands to use AF_NETLINK socket layer.Ben Pfaff
This completes the transition to the Generic Netlink interface, and so this commit restores support for Linux 2.6.18 and later. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28datapath: Convert ODP_VPORT_* to use AF_NETLINK socket layer.Ben Pfaff
This commit calls genl_lock() and thus doesn't support Linux before 2.6.35, which wasn't exported before that version. That problem will be fixed once the whole userspace interface transitions to Generic Netlink a few commits from now. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28datapath: Convert ODP_DP_* commands to use AF_NETLINK socket layer.Ben Pfaff
This commit calls genl_lock() and thus doesn't support Linux before 2.6.35, which wasn't exported before that version. That problem will be fixed once the whole userspace interface transitions to Generic Netlink a few commits from now. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-28datapath: Convert upcalls and ODP_EXECUTE to use AF_NETLINK socket layer.Ben Pfaff
This commit calls genl_lock() and thus doesn't support Linux before 2.6.35, which wasn't exported before that version. That problem will be fixed once the whole userspace interface transitions to Generic Netlink a few commits from now. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27Eliminate ODPL_* from userspace-facing interface.Ben Pfaff
Reviewed by Justin Pettit.
2011-01-27datapath: Convert ODP_EXECUTE to use Netlink framing.Ben Pfaff
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Convert datapath operations to use Netlink framing.Ben Pfaff
Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Convert ODP_FLOW_* and ODP_EXECUTE to put dp_idx into message.Ben Pfaff
When the datapath moves to the Netlink protocol it won't have a minor number to use, so we have to put the dp_idx in the message. This also changes the kernel implementation of ODP_FLOW_FLUSH to do the datapath locking inside flush_flows() instead of inside openvswitch_ioctl() but doesn't change that command's userspace interface, which still passes a datapath number as the ioctl argument. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Eliminate 'flags' member from odp_flow.Ben Pfaff
Nothing was productively using the 'flags' member of odp_flow, so this commit removes it. ODPFF_ZERO_TCP_FLAGS isn't used at all (as of the previous commit). ODPFF_EOF has been replaced by a special case of the 'key_len' member. This will go away, too, once AF_NETLINK starts being used. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27dpif: Eliminate ODPPF_* constants from client-visible interface.Ben Pfaff
Following this commit, the ODPPF_* constants are only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.
2011-01-27dpif: Eliminate "struct odp_flow_stats" from client-visible interface.Ben Pfaff
Following this commit, "struct odp_flow_stats" is only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.
2011-01-27dpif: Eliminate "struct odp_flow" from client-visible interface.Ben Pfaff
Following this commit, "struct odp_flow" and related data structures are only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.
2011-01-27datapath: Change ODP_FLOW_GET to retrieve only a single flow at a time.Ben Pfaff
This brings the code closer to what the Netlink interface will need to implement. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Drop port information from odp_stats.Ben Pfaff
As with n_flows, n_ports was used regularly by userspace to determine how much memory to allocate when listing ports, but it is no longer needed for that. max_ports, on the other hand, is necessary but it is also a fixed value for the kernel datapath right now and if we expand it we can also come up with a way to report the expanded value. The remaining members of odp_stats are actually real statistics that I intend to keep. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Drop queue information from odp_stats.Ben Pfaff
This queue information will be available through the kernel socket layer once we move over to Netlink socket as transports, so we might as well get rid of the redundancy. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Change userspace vport interface to use Netlink attributes.Ben Pfaff
One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to add new features to the kernel vport layer without changing userspace software. The customary way to do this in the Linux networking stack is to use Netlink and in particular Netlink attributes. This commit adopts that model for the vport layer. It does not yet actually start using the Netlink socket layer, which will come later. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Change vport type from string to integer enumeration.Ben Pfaff
I plan to make the vport type part of the standard header stuck on each Netlink message related to a vport. As such, it is more convenient to use an integer than a string. In addition, by being fundamentally different from strings, using an integer may reduce the confusion we've had in the past over the differences in userspace and kernel names for network device and vport types. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27dpif: Eliminate "struct odp_port" from client-visible interface.Ben Pfaff
Following this commit, "struct odp_port" is only used in Linux-specific parts of OVS userspace code. This allows the actual Linux datapath interface to evolve more freely. Reviewed by Justin Pettit.
2011-01-27datapath: Drop datapath index and port number from Ethtool output.Ben Pfaff
I introduced this a long time ago as an efficient way for userspace to find out whether and where an internal device was attached, but I've always considered it an ugly kluge. Now that ODP_VPORT_QUERY can fetch a vport's info regardless of datapath, it is no longer necessary. This commit stops using Ethtool for this purpose and drops the feature. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Make it possible to query vports by name regardless of datapath.Ben Pfaff
Until now it has only been possible to query a vport if you know what datapath it is on. This doesn't really make sense, so this commit removes that restriction. It is a little bigger than one might naturally expect because locking changes are required. This also allows us to get rid of the ETHTOOL_GDRVINFO kluge that has bothered me for a long time. The next commit does that. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Change listing ports to use an iterator concept.Ben Pfaff
One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to add new features to the kernel vport layer without changing userspace software. In turn, that means that the odp_port structure must become variable-length. This does not, however, fit in well with the ODP_PORT_LIST ioctl in its current form, because that would require userspace to know how much space to allocate for each port in advance, or to allocate as much space as could possibly be needed. Neither choice is very attractive. This commit prepares for a different solution, by replacing ODP_PORT_LIST by a new ioctl ODP_VPORT_DUMP that retrieves information about a single vport from the datapath on each call. It is much cleaner to allocate the maximum amount of space for a single vport than to do so for possibly a large number of vports. It would be faster to retrieve a number of vports in batch instead of just one at a time, but that will naturally happen later when the kernel datapath interface is changed to use Netlink, so this patch does not bother with it. The Netlink version won't need to take the starting port number from userspace, since Netlink sockets can keep track of that state as part of their "dump" feature. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Report kernel's flow key when passing packets up to userspace.Ben Pfaff
One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to change the kernel's idea of the flow key separately from the userspace version. This commit takes one step in that direction by making the kernel report its idea of the flow that a packet belongs to whenever it passes a packet up to userspace. This means that userspace can intelligently figure out what to do: - If userspace's notion of the flow for the packet matches the kernel's, then nothing special is necessary. - If the kernel has a more specific notion for the flow than userspace, for example if the kernel decoded IPv6 headers but userspace stopped at the Ethernet type (because it does not understand IPv6), then again nothing special is necessary: userspace can still set up the flow in the usual way. - If userspace has a more specific notion for the flow than the kernel, for example if userspace decoded an IPv6 header but the kernel stopped at the Ethernet type, then userspace can forward the packet manually, without setting up a flow in the kernel. (This case is bad from a performance point of view, but at least it is correct.) This commit does not actually make userspace flexible enough to handle changes in the kernel flow key structure, although userspace does now have enough information to do that intelligently. This will have to wait for later commits. This commit is bigger than it would otherwise be because it is rolled together with changing "struct odp_msg" to a sequence of Netlink attributes. The alternative, to do each of those changes in a separate patch, seemed like overkill because it meant that either we would have to introduce and then kill off Netlink attributes for in_port and tun_id, if Netlink conversion went first, or shove yet another variable-length header into the stuff already after odp_msg, if adding the flow key to odp_msg went first. This commit will slow down performance of checksumming packets sent up to userspace. I'm not entirely pleased with how I did it. I considered a couple of alternatives, but none of them seemed that much better. Suggestions welcome. Not changing anything wasn't an option, unfortunately. At any rate some slowdown will become unavoidable when OVS actually starts using Netlink instead of just Netlink framing. (Actually, I thought of one option where we could avoid that: make userspace do the checksum instead, by passing csum_start and csum_offset as part of what goes to userspace. But that's not perfect either.) Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-27datapath: Change listing flows to use an iterator concept.Ben Pfaff
One of the goals for Open vSwitch is to decouple kernel and userspace software, so that either one can be upgraded or rolled back independent of the other. To do this in full generality, it must be possible to change the kernel's idea of the flow key separately from the userspace version. In turn, that means that flow keys must become variable-length. This does not, however, fit in well with the ODP_FLOW_LIST ioctl in its current form, because that would require userspace to know how much space to allocate for each flow's key in advance, or to allocate as much space as could possibly be needed. Neither choice is very attractive. This commit prepares for a different solution, by replacing ODP_FLOW_LIST by a new ioctl ODP_FLOW_DUMP that retrieves a single flow from the datapath on each call. It is much cleaner to allocate the maximum amount of space for a single flow key than to do so for possibly a very large number of flow keys. As a side effect, this patch also fixes a race condition that sometimes made "ovs-dpctl dump-flows" print an error: previously, flows were listed and then their actions were retrieved, which left a window in which ovs-vswitchd could delete the flow. Now dumping a flow and its actions is a single step, closing that window. Dumping all of the flows in a datapath is no longer an atomic step, so now it is possible to miss some flows or see a single flow twice during iteration, if the flow table is modified by another process. It doesn't look like this should be a problem for ovs-vswitchd. It would be faster to retrieve a number of flows in batch instead of just one at a time, but that will naturally happen later when the kernel datapath interface is changed to use Netlink, so this patch does not bother with it. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2011-01-04rtnetlink: Remove LINK specific messages from rtnetlinkEthan Jackson
Abstracted rtnetlink so that it may be used for messages other than RTM LINK messages. Created a new rtnetlink-link module which specifically deals with these kinds of messages and follows the old rtnetlink API.
2010-12-28vswitch: Use "ipsec_gre" vport instead of "gre" with "other_config"Justin Pettit
Previously, a GRE-over-IPsec tunnel was created as an interface with a "type" of "gre" and the "other_config" column with "ipsec_cert" or "ipsec_psk" set. This could lead to a potential security problem if a user intended to create a GRE-over-IPsec tunnel, but misconfigured the "ipsec_*" config and created an unencrypted GRE tunnel. This commit defines an "ipsec_gre" tunnel type, which should prevent users from inadvertently establishing insecure tunnels.
2010-12-13vswitchd: Consistently use size_t for action lengths.Jesse Gross
Currently the type of the datapath action length is mixture of size_t and unsigned int. However, size_t is really defined as an unsigned long, which causes the build to fail on 64-bit platforms. This consistently uses size_t.
2010-12-10datapath: Replace "struct odp_action" by Netlink attributes.Ben Pfaff
In the medium term, we plan to migrate the datapath to use Netlink as its communication channel. In the short term, we need to be able to have actions with 64-bit arguments but "struct odp_action" only has room for 48 bits. So this patch shifts to variable-length arguments using Netlink attributes, which starts in on the Netlink transition and makes 64-bit arguments possible at the same time. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2010-12-03datapath: Make adding and attaching a vport a single step.Ben Pfaff
For some time now, Open vSwitch datapaths have internally made a distinction between adding a vport and attaching it to a datapath. Adding a vport just means to create it, as an entity detached from any datapath. Attaching it gives it a port number and a datapath. Similarly, a vport could be detached and deleted separately. After some study, I think I understand why this distinction exists. It is because ovs-vswitchd tries to open all the datapath ports before it tries to create them. However, changing it to create them before it tries to open them is not difficult, so this commit does this. The bulk of this commit, however, changes the datapath interface to one that always creates a vport and attaches it to a datapath in a single step, and similarly detaches a vport and deletes it in a single step. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
2010-11-18dpif: Make dpif_class 'open' function take class instead of type name.Ben Pfaff
This makes it easier for dpif_provider implementations to share code but distinguish the class actually in use, because comparing a pointer is easier than comparing a string.
2010-10-29vlog: Make client supply semicolon for VLOG_DEFINE_THIS_MODULE.Ben Pfaff
It's kind of odd for VLOG_DEFINE_THIS_MODULE to supply its own semicolon, so this commit switches to the more common form.
2010-10-11datapath: Remove implementation of port groups.Ben Pfaff
The "port group" concept seems like a good one, but it has not been used very much in userspace so far, so before we commit ourselves to a frozen API that we must maintain forever, remove it. We can always add it back in later as a new kind of vport. Signed-off-by: Ben Pfaff <blp@nicira.com>
2010-09-23shash: New function shash_steal().Ben Pfaff
2010-09-23vlog: Add VLOG_WARN_ONCE() and similar macros.Ben Pfaff
2010-09-01ofpbuf: Add ofpbuf_new_with_headroom(), ofpbuf_clone_with_headroom().Ben Pfaff
These new functions simplify an increasingly common usage pattern. Suggested-by: Jesse Gross <jesse@nicira.com>
2010-07-21vlog: Introduce VLOG_DEFINE_THIS_MODULE for declaring vlog module in use.Ben Pfaff
Adding a macro to define the vlog module in use adds a level of indirection, which makes it easier to change how the vlog module must be defined. A followup commit needs to do that, so getting these widespread changes out of the way first should make that commit easier to review.
2010-07-20netdev-linux: Avoid minor number 0 in traffic control.Ben Pfaff
Linux traffic control handles with minor number 0 refer to qdiscs, not to classes. This commit deals with this by using a conversion function: OpenFlow queue 0 maps to minor 1, queue 1 to minor 2, and so on.
2010-07-20dpif-linux: Translate queues to priorities correctly.Ben Pfaff
The TC_H_MAKE macro does not shift the major number into position.
2010-07-20dpif: Abstract translation from OpenFlow queue ID into ODP priority value.Ben Pfaff
When the QoS code was integrated, I didn't yet know how to abstract the translation from a queue ID in an OpenFlow OFPAT_ENQUEUE action into a priority value for an ODP ODPAT_SET_PRIORITY action. This commit is a first attempt that works OK for Linux, so far. It's possible that in fact this translation needs the 'netdev' as an argument too, but it's not needed yet.
2010-05-26datapath: Make datapath-protocol.h portable to non-Linux systems.Ben Pfaff
datapath-protocol.h is not a very clean interface. I originally intended it to be solely a Linux-kernel specific interface. Over time it became a general-purpose interface to dpifs. This is not a good situation, because clearly the header is still Linux-specific. In the long run, the correct solution is to separate the generic and Linux-specific bits. This is not that patch. Instead, this patch modifies datapath-protocol.h enough that it can be used on non-Linux hosts. In particular I tested that it works OK with FreeBSD 8.0.
2010-05-20dpif: Include stat.h headerJustin Pettit
2010-05-05dpif-linux: Use hash instead of sorted array.Ben Pfaff
With 1000 network devices being added or removed, sorting the array was a profiling hot spot. Using a hash makes it drop off the profile.
2010-04-27ofproto: Avoid buffer copy in OFPT_PACKET_IN path.Ben Pfaff
When a dpif passes an odp_msg down to ofproto, and ofproto transforms it into an ofp_packet_in to send to the controller, until now this always involved a full copy of the packet inside ofproto. This commit eliminates this copy by ensuring that there is always enough headroom in the ofpbuf that holds the odp_msg to replace it by an ofp_packet_in in-place. From Jean Tourrilhes <jt@hpl.hp.com>, with some revisions.
2010-04-19dpif-linux: Clean up vports that are no longer in config.Jesse Gross
If the config changes while ovs-vswitchd is not running it is possible that there could be some vports which are no longer needed but won't be destroyed when closed because they aren't open. This deletes unneeded vports at the same time that we clean up unneeded datapaths.