aboutsummaryrefslogtreecommitdiff
path: root/datapath/README
diff options
context:
space:
mode:
authorBen Pfaff <blp@nicira.com>2011-11-14 15:56:43 -0800
committerBen Pfaff <blp@nicira.com>2011-11-14 16:52:51 -0800
commitfea393b1d6b2729a784b898dbdd48d30d42e3ff7 (patch)
tree71c124a9fdd2a4d7536dc16a3dc357713d7316d8 /datapath/README
parent34118caeded9c72aa259ab76b97148efda7bdc22 (diff)
datapath: Describe policy for extending flow key, implement needed changes.
When the datapath was converted to use Netlink attributes for describing flow keys, I had a vague idea of how it could be smoothly extensible, but I didn't actually implement extensibility or carefully think it through. This commit adds a document that describes how flow keys can be extended in a compatible fashion and adapts the existing interface to match what it says. This commit doesn't actually implement extensibility. I already have a separate patch series out for that. This patch series borrows from that one heavily, but the extensibility series will need to be reworked somewhat once this one is in. This commit is only lightly tested because I don't have a good test setup for VLANs. Signed-off-by: Ben Pfaff <blp@nicira.com> Acked-by: Jesse Gross <jesse@nicira.com>
Diffstat (limited to 'datapath/README')
-rw-r--r--datapath/README160
1 files changed, 160 insertions, 0 deletions
diff --git a/datapath/README b/datapath/README
new file mode 100644
index 00000000..2d8a141b
--- /dev/null
+++ b/datapath/README
@@ -0,0 +1,160 @@
+Open vSwitch datapath developer documentation
+=============================================
+
+The Open vSwitch kernel module allows flexible userspace control over
+flow-level packet processing on selected network devices. It can be
+used to implement a plain Ethernet switch, network device bonding,
+VLAN processing, network access control, flow-based network control,
+and so on.
+
+The kernel module implements multiple "datapaths" (analogous to
+bridges), each of which can have multiple "vports" (analogous to ports
+within a bridge). Each datapath also has associated with it a "flow
+table" that userspace populates with "flows" that map from keys based
+on packet headers and metadata to sets of actions. The most common
+action forwards the packet to another vport; other actions are also
+implemented.
+
+When a packet arrives on a vport, the kernel module processes it by
+extracting its flow key and looking it up in the flow table. If there
+is a matching flow, it executes the associated actions. If there is
+no match, it queues the packet to userspace for processing (as part of
+its processing, userspace will likely set up a flow to handle further
+packets of the same type entirely in-kernel).
+
+
+Flow key compatibility
+----------------------
+
+Network protocols evolve over time. New protocols become important
+and existing protocols lose their prominence. For the Open vSwitch
+kernel module to remain relevant, it must be possible for newer
+versions to parse additional protocols as part of the flow key. It
+might even be desirable, someday, to drop support for parsing
+protocols that have become obsolete. Therefore, the Netlink interface
+to Open vSwitch is designed to allow carefully written userspace
+applications to work with any version of the flow key, past or future.
+
+To support this forward and backward compatibility, whenever the
+kernel module passes a packet to userspace, it also passes along the
+flow key that it parsed from the packet. Userspace then extracts its
+own notion of a flow key from the packet and compares it against the
+kernel-provided version:
+
+ - If userspace's notion of the flow key for the packet matches the
+ kernel's, then nothing special is necessary.
+
+ - If the kernel's flow key includes more fields than the userspace
+ version of the flow key, for example if the kernel decoded IPv6
+ headers but userspace stopped at the Ethernet type (because it
+ does not understand IPv6), then again nothing special is
+ necessary. Userspace can still set up a flow in the usual way,
+ as long as it uses the kernel-provided flow key to do it.
+
+ - If the userspace flow key includes more fields than the
+ kernel's, for example if userspace decoded an IPv6 header but
+ the kernel stopped at the Ethernet type, then userspace can
+ forward the packet manually, without setting up a flow in the
+ kernel. This case is bad for performance because every packet
+ that the kernel considers part of the flow must go to userspace,
+ but the forwarding behavior is correct. (If userspace can
+ determine that the values of the extra fields would not affect
+ forwarding behavior, then it could set up a flow anyway.)
+
+How flow keys evolve over time is important to making this work, so
+the following sections go into detail.
+
+
+Flow key format
+---------------
+
+A flow key is passed over a Netlink socket as a sequence of Netlink
+attributes. Some attributes represent packet metadata, defined as any
+information about a packet that cannot be extracted from the packet
+itself, e.g. the vport on which the packet was received. Most
+attributes, however, are extracted from headers within the packet,
+e.g. source and destination addresses from Ethernet, IP, or TCP
+headers.
+
+The <linux/openvswitch.h> header file defines the exact format of the
+flow key attributes. For informal explanatory purposes here, we write
+them as comma-separated strings, with parentheses indicating arguments
+and nesting. For example, the following could represent a flow key
+corresponding to a TCP packet that arrived on vport 1:
+
+ in_port(1), eth(src=e0:91:f5:21:d0:b2, dst=00:02:e3:0f:80:a4),
+ eth_type(0x0800), ipv4(src=172.16.0.20, dst=172.18.0.52, proto=17, tos=0,
+ frag=no), tcp(src=49163, dst=80)
+
+Often we ellipsize arguments not important to the discussion, e.g.:
+
+ in_port(1), eth(...), eth_type(0x0800), ipv4(...), tcp(...)
+
+
+Basic rule for evolving flow keys
+---------------------------------
+
+Some care is needed to really maintain forward and backward
+compatibility for applications that follow the rules listed under
+"Flow key compatibility" above.
+
+The basic rule is obvious:
+
+ ------------------------------------------------------------------
+ New network protocol support must only supplement existing flow
+ key attributes. It must not change the meaning of already defined
+ flow key attributes.
+ ------------------------------------------------------------------
+
+This rule does have less-obvious consequences so it is worth working
+through a few examples. Suppose, for example, that the kernel module
+did not already implement VLAN parsing. Instead, it just interpreted
+the 802.1Q TPID (0x8100) as the Ethertype then stopped parsing the
+packet. The flow key for any packet with an 802.1Q header would look
+essentially like this, ignoring metadata:
+
+ eth(...), eth_type(0x8100)
+
+Naively, to add VLAN support, it makes sense to add a new "vlan" flow
+key attribute to contain the VLAN tag, then continue to decode the
+encapsulated headers beyond the VLAN tag using the existing field
+definitions. With this change, an TCP packet in VLAN 10 would have a
+flow key much like this:
+
+ eth(...), vlan(vid=10, pcp=0), eth_type(0x0800), ip(proto=6, ...), tcp(...)
+
+But this change would negatively affect a userspace application that
+has not been updated to understand the new "vlan" flow key attribute.
+The application could, following the flow compatibility rules above,
+ignore the "vlan" attribute that it does not understand and therefore
+assume that the flow contained IP packets. This is a bad assumption
+(the flow only contains IP packets if one parses and skips over the
+802.1Q header) and it could cause the application's behavior to change
+across kernel versions even though it follows the compatibility rules.
+
+The solution is to use a set of nested attributes. This is, for
+example, why 802.1Q support uses nested attributes. A TCP packet in
+VLAN 10 is actually expressed as:
+
+ eth(...), eth_type(0x8100), vlan(vid=10, pcp=0), encap(eth_type(0x0800),
+ ip(proto=6, ...), tcp(...)))
+
+Notice how the "eth_type", "ip", and "tcp" flow key attributes are
+nested inside the "encap" attribute. Thus, an application that does
+not understand the "vlan" key will not see either of those attributes
+and therefore will not misinterpret them. (Also, the outer eth_type
+is still 0x8100, not changed to 0x0800.)
+
+Other rules
+-----------
+
+The other rules for flow keys are much less subtle:
+
+ - Duplicate attributes are not allowed at a given nesting level.
+
+ - Ordering of attributes is not significant.
+
+ - When the kernel sends a given flow key to userspace, it always
+ composes it the same way. This allows userspace to hash and
+ compare entire flow keys that it may not be able to fully
+ interpret.