[dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

Boris Pismenny borisp at mellanox.com
Tue Jan 2 11:50:50 CET 2018


Hi Declan,

On 12/22/2017 12:21 AM, Doherty, Declan wrote:
> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that when used
> in conjunction with rte_flow enables the configuration of inline data path encapsulation
> and decapsulation of tunnel endpoint network overlays on accelerated IO devices.
> 
> The proposed new API would provide for the creation, destruction, and
> monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs to allow the
> acceleration features to be discovered by applications.
> 
> /** Tunnel Endpoint context, opaque structure */
> struct rte_tep;
> 
> enum rte_tep_type {
>                 RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
>                 RTE_TEP_TYPE_NVGRE,     /**< NVGRE Protocol */
>                 ...
> };
> 
> /** Tunnel Endpoint Attributes */
> struct rte_tep_attr {
>                 enum rte_type_type type;
> 
>                 /* other endpoint attributes here */
> }
> 
> /**
> * Create a tunnel end-point context as specified by the flow attribute and pattern
> *
> * @param   port_id     Port identifier of Ethernet device.
> * @param   attr        Flow rule attributes.
> * @param   pattern     Pattern specification by list of rte_flow_items.
> * @return
> *  - On success returns pointer to TEP context
> *  - On failure returns NULL
> */
> struct rte_tep *rte_tep_create(uint16_t port_id,
>                                struct rte_tep_attr *attr, struct rte_flow_item pattern[])
> 
> /**
> * Destroy an existing tunnel end-point context. All the end-points context
> * will be destroyed, so all active flows using tep should be freed before
> * destroying context.
> * @param   port_id    Port identifier of Ethernet device.
> * @param   tep        Tunnel endpoint context
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)
> 
> /**
> * Get tunnel endpoint statistics
> *
> * @param   port_id    Port identifier of Ethernet device.
> * @param   tep        Tunnel endpoint context
> * @param   stats      Tunnel endpoint statistics
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> Int
> rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
>                                struct rte_tep_stats *stats)
> 
> /**
> * Get ports tunnel endpoint capabilities
> *
> * @param   port_id    Port identifier of Ethernet device.
> * @param   capabilities        Tunnel endpoint capabilities
> *
> * @return
> *  - On success returns 0
> *  - On failure returns 1
> */
> int
> rte_tep_capabilities_get(uint16_t port_id,
>                                struct rte_tep_capabilities *capabilities)
> 
> 
> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
> enhanced to add a new flow item type. This contains a pointer to the
> TEP context as well as the overlay flow id to which the traffic flow is
> associated.
> 
> struct rte_flow_item_tep {
>                 struct rte_tep *tep;
>                 uint32_t flow_id;
> }
> 
> Also 2 new generic actions types are added encapsulation and decapsulation.
> 
> RTE_FLOW_ACTION_TYPE_ENCAP
> RTE_FLOW_ACTION_TYPE_DECAP
> 
> struct rte_flow_action_encap {
>                 struct rte_flow_item *item;
> }
> 
> struct rte_flow_action_decap {
>                 struct rte_flow_item *item;
> }
> 
> The following section outlines the intended usage of the new APIs and then how
> they are combined with the existing rte_flow APIs.
> 
> Tunnel endpoints are created on logical ports which support the capability
> using rte_tep_create() using a combination of TEP attributes and
> rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being defined.
> The attrs parameter sets the TEP type, and could be used for other possible
> attributes.
> 
> struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN };
> 
> The values for the headers which make up the tunnel endpointr are then
> defined using spec parameter in the rte flow items (IPv4, UDP and
> VxLAN in this case)
> 
> struct rte_flow_item_ipv4 ipv4_item = {
>                 .hdr = { .src_addr = saddr, .dst_addr = daddr }
> };
> 
> struct rte_flow_item_udp udp_item = {
>                 .hdr = { .src_port = sport, .dst_port = dport }
> };
> 
> struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags };
> 
> struct rte_flow_item pattern[] = {
>                 { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_END }
> };
> 
> The tunnel endpoint can then be create on the port. Whether or not any hw
> configuration is required at this point would be hw dependent, but if not
> the context for the TEP is available for use in programming flow, so the
> application is not forced to redefine the TEP parameters on each flow
> addition.
> 
> struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);
> 
> Once the tep context is created flows can then be directed to that endpoint for
> processing. The following sections will outline how the author envisage flow
> programming will work and also how TEP acceleration can be combined with other
> accelerations.
> 
> 
> Ingress TEP decapsulation, mark and forward to queue:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The flows definition for TEP decapsulation actions should specify the full
> outer packet to be matched at a minimum. The outer packet definition should
> match the tunnel definition in the tep context and the tep flow id. This
> example shows describes matching on the outer, marking the packet with the
> VXLAN VNI and directing to a specified queue of the port.
> 
> Source Packet
> 
>         Decapsulate Outer Hdr
>       /                       \                                    decap outer crc
>      /                         \                                    /          \
>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER CRC |
>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
> 
> /* Flow Attributes/Items Definitions */
> 
> struct rte_flow_attr attr = { .ingress = 1 };
> 
> struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type = ether_type };
> struct rte_flow_item_tep tep_item = { .tep = tep, .id = vni };
> 
> struct rte_flow_item pattern[] = {
>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &tep_item  },
>                 { .type = RTE_FLOW_ITEM_TYPE_END }
> };
> 
> /* Flow Actions Definitions */
> 
> struct rte_flow_action_decap decap_eth = {
>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
> };
> 
> struct rte_flow_action_decap decap_tep = {
>                 .type = RTE_FLOW_ITEM_TYPE_TEP,
> .spec = &tep_item
> };
> 
> struct rte_flow_action_queue queue_action = { .index = qid };
> 
> struct rte_flow_action_port mark_action = { .index = vni };
> 
> struct rte_flow_action actions[] = {
>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },
>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },
>                 { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &mark_action },
>                 { .type = RTE_FLOW_ACTION_TYPE_QUEUE, .conf = &queue_action },
>                 { .type = RTE_FLOW_ACTION_TYPE_END }
> };

I guess the Ethernet header is kept separate so that it would be 
possible to update it separately?
But, I don't know of anyway to update a specific rte_flow pattern.
Maybe it would be best to combine it with the rest of the TEP and add an 
update TEP command?

> 
> /** VERY IMPORTANT NOTE **/
> One of the core concepts of this proposal is that actions which modify the
> packet are defined in the order which they are to be processed. So first decap
> outer ethernet header, then the outer TEP headers.
> I think this is not only logical from a usability point of view, it should also
> simplify the logic required in PMDs to parse the desired actions.

This makes a lot of sense when dealing with encap/decap.
Maybe it would be best to add a new bit from the reserved field in 
rte_flow_attr to express this. Something like this:

struct rte_flow_attr {
         uint32_t group; /**< Priority group. */
         uint32_t priority; /**< Priority level within group. */
         uint32_t ingress:1; /**< Rule applies to ingress traffic. */
         uint32_t egress:1; /**< Rule applies to egress traffic. */
	uint32_t inorder:1; /**< Actions are applied in order. */
         uint32_t reserved:29; /**< Reserved, must be zero. */
};

> 
> struct rte_flow *flow =
>                                rte_flow_create(port_id, &attr, pattern, actions, &err);
> 
> The processed packets are delivered to specifed queue with mbuf metadata
> denoting marked flow id and with mbuf ol_flags PKT_RX_TEP_OFFLOAD set.
> 
>      +-----+------+-----+---------+-----+
>      | ETH | IPv4 | TCP | PAYLOAD | CRC |
>      +-----+------+-----+---------+-----+
> 
> 
> Ingress TEP decapsulation switch to port:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> This is intended to represent how a TEP decapsulation could be configured
> in a switching offload case, it makes an assumption that there is a logical
> port representation for all ports on the hw switch in the DPDK application,
> but similar functionality could be achieved by specifying something like a
> VF ID of the device.
> 
> Like the previous scenario the flows definition for TEP decapsulation actions
> should specify the full outer packet to be matched at a minimum but also
> define the elements of the inner match to match against including masks if
> required.

Why is the inner specification necessary?

What if I'd like to decapsulate all VXLAN traffic of some specification?

> 
> struct rte_flow_attr attr = { .ingress = 1 };
> 
> struct rte_flow_item pattern[] = {
>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &outer_eth_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &outer_tep_item, .mask = &tep_mask },
>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &inner_eth_item, .mask = &eth_mask }
>                 { .type = RTE_FLOW_ITEM_TYPE_IPv4, .spec = &inner_ipv4_item, .mask = &ipv4_mask },
>                 { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &inner_tcp_item, .mask = &tcp_mask },
>                 { .type = RTE_FLOW_ITEM_TYPE_END }
> };
> 
> /* Flow Actions Definitions */
> 
> struct rte_flow_action_decap decap_eth = {
>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
> };
> 
> struct rte_flow_action_decap decap_tep = {
>                 .type = RTE_FLOW_ITEM_TYPE_TEP,
>                 .item = &outer_tep_item
> };
> 
> struct rte_flow_action_port port_action = { .index = port_id };
> 
> struct rte_flow_action actions[] = {
>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },
>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },
>                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
>                 { .type = RTE_FLOW_ACTION_TYPE_END }
> }; >
> struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &err);
> 
> This action will forward the decapsulated packets to another port of the switch
> fabric but no information will on the tunnel or the fact that the packet was
> decapsulated will be passed with it, thereby enable segregation of the
> infrastructure and
> 
> 
> Egress TEP encapsulation:
> ~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> Encapulsation TEP actions require the flow definitions for the source packet
> and then the actions to do on that, this example shows a ipv4/tcp packet
> action.
> 
> Source Packet
> 
>      +-----+------+-----+---------+-----+
>      | ETH | IPv4 | TCP | PAYLOAD | CRC |
>      +-----+------+-----+---------+-----+
> 
> struct rte_flow_attr attr = { .egress = 1 };
> 
> struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type = ether_type };
> struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr = src_addr, .dst_addr = dst_addr } };
> struct rte_flow_item_udp tcp_item = { .hdr = { .src_port = src_port, .dst_port = dst_port } };
> 
> struct rte_flow_item pattern[] = {
>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &tcp_item },
>                 { .type = RTE_FLOW_ITEM_TYPE_END }
> };
> 
> /* Flow Actions Definitions */
> 
> struct rte_flow_action_encap encap_eth = {
>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type }
> };
> 
> struct rte_flow_action_encap encap_tep = {
>                 .type = RTE_FLOW_ITEM_TYPE_TEP,
>                 .item = { .tep = tep, .id = vni }
> };
> struct rte_flow_action_mark port_action = { .index = port_id };

This is the source port_id, where previously it was the destination 
port_id, right?

> 
> struct rte_flow_action actions[] = {
>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },
>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },
>                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
>                 { .type = RTE_FLOW_ACTION_TYPE_END }
> }
> struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions, &err);
> 
> 
>        encapsulating Outer Hdr
>       /                       \                                      outer crc
>      /                         \                                   /          \
>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER CRC |
>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
> 
> 
> 
> Chaining multiple modification actions eg IPsec and TEP
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> For example the definition for full hw acceleration for an IPsec ESP/Transport
> SA encapsulated in a vxlan tunnel would look something like:
> 
> struct rte_flow_action actions[] = {
>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },
>                 { .type = RTE_FLOW_ACTION_TYPE_SECURITY, .conf = &sec_session },
>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },
>                 { .type = RTE_FLOW_ACTION_TYPE_END }
> }

Assuming the actions are ordered..
The order here suggests that the packet looks like:
[ETH | IP | UDP | VXLAN | ETH | IP | ESP | payload | ESP TRAILER | CRC]

But, the packet below has the ESP header as the outer header.
Also, shouldn't the encap_eth action come before the encap_tep action?

> 
> 1. Source Packet
>                             +-----+------+-----+---------+-----+
>                             | ETH | IPv4 | TCP | PAYLOAD | CRC |
>                             +-----+------+-----+---------+-----+
> 
> 2. First Action - Tunnel Endpoint Encapsulation
> 
>        +------+-----+-------+-----+------+-----+---------+-----+
>        | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC |
>        +------+-----+-------+-----+------+-----+---------+-----+
> 
> 3. Second Action - IPsec ESP/Transport Security Processing
> 
>        +------+-----+-----+-------+-----+------+-----+---------+-----+-------------+
>        | IPv4 | ESP |              ENCRYPTED PAYLOAD                 | ESP TRAILER |
>        +------+-----+-----+-------+-----+------+-----+---------+-----+-------------+
> 
> 4. Third Action - Outer Ethernet Encapsulation
> 
> +-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+-----------+
> | ETH | IPv4 | ESP |              ENCRYPTED PAYLOAD                 | ESP TRAILER | OUTER CRC |
> +-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+-----------+
> 
> This example demonstrates the importance of making the interoperation of
> actions to be ordered, as in the above example, a security
> action can be defined on both the inner and outer packet by simply placing
> another security action at the beginning of the action list.
> 
> It also demonstrates the rationale for not collapsing the Ethernet into
> the TEP definition as when you have multiple encapsulating actions, all
> could potentially be the place where the Ethernet header needs to be
> defined.
> 
> 

With rte_security full protocol offload as presented here we still need 
someway to provide and update the Ethernet header. Maybe there should be 
two encap_eth actions in this case. One for the outer and another for 
the inner?



More information about the dev mailing list