[dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

Doherty, Declan declan.doherty at intel.com
Tue Jan 23 16:50:56 CET 2018


On 11/01/2018 9:44 PM, John Daley (johndale) wrote:
> Hi,
> One comment on DECAP action and a "feature request".  I'll also reply to the top of thread discussion separately. Thanks for the RFC Declan!
> 
> Feature request associated with ENCAP action:
> 
> VPP (and probably other apps) would like the ability to simply specify an independent tunnel ID as part of egress match criteria in an rte_flow rule. Then egress packets could specify a tunnel ID  and valid flag in the mbuf. If it matched the rte_flow tunnel ID item, a simple lookup in the nic could be done and the associated actions (particularly ENCAP) executed. The application already know the tunnel that the packet is associated with so no need to have the nic do matching on a header pattern. Plus it's possible that packet headers alone are not enough to determine the correct encap action (the bridge where the packet came from might be required).
> 
> This would require a new mbuf field to specify the tunnel ID (maybe in tx_offload) and a valid flag.  It would also require a new rte flow item type for matching the tunnel ID (like RTE_FLOW_ITEM_TYPE_META_TUNNEL_ID).
> 
> Is something like this being considered by others? If not, should it be part of this RFC or a new one? I think this would be the 1st meta-data match criteria in rte_flow, but I could see others following.

This sounds similar to what we needed to do in rte_security to support 
metadata for inline crypto on the ixgbe. I wasn't aware of devices which 
supported this type of function for overlaps, but it definitely sounds 
like we need to consider it here.

> 
> -johnd
> 
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Doherty, Declan
>> Sent: Thursday, December 21, 2017 2:21 PM
>> To: dev at dpdk.org
>> Subject: [dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement
>>
>> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
>> when used in conjunction with rte_flow enables the configuration of inline
>> data path encapsulation and decapsulation of tunnel endpoint network
>> overlays on accelerated IO devices.
>>
>> The proposed new API would provide for the creation, destruction, and
>> monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
>> to allow the acceleration features to be discovered by applications.
>>
>> /** Tunnel Endpoint context, opaque structure */ struct rte_tep;
>>
>> enum rte_tep_type {
>>                 RTE_TEP_TYPE_VXLAN = 1, /**< VXLAN Protocol */
>>                 RTE_TEP_TYPE_NVGRE,     /**< NVGRE Protocol */
>>                 ...
>> };
>>
>> /** Tunnel Endpoint Attributes */
>> struct rte_tep_attr {
>>                 enum rte_type_type type;
>>
>>                 /* other endpoint attributes here */ }
>>
>> /**
>> * Create a tunnel end-point context as specified by the flow attribute and
>> pattern
>> *
>> * @param   port_id     Port identifier of Ethernet device.
>> * @param   attr        Flow rule attributes.
>> * @param   pattern     Pattern specification by list of rte_flow_items.
>> * @return
>> *  - On success returns pointer to TEP context
>> *  - On failure returns NULL
>> */
>> struct rte_tep *rte_tep_create(uint16_t port_id,
>>                                struct rte_tep_attr *attr, struct rte_flow_item pattern[])
>>
>> /**
>> * Destroy an existing tunnel end-point context. All the end-points context
>> * will be destroyed, so all active flows using tep should be freed before
>> * destroying context.
>> * @param   port_id    Port identifier of Ethernet device.
>> * @param   tep        Tunnel endpoint context
>> * @return
>> *  - On success returns 0
>> *  - On failure returns 1
>> */
>> int rte_tep_destroy(uint16_t port_id, struct rte_tep *tep)
>>
>> /**
>> * Get tunnel endpoint statistics
>> *
>> * @param   port_id    Port identifier of Ethernet device.
>> * @param   tep        Tunnel endpoint context
>> * @param   stats      Tunnel endpoint statistics
>> *
>> * @return
>> *  - On success returns 0
>> *  - On failure returns 1
>> */
>> Int
>> rte_tep_stats_get(uint16_t port_id, struct rte_tep *tep,
>>                                struct rte_tep_stats *stats)
>>
>> /**
>> * Get ports tunnel endpoint capabilities
>> *
>> * @param   port_id    Port identifier of Ethernet device.
>> * @param   capabilities        Tunnel endpoint capabilities
>> *
>> * @return
>> *  - On success returns 0
>> *  - On failure returns 1
>> */
>> int
>> rte_tep_capabilities_get(uint16_t port_id,
>>                                struct rte_tep_capabilities *capabilities)
>>
>>
>> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
>> enhanced to add a new flow item type. This contains a pointer to the TEP
>> context as well as the overlay flow id to which the traffic flow is associated.
>>
>> struct rte_flow_item_tep {
>>                 struct rte_tep *tep;
>>                 uint32_t flow_id;
>> }
>>
>> Also 2 new generic actions types are added encapsulation and decapsulation.
>>
>> RTE_FLOW_ACTION_TYPE_ENCAP
>> RTE_FLOW_ACTION_TYPE_DECAP
>>
>> struct rte_flow_action_encap {
>>                 struct rte_flow_item *item; }
>>
>> struct rte_flow_action_decap {
>>                 struct rte_flow_item *item; }
>>
>> The following section outlines the intended usage of the new APIs and then
>> how they are combined with the existing rte_flow APIs.
>>
>> Tunnel endpoints are created on logical ports which support the capability
>> using rte_tep_create() using a combination of TEP attributes and
>> rte_flow_items. In the example below a new IPv4 VxLAN endpoint is being
>> defined.
>> The attrs parameter sets the TEP type, and could be used for other possible
>> attributes.
>>
>> struct rte_tep_attr attrs = { .type = RTE_TEP_TYPE_VXLAN };
>>
>> The values for the headers which make up the tunnel endpointr are then
>> defined using spec parameter in the rte flow items (IPv4, UDP and VxLAN in
>> this case)
>>
>> struct rte_flow_item_ipv4 ipv4_item = {
>>                 .hdr = { .src_addr = saddr, .dst_addr = daddr } };
>>
>> struct rte_flow_item_udp udp_item = {
>>                 .hdr = { .src_port = sport, .dst_port = dport } };
>>
>> struct rte_flow_item_vxlan vxlan_item = { .flags = vxlan_flags };
>>
>> struct rte_flow_item pattern[] = {
>>                 { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_UDP, .spec = &udp_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_VXLAN, .spec = &vxlan_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_END } };
>>
>> The tunnel endpoint can then be create on the port. Whether or not any hw
>> configuration is required at this point would be hw dependent, but if not the
>> context for the TEP is available for use in programming flow, so the
>> application is not forced to redefine the TEP parameters on each flow
>> addition.
>>
>> struct rte_tep *tep = rte_tep_create(port_id, &attrs, pattern);
>>
>> Once the tep context is created flows can then be directed to that endpoint
>> for processing. The following sections will outline how the author envisage
>> flow programming will work and also how TEP acceleration can be combined
>> with other accelerations.
>>
>>
>> Ingress TEP decapsulation, mark and forward to queue:
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> The flows definition for TEP decapsulation actions should specify the full
>> outer packet to be matched at a minimum. The outer packet definition
>> should match the tunnel definition in the tep context and the tep flow id.
>> This example shows describes matching on the outer, marking the packet
>> with the VXLAN VNI and directing to a specified queue of the port.
>>
>> Source Packet
>>
>>         Decapsulate Outer Hdr
>>       /                       \                                    decap outer crc
>>      /                         \                                    /          \
>>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>>      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER
>> CRC |
>>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>>
>> /* Flow Attributes/Items Definitions */
>>
>> struct rte_flow_attr attr = { .ingress = 1 };
>>
>> struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type =
>> ether_type }; struct rte_flow_item_tep tep_item = { .tep = tep, .id = vni };
>>
>> struct rte_flow_item pattern[] = {
>>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &tep_item  },
>>                 { .type = RTE_FLOW_ITEM_TYPE_END } };
>>
>> /* Flow Actions Definitions */
>>
>> struct rte_flow_action_decap decap_eth = {
>>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type } };
>>
>> struct rte_flow_action_decap decap_tep = {
>>                 .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &tep_item };
>>
>> struct rte_flow_action_queue queue_action = { .index = qid };
>>
>> struct rte_flow_action_port mark_action = { .index = vni };
>>
>> struct rte_flow_action actions[] = {
>>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },
>>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },
>>                 { .type = RTE_FLOW_ACTION_TYPE_MARK, .conf = &mark_action },
>>                 { .type = RTE_FLOW_ACTION_TYPE_QUEUE, .conf = &queue_action },
>>                 { .type = RTE_FLOW_ACTION_TYPE_END } };
>>
> Does the conf for  RTE_FLOW_ACTION_TYPE_DECAP action specify the first pattern to decap up to? In the above, is the 1st decap action needed? Wouldn't the 2nd action decap up to the matching vni?

I hadn't looked at like that, only as an explicit ordered list of 
headers to decap but viewing it as a pattern to decap up to also makes 
sense.
> On our nic, we would have to translate the decap actions into a (level, offset) pair which requires a lot of effort. Since the packet is already matched perhaps 'struct rte_flow_item' is not the right thing to pass to the decap action and a simple (layer, offset) could be used instead? E.g to decap up to the inner Ethernet header of a VxLAN packet:
> struct rte_flow_action_decap {
>                 uint32_t level;
> 	uint8_t offset;
> }
> struct rte_flow_action_decap_tep {
> 	.level = RTE_PTYPE_L4_UDP,
> 	.offset = sizeof(struct vxlan_hdr)
> }
> 
> Using RTE_PTYPE... is just for illustration- we might to define our own layers in rte_flow.h.  You could specify inner packet layers, and the offset need not be restricted to the size of the header so that  decap to an absolute offset could be allowed, e.g:
> struct rte_flow_action_decap_42 {
> 	.level = RTE_PTYPE_L2_ETHER,
> 	.offset = 42
> }
>

This sounds like an interesting approach, I hadn't considered these sort 
of decap actions.

>> /** VERY IMPORTANT NOTE **/
>> One of the core concepts of this proposal is that actions which modify the
>> packet are defined in the order which they are to be processed. So first
>> decap outer ethernet header, then the outer TEP headers.
>> I think this is not only logical from a usability point of view, it should also
>> simplify the logic required in PMDs to parse the desired actions.
>>
>> struct rte_flow *flow =
>>                                rte_flow_create(port_id, &attr, pattern, actions, &err);
>>
>> The processed packets are delivered to specifed queue with mbuf metadata
>> denoting marked flow id and with mbuf ol_flags PKT_RX_TEP_OFFLOAD set.
>>
>>      +-----+------+-----+---------+-----+
>>      | ETH | IPv4 | TCP | PAYLOAD | CRC |
>>      +-----+------+-----+---------+-----+
>>
>>
>> Ingress TEP decapsulation switch to port:
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> This is intended to represent how a TEP decapsulation could be configured in
>> a switching offload case, it makes an assumption that there is a logical port
>> representation for all ports on the hw switch in the DPDK application, but
>> similar functionality could be achieved by specifying something like a VF ID of
>> the device.
>>
>> Like the previous scenario the flows definition for TEP decapsulation actions
>> should specify the full outer packet to be matched at a minimum but also
>> define the elements of the inner match to match against including masks if
>> required.
>>
>> struct rte_flow_attr attr = { .ingress = 1 };
>>
>> struct rte_flow_item pattern[] = {
>>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &outer_eth_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_TEP, .spec = &outer_tep_item,
>> .mask = &tep_mask },
>>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &inner_eth_item,
>> .mask = &eth_mask }
>>                 { .type = RTE_FLOW_ITEM_TYPE_IPv4, .spec = &inner_ipv4_item,
>> .mask = &ipv4_mask },
>>                 { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &inner_tcp_item,
>> .mask = &tcp_mask },
>>                 { .type = RTE_FLOW_ITEM_TYPE_END } };
>>
>> /* Flow Actions Definitions */
>>
>> struct rte_flow_action_decap decap_eth = {
>>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type } };
>>
>> struct rte_flow_action_decap decap_tep = {
>>                 .type = RTE_FLOW_ITEM_TYPE_TEP,
>>                 .item = &outer_tep_item
>> };
>>
>> struct rte_flow_action_port port_action = { .index = port_id };
>>
>> struct rte_flow_action actions[] = {
>>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_eth },
>>                 { .type = RTE_FLOW_ACTION_TYPE_DECAP, .conf = &decap_tep },
>>                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
>>                 { .type = RTE_FLOW_ACTION_TYPE_END } };
>>
>> struct rte_flow *flow = rte_flow_create(port_id, &attr, pattern, actions,
>> &err);
>>
>> This action will forward the decapsulated packets to another port of the
>> switch fabric but no information will on the tunnel or the fact that the packet
>> was decapsulated will be passed with it, thereby enable segregation of the
>> infrastructure and
>>
>>
>> Egress TEP encapsulation:
>> ~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Encapulsation TEP actions require the flow definitions for the source packet
>> and then the actions to do on that, this example shows a ipv4/tcp packet
>> action.
>>
>> Source Packet
>>
>>      +-----+------+-----+---------+-----+
>>      | ETH | IPv4 | TCP | PAYLOAD | CRC |
>>      +-----+------+-----+---------+-----+
>>
>> struct rte_flow_attr attr = { .egress = 1 };
>>
>> struct rte_flow_item_eth eth_item = { .src = s_addr, .dst = d_addr, .type =
>> ether_type }; struct rte_flow_item_ipv4 ipv4_item = { .hdr = { .src_addr =
>> src_addr, .dst_addr = dst_addr } }; struct rte_flow_item_udp tcp_item = {
>> .hdr = { .src_port = src_port, .dst_port = dst_port } };
>>
>> struct rte_flow_item pattern[] = {
>>                 { .type = RTE_FLOW_ITEM_TYPE_ETH, .spec = &eth_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_IPV4, .spec = &ipv4_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_TCP, .spec = &tcp_item },
>>                 { .type = RTE_FLOW_ITEM_TYPE_END } };
>>
>> /* Flow Actions Definitions */
>>
>> struct rte_flow_action_encap encap_eth = {
>>                 .type = RTE_FLOW_ITEM_TYPE_ETH,
>>                 .item = { .src = s_addr, .dst = d_addr, .type = ether_type } };
>>
>> struct rte_flow_action_encap encap_tep = {
>>                 .type = RTE_FLOW_ITEM_TYPE_TEP,
>>                 .item = { .tep = tep, .id = vni } }; struct rte_flow_action_mark
>> port_action = { .index = port_id };
>>
>> struct rte_flow_action actions[] = {
>>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },
>>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },
>>                 { .type = RTE_FLOW_ACTION_TYPE_PORT, .conf = &port_action },
>>                 { .type = RTE_FLOW_ACTION_TYPE_END } } struct rte_flow *flow =
>> rte_flow_create(port_id, &attr, pattern, actions, &err);
>>
>>
>>        encapsulating Outer Hdr
>>       /                       \                                      outer crc
>>      /                         \                                   /          \
>>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>>      | ETH | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC | OUTER
>> CRC |
>>      +-----+------+-----+-------+-----+------+-----+---------+-----+-----------+
>>
>>
>>
>> Chaining multiple modification actions eg IPsec and TEP
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> For example the definition for full hw acceleration for an IPsec ESP/Transport
>> SA encapsulated in a vxlan tunnel would look something like:
>>
>> struct rte_flow_action actions[] = {
>>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_tep },
>>                 { .type = RTE_FLOW_ACTION_TYPE_SECURITY, .conf = &sec_session
>> },
>>                 { .type = RTE_FLOW_ACTION_TYPE_ENCAP, .conf = &encap_eth },
>>                 { .type = RTE_FLOW_ACTION_TYPE_END } }
>>
>> 1. Source Packet
>>                             +-----+------+-----+---------+-----+
>>                             | ETH | IPv4 | TCP | PAYLOAD | CRC |
>>                             +-----+------+-----+---------+-----+
>>
>> 2. First Action - Tunnel Endpoint Encapsulation
>>
>>        +------+-----+-------+-----+------+-----+---------+-----+
>>        | IPv4 | UDP | VxLAN | ETH | IPv4 | TCP | PAYLOAD | CRC |
>>        +------+-----+-------+-----+------+-----+---------+-----+
>>
>> 3. Second Action - IPsec ESP/Transport Security Processing
>>
>>        +------+-----+-----+-------+-----+------+-----+---------+-----+-------------+
>>        | IPv4 | ESP |              ENCRYPTED PAYLOAD                 | ESP TRAILER |
>>        +------+-----+-----+-------+-----+------+-----+---------+-----+-------------+
>>
>> 4. Third Action - Outer Ethernet Encapsulation
>>
>> +-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+---
>> --------+
>> | ETH | IPv4 | ESP |              ENCRYPTED PAYLOAD                 | ESP TRAILER |
>> OUTER CRC |
>> +-----+------+-----+-----+-------+-----+------+-----+---------+-----+-------------+---
>> --------+
>>
>> This example demonstrates the importance of making the interoperation of
>> actions to be ordered, as in the above example, a security action can be
>> defined on both the inner and outer packet by simply placing another
>> security action at the beginning of the action list.
>>
>> It also demonstrates the rationale for not collapsing the Ethernet into the TEP
>> definition as when you have multiple encapsulating actions, all could
>> potentially be the place where the Ethernet header needs to be defined.
>>
> 



More information about the dev mailing list