[dpdk-dev] [RFC] tunnel endpoint hw acceleration enablement

Doherty, Declan declan.doherty at intel.com
Tue Jan 9 18:30:48 CET 2018


On 24/12/2017 5:30 PM, Shahaf Shuler wrote:
> Hi Declan,
> 

Hey Shahaf, apologies for the delay in responding, I have been out of 
office for the last 2 weeks.

> Friday, December 22, 2017 12:21 AM, Doherty, Declan:
>> This RFC contains a proposal to add a new tunnel endpoint API to DPDK that
>> when used in conjunction with rte_flow enables the configuration of inline
>> data path encapsulation and decapsulation of tunnel endpoint network
>> overlays on accelerated IO devices.
>>
>> The proposed new API would provide for the creation, destruction, and
>> monitoring of a tunnel endpoint in supporting hw, as well as capabilities APIs
>> to allow the acceleration features to be discovered by applications.
>>
....
> 
> 
> Am not sure I understand why there is a need for the above control methods.
> Are you introducing a new "tep device" ? > As the tunnel endpoint is sending and receiving Ethernet packets from 
the network I think it should still be counted as Ethernet device but 
with more capabilities (for example it supported encap/decap etc..), 
therefore it should use the Ethdev layer API to query statistics (for 
example).

No, the new APIs are only intended to be a method of creating, 
monitoring and deleting tunnel-endpoints on an existing ethdev. The 
rationale for APIs separate to rte_flow are the same as that in the 
rte_security, there is not a 1:1 mapping of TEPs to flows. Many flows 
(VNI's in VxLAN for example) can be originate/terminate on the same TEP, 
therefore managing the TEP independently of the flows being transmitted 
on it is important to allow visibility of that endpoint stats for 
example. I can't see how the existing ethdev API could be used for 
statistics as a single ethdev could be supporting may concurrent TEPs, 
therefore we would either need to use the extended stats with many 
entries, one for each TEP, or if we treat a TEP as an attribute of a 
port in a similar manner to the way rte_security manages an IPsec SA, 
the state of each TEP can be monitored and managed independently of both 
the overall port or the flows being transported on that endpoint.

> As for the capabilities - what specifically you had in mind? The current usage you show with tep is with rte_flow rules. There are no capabilities currently for rte_flow supported actions/pattern. To check such capabilities application uses rte_flow_validate.

I envisaged that the application should be able to see if an ethdev can 
support TEP in the rx/tx offloads, and then the rte_tep_capabilities 
would allow applications to query what tunnel endpoint protocols are 
supported etc. I would like a simple mechanism to allow users to see if 
a particular tunnel endpoint type is supported without having to build 
actual flows to validate.

> Regarding the creation/destroy of tep. Why not simply use rte_flow API and avoid this extra control?
> For example - with 17.11 APIs, application can put the port in isolate mode, and insert a flow_rule to catch only IPv4 VXLAN traffic and direct to some queue/do RSS. Such operation, per my understanding, will create a tunnel endpoint. What are the down sides of doing it with the current APIs?

That doesn't enable encapsulation and decapsulation of the outer tunnel 
endpoint in the hw as far as I know. Apart from the inability to monitor
the endpoint statistics I mentioned above. It would also require that 
you redefine the endpoints parameters ever time to you wish to add a new 
flow to it. I think the having the rte_tep object semantics should also 
simplify the ability to enable a full vswitch offload of TEP where the 
hw is handling both encap/decap and switching to a particular port.

> 
>>
>>
>> To direct traffic flows to hw terminated tunnel endpoint the rte_flow API is
>> enhanced to add a new flow item type. This contains a pointer to the TEP
>> context as well as the overlay flow id to which the traffic flow is associated.
>>
>> struct rte_flow_item_tep {
>>                 struct rte_tep *tep;
>>                 uint32_t flow_id;
>> }
> 
> Can you provide more detailed definition about the flow id ? to which field from the packet headers it refers to?
> On your below examples it looks like it is to match the VXLAN vni in case of VXLAN, what about the other protocols? And also, why not using the already exists VXLAN item?

I have only been looking initially at couple of the tunnel endpoint 
procotols, namely Geneve, NvGRE, and VxLAN, but the idea here is to 
allow the user to define the VNI in the case of Geneve and VxLAN and the 
VSID in the case of NvGRE on a per flow basis, as per my understanding 
these are used to identify the source/destination hosts on the overlay 
network independently from the endpoint there are transported across.

The VxLAN item is used in the creation of the TEP object, using the TEP 
object just removes the need for the user to constantly redefine all the 
tunnel parameters and also I think dependent on the hw implementation it 
may simplify the drivers work if it know the exact endpoint the actions 
is for instead of having to look it up on each flow addition.

> 
> Generally I like the idea of separating the encap/decap context from the action. However looks like the rte_flow_item has double meaning on this RFC, once for the classification and once for the action.
>  From the top of my head I would think of an API which separate those, and re-use the existing flow items. Something like:
> 
>   struct rte_flow_item pattern[] = {
>                  { set of already exists pattern  },
>                  { ... },
>                  { .type = RTE_FLOW_ITEM_TYPE_END } };
> 
> encap_ctx = create_enacap_context(pattern)
> 
> rte_flow_action actions[] = {
> 	{ .type RTE_FLOW_ITEM_ENCAP, .conf = encap_ctx}
> }

I not sure I fully understand what you're asking here, but in general 
for encap you only would define the inner part of the packet in the 
match pattern criteria and the actual outer tunnel headers would be 
defined in the action.

I guess there is some replication in the decap side as proposed, as the 
TEP object is used in both the pattern and the action, possibly you 
could get away with having no TEP object defined in the action data, but 
I prefer keeping the API symmetrical for encap/decap actions at the 
shake of some extra verbosity.

>   
...
> 



More information about the dev mailing list