[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Adrien Mazarguil adrien.mazarguil at 6wind.com
Thu Jul 21 19:07:38 CEST 2016


Hi Rahul,

Please see below.

On Thu, Jul 21, 2016 at 01:43:37PM +0530, Rahul Lakkireddy wrote:
> Hi Adrien,
> 
> The proposal looks very good.  It satisfies most of the features
> supported by Chelsio NICs.  We are looking for suggestions on exposing
> more additional features supported by Chelsio NICs via this API.
> 
> Chelsio NICs have two regions in which filters can be placed -
> Maskfull and Maskless regions.  As their names imply, maskfull region
> can accept masks to match a range of values; whereas, maskless region
> don't accept any masks and hence perform a more strict exact-matches.
> Filters without masks can also be placed in maskfull region.  By
> default, maskless region have higher priority over the maskfull region.
> However, the priority between the two regions is configurable.

I understand this configuration affects the entire device. Just to be clear,
assuming some filters are already configured, are they affected by a change
of region priority later?

> Please suggest on how we can let the apps configure in which region
> filters must be placed and set the corresponding priority accordingly
> via this API.

Okay. Applications, like customers, are always right.

With this in mind, PMDs are not allowed to break existing flow rules, and
face two options when applications provide a flow specification that would
break an existing rule:

- Refuse to create it (easiest).

- Find a workaround to apply it anyway (possibly quite complicated).

The reason you have these two regions is performance right? Otherwise I'd
just say put everything in the maskfull region.

PMDs are allowed to rearrange existing rules or change device parameters as
long as existing constraints are satisfied. In my opinion it does not matter
which region has the highest default priority. Applications always want the
best performance so the first created rule should be in the most appropriate
region.

If a subsequent rule requires it to be in the other region but the
application specified the wrong priority for this to work, then the PMD can
either choose to swap region priorities on the device (assuming it does not
affect other rules), or destroy and recreate the original rule in a way that
satisfies all constraints (i.e. moving conflicting rules from the maskless
region to the maskfull one).

Going further, when subsequent rules get destroyed the PMD should ideally
move back maskfull rules back into the maskless region for better
performance.

This is only a suggestion. PMDs have the right to say "no" at some point.

More important in my opinion is to make sure applications can create a given
set of flow rules in any order. If rules a/b/c can be created, then it won't
make sense from an application point of view if c/a/b for some reason cannot
and the PMD maintainers will rightfully get a bug report.

> More comments below.
> 
> On Tuesday, July 07/05/16, 2016 at 20:16:46 +0200, Adrien Mazarguil wrote:
> > Hi All,
> > 
> [...]
> 
> > 
> > ``ETH``
> > ^^^^^^^
> > 
> > Matches an Ethernet header.
> > 
> > - ``dst``: destination MAC.
> > - ``src``: source MAC.
> > - ``type``: EtherType.
> > - ``tags``: number of 802.1Q/ad tags defined.
> > - ``tag[]``: 802.1Q/ad tag definitions, innermost first. For each one:
> > 
> >  - ``tpid``: Tag protocol identifier.
> >  - ``tci``: Tag control information.
> > 
> > ``IPV4``
> > ^^^^^^^^
> > 
> > Matches an IPv4 header.
> > 
> > - ``src``: source IP address.
> > - ``dst``: destination IP address.
> > - ``tos``: ToS/DSCP field.
> > - ``ttl``: TTL field.
> > - ``proto``: protocol number for the next layer.
> > 
> > ``IPV6``
> > ^^^^^^^^
> > 
> > Matches an IPv6 header.
> > 
> > - ``src``: source IP address.
> > - ``dst``: destination IP address.
> > - ``tc``: traffic class field.
> > - ``nh``: Next header field (protocol).
> > - ``hop_limit``: hop limit field (TTL).
> > 
> > ``ICMP``
> > ^^^^^^^^
> > 
> > Matches an ICMP header.
> > 
> > - TBD.
> > 
> > ``UDP``
> > ^^^^^^^
> > 
> > Matches a UDP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - ``length``: UDP length.
> > - ``checksum``: UDP checksum.
> > 
> > .. raw:: pdf
> > 
> >    PageBreak
> > 
> > ``TCP``
> > ^^^^^^^
> > 
> > Matches a TCP header.
> > 
> > - ``sport``: source port.
> > - ``dport``: destination port.
> > - All other TCP fields and bits.
> > 
> > ``VXLAN``
> > ^^^^^^^^^
> > 
> > Matches a VXLAN header.
> > 
> > - TBD.
> > 
> 
> In addition to above matches, Chelsio NICs have some additional
> features:
> 
> - Match based on unicast DST-MAC, multicast DST-MAC, broadcast DST-MAC.
>   Also, there is a match criteria available called 'promisc' - which
>   matches packets that are not destined for the interface, but had
>   been received by the hardware due to interface being in promiscuous
>   mode.

There is no unicast/multicast/broadcast distinction in the ETH pattern item,
those are covered by the specified MAC address.

Now about this "promisc" match criteria, it can be added as a new meta
pattern item (4.1.3 Meta item types). Do you want it to be defined from the
start or add it later with the related code in your PMD?

> - Match FCoE packets.

The "oE" part is covered by the ETH item (using the proper Ethertype), but a
new pattern item should probably be added for the FC protocol itself. As
above, I suggest adding it later.

> - Match IP Fragmented packets.

It seems that I missed quite a few fields in the IPv4 item definition
(struct rte_flow_item_ipv4). It should be in there.

> - Match range of physical ports on the NIC in a single rule via masks.
>   For ex: match all UDP packets coming on ports 3 and 4 out of 4
>   ports available on the NIC.

Applications create flow rules per port, I'd thus suggest that the PMD
should detect identical rules created on different ports and aggregate them
as a single HW rule automatically.

If you think this approach is not right, the alternative is a meta pattern
item that provides a list of ports. I'm not sure this is the right approach
considering it would most likely not be supported by most NICs. Applications
may not request it explicitly.

> - Match range of Physical Functions (PFs) on the NIC in a single rule
>   via masks. For ex: match all traffic coming on several PFs.

The PF and VF pattern items assume there is a single PF associated with a
DPDK port. VFs are identified with an ID. I basically took the same
definitions as the existing filter types, perhaps this is not enough for
Chelsio adapters.

Do you expose more than one PF for a DPDK port?

Anyway, I'd suggest the same approach as above, automatic aggregation of
rules for performance reasons, otherwise new or updated PF/VF pattern items,
in which case it would be great if you could provide ideal structure
definitions for this use case.

> Please suggest on how we can expose the above features to DPDK apps via
> this API.
> 
> [...]
> 
> > 
> > Actions
> > ~~~~~~~
> > 
> > Each possible action is represented by a type. Some have associated
> > configuration structures. Several actions combined in a list can be affected
> > to a flow rule. That list is not ordered.
> > 
> > At least one action must be defined in a filter rule in order to do
> > something with matched packets.
> > 
> > - Actions are defined with ``struct rte_flow_action``.
> > - A list of actions is defined with ``struct rte_flow_actions``.
> > 
> > They fall in three categories:
> > 
> > - Terminating actions (such as QUEUE, DROP, RSS, PF, VF) that prevent
> >   processing matched packets by subsequent flow rules, unless overridden
> >   with PASSTHRU.
> > 
> > - Non terminating actions (PASSTHRU, DUP) that leave matched packets up for
> >   additional processing by subsequent flow rules.
> > 
> > - Other non terminating meta actions that do not affect the fate of packets
> >   (END, VOID, ID, COUNT).
> > 
> > When several actions are combined in a flow rule, they should all have
> > different types (e.g. dropping a packet twice is not possible). However
> > considering the VOID type is an exception to this rule, the defined behavior
> > is for PMDs to only take into account the last action of a given type found
> > in the list. PMDs still perform error checking on the entire list.
> > 
> > *Note that PASSTHRU is the only action able to override a terminating rule.*
> > 
> 
> Chelsio NICs can support an action 'switch' which can re-direct
> matched packets from one port to another port in hardware.  In addition,
> it can also optionally:
> 
> 1. Perform header rewrites (src-mac/dst-mac rewrite, src-mac/dst-mac
>    swap, vlan add/remove/rewrite).
> 
> 2. Perform NAT'ing in hardware (4-tuple rewrite).
> 
> before sending it out on the wire [1].
> 
> To meet the above requirements, we'd need a way to pass sub-actions
> to action 'switch' and a way to pass extra info (such as new
> src-mac/dst-mac, new vlan, new 4-tuple for NAT) to rewrite
> corresponding fields.
> 
> We're looking for suggestions on how we can achieve action 'switch'
> in this new API.
> 
> From our understanding of this API, we could just expand
> rte_flow_action_type with an additional action type
> RTE_FLOW_ACTION_TYPE_SWITCH and define several sub-actions such as:
> 
> enum rte_flow_action_switch_type {
>         RTE_FLOW_ACTION_SWITCH_TYPE_NONE,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_MAC_REWRITE,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_MAC_SWAP,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_INSERT,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_DELETE,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_VLAN_REWRITE,
> 	RTE_FLOW_ACTION_SWITCH_TYPE_NAT_REWRITE,
> };
> 
> and then define an rte_flow_action_switch as follows:
> 
> struct rte_flow_action_switch {
> 	enum rte_flow_action_switch_type type; /* sub actions to perform */
> 	uint16_t port;	/* Destination physical port to switch packet */
> 	enum rte_flow_item_type	 type; /* Fields to rewrite */
> 	const void *switch_spec;
> 	/* Holds info to rewrite matched flows */
> };
> 
> Does the above approach sound right with respect to this new API?

It does. Not sure I'd go with a sublevel of actions for switching types
though. Think more generic, MAC swapping, MAC rewrite and so on could be
defined as separate actions usable on their own by all PMDs, so you'd
combine a SWITCH action with these.

Also be careful with the "port" field definition. I'm sure the Chelsio
switch cannot make a packet leave through a port of a Mellanox device. DPDK
applications are aware of a single "port namespace", so it has to be
translated by the PMD to a physical port on the same Chelio adapter,
otherwise rule creation should fail.

> [...]
> 
> > 
> > ``COUNT``
> > ^^^^^^^^^
> > 
> > Enables hits counter for this rule.
> > 
> > This counter can be retrieved and reset through ``rte_flow_query()``, see
> > ``struct rte_flow_query_count``.
> > 
> > - Counters can be retrieved with ``rte_flow_query()``.
> > - No configurable property.
> > 
> > +---------------+
> > | COUNT         |
> > +===============+
> > | no properties |
> > +---------------+
> > 
> > Query structure to retrieve and reset the flow rule hits counter:
> > 
> > +------------------------------------------------+
> > | COUNT query                                    |
> > +===========+=====+==============================+
> > | ``reset`` | in  | reset counter after query    |
> > +-----------+-----+------------------------------+
> > | ``hits``  | out | number of hits for this flow |
> > +-----------+-----+------------------------------+
> > 
> 
> Chelsio NICs can also count the number of bytes that hit the rule.
> So, need a counter "bytes".

As well as the number of packets, right? Anyway it makes sense, I'll add
a "bytes" field.

> [1]  http://www.dpdk.org/ml/archives/dev/2016-February/032605.html

Wow. I've not paid much attention to this thread at the time. Tweaking FDIR
this much was quite an achievement. Now I feel lazy with my proposal for a
brand new API instead, thanks.

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list