[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Lu, Wenzhuo wenzhuo.lu at intel.com
Wed Jul 20 04:16:51 CEST 2016


Hi Adrien,


> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Tuesday, July 19, 2016 9:12 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Thomas Monjalon; Zhang, Helin; Wu, Jingjing; Rasesh Mody;
> Ajit Khaparde; Rahul Lakkireddy; Jan Medala; John Daley; Chen, Jing D; Ananyev,
> Konstantin; Matej Vido; Alejandro Lucero; Sony Chacko; Jerin Jacob; De Lara
> Guarch, Pablo; Olga Shern
> Subject: Re: [RFC] Generic flow director/filtering/classification API
> 
> On Tue, Jul 19, 2016 at 08:11:48AM +0000, Lu, Wenzhuo wrote:
> > Hi Adrien,
> > Thanks for your clarification.  Most of my questions are clear, but still
> something may need to be discussed, comment below.
> 
> Hi Wenzhuo,
> 
> Please see below.
> 
> [...]
> > > > > Requirements for a new API:
> > > > >
> > > > > - Flexible and extensible without causing API/ABI problems for existing
> > > > >   applications.
> > > > > - Should be unambiguous and easy to use.
> > > > > - Support existing filtering features and actions listed in `Filter types`_.
> > > > > - Support packet alteration.
> > > > > - In case of overlapping filters, their priority should be well documented.
> > > > Does that mean we don't guarantee the consistent of priority? The
> > > > priority can
> > > be different on different NICs. So the behavior of the actions  can be
> different.
> > > Right?
> > >
> > > No, the intent is precisely to define what happens in order to get a
> > > consistent result across different devices, and document cases with
> undefined behavior.
> > > There must be no room left for interpretation.
> > >
> > > For example, the API must describe what happens when two overlapping
> > > filters (e.g. one matching an Ethernet header, another one matching
> > > an IP header) match a given packet at a given priority level.
> > >
> > > It is documented in section 4.1.1 (priorities) as "undefined behavior".
> > > Applications remain free to do it and deal with consequences, at
> > > least they know they cannot expect a consistent outcome, unless they
> > > use different priority levels for both rules, see also 4.4.5 (flow rules priority).
> > >
> > > > Seems the users still need to aware the some details of the HW? Do
> > > > we need
> > > to add the negotiation for the priority?
> > >
> > > Priorities as defined in this document may not be directly mappable
> > > to HW capabilities (e.g. HW does not support enough priorities, or
> > > that some corner case make them not work as described), in which
> > > case the PMD may choose to simulate priorities (again 4.4.5), as
> > > long as the end result follows the specification.
> > >
> > > So users must not be aware of some HW details, the PMD does and must
> > > perform the needed workarounds to suit their expectations. Users may
> > > only be impacted by errors while attempting to create rules that are
> > > either unsupported or would cause them (or existing rules) to diverge from
> the spec.
> > The problem is sometime the priority of the filters is fixed according
> > to
> > > HW's implementation. For example, on ixgbe, n-tuple has a higher
> > > priority than flow director.
> 
> As a side note I did not know that N-tuple had a higher priority than flow
> director on ixgbe, priorities among filter types do not seem to be documented at
> all in DPDK. This is one of the reasons I think we need a generic API to handle
> flow configuration.
Totally agree with you. We haven't documented the info well enough. And even we do that, users have to study the details of every NIC, it can still make the filters very hard to use. I believe a generic API is very helpful here :)

> 
> 
> So, today an application cannot combine N-tuple and FDIR flow rules and get a
> reliable outcome, unless it is designed for specific devices with a known
> behavior.
> 
> > What's the right behavior of PMD if APP want to create a flow director rule
> which has a higher or even equal priority than an existing n-tuple rule? Should
> PMD return fail?
> 
> First remember applications only deal with the generic API, PMDs are
> responsible for choosing the most appropriate HW implementation to use
> according to the requested flow rules (FDIR, N-tuple or anything else).
> 
> For the specific case of FDIR vs N-tuple, if the underlying HW supports both I do
> not see why the PMD would create a N-tuple rule. Doesn't FDIR support
> everything N-tuple can do and much more?
Talking about the filters, fdir can cover n-tuple. I think that's why i40e only supports fdir but not n-tuple. But n-tuple has its own highlight. As we know, at least on intel NICs, fdir only supports per device mask. But n-tuple can support per rule mask.
As every pattern has spec and mask both, we cannot guarantee the masks are same. I think ixgbe will try to use n-tuple first if can. Because even the masks are different, we can support them all.

> 
> Assuming such a thing happened anyway, that the PMD had to create a rule
> using a high priority filter type and that the application requests the creation of a
> rule that can only be done using a lower priority filter type, but also requested a
> higher priority for that rule, then yes, it should obviously fail.
> 
> That is, unless the PMD can perform some kind of workaround to have both.
> 
> > If so, do we need more fail reasons? According to this RFC, I think we need
> return " EEXIST: collision with an existing rule. ", but it's not very clear, APP
> doesn't know the problem is priority, maybe more detailed reason is helpful.
> 
> Possibly, I've defined a basic set of errors, there are quite a number of errno
> values to choose from. However I think we should not define too many values.
> In my opinion the basic set covers every possible failure:
> 
> - EINVAL: invalid format, rule is broken or cannot be understood by the PMD
>   anyhow.
> 
> - ENOTSUP: pattern/actions look fine but something in the requested rule is
>   not supported and thus cannot be applied.
> 
> - EEXIST: pattern/actions are fine and could have been applied if only some
>   other rule did not prevent the PMD to do it (I see it as the closest thing
>   to "ETOOBAD" which unfortunately does not exist).
> 
> - ENOMEM: like EEXIST, except it is due to the lack of resources not because
>   of another rule. I wasn't sure which of ENOMEM or ENOSPC was better but
>   settled on ENOMEM as it is well known. Still open to debate.
> 
> Errno values are only useful to get a rough idea of the reason, and another
> mechanism is needed to pinpoint the exact problem for debugging/reporting
> purposes, something like:
> 
>  enum rte_flow_error_type {
>      RTE_FLOW_ERROR_TYPE_NONE,
>      RTE_FLOW_ERROR_TYPE_UNKNOWN,
>      RTE_FLOW_ERROR_TYPE_PRIORITY,
>      RTE_FLOW_ERROR_TYPE_PATTERN,
>      RTE_FLOW_ERROR_TYPE_ACTION,
>  };
> 
>  struct rte_flow_error {
>      enum rte_flow_error_type type;
>      void *offset; /* Points to the exact pattern item or action. */
>      const char *message;
>  };
When we are using a CLI and it fails, normally it will let us know which parameter is not appropriate. So, I think it’s a good idea to have this error structure :)

> 
> Then either provide an optional struct rte_flow_error pointer to
> rte_flow_validate(), or a separate function (rte_flow_analyze()?), since
> processing this may be quite expensive and applications may not care about the
> exact reason.
Agree the processing may be too expensive. Maybe we can say it's optional to return error details. And that's a good question that what APP should do if creating the rule fails. I believe normally it will choose handle the rule by itself. But I think it's not bad to feedback more. Or even the APP want to adjust the rules, it cannot be an option for lack of info.

> 
> What do you suggest?
> 
> > > > > Behavior
> > > > > --------
> > > > >
> > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be
> > > > >   returned).
> > > > >
> > > > > - There is no provision for reentrancy/multi-thread safety, although
> nothing
> > > > >   should prevent different devices from being configured at the same
> > > > >   time. PMDs may protect their control path functions accordingly.
> > > > >
> > > > > - Stopping the data path (TX/RX) should not be necessary when
> > > > > managing
> > > flow
> > > > >   rules. If this cannot be achieved naturally or with workarounds (such as
> > > > >   temporarily replacing the burst function pointers), an appropriate error
> > > > >   code must be returned (``EBUSY``).
> > > > PMD cannot stop the data path without adding lock. So I think if
> > > > some rules
> > > cannot be applied without stopping rx/tx, PMD has to return fail.
> > > > Or let the APP to stop the data path.
> > >
> > > Agreed, that is the intent. If the PMD cannot touch flow rules for
> > > some reason even after trying really hard, then it just returns EBUSY.
> > >
> > > Perhaps we should write down that applications may get a different
> > > outcome after stopping the data path if they get EBUSY?
> > Agree, it's better to describe more about the APP. BTW, I checked the
> > behavior of ixgbe/igb, I think we can add/delete filters during
> > runtime. Hopefully we'll not hit too many EBUSY problems on other NICs
> > :)
> 
> OK, I will add it.
> 
> > > > > - PMDs, not applications, are responsible for maintaining flow rules
> > > > >   configuration when stopping and restarting a port or performing other
> > > > >   actions which may affect them. They can only be destroyed explicitly.
> > > > Don’t understand " They can only be destroyed explicitly."
> > >
> > > This part says that as long as an application has not called
> > > rte_flow_destroy() on a flow rule, it never disappears, whatever
> > > happens to the port (stopped, restarted). The application is not
> > > responsible for re-creating rules after that.
> > >
> > > Note that according to the specification, this may translate to not
> > > being able to stop a port as long as a flow rule is present,
> > > depending on how nice the PMD intends to be with applications.
> > > Implementation can be done in small steps with minimal amount of code on
> the PMD side.
> > Does it mean PMD should store and maintain all the rules? Why not let rte do
> that? I think if PMD maintain all the rules, it means every kind of NIC should have
> a copy of code for the rules. But if rte do that, only one copy of code need to be
> maintained, right?
> 
> I've considered having rules stored in a common format understood at the RTE
> level and not specific to each PMD and decided that the opaque rte_flow pointer
> was a better choice for the following reasons:
> 
> - Even though flow rules management is done in the control path, processing
>   must be as fast as possible. Letting PMDs store flow rules using their own
>   internal representation gives them the chance to achieve better
>   performance.
Not quite understand. I think we're talking about maintain the rules by SW. I don’t think there's something need to be optimized according to specific NICs. If we need to optimize the code, I think we need to consider the CPU, OS ... and some common means. I'm wrong?

> 
> - An opaque context managed by PMDs would probably have to be stored
>   somewhere as well anyway.
> 
> - PMDs may not need to allocate/store anything at all if they exclusively
>   rely on HW state for everything. In my opinion, the generic API has enough
>   constraints for this to work and maintain consistency between flow
>   rules. Note this is currently how most PMDs implement FDIR and other
>   filter types.
Yes, the rules are stored by HW. But considering stop/start the device, the rules in HW will lose. we have to store the rules by SW and re-program them when restarting the device.
And in existing code, we store the filters by SW at least on Intel NICs. But I think we cannot reuse them, because considering the priority and which category of filter should be chosen, I think we need a whole new table for generic API. I think it’s what's designed now, right?

> 
> - RTE can (and will) provide helpers to avoid most of the code redundancy,
>   PMDs are free to use them or manage everything by themselves.
> 
> - Given that the opaque rte_flow pointer associated with a flow rule is to
>   be stored by the application, PMDs do not even have to keep references to
>   them.
Don’t understand. More details?

> 
> - The flow rules format described in this specification (pattern / actions)
>   will be used by applications directly, and will be free to arrange them in
>   lists, trees or in any other way if they need to keep flow specifications
>   around for further processing.
Who will create the lists, trees or something else? According to previous discussion, I think the APP will program the rules one by one. So if APP organize the rules to lists, trees..., PMD doesn’t know that. 
And you said " Given that the opaque rte_flow pointer associated with a flow rule is to be stored by the application ". I'm lost here.

> 
> > When the port is stopped and restarted, rte can reconfigure the rules. Is the
> concern that PMD may adjust the sequence of the rules according to the priority,
> so every NIC has a different list of rules? But PMD can adjust them again when
> rte reconfiguring the rules.
> 
> What about PMDs able to stop and restart ports without destroying their own
> flow rules? If we assume flow rules must be destroyed when stopping a port,
> these PMDs are needlessly penalized with slower stop/start cycles. Think about
> it assuming thousands of flow rules.
I believe the rules maintained by SW should not be destroyed, because they're used to be re-programed when the device starts again.

> 
> Thus from an application point of view, whatever happens when stopping and
> restarting a port should not matter. If a flow rule was present before, it must
> still be present afterwards. If the PMD had to destroy flow rules and re-create
> them, it does not actually matter if they differ slightly at the HW level, as long as:
> 
> - Existing opaque flow rule pointers (rte_flow) are still valid to the PMD
>   and refer to the same rules.
> 
> - The overall behavior of all rules is the same.
> 
> The list of rules you think of (patterns / actions) is maintained by applications
> (not RTE), and only if they need them. RTE would needlessly duplicate this.
As said before, need more details to understand this. Maybe an example is better :)

> 
> --
> Adrien Mazarguil
> 6WIND


More information about the dev mailing list