[dpdk-dev] [RFC] Generic flow director/filtering/classification API

Chandran, Sugesh sugesh.chandran at intel.com
Mon Jul 18 15:26:09 CEST 2016


Hi Adrien,
Thank you for getting back on this.
Please find my comments below.

Regards
_Sugesh


> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil at 6wind.com]
> Sent: Friday, July 15, 2016 4:04 PM
> To: Chandran, Sugesh <sugesh.chandran at intel.com>
> Cc: dev at dpdk.org; Thomas Monjalon <thomas.monjalon at 6wind.com>;
> Zhang, Helin <helin.zhang at intel.com>; Wu, Jingjing
> <jingjing.wu at intel.com>; Rasesh Mody <rasesh.mody at qlogic.com>; Ajit
> Khaparde <ajit.khaparde at broadcom.com>; Rahul Lakkireddy
> <rahul.lakkireddy at chelsio.com>; Lu, Wenzhuo <wenzhuo.lu at intel.com>;
> Jan Medala <jan at semihalf.com>; John Daley <johndale at cisco.com>; Chen,
> Jing D <jing.d.chen at intel.com>; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; Matej Vido <matejvido at gmail.com>;
> Alejandro Lucero <alejandro.lucero at netronome.com>; Sony Chacko
> <sony.chacko at qlogic.com>; Jerin Jacob
> <jerin.jacob at caviumnetworks.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>; Olga Shern <olgas at mellanox.com>;
> Chilikin, Andrey <andrey.chilikin at intel.com>
> Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification
> API
> 
> On Fri, Jul 15, 2016 at 09:23:26AM +0000, Chandran, Sugesh wrote:
> > Thank you Adrien,
> > Please find below for some more comments/inputs
> >
> > Let me know your thoughts on this.
> 
> Thanks, stripping again non relevant parts.
> 
> [...]
> > > > > > [Sugesh] Is it a limitation to use only 32 bit ID? Is it
> > > > > > possible to have a
> > > > > > 64 bit ID? So that application can use the control plane flow
> > > > > > pointer Itself as an ID. Does it make sense?
> > > > >
> > > > > I've specified a 32 bit ID for now because this is what FDIR
> > > > > supports and also what existing devices can report today AFAIK
> > > > > (i40e and
> > > mlx5).
> > > > >
> > > > > We could use 64 bit for future-proofness in a separate action like
> "ID64"
> > > > > when at least one device supports it.
> > > > >
> > > > > To PMD maintainers: please comment if you know devices that
> > > > > support tagging matching packets with more than 32 bits of
> > > > > user-provided data!
> > > > [Sugesh] I guess the flow director ID is 64 bit , The XL710 datasheet says
> so.
> > > > And in the 'rte_mbuf' structure the 64 bit FDIR-ID is shared with
> > > > rss hash. This can be a software driver limitation that expose
> > > > only 32 bit. Possibly because of cache alignment issues? Since the
> > > > hardware can support 64 bit, I feel it make sense to support 64 bit as
> well.
> > >
> > > I agree we need 64 bit support, but then we also need a solution for
> > > devices that support only 32 bit. Possible methods I can think of:
> > >
> > > - A separate "ID64" action (or a "ID32" one, perhaps with a better name).
> > >
> > > - A single ID action with an unlimited number of bytes to return with
> > >   packets (would actually be a string). PMDs can then refuse to create
> flow
> > >   rules requesting an unsupported number of bytes. Devices
> > > supporting fewer
> > >   than 32 bits are also included this way without the need for yet another
> > >   action.
> > >
> > > Thoughts?
> > [Sugesh] I feel the single ID approach is much better. But I would say
> > a fixed size ID is easy to handle at upper layers. Say PMD returns
> > 64bit ID in which MSBs are masked out, based on how many bits the
> hardware can support.
> > PMD can refuse the unsupported number of bytes when requested. So
> the
> > size of ID going to be a parameter to program the flow.
> > What do you think?
> 
> What you suggest if I am not mistaken is:
> 
>  struct rte_flow_action_id {
>      uint64_t id;
>      uint64_t mask; /* either a bit-mask or a prefix/suffix length? */  };
> 
> I think in this case a mask is more versatile than a prefix/suffix length as the
> value itself comes in an unknown endian (from PMD's POV). It also allows
> specific bits to be taken into account, like when HW only supports 32 bit, with
> some black magic the full original 64 bit value can be restored as long as the
> application only cares about at most 32 bits anywhere.
> 
> However I do not think many applications "won't care" about specific bits in a
> given value and having to provide a properly crafted mask will be a hassle,
> they will just fill it with ones and hope for the best. As a result they won't
> take advantage of this feature or will stick to 32 bits all the time, or whatever
> happens to be the least common denominator.
> 
> My previous suggestion was:
> 
>  struct rte_flow_action_id {
>      uint8_t size; /* number of bytes in id[] */
>      uint8_t id[];
>  };
> 
> It does not solve the issue if an application requests more bytes than
> supported, however as a string, there is no endianness ambiguity and these
> bytes are copied as-is to the related mbuf field as if done through memcpy()
> possibly with some padding to fill the entire 64 bit field (copied bytes thus
> starting from MSB for big-endian machines, LSB for little-endian ones). The
> value itself remains opaque to the PMD.
> 
> One issue is the flexible array approach makes static initialization more
> complicated. Maybe it is not worth the trouble since according to Andrey,
> even X710 reports at most 32 bits of user data.
> 
> So what should we do? Fixed 32 bits ID for now to keep things simple, then
> another action for 64 bits later when necessary?
[Sugesh] I agree with you. We could keep things simple by having 32 bit ID now.
I mixed up the size of ID with flexible payload size. Sorry about that.
In the future, we could add an action for 64 bit if necessary.

> 
> > > [...]
> > > > > > [Sugesh] Another concern is the cost and time of installing
> > > > > > these rules in the hardware. Can we make these APIs time
> > > > > > bound(or at least an option
> > > > > to
> > > > > > set the time limit to execute these APIs), so that Application
> > > > > > doesn’t have to wait so long when installing and deleting
> > > > > > flows
> > > > > with
> > > > > > slow hardware/NIC. What do you think? Most of the datapath
> > > > > > flow
> > > > > installations are
> > > > > > dynamic and triggered only when there is an ingress traffic.
> > > > > > Delay in flow insertion/deletion have unpredictable
> > > > > consequences.
> > > > >
> > > > > This API is (currently) aimed at the control path only, and must
> > > > > indeed be assumed to be slow. Creating million of rules may take
> > > > > quite long as it may involve syscalls and other time-consuming
> > > > > synchronization things on the PMD side.
> > > > >
> > > > > So currently there is no plan to have rules added from the data
> > > > > path with time constraints. I think it would be implemented
> > > > > through a different set of functions anyway.
> > > > >
> > > > > I do not think adding time limits is practical, even specifying
> > > > > in the API that creating a single flow rule must take less than
> > > > > a maximum number of seconds in order to be effective is too much
> > > > > of a constraint (applications that create all flows during init
> > > > > may not care after
> > > all).
> > > > >
> > > > > You should consider in any case that modifying flow rules will
> > > > > always be slower than receiving packets, there is no way around
> > > > > that. Applications have to live with it and provide a software
> > > > > fallback for incoming packets while managing flow rules.
> > > > >
> > > > > Moreover, think about what happens when you hit the maximum
> > > number
> > > > > of flow rules and cannot create any more. Applications need to
> > > > > implement some kind of fallback in their data path.
> > > > >
> > > > > Offloading flows in HW is also only useful if they live much
> > > > > longer than the time taken to create and delete them. Perhaps
> > > > > applications may choose to do so after detecting long lived
> > > > > flows such as TCP sessions.
> > > > >
> > > > > You may have one separate control thread dedicated to manage
> > > > > flows and keep your normal control thread unaffected by delays.
> > > > > Several threads can even be dedicated, one per device.
> > > > [Sugesh] I agree that the flow insertion cannot be as fast as the
> > > > packet receiving rate.  From application point of view the problem
> > > > will be when hardware flow insertion takes longer than software
> > > > flow insertion. At least application has to know the cost of
> > > > inserting/deleting a rule in hardware beforehand. Otherwise how
> > > > application can choose the right flow candidate for hardware. My
> > > > point
> > > here is application is expecting a deterministic behavior from a
> > > classifier while inserting and deleting rules.
> > >
> > > Understood, however it will be difficult to estimate, particularly
> > > if a PMD must rearrange flow rules to make room for a new one due to
> > > priority levels collision or some other HW-related reason. I mean,
> > > spent time cannot be assumed to be constant, even PMDs cannot know
> > > in advance because it also depends on the performance of the host CPU.
> > >
> > > Such applications may find it easier to measure elapsed time for the
> > > rules they create, make statistics and extrapolate from this
> > > information for future rules. I do not think the PMD can help much here.
> > [Sugesh] From an application point of view this can be an issue.
> > Even there is a security concern when we program a short lived flow.
> > Lets consider the case,
> >
> > 1) Control plane programs the hardware with Queue termination flow.
> > 2) Software dataplane programmed to treat the packets from the specific
> queue accordingly.
> > 3) Remove the flow from the hardware. (Lets consider this is a long wait
> process..).
> > Or even there is a chance that hardware take more time to report the
> > status than removing it physically . Now the packets in the queue no longer
> consider as matched/flow hit.
> > . This is due to the software dataplane update is yet to happen.
> > We must need a way to sync between software datapath and classifier
> > APIs even though they are both programmed from a different control
> thread.
> >
> > Are we saying these APIs are only meant for user defined static flows??
> 
> No, that is definitely not the intent. These are good points.
> 
> With the specified API, applications may have to adapt their logic and take
> extra precautions in order to remain on the safe side at all times.
> 
> For your above example, the application cannot assume a rule is
> added/deleted as long as the PMD has not completed the related operation,
> which means keeping the SW rule/fallback in place in the meantime. Should
> handle security concerns as long as after removing a rule, packets end up in a
> default queue entirely processed by SW. Obviously this may worsen
> response time.
> 
> The ID action can help with this. By knowing which rule a received packet is
> associated with, processing can be temporarily offloaded by another thread
> without much complexity.
[Sugesh] Setting ID for every flow may not viable especially when the size of ID
is small(just only 8 bits). I am not sure is this a valid case though.

How about a hardware flow flag in packet descriptor that set when the
packets hits any hardware rule. This way software doesn’t worry /blocked by a
hardware rule . Even though there is an additional overhead of validating this flag,
software datapath can identify the hardware processed packets easily.
This way the packets traverses the software fallback path until the rule configuration is
complete. This flag avoids setting ID action for every hardware flow that are configuring.

> 
> I think applications have to implement SW fallbacks all the time, as even
> some sort of guarantee on the flow rule processing time may not be enough
> to avoid misdirected packets and related security issues.
[Sugesh] Software fallback will be there always. However I am little bit confused on
the way software going to identify the packets that are already hardware processed . I feel we need some
notification in the packet itself, when a hardware rule hits. ID/flag/any other options?
> 
> Let's wait for applications to start using this API and then consider an extra
> set of asynchronous / real-time functions when the need arises. It should not
> impact the way rules are specified
[Sugesh] Sure. I think the rule definition may not impact with this.
.
> 
> > > > > > [Sugesh] Another query is on the synchronization part. What if
> > > > > > same rules
> > > > > are
> > > > > > handled from different threads? Is application responsible for
> > > > > > handling the
> > > > > concurrent
> > > > > > hardware programming?
> > > > >
> > > > > Like most (if not all) DPDK APIs, applications are responsible
> > > > > for managing locking issues as decribed in 4.3 (Behavior). Since
> > > > > this is a control path API and applications usually have a
> > > > > single control thread, locking should not be necessary in most cases.
> > > > >
> > > > > Regarding my above comment about using several control threads
> > > > > to manage different devices, section 4.3 says:
> > > > >
> > > > >  "There is no provision for reentrancy/multi-thread safety,
> > > > > although nothing  should prevent different devices from being
> > > > > configured at the same  time. PMDs may protect their control
> > > > > path functions
> > > accordingly."
> > > > >
> > > > > I'd like to emphasize it is not "per port" but "per device",
> > > > > since in a few cases a configurable resource is shared by several ports.
> > > > > It may be difficult for applications to determine which ports
> > > > > are shared by a given device but this falls outside the scope of this
> API.
> > > > >
> > > > > Do you think adding the guarantee that it is always safe to
> > > > > configure two different ports simultaneously without locking
> > > > > from the application side is necessary? In which case the PMD
> > > > > would be responsible for locking shared resources.
> > > > [Sugesh] This would be little bit complicated when some of ports
> > > > are not under DPDK itself(what if one port is managed by Kernel)
> > > > Or ports are tied by different application. Locking in PMD helps
> > > > when the ports are accessed by multiple DPDK application. However
> > > > what if the port itself
> > > not under DPDK?
> > >
> > > Well, either we do not care about what happens outside of the DPDK
> > > context, or PMDs must find a way to satisfy everyone. I'm not a fan
> > > of locking either but it would be nice if flow rules configuration
> > > could be attempted on different ports simultaneously without the
> > > risk of wrecking anything, so that applications do not need to care.
> > >
> > > Possible cases for a dual port device with global flow rule settings
> > > affecting both ports:
> > >
> > > 1) ports 1 & 2 are managed by DPDK: this is the easy case, a rule that
> needs
> > >    to alter a global setting necessary for an existing rule on any port is
> > >    not allowed (EEXIST). PMD must maintain a device context common to
> both
> > >    ports in order for this to work. This context is either under lock, or
> > >    the first port on which a flow rule is created owns all future flow
> > >    rules.
> > >
> > > 2) port 1 is managed by DPDK, port 2 by something else, the PMD is
> aware of
> > >    it and knows that port 2 may modify the global context: no flow rules
> can
> > >    be created from the DPDK application due to safety issues (EBUSY?).
> > >
> > > 3) port 1 is managed by DPDK, port 2 by something else, the PMD is
> aware of
> > >    it and knows that port 2 will not modify flow rules: PMD should not care,
> > >    no lock necessary.
> > >
> > > 4) port 1 is managed by DPDK, port 2 by something else and the PMD is
> not
> > >    aware of it: either flow rules cannot be created ever at all, or we say
> > >    it is user's reponsibility to make sure this does not happen.
> > >
> > > Considering that most control operations performed by DPDK affect
> > > the device regardless of other applications, I think 1) is the only
> > > case that should be defined, otherwise 4), defined as user's
> responsibility.
> 
> No more comments on this part? What do you suggest?
[Sugesh] I agree with your suggestions. I feel this is the best that can offer.

> 
> --
> Adrien Mazarguil
> 6WIND


More information about the dev mailing list