[dpdk-dev] [RFC v2] Flow classification library

Adrien Mazarguil adrien.mazarguil at 6wind.com
Tue May 23 15:30:57 CEST 2017


On Tue, May 23, 2017 at 01:58:44PM +0100, Ferruh Yigit wrote:
> On 5/23/2017 1:26 PM, Adrien Mazarguil wrote:
> > On Mon, May 22, 2017 at 02:53:28PM +0100, Ferruh Yigit wrote:
> >> On 5/19/2017 5:30 PM, Iremonger, Bernard wrote:
> >>> Hi Ferruh,
> >>>
> >>>> -----Original Message-----
> >>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ferruh Yigit
> >>>> Sent: Thursday, May 18, 2017 7:12 PM
> >>>> To: dev at dpdk.org
> >>>> Cc: Yigit, Ferruh <ferruh.yigit at intel.com>; Mcnamara, John
> >>>> <john.mcnamara at intel.com>; Tahhan, Maryam
> >>>> <maryam.tahhan at intel.com>
> >>>> Subject: [dpdk-dev] [RFC v2] Flow classification library
> >>>>
> >>>> DPDK works with packets, but some network administration tools works
> >>>> based on flow information.
> >>>>
> >>>> This library is suggested to provide helper API to convert packet based
> >>>> information to the flow records.
> >>>>
> >>>> Basically the library consist of a single API that gets packets, flow definition
> >>>> and action as parameter and provides flow stats based on action. Application
> >>>> should call the API for all received packets.
> >>>>
> >>>> Library header file has more comments on how library works and provided
> >>>> APIs.
> >>>>
> >>>> Packets to flow conversion will cause performance drop, that is why
> >>>> conversion done on demand by an API call provided by this library.
> >>>>
> >>>> Initial implementation in mind is to provide support for IPFIX metering
> >>>> process but library planned to be as generic as possible. And flow information
> >>>> provided by this library is missing to implement full IPFIX features, but this is
> >>>> planned to be initial step.
> >>>>
> >>>> Flows are defined using rte_flow, also measurements (actions) are provided
> >>>> by rte_flow. To support more IPFIX measurements, the implementation may
> >>>> require extending rte_flow addition to implementing this library.
> >>>
> >>> Do you know what extensions are needed to the rte_flow code?
> >>
> >> The extension may be required on two fields:
> >> 1- Defining the flow
> >> 2- Available actions
> >>
> >> For defining the flow, an update may not be required, specially at first
> >> version of the library.
> >>
> >> But for action, there may be some updates.
> >>
> >> IPFIX RFC defines Metering process as [1], (in [2]). This library should
> >> provide helper APIs to metering process.
> >>
> >> Currently only action can be used in rte_flow is COUNT, more actions can
> >> be added to help "packet header capturing, timestamping, sampling,
> >> classifying" tasks of the metering process.
> >>
> >> The exact list depends on the what will be implemented in this release.
> >>
> >>
> >> [1]
> >>    Metering Process
> >>
> >>       The Metering Process generates Flow Records.  Inputs to the
> >>       process are packet headers, characteristics, and Packet Treatment
> >>       observed at one or more Observation Points.
> >>
> >>       The Metering Process consists of a set of functions that includes
> >>       packet header capturing, timestamping, sampling, classifying, and
> >>       maintaining Flow Records.
> >>
> >>       The maintenance of Flow Records may include creating new records,
> >>       updating existing ones, computing Flow statistics, deriving
> >>       further Flow properties, detecting Flow expiration, passing Flow
> >>       Records to the Exporting Process, and deleting Flow Records.
> >>
> >> [2]
> >> https://tools.ietf.org/html/rfc7011
> > 
> > Since I did not take this into account in my previous answer [3], I now
> > understand several of these requirements cannot be met by hardware (at least
> > in the near future). Therefore I think it makes sense to leave IPFIX and
> > more generally the maintenance of software data associated with flows to
> > separate libraries, instead of adding many new rte_flow actions that can
> > only be implemented in software.
> > 
> > A hybrid solution as described in [3] is still needed regardless to offload
> > flow recognition, so that only flows of interest are reprocessed in software
> > to compute IPFIX and other data.
> > 
> > You suggested at one point to take flow rules in addition to mbufs as input
> > to handle that. Well, that's actually a nice approach.
> > 
> > For this to work, rte_flow_classify would have to use opaque handles like
> > rte_flow, provided back by the application when attempting to classify
> > traffic. If the handle is not known (e.g. MARK is unsupported), a separate
> > API function could take a mbuf as input and spit the related
> > rte_flow_classify object if any.
> > 
> > To be clear:
> > 
> > 1. Create classifier object:
> > 
> >    classify = rte_flow_classify_create([some rte_flow pattern],
> >       [classify-specific actions list, associated resources]);
> > 
> > 2. Create some flow rule with a MARK action to identify it uniquely. This
> >    step might fail and flow can be NULL, that's not an issue:
> > 
> >    flow = rte_flow_create([the same pattern], MARK with id 42)
> > 
> > 3. For each received packet:
> > 
> >    /*
> >     * Attempt HW and fall back on SW for flow identification in order to
> >     * update classifier flow-related data.
> >     */
> >    if (flow) {
> >       if (mbuf->ol_flags & PKT_RX_FDIR_ID && mbuf->hash.fdir.hi == 42)
> >          tmp_classify = classify;
> >    } else {
> >       tmp_classify = rte_flow_classify_lookup([classifier candidates], mbuf);
> >    }
> >    if (tmp_classify)
> >          rte_flow_classify_update(tmp_classify, mbuf);
> > 
> > 4. At some point, retrieve computed data from the classifier object itself:
> > 
> >    rte_flow_classify_stats_get(classify, [output buffer(s)])
> > 
> > On the RX path, the MARK action can be enough to implement the above. When
> > not supported, it could also be emulated through the "sw_fallback" bit
> > described in [3] however if the above approach is fine, no need for that.
> > 
> > It's a bit more complicated to benefit from rte_flow on the TX path since no
> > MARK data can be returned. There is currently no other solution than doing
> > it all in software anyway.
> > 
> > [3] http://dpdk.org/ml/archives/dev/2017-May/066177.html
> > 
> 
> Using MARK action is good idea indeed, but I believe the software
> version needs to be implemented anyway, so the MARK action can be next
> step optimization.

Does it mean you are you fine with the separation of RFCv2's
rte_flow_classify_stats_get() into a sort of rte_flow_classify_lookup() and
rte_flow_classify_update() as described?

There is otherwise no change needed to benefit from MARK optimization
beside providing the ability to not perform the lookup step when not
necessary.

> And another thing is, having 3. and 4. as separate steps means
> rte_flow_classify to maintain the flow_records in the library for an
> unknown time.

I think it should be an application's problem. The library does not
necessarily allocate anything since the related temporary storage can be
provided as arguments to rte_flow_classify_create().

Expiration could be handled by making rte_flow_classify_lookup() return some
error code when the flow has expired. Application would then be free to call
rte_flow_classify_destroy() directly or retrieve stats one last time through
rte_flow_classify_stats_get() before that. 

> Application needs to explicitly do the call for 3., why not return the
> output buffers immediately so the application do the required
> association of data to the ports.

3. is typically done in the data path and must be as fast as possible,
right? The way I see it, in many cases if the flow is already identified
(classifier object provided) the mbuf does not even need to be parsed for
associated counters to be incremented.

Also because rte_flow_classify_stats_get() could perform extra computation
to convert internal data to an application-friendly format.

> Mainly, I would like to keep same API logic in RFC v2, but agree to add
> two new APIs:
> rte_flow_classify_create()
> rte_flow_classify_destroy()
> 
> Create will return and opaque rte_flow object (as done in rte_flow),
> meanwhile it can create some masks in rte_flow object to help parsing mbuf.

OK, I suggest naming that object "struct rte_flow_classify" to avoid
confusion though.

> And rte_flow_classify_stats_get() will get rte_flow object and stats as
> parameter. Also instead of getting an array of filters, to simplify the
> logic, it will get a rte_flow object at a time.

Perfect.

> I am planning to send an updated RFC (v3) as above, if there is no major
> objection, we can continue to discuss on that RFC.

Except for my above comments, looks like we're converging.

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list