[dpdk-dev,RFC,17.08] flow_classify: add librte_flow_classify library

Message ID 20170420185448.19162-2-ferruh.yigit@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Ferruh Yigit April 20, 2017, 6:54 p.m. UTC
  Signed-off-by: Ferruh Yigit <ferruh.yigit@intel.com>
---
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_05.rst             |   1 +
 lib/Makefile                                       |   2 +
 lib/librte_flow_classify/Makefile                  |  50 +++++
 lib/librte_flow_classify/rte_flow_classify.c       |  34 ++++
 lib/librte_flow_classify/rte_flow_classify.h       | 202 +++++++++++++++++++++
 .../rte_flow_classify_version.map                  |  10 +
 9 files changed, 306 insertions(+)
 create mode 100644 lib/librte_flow_classify/Makefile
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.c
 create mode 100644 lib/librte_flow_classify/rte_flow_classify.h
 create mode 100644 lib/librte_flow_classify/rte_flow_classify_version.map
  

Comments

John McNamara May 4, 2017, 11:35 a.m. UTC | #1
> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Thursday, April 20, 2017 7:55 PM
> To: dev@dpdk.org
> Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>
> Subject: [RFC 17.08] flow_classify: add librte_flow_classify library


CCing techboard@dpdk.org since we would like this RFC added to the agenda
for discussion at the next Tech Board meeting.

John
  
Thomas Monjalon May 16, 2017, 10:19 p.m. UTC | #2
04/05/2017 13:35, Mcnamara, John:
> 
> > -----Original Message-----
> > From: Yigit, Ferruh
> > Sent: Thursday, April 20, 2017 7:55 PM
> > To: dev@dpdk.org
> > Cc: Yigit, Ferruh <ferruh.yigit@intel.com>; Mcnamara, John
> > <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>
> > Subject: [RFC 17.08] flow_classify: add librte_flow_classify library
> 
> 
> CCing techboard@dpdk.org since we would like this RFC added to the agenda
> for discussion at the next Tech Board meeting.

Please Jan, could you add it to the agenda?
  
Ananyev, Konstantin May 17, 2017, 2:54 p.m. UTC | #3
Hi Ferruh,
Please see my comments/questions below.
Thanks
Konstantin

> +
> +/**
> + * @file
> + *
> + * RTE Flow Classify Library
> + *
> + * This library provides flow record information with some measured properties.
> + *
> + * Application can select variety of flow types based on various flow keys.
> + *
> + * Library only maintains flow records between rte_flow_classify_stats_get()
> + * calls and with a maximum limit.
> + *
> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
> + * structure.
> + *
> + * Library is responsible from allocating and freeing memory for flow record
> + * table. Previous table freed with next rte_flow_classify_stats_get() call and
> + * all tables are freed with rte_flow_classify_type_reset() or
> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
> + * creating records.
> + *
> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
> + * callbacks and start filling flow record table.
> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
> + * library continues collecting records.
> + *
> + *  Usage:
> + *  - application calls rte_flow_classify_type_set() for a device
> + *  - library creates Rx/Tx callbacks for packets and start filling flow table

Does it necessary to use an  RX callback here?
Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?
So the user would have a choice either setup a callback or call collect() directly. 

> + *    for that type of flow (currently only one flow type supported)
> + *  - application calls rte_flow_classify_stats_get() to get pointer to linked
> + *    listed flow table. Library assigns this pointer to another value and keeps
> + *    collecting flow data. In next rte_flow_classify_stats_get(), library first
> + *    free the previous table, and pass current table to the application, keep
> + *    collecting data.

Ok, but that means that you can't use stats_get() for the same type
from 2 different threads without explicit synchronization?

> + *  - application calls rte_flow_classify_type_reset(), library unregisters the
> + *    callbacks and free all flow table data.
> + *
> + */
> +
> +enum rte_flow_classify_type {
> +	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
> +	RTE_FLOW_CLASSIFY_TYPE_MAX,
> +};
> +
> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
> +
> +/**
> + * Global configuration struct
> + */
> +struct rte_flow_classify_config {
> +	uint32_t type; /* bitwise enum rte_flow_classify_type values */
> +	void *flow_table_prev;
> +	uint32_t flow_table_prev_item_count;
> +	void *flow_table_current;
> +	uint32_t flow_table_current_item_count;
> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
> +
> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
> +
> +/**
> + * Classification stats data struct
> + */
> +struct rte_flow_classify_stat_generic {
> +	struct rte_flow_classify_stat_generic *next;
> +	uint32_t id;
> +	uint64_t timestamp;
> +
> +	struct ether_addr src_mac;
> +	struct ether_addr dst_mac;
> +	uint32_t src_ipv4;
> +	uint32_t dst_ipv4;
> +	uint8_t l3_protocol_id;
> +	uint16_t src_port;
> +	uint16_t dst_port;
> +
> +	uint64_t packet_count;
> +	uint64_t packet_size; /* bytes */
> +};

Ok, so if I understood things right, for generic type it will always classify all incoming packets by:
<src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port> 
all by absolute values, and represent results as a linked list.
Is that correct, or I misunderstood your intentions here?
If so, then I see several disadvantages here:
1) It is really hard to predict what kind of stats is required for that particular cases.
 Let say some people would like to collect stat by <dst_mac,, vlan> ,
another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
Having just one hardcoded filter doesn't seem very felxable/usable.
I think you need to find a way to allow user to define what type of filter they want to apply.
I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second
(all packets have different ipv4/ports or so).
Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.
I'd say we need some better way to retrieve/present collected data.
  
Ferruh Yigit May 17, 2017, 3:37 p.m. UTC | #4
On 5/17/2017 3:54 PM, Ananyev, Konstantin wrote:
> Hi Ferruh,
> Please see my comments/questions below.
> Thanks
> Konstantin
> 
>> +
>> +/**
>> + * @file
>> + *
>> + * RTE Flow Classify Library
>> + *
>> + * This library provides flow record information with some measured properties.
>> + *
>> + * Application can select variety of flow types based on various flow keys.
>> + *
>> + * Library only maintains flow records between rte_flow_classify_stats_get()
>> + * calls and with a maximum limit.
>> + *
>> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
>> + * structure.
>> + *
>> + * Library is responsible from allocating and freeing memory for flow record
>> + * table. Previous table freed with next rte_flow_classify_stats_get() call and
>> + * all tables are freed with rte_flow_classify_type_reset() or
>> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
>> + * creating records.
>> + *
>> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
>> + * callbacks and start filling flow record table.
>> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
>> + * library continues collecting records.
>> + *
>> + *  Usage:
>> + *  - application calls rte_flow_classify_type_set() for a device
>> + *  - library creates Rx/Tx callbacks for packets and start filling flow table
> 
> Does it necessary to use an  RX callback here?
> Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?
> So the user would have a choice either setup a callback or call collect() directly. 

This was also comment from Morten, I will update RFC to use direct API call.

> 
>> + *    for that type of flow (currently only one flow type supported)
>> + *  - application calls rte_flow_classify_stats_get() to get pointer to linked
>> + *    listed flow table. Library assigns this pointer to another value and keeps
>> + *    collecting flow data. In next rte_flow_classify_stats_get(), library first
>> + *    free the previous table, and pass current table to the application, keep
>> + *    collecting data.
> 
> Ok, but that means that you can't use stats_get() for the same type
> from 2 different threads without explicit synchronization?

Correct.
And multiple threads shouldn't be calling this API. It doesn't store
previous flow data, so multiple threads calling this only can have piece
of information. Do you see any use case that multiple threads can call
this API?

> 
>> + *  - application calls rte_flow_classify_type_reset(), library unregisters the
>> + *    callbacks and free all flow table data.
>> + *
>> + */
>> +
>> +enum rte_flow_classify_type {
>> +	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>> +	RTE_FLOW_CLASSIFY_TYPE_MAX,
>> +};
>> +
>> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
>> +
>> +/**
>> + * Global configuration struct
>> + */
>> +struct rte_flow_classify_config {
>> +	uint32_t type; /* bitwise enum rte_flow_classify_type values */
>> +	void *flow_table_prev;
>> +	uint32_t flow_table_prev_item_count;
>> +	void *flow_table_current;
>> +	uint32_t flow_table_current_item_count;
>> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
>> +
>> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
>> +
>> +/**
>> + * Classification stats data struct
>> + */
>> +struct rte_flow_classify_stat_generic {
>> +	struct rte_flow_classify_stat_generic *next;
>> +	uint32_t id;
>> +	uint64_t timestamp;
>> +
>> +	struct ether_addr src_mac;
>> +	struct ether_addr dst_mac;
>> +	uint32_t src_ipv4;
>> +	uint32_t dst_ipv4;
>> +	uint8_t l3_protocol_id;
>> +	uint16_t src_port;
>> +	uint16_t dst_port;
>> +
>> +	uint64_t packet_count;
>> +	uint64_t packet_size; /* bytes */
>> +};
> 
> Ok, so if I understood things right, for generic type it will always classify all incoming packets by:
> <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port> 
> all by absolute values, and represent results as a linked list.
> Is that correct, or I misunderstood your intentions here?

Correct.

> If so, then I see several disadvantages here:
> 1) It is really hard to predict what kind of stats is required for that particular cases.
>  Let say some people would like to collect stat by <dst_mac,, vlan> ,
> another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
> Having just one hardcoded filter doesn't seem very felxable/usable.
> I think you need to find a way to allow user to define what type of filter they want to apply.

The flow type should be provided by applications, according their needs,
and needs to be implemented in this library. The generic one will be the
only one implemented in first version:
enum rte_flow_classify_type {
	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
	RTE_FLOW_CLASSIFY_TYPE_MAX,
};


App should set the type first via the API:
rte_flow_classify_type_set(uint8_t port_id, uint32_t type);


And the stats for this type will be returned, because returned type can
be different type of struct, returned as void:
rte_flow_classify_stats_get(uint8_t port_id, void *stats);

> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?


> 2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second
> (all packets have different ipv4/ports or so).
> Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.
> I'd say we need some better way to retrieve/present collected data.

This is to keep flows, so I expect the numbers will be less comparing to
the packet numbers.
It is possible to use fixed size arrays for this. But I think it is easy
to make this switch later, I would like to see the performance effect
before doing this switch. Do you think is it OK to start like this and
give that decision during implementation?
  
Ferruh Yigit May 17, 2017, 4:02 p.m. UTC | #5
On 5/17/2017 3:54 PM, Ananyev, Konstantin wrote:
> Hi Ferruh,
> Please see my comments/questions below.

Thanks for review.

> Thanks
> Konstantin

<...>

> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?

Missed this one:

Gaëtan also had same comment, copy-paste from other mail related to my
concerns using rte_flow:

"
rte_flow is to create flow rules in PMD level, but what this library
aims to collect flow information, independent from if underlying PMD
implemented rte_flow or not.

So issues with using rte_flow for this use case:
1- It may not be implemented for all PMDs (including virtual ones).
2- It may conflict with other rte_flow rules created by user.
3- It may not gather all information required. (I mean some actions
here, count like ones are easy but rte_flow may not be so flexible to
extract different metrics from flows)
"
  
Ananyev, Konstantin May 17, 2017, 4:10 p.m. UTC | #6
> > Hi Ferruh,

> > Please see my comments/questions below.

> > Thanks

> > Konstantin

> >

> >> +

> >> +/**

> >> + * @file

> >> + *

> >> + * RTE Flow Classify Library

> >> + *

> >> + * This library provides flow record information with some measured properties.

> >> + *

> >> + * Application can select variety of flow types based on various flow keys.

> >> + *

> >> + * Library only maintains flow records between rte_flow_classify_stats_get()

> >> + * calls and with a maximum limit.

> >> + *

> >> + * Provided flow record will be linked list rte_flow_classify_stat_xxx

> >> + * structure.

> >> + *

> >> + * Library is responsible from allocating and freeing memory for flow record

> >> + * table. Previous table freed with next rte_flow_classify_stats_get() call and

> >> + * all tables are freed with rte_flow_classify_type_reset() or

> >> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while

> >> + * creating records.

> >> + *

> >> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx

> >> + * callbacks and start filling flow record table.

> >> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile

> >> + * library continues collecting records.

> >> + *

> >> + *  Usage:

> >> + *  - application calls rte_flow_classify_type_set() for a device

> >> + *  - library creates Rx/Tx callbacks for packets and start filling flow table

> >

> > Does it necessary to use an  RX callback here?

> > Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?

> > So the user would have a choice either setup a callback or call collect() directly.

> 

> This was also comment from Morten, I will update RFC to use direct API call.

> 

> >

> >> + *    for that type of flow (currently only one flow type supported)

> >> + *  - application calls rte_flow_classify_stats_get() to get pointer to linked

> >> + *    listed flow table. Library assigns this pointer to another value and keeps

> >> + *    collecting flow data. In next rte_flow_classify_stats_get(), library first

> >> + *    free the previous table, and pass current table to the application, keep

> >> + *    collecting data.

> >

> > Ok, but that means that you can't use stats_get() for the same type

> > from 2 different threads without explicit synchronization?

> 

> Correct.

> And multiple threads shouldn't be calling this API. It doesn't store

> previous flow data, so multiple threads calling this only can have piece

> of information. Do you see any use case that multiple threads can call

> this API?


One example would be when you have multiple queues per port,
managed/monitored by different cores.
BTW, how are you going to collect the stats in that way?

> 

> >

> >> + *  - application calls rte_flow_classify_type_reset(), library unregisters the

> >> + *    callbacks and free all flow table data.

> >> + *

> >> + */

> >> +

> >> +enum rte_flow_classify_type {

> >> +	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),

> >> +	RTE_FLOW_CLASSIFY_TYPE_MAX,

> >> +};

> >> +

> >> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)

> >> +

> >> +/**

> >> + * Global configuration struct

> >> + */

> >> +struct rte_flow_classify_config {

> >> +	uint32_t type; /* bitwise enum rte_flow_classify_type values */

> >> +	void *flow_table_prev;

> >> +	uint32_t flow_table_prev_item_count;

> >> +	void *flow_table_current;

> >> +	uint32_t flow_table_current_item_count;

> >> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];

> >> +

> >> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX

> >> +

> >> +/**

> >> + * Classification stats data struct

> >> + */

> >> +struct rte_flow_classify_stat_generic {

> >> +	struct rte_flow_classify_stat_generic *next;

> >> +	uint32_t id;

> >> +	uint64_t timestamp;

> >> +

> >> +	struct ether_addr src_mac;

> >> +	struct ether_addr dst_mac;

> >> +	uint32_t src_ipv4;

> >> +	uint32_t dst_ipv4;

> >> +	uint8_t l3_protocol_id;

> >> +	uint16_t src_port;

> >> +	uint16_t dst_port;

> >> +

> >> +	uint64_t packet_count;

> >> +	uint64_t packet_size; /* bytes */

> >> +};

> >

> > Ok, so if I understood things right, for generic type it will always classify all incoming packets by:

> > <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port>

> > all by absolute values, and represent results as a linked list.

> > Is that correct, or I misunderstood your intentions here?

> 

> Correct.

> 

> > If so, then I see several disadvantages here:

> > 1) It is really hard to predict what kind of stats is required for that particular cases.

> >  Let say some people would like to collect stat by <dst_mac,, vlan> ,

> > another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.

> > Having just one hardcoded filter doesn't seem very felxable/usable.

> > I think you need to find a way to allow user to define what type of filter they want to apply.

> 

> The flow type should be provided by applications, according their needs,

> and needs to be implemented in this library. The generic one will be the

> only one implemented in first version:

> enum rte_flow_classify_type {

> 	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),

> 	RTE_FLOW_CLASSIFY_TYPE_MAX,

> };

> 

> 

> App should set the type first via the API:

> rte_flow_classify_type_set(uint8_t port_id, uint32_t type);

> 

> 

> And the stats for this type will be returned, because returned type can

> be different type of struct, returned as void:

> rte_flow_classify_stats_get(uint8_t port_id, void *stats);


I understand that, but it means that for every different filter user wants to use,
someone has to update the library: define a new type and write a new piece of code to handle it.
That seems not flexible and totally un-extendable from user perspective.
Even  HW allows some flexibility with RX filters.
Why not allow user to specify a classification filter  he/she wants for that particular case?
In a way both rte_flow and rte_acl work?

> 

> > I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?

> 

> 

> > 2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second

> > (all packets have different ipv4/ports or so).

> > Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.

> > I'd say we need some better way to retrieve/present collected data.

> 

> This is to keep flows, so I expect the numbers will be less comparing to

> the packet numbers.


That was an  extreme example to show how bad the selected approach should behave.
What I am trying to say: we need a way to collect and retrieve stats in a quick and easy way.
Let say right now user invoked stats_get(port=0, type=generic).
Now, he is interested to get stats for particular dst_ip only.
The only way to get it: walk over whole list stats_get() returned and examine each entry one by one.

I think would be much better to have something like:

struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..};

<fill rte_flow_item (or something else) to define desired filter>

filter_id = rte_flow_stats_register(.., &rte_flow_item);
....
struct rte_flow_stats stats;
rte_flow_stats_get(..., filter_id, &stats);

That allows user to define flows to collect stats for.
Again in that case you don't need to worry about when/where to destroy the previous
version of your stats.
Of course the open question is how to treat packets that would match more than one flow
(priority/insertion order/something else?), but I suppose we'll need to deal with that question anyway.
 
Konstantin

> It is possible to use fixed size arrays for this. But I think it is easy

> to make this switch later, I would like to see the performance effect

> before doing this switch. Do you think is it OK to start like this and

> give that decision during implementation?
  
Ananyev, Konstantin May 17, 2017, 4:18 p.m. UTC | #7
> 

> On 5/17/2017 3:54 PM, Ananyev, Konstantin wrote:

> > Hi Ferruh,

> > Please see my comments/questions below.

> 

> Thanks for review.

> 

> > Thanks

> > Konstantin

> 

> <...>

> 

> > I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?

> 

> Missed this one:

> 

> Gaëtan also had same comment, copy-paste from other mail related to my

> concerns using rte_flow:

> 

> "

> rte_flow is to create flow rules in PMD level, but what this library

> aims to collect flow information, independent from if underlying PMD

> implemented rte_flow or not.

> 

> So issues with using rte_flow for this use case:

> 1- It may not be implemented for all PMDs (including virtual ones).

> 2- It may conflict with other rte_flow rules created by user.

> 3- It may not gather all information required. (I mean some actions

> here, count like ones are easy but rte_flow may not be so flexible to

> extract different metrics from flows)

> "


I am not talking about actions - I am talking about using rte_flow_item
(or similar approach) to allow user to define what flow he likes to have.
Then the flow_classify library would use that information to generate
the internal structures(/code) it will use to classify the incoming packets. 
I understand that we might not support all define rte_flow_items straightway,
we could start with some limited set and add new ones on a iterative basis.
Basically what I am talking about - SW implementation for rte_flow.
Konstantin
  
Gaëtan Rivet May 17, 2017, 4:38 p.m. UTC | #8
Hi Ferruh,

On Wed, May 17, 2017 at 05:02:50PM +0100, Ferruh Yigit wrote:
>On 5/17/2017 3:54 PM, Ananyev, Konstantin wrote:
>> Hi Ferruh,
>> Please see my comments/questions below.
>
>Thanks for review.
>
>> Thanks
>> Konstantin
>
><...>
>
>> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
>
>Missed this one:
>
>Gaëtan also had same comment, copy-paste from other mail related to my
>concerns using rte_flow:
>
>"
>rte_flow is to create flow rules in PMD level, but what this library
>aims to collect flow information, independent from if underlying PMD
>implemented rte_flow or not.
>
>So issues with using rte_flow for this use case:
>1- It may not be implemented for all PMDs (including virtual ones).
>2- It may conflict with other rte_flow rules created by user.
>3- It may not gather all information required. (I mean some actions
>here, count like ones are easy but rte_flow may not be so flexible to
>extract different metrics from flows)
>"

There are two separate elements to using rte_flow in this context I think.

One is the use of the existing actions, and as you say, this makes the
support of this library dependent on the rte_flow support in PMDs.

The other is the expression of flows through a shared syntax. Using
flags to propose presets can be simpler, but will probably not be flexible
enough. rte_flow_items are a first-class citizen in DPDK and are
already a data type that can express flows with flexibility. As
mentioned, they are however missing a few elements to fully cover IPFIX
meters, but nothing that cannot be added I think.

So I was probably not clear enough, but I was thinking about
supporting rte_flow_items in rte_flow_classify as the possible key
applications would use to configure their measurements. This should not
require rte_flow supports from the PMDs they would be using, only
rte_flow_item parsing from the rte_flow_classify library.

Otherwise, DPDK will probably end up with two competing flow
representations. Additionally, it may be interesting for applications
to bind these data directly to rte_flow actions once the
classification has been analyzed.
  
Ferruh Yigit May 18, 2017, 11:33 a.m. UTC | #9
On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> Hi Ferruh,

Hi Gaetan,

> 
> On Wed, May 17, 2017 at 05:02:50PM +0100, Ferruh Yigit wrote:
>> On 5/17/2017 3:54 PM, Ananyev, Konstantin wrote:
>>> Hi Ferruh,
>>> Please see my comments/questions below.
>>
>> Thanks for review.
>>
>>> Thanks
>>> Konstantin
>>
>> <...>
>>
>>> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
>>
>> Missed this one:
>>
>> Gaëtan also had same comment, copy-paste from other mail related to my
>> concerns using rte_flow:
>>
>> "
>> rte_flow is to create flow rules in PMD level, but what this library
>> aims to collect flow information, independent from if underlying PMD
>> implemented rte_flow or not.
>>
>> So issues with using rte_flow for this use case:
>> 1- It may not be implemented for all PMDs (including virtual ones).
>> 2- It may conflict with other rte_flow rules created by user.
>> 3- It may not gather all information required. (I mean some actions
>> here, count like ones are easy but rte_flow may not be so flexible to
>> extract different metrics from flows)
>> "
> 
> There are two separate elements to using rte_flow in this context I think.
> 
> One is the use of the existing actions, and as you say, this makes the
> support of this library dependent on the rte_flow support in PMDs.
> 
> The other is the expression of flows through a shared syntax. Using
> flags to propose presets can be simpler, but will probably not be flexible
> enough. rte_flow_items are a first-class citizen in DPDK and are
> already a data type that can express flows with flexibility. As
> mentioned, they are however missing a few elements to fully cover IPFIX
> meters, but nothing that cannot be added I think.
> 
> So I was probably not clear enough, but I was thinking about
> supporting rte_flow_items in rte_flow_classify as the possible key
> applications would use to configure their measurements. This should not
> require rte_flow supports from the PMDs they would be using, only
> rte_flow_item parsing from the rte_flow_classify library.
> 
> Otherwise, DPDK will probably end up with two competing flow
> representations. Additionally, it may be interesting for applications
> to bind these data directly to rte_flow actions once the
> classification has been analyzed.

Thanks for clarification, I see now what you and Konstantin is proposing.

And yes it makes sense to use rte_flow to define flows in the library, I
will update the RFC.

Thanks,
ferruh
  
Ferruh Yigit May 18, 2017, 12:12 p.m. UTC | #10
On 5/17/2017 5:10 PM, Ananyev, Konstantin wrote:
>>> Hi Ferruh,
>>> Please see my comments/questions below.
>>> Thanks
>>> Konstantin
>>>
>>>> +
>>>> +/**
>>>> + * @file
>>>> + *
>>>> + * RTE Flow Classify Library
>>>> + *
>>>> + * This library provides flow record information with some measured properties.
>>>> + *
>>>> + * Application can select variety of flow types based on various flow keys.
>>>> + *
>>>> + * Library only maintains flow records between rte_flow_classify_stats_get()
>>>> + * calls and with a maximum limit.
>>>> + *
>>>> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
>>>> + * structure.
>>>> + *
>>>> + * Library is responsible from allocating and freeing memory for flow record
>>>> + * table. Previous table freed with next rte_flow_classify_stats_get() call and
>>>> + * all tables are freed with rte_flow_classify_type_reset() or
>>>> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
>>>> + * creating records.
>>>> + *
>>>> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
>>>> + * callbacks and start filling flow record table.
>>>> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
>>>> + * library continues collecting records.
>>>> + *
>>>> + *  Usage:
>>>> + *  - application calls rte_flow_classify_type_set() for a device
>>>> + *  - library creates Rx/Tx callbacks for packets and start filling flow table
>>>
>>> Does it necessary to use an  RX callback here?
>>> Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?
>>> So the user would have a choice either setup a callback or call collect() directly.
>>
>> This was also comment from Morten, I will update RFC to use direct API call.
>>
>>>
>>>> + *    for that type of flow (currently only one flow type supported)
>>>> + *  - application calls rte_flow_classify_stats_get() to get pointer to linked
>>>> + *    listed flow table. Library assigns this pointer to another value and keeps
>>>> + *    collecting flow data. In next rte_flow_classify_stats_get(), library first
>>>> + *    free the previous table, and pass current table to the application, keep
>>>> + *    collecting data.
>>>
>>> Ok, but that means that you can't use stats_get() for the same type
>>> from 2 different threads without explicit synchronization?
>>
>> Correct.
>> And multiple threads shouldn't be calling this API. It doesn't store
>> previous flow data, so multiple threads calling this only can have piece
>> of information. Do you see any use case that multiple threads can call
>> this API?
> 
> One example would be when you have multiple queues per port,
> managed/monitored by different cores.
> BTW, how are you going to collect the stats in that way?
> 
>>
>>>
>>>> + *  - application calls rte_flow_classify_type_reset(), library unregisters the
>>>> + *    callbacks and free all flow table data.
>>>> + *
>>>> + */
>>>> +
>>>> +enum rte_flow_classify_type {
>>>> +	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>>>> +	RTE_FLOW_CLASSIFY_TYPE_MAX,
>>>> +};
>>>> +
>>>> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
>>>> +
>>>> +/**
>>>> + * Global configuration struct
>>>> + */
>>>> +struct rte_flow_classify_config {
>>>> +	uint32_t type; /* bitwise enum rte_flow_classify_type values */
>>>> +	void *flow_table_prev;
>>>> +	uint32_t flow_table_prev_item_count;
>>>> +	void *flow_table_current;
>>>> +	uint32_t flow_table_current_item_count;
>>>> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
>>>> +
>>>> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
>>>> +
>>>> +/**
>>>> + * Classification stats data struct
>>>> + */
>>>> +struct rte_flow_classify_stat_generic {
>>>> +	struct rte_flow_classify_stat_generic *next;
>>>> +	uint32_t id;
>>>> +	uint64_t timestamp;
>>>> +
>>>> +	struct ether_addr src_mac;
>>>> +	struct ether_addr dst_mac;
>>>> +	uint32_t src_ipv4;
>>>> +	uint32_t dst_ipv4;
>>>> +	uint8_t l3_protocol_id;
>>>> +	uint16_t src_port;
>>>> +	uint16_t dst_port;
>>>> +
>>>> +	uint64_t packet_count;
>>>> +	uint64_t packet_size; /* bytes */
>>>> +};
>>>
>>> Ok, so if I understood things right, for generic type it will always classify all incoming packets by:
>>> <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port>
>>> all by absolute values, and represent results as a linked list.
>>> Is that correct, or I misunderstood your intentions here?
>>
>> Correct.
>>
>>> If so, then I see several disadvantages here:
>>> 1) It is really hard to predict what kind of stats is required for that particular cases.
>>>  Let say some people would like to collect stat by <dst_mac,, vlan> ,
>>> another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
>>> Having just one hardcoded filter doesn't seem very felxable/usable.
>>> I think you need to find a way to allow user to define what type of filter they want to apply.
>>
>> The flow type should be provided by applications, according their needs,
>> and needs to be implemented in this library. The generic one will be the
>> only one implemented in first version:
>> enum rte_flow_classify_type {
>> 	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
>> 	RTE_FLOW_CLASSIFY_TYPE_MAX,
>> };
>>
>>
>> App should set the type first via the API:
>> rte_flow_classify_type_set(uint8_t port_id, uint32_t type);
>>
>>
>> And the stats for this type will be returned, because returned type can
>> be different type of struct, returned as void:
>> rte_flow_classify_stats_get(uint8_t port_id, void *stats);
> 
> I understand that, but it means that for every different filter user wants to use,
> someone has to update the library: define a new type and write a new piece of code to handle it.
> That seems not flexible and totally un-extendable from user perspective.
> Even  HW allows some flexibility with RX filters.
> Why not allow user to specify a classification filter  he/she wants for that particular case?
> In a way both rte_flow and rte_acl work?
> 
>>
>>> I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
>>
>>
>>> 2) Even  one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second
>>> (all packets have different ipv4/ports or so).
>>> Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.
>>> I'd say we need some better way to retrieve/present collected data.
>>
>> This is to keep flows, so I expect the numbers will be less comparing to
>> the packet numbers.
> 
> That was an  extreme example to show how bad the selected approach should behave.
> What I am trying to say: we need a way to collect and retrieve stats in a quick and easy way.
> Let say right now user invoked stats_get(port=0, type=generic).
> Now, he is interested to get stats for particular dst_ip only.
> The only way to get it: walk over whole list stats_get() returned and examine each entry one by one.
> 
> I think would be much better to have something like:
> 
> struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..};
> 
> <fill rte_flow_item (or something else) to define desired filter>
> 
> filter_id = rte_flow_stats_register(.., &rte_flow_item);
> ....
> struct rte_flow_stats stats;
> rte_flow_stats_get(..., filter_id, &stats);
> 
> That allows user to define flows to collect stats for.
> Again in that case you don't need to worry about when/where to destroy the previous
> version of your stats.

Except from using rte_flow, above suggest instead of:
- set key/filter
- poll collect()
- when ever app wants call stats_get()

using:
- poll stats_get(key/filter);

specially after switched from callbacks to polling, this makes sense
because application already will have to do to continuous calls to this
library. Merging set filter/collect/stats_get into same function saves
library from storing/deleting stats until app asks for them, as you
mentioned above.

So, I will update RFC according.

> Of course the open question is how to treat packets that would match more than one flow
> (priority/insertion order/something else?), but I suppose we'll need to deal with that question anyway.
>  
> Konstantin
> 
>> It is possible to use fixed size arrays for this. But I think it is easy
>> to make this switch later, I would like to see the performance effect
>> before doing this switch. Do you think is it OK to start like this and
>> give that decision during implementation?
  
Thomas Monjalon May 18, 2017, 8:31 p.m. UTC | #11
18/05/2017 13:33, Ferruh Yigit:
> On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> > The other is the expression of flows through a shared syntax. Using
> > flags to propose presets can be simpler, but will probably not be flexible
> > enough. rte_flow_items are a first-class citizen in DPDK and are
> > already a data type that can express flows with flexibility. As
> > mentioned, they are however missing a few elements to fully cover IPFIX
> > meters, but nothing that cannot be added I think.
> > 
> > So I was probably not clear enough, but I was thinking about
> > supporting rte_flow_items in rte_flow_classify as the possible key
> > applications would use to configure their measurements. This should not
> > require rte_flow supports from the PMDs they would be using, only
> > rte_flow_item parsing from the rte_flow_classify library.
> > 
> > Otherwise, DPDK will probably end up with two competing flow
> > representations. Additionally, it may be interesting for applications
> > to bind these data directly to rte_flow actions once the
> > classification has been analyzed.
> 
> Thanks for clarification, I see now what you and Konstantin is proposing.
> 
> And yes it makes sense to use rte_flow to define flows in the library, I
> will update the RFC.

Does it mean that rte_flow.h must be moved from ethdev to this
new flow library? Or will it depend of ethdev?
  
Ananyev, Konstantin May 19, 2017, 8:57 a.m. UTC | #12
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Thursday, May 18, 2017 9:32 PM
> To: Yigit, Ferruh <ferruh.yigit@intel.com>
> Cc: dev@dpdk.org; Gaëtan Rivet <gaetan.rivet@6wind.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Mcnamara, John
> <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>; adrien.mazarguil@6wind.com
> Subject: Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
> 
> 18/05/2017 13:33, Ferruh Yigit:
> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> > > The other is the expression of flows through a shared syntax. Using
> > > flags to propose presets can be simpler, but will probably not be flexible
> > > enough. rte_flow_items are a first-class citizen in DPDK and are
> > > already a data type that can express flows with flexibility. As
> > > mentioned, they are however missing a few elements to fully cover IPFIX
> > > meters, but nothing that cannot be added I think.
> > >
> > > So I was probably not clear enough, but I was thinking about
> > > supporting rte_flow_items in rte_flow_classify as the possible key
> > > applications would use to configure their measurements. This should not
> > > require rte_flow supports from the PMDs they would be using, only
> > > rte_flow_item parsing from the rte_flow_classify library.
> > >
> > > Otherwise, DPDK will probably end up with two competing flow
> > > representations. Additionally, it may be interesting for applications
> > > to bind these data directly to rte_flow actions once the
> > > classification has been analyzed.
> >
> > Thanks for clarification, I see now what you and Konstantin is proposing.
> >
> > And yes it makes sense to use rte_flow to define flows in the library, I
> > will update the RFC.
> 
> Does it mean that rte_flow.h must be moved from ethdev to this
> new flow library? Or will it depend of ethdev?

Just a thought: probably move rte_flow.h to  lib/librte_net?
Konstantin
  
Gaëtan Rivet May 19, 2017, 9:11 a.m. UTC | #13
On Fri, May 19, 2017 at 08:57:01AM +0000, Ananyev, Konstantin wrote:
>
>
>> -----Original Message-----
>> From: Thomas Monjalon [mailto:thomas@monjalon.net]
>> Sent: Thursday, May 18, 2017 9:32 PM
>> To: Yigit, Ferruh <ferruh.yigit@intel.com>
>> Cc: dev@dpdk.org; Gaëtan Rivet <gaetan.rivet@6wind.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Mcnamara, John
>> <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>; adrien.mazarguil@6wind.com
>> Subject: Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
>>
>> 18/05/2017 13:33, Ferruh Yigit:
>> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
>> > > The other is the expression of flows through a shared syntax. Using
>> > > flags to propose presets can be simpler, but will probably not be flexible
>> > > enough. rte_flow_items are a first-class citizen in DPDK and are
>> > > already a data type that can express flows with flexibility. As
>> > > mentioned, they are however missing a few elements to fully cover IPFIX
>> > > meters, but nothing that cannot be added I think.
>> > >
>> > > So I was probably not clear enough, but I was thinking about
>> > > supporting rte_flow_items in rte_flow_classify as the possible key
>> > > applications would use to configure their measurements. This should not
>> > > require rte_flow supports from the PMDs they would be using, only
>> > > rte_flow_item parsing from the rte_flow_classify library.
>> > >
>> > > Otherwise, DPDK will probably end up with two competing flow
>> > > representations. Additionally, it may be interesting for applications
>> > > to bind these data directly to rte_flow actions once the
>> > > classification has been analyzed.
>> >
>> > Thanks for clarification, I see now what you and Konstantin is proposing.
>> >
>> > And yes it makes sense to use rte_flow to define flows in the library, I
>> > will update the RFC.
>>
>> Does it mean that rte_flow.h must be moved from ethdev to this
>> new flow library? Or will it depend of ethdev?

Even outside of lib/librte_ether, wouldn't rte_flow stay dependent on
rte_ether?

>
>Just a thought: probably move rte_flow.h to  lib/librte_net?
>Konstantin

If we are to move rte_flow, why not lib/librte_flow?
  
Ananyev, Konstantin May 19, 2017, 9:40 a.m. UTC | #14
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Friday, May 19, 2017 10:11 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Thomas Monjalon <thomas@monjalon.net>; Yigit, Ferruh <ferruh.yigit@intel.com>; dev@dpdk.org; Mcnamara, John
> <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>; adrien.mazarguil@6wind.com
> Subject: Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
> 
> On Fri, May 19, 2017 at 08:57:01AM +0000, Ananyev, Konstantin wrote:
> >
> >
> >> -----Original Message-----
> >> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >> Sent: Thursday, May 18, 2017 9:32 PM
> >> To: Yigit, Ferruh <ferruh.yigit@intel.com>
> >> Cc: dev@dpdk.org; Gaëtan Rivet <gaetan.rivet@6wind.com>; Ananyev, Konstantin <konstantin.ananyev@intel.com>; Mcnamara, John
> >> <john.mcnamara@intel.com>; Tahhan, Maryam <maryam.tahhan@intel.com>; adrien.mazarguil@6wind.com
> >> Subject: Re: [dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
> >>
> >> 18/05/2017 13:33, Ferruh Yigit:
> >> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> >> > > The other is the expression of flows through a shared syntax. Using
> >> > > flags to propose presets can be simpler, but will probably not be flexible
> >> > > enough. rte_flow_items are a first-class citizen in DPDK and are
> >> > > already a data type that can express flows with flexibility. As
> >> > > mentioned, they are however missing a few elements to fully cover IPFIX
> >> > > meters, but nothing that cannot be added I think.
> >> > >
> >> > > So I was probably not clear enough, but I was thinking about
> >> > > supporting rte_flow_items in rte_flow_classify as the possible key
> >> > > applications would use to configure their measurements. This should not
> >> > > require rte_flow supports from the PMDs they would be using, only
> >> > > rte_flow_item parsing from the rte_flow_classify library.
> >> > >
> >> > > Otherwise, DPDK will probably end up with two competing flow
> >> > > representations. Additionally, it may be interesting for applications
> >> > > to bind these data directly to rte_flow actions once the
> >> > > classification has been analyzed.
> >> >
> >> > Thanks for clarification, I see now what you and Konstantin is proposing.
> >> >
> >> > And yes it makes sense to use rte_flow to define flows in the library, I
> >> > will update the RFC.
> >>
> >> Does it mean that rte_flow.h must be moved from ethdev to this
> >> new flow library? Or will it depend of ethdev?
> 
> Even outside of lib/librte_ether, wouldn't rte_flow stay dependent on
> rte_ether?
> 
> >
> >Just a thought: probably move rte_flow.h to  lib/librte_net?
> >Konstantin
> 
> If we are to move rte_flow, why not lib/librte_flow?

To avoid new dependency for lib/lirte_ethdev?

> 
> --
> Gaëtan Rivet
> 6WIND
  
Thomas Monjalon May 19, 2017, 10:11 a.m. UTC | #15
19/05/2017 11:11, Gaëtan Rivet:
> On Fri, May 19, 2017 at 08:57:01AM +0000, Ananyev, Konstantin wrote:
> > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> >> 18/05/2017 13:33, Ferruh Yigit:
> >> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> >> > > The other is the expression of flows through a shared syntax. Using
> >> > > flags to propose presets can be simpler, but will probably not be flexible
> >> > > enough. rte_flow_items are a first-class citizen in DPDK and are
> >> > > already a data type that can express flows with flexibility. As
> >> > > mentioned, they are however missing a few elements to fully cover IPFIX
> >> > > meters, but nothing that cannot be added I think.
> >> > >
> >> > > So I was probably not clear enough, but I was thinking about
> >> > > supporting rte_flow_items in rte_flow_classify as the possible key
> >> > > applications would use to configure their measurements. This should not
> >> > > require rte_flow supports from the PMDs they would be using, only
> >> > > rte_flow_item parsing from the rte_flow_classify library.
> >> > >
> >> > > Otherwise, DPDK will probably end up with two competing flow
> >> > > representations. Additionally, it may be interesting for applications
> >> > > to bind these data directly to rte_flow actions once the
> >> > > classification has been analyzed.
> >> >
> >> > Thanks for clarification, I see now what you and Konstantin is proposing.
> >> >
> >> > And yes it makes sense to use rte_flow to define flows in the library, I
> >> > will update the RFC.
> >>
> >> Does it mean that rte_flow.h must be moved from ethdev to this
> >> new flow library? Or will it depend of ethdev?
> 
> Even outside of lib/librte_ether, wouldn't rte_flow stay dependent on
> rte_ether?
> 
> >
> >Just a thought: probably move rte_flow.h to  lib/librte_net?
> >Konstantin
> 
> If we are to move rte_flow, why not lib/librte_flow?

There are 3 different things:
1/ rte_flow.h for flow description
2/ rte_flow API in ethdev for HW offloading
3/ SW flow table (this new lib)

2 and 3 will depends on 1.
I think moving rte_flow.h in librte_net is a good idea.
  
Adrien Mazarguil May 22, 2017, 9:13 a.m. UTC | #16
On Fri, May 19, 2017 at 12:11:53PM +0200, Thomas Monjalon wrote:
> 19/05/2017 11:11, Gaëtan Rivet:
> > On Fri, May 19, 2017 at 08:57:01AM +0000, Ananyev, Konstantin wrote:
> > > From: Thomas Monjalon [mailto:thomas@monjalon.net]
> > >> 18/05/2017 13:33, Ferruh Yigit:
> > >> > On 5/17/2017 5:38 PM, Gaëtan Rivet wrote:
> > >> > > The other is the expression of flows through a shared syntax. Using
> > >> > > flags to propose presets can be simpler, but will probably not be flexible
> > >> > > enough. rte_flow_items are a first-class citizen in DPDK and are
> > >> > > already a data type that can express flows with flexibility. As
> > >> > > mentioned, they are however missing a few elements to fully cover IPFIX
> > >> > > meters, but nothing that cannot be added I think.
> > >> > >
> > >> > > So I was probably not clear enough, but I was thinking about
> > >> > > supporting rte_flow_items in rte_flow_classify as the possible key
> > >> > > applications would use to configure their measurements. This should not
> > >> > > require rte_flow supports from the PMDs they would be using, only
> > >> > > rte_flow_item parsing from the rte_flow_classify library.
> > >> > >
> > >> > > Otherwise, DPDK will probably end up with two competing flow
> > >> > > representations. Additionally, it may be interesting for applications
> > >> > > to bind these data directly to rte_flow actions once the
> > >> > > classification has been analyzed.
> > >> >
> > >> > Thanks for clarification, I see now what you and Konstantin is proposing.
> > >> >
> > >> > And yes it makes sense to use rte_flow to define flows in the library, I
> > >> > will update the RFC.
> > >>
> > >> Does it mean that rte_flow.h must be moved from ethdev to this
> > >> new flow library? Or will it depend of ethdev?
> > 
> > Even outside of lib/librte_ether, wouldn't rte_flow stay dependent on
> > rte_ether?
> > 
> > >
> > >Just a thought: probably move rte_flow.h to  lib/librte_net?
> > >Konstantin
> > 
> > If we are to move rte_flow, why not lib/librte_flow?
> 
> There are 3 different things:
> 1/ rte_flow.h for flow description
> 2/ rte_flow API in ethdev for HW offloading
> 3/ SW flow table (this new lib)
> 
> 2 and 3 will depends on 1.
> I think moving rte_flow.h in librte_net is a good idea.

If I had to choose, it would be librte_flow over librte_net because rte_flow
is not necessarily about matching protocol headers (e.g. you can can match
meta properties like physical ports or the fact traffic comes from a
specific VF).

However, I am not sure a separate library is actually necessary, I think the
requirements can be addressed by rte_flow (in its current directory)
directly.

One assumption is that the COUNT action as currently described by rte_flow
satisfies the counters requirements from this proposal, new actions could be
added later to return other flow-related properties. In short there is no
need to return info from individual packets, only from the flows themselves.

If the above is true, then as pointed earlier by Gaetan this proposal can be
summarized as a software implementation for rte_flow_query() and related
actions.

To determine if a packet is part of a given flow in software and update the
associated flow counters, it must be parsed and compared against patterns of
all existing rte_flow rules until one of them matches. For accurate results,
this must be done on all TX/RX traffic.

RFCv1 does so by automatically hooking burst functions while RFCv2 does so
by expecting the application to call rte_flow_classify_stats_get().

One issue I would like to raise before going on is the CPU cost of doing all
this. Parsing all incoming/outgoing traffic without missing any, looking up
related flows and updating counters in software seems like a performance
killer. Applications will likely request assistance from HW to minimize this
cost as much as possible (e.g. using the rte_flow MARK action if COUNT is
not supported directly).

Assuming a flow is identified by HW, parsing it once again in software with
the proposed API to update the related stats seems counterproductive; a
hybrid HW/SW solution with the SW part automatically used as a fallback when
hardware is not capable enough would be better and easier to use.

The topic of software fallbacks for rte_flow was brought up some time ago
(can't find the exact thread). The intent was to expose a common set of
features between PMDs so that applications do not have to implement their
own fallbacks. They would request it on a rule basis by setting a kind of
"sw_fallback" bit in flow rule attributes (struct rte_flow_attr). This bit
would be checked by PMDs and/or the rte_flow_* wrapper functions after the
underlying PMD refuses to validate/create a rule.

Basically I think rte_flow_classify could be entirely implemented in
rte_flow through this "sw_fallback" approach in order for applications to
automatically benefit from HW acceleration when PMDs can handle it. It then
makes sense for the underlying implementation to use RX/TX hooks if
necessary (as in RFCv1). These hooks would be removed by destroying the
related flow rule(s).

This would also open the door to a full SW implementation for rte_flow given
that once the packet parser is there, most actions can be implemented rather
easily (well, that's still a lot of work.)

Bottom line is I'm not against a separate SW implementation not tied to a
port_id for rte_flow_classify, but I would like to hear the community's
thoughts about the above first.
  

Patch

diff --git a/config/common_base b/config/common_base
index 412ec3f..c05a411 100644
--- a/config/common_base
+++ b/config/common_base
@@ -634,6 +634,11 @@  CONFIG_RTE_LIBRTE_IP_FRAG_TBL_STAT=n
 CONFIG_RTE_LIBRTE_METER=y
 
 #
+# Compile librte_classify
+#
+CONFIG_RTE_LIBRTE_FLOW_CLASSIFY=y
+
+#
 # Compile librte_sched
 #
 CONFIG_RTE_LIBRTE_SCHED=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index a26846a..7f0be03 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -97,6 +97,7 @@  There are many libraries, so their headers may be grouped by topics:
   [LPM IPv4 route]     (@ref rte_lpm.h),
   [LPM IPv6 route]     (@ref rte_lpm6.h),
   [ACL]                (@ref rte_acl.h),
+  [flow_classify]      (@ref rte_flow_classify.h),
   [EFD]                (@ref rte_efd.h)
 
 - **QoS**:
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 97fb551..9eec10c 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -45,6 +45,7 @@  INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_efd \
                           lib/librte_ether \
                           lib/librte_eventdev \
+                          lib/librte_flow_classify \
                           lib/librte_hash \
                           lib/librte_ip_frag \
                           lib/librte_jobstats \
diff --git a/doc/guides/rel_notes/release_17_05.rst b/doc/guides/rel_notes/release_17_05.rst
index 25e7144..89520e4 100644
--- a/doc/guides/rel_notes/release_17_05.rst
+++ b/doc/guides/rel_notes/release_17_05.rst
@@ -507,6 +507,7 @@  The libraries prepended with a plus sign were incremented in this version.
      librte_distributor.so.1
    + librte_eal.so.4
      librte_ethdev.so.6
+     librte_flow_classify.so.1
      librte_hash.so.2
      librte_ip_frag.so.1
      librte_jobstats.so.1
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..e63cd61 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -80,6 +80,8 @@  DIRS-$(CONFIG_RTE_LIBRTE_POWER) += librte_power
 DEPDIRS-librte_power := librte_eal
 DIRS-$(CONFIG_RTE_LIBRTE_METER) += librte_meter
 DEPDIRS-librte_meter := librte_eal
+DIRS-$(CONFIG_RTE_LIBRTE_FLOW_CLASSIFY) += librte_flow_classify
+DEPDIRS-librte_flow_classify := librte_eal librte_ether librte_net
 DIRS-$(CONFIG_RTE_LIBRTE_SCHED) += librte_sched
 DEPDIRS-librte_sched := librte_eal librte_mempool librte_mbuf librte_net
 DEPDIRS-librte_sched += librte_timer
diff --git a/lib/librte_flow_classify/Makefile b/lib/librte_flow_classify/Makefile
new file mode 100644
index 0000000..c57e9a3
--- /dev/null
+++ b/lib/librte_flow_classify/Makefile
@@ -0,0 +1,50 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2017 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_flow_classify.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_flow_classify_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_FLOW_CLASSIFY) := rte_flow_classify.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_FLOW_CLASSIFY)-include := rte_flow_classify.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_flow_classify/rte_flow_classify.c b/lib/librte_flow_classify/rte_flow_classify.c
new file mode 100644
index 0000000..e6f724e
--- /dev/null
+++ b/lib/librte_flow_classify/rte_flow_classify.c
@@ -0,0 +1,34 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include "rte_flow_classify.h"
diff --git a/lib/librte_flow_classify/rte_flow_classify.h b/lib/librte_flow_classify/rte_flow_classify.h
new file mode 100644
index 0000000..a52394f
--- /dev/null
+++ b/lib/librte_flow_classify/rte_flow_classify.h
@@ -0,0 +1,202 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_FLOW_CLASSIFY_H_
+#define _RTE_FLOW_CLASSIFY_H_
+
+/**
+ * @file
+ *
+ * RTE Flow Classify Library
+ *
+ * This library provides flow record information with some measured properties.
+ *
+ * Application can select variety of flow types based on various flow keys.
+ *
+ * Library only maintains flow records between rte_flow_classify_stats_get()
+ * calls and with a maximum limit.
+ *
+ * Provided flow record will be linked list rte_flow_classify_stat_xxx
+ * structure.
+ *
+ * Library is responsible from allocating and freeing memory for flow record
+ * table. Previous table freed with next rte_flow_classify_stats_get() call and
+ * all tables are freed with rte_flow_classify_type_reset() or
+ * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
+ * creating records.
+ *
+ * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
+ * callbacks and start filling flow record table.
+ * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
+ * library continues collecting records.
+ *
+ *  Usage:
+ *  - application calls rte_flow_classify_type_set() for a device
+ *  - library creates Rx/Tx callbacks for packets and start filling flow table
+ *    for that type of flow (currently only one flow type supported)
+ *  - application calls rte_flow_classify_stats_get() to get pointer to linked
+ *    listed flow table. Library assigns this pointer to another value and keeps
+ *    collecting flow data. In next rte_flow_classify_stats_get(), library first
+ *    free the previous table, and pass current table to the application, keep
+ *    collecting data.
+ *  - application calls rte_flow_classify_type_reset(), library unregisters the
+ *    callbacks and free all flow table data.
+ *
+ */
+
+#include <rte_ethdev.h>
+#include <rte_ether.h>
+#include <rte_ip.h>
+#include <rte_tcp.h>
+#include <rte_udp.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * Types of classification supported.
+ */
+enum rte_flow_classify_type {
+	RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
+	RTE_FLOW_CLASSIFY_TYPE_MAX,
+};
+
+#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
+
+/**
+ * Global configuration struct
+ */
+struct rte_flow_classify_config {
+	uint32_t type; /* bitwise enum rte_flow_classify_type values */
+	void *flow_table_prev;
+	uint32_t flow_table_prev_item_count;
+	void *flow_table_current;
+	uint32_t flow_table_current_item_count;
+} rte_flow_classify_config[RTE_MAX_ETHPORTS];
+
+#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
+
+/**
+ * Classification stats data struct
+ */
+struct rte_flow_classify_stat_generic {
+	struct rte_flow_classify_stat_generic *next;
+	uint32_t id;
+	uint64_t timestamp;
+
+	struct ether_addr src_mac;
+	struct ether_addr dst_mac;
+	uint32_t src_ipv4;
+	uint32_t dst_ipv4;
+	uint8_t l3_protocol_id;
+	uint16_t src_port;
+	uint16_t dst_port;
+
+	uint64_t packet_count;
+	uint64_t packet_size; /* bytes */
+};
+
+/**
+* Get flow types to flow_classify
+*
+* @param port_id
+*   Ethernet device port id to get classification.
+* @param type
+*   bitmap of enum rte_flow_classify_type values enabled for classification
+* @return
+*   - (0) if successful.
+*   - (-EINVAL) on failure.
+*/
+int
+rte_flow_classify_type_get(uint8_t port_id, uint32_t *type);
+
+/**
+* Set flow types to flow_classify
+*
+* If the type list is zero, no classification done.
+*
+* @param port_id
+*   Ethernet device port_id to set classification.
+* @param type
+*   bitmap of enum rte_flow_classify_type values to enable classification
+* @return
+*   - (0) if successful.
+*   - (-EINVAL) on failure.
+*/
+int
+rte_flow_classify_type_set(uint8_t port_id, uint32_t type);
+
+/**
+* Disable flow classification for device
+*
+* @param port_id
+*   Ethernet device port id to reset classification.
+* @return
+*   - (0) if successful.
+*   - (-EINVAL) on failure.
+*/
+int
+rte_flow_classify_type_reset(uint8_t port_id);
+
+/**
+* Get classified results
+*
+* @param port_id
+*  Ethernet device port id to get flow stats
+* @param stats
+*  void * to linked list flow data
+* @return
+*   - (0) if successful.
+*   - (-EINVAL) on failure.
+*/
+int
+rte_flow_classify_stats_get(uint8_t port_id, void *stats);
+
+/**
+* Reset classified results
+*
+* @param port_id
+*  Ethernet device port id to reset flow stats
+* @return
+*   - (0) if successful.
+*   - (-EINVAL) on failure.
+*/
+int
+rte_flow_classify_stats_reset(uint8_t port_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_FLOW_CLASSIFY_H_ */
diff --git a/lib/librte_flow_classify/rte_flow_classify_version.map b/lib/librte_flow_classify/rte_flow_classify_version.map
new file mode 100644
index 0000000..0f396ae
--- /dev/null
+++ b/lib/librte_flow_classify/rte_flow_classify_version.map
@@ -0,0 +1,10 @@ 
+DPDK_17.08 {
+	global:
+
+	rte_flow_classify_stats_get;
+	rte_flow_classify_type_get;
+	rte_flow_classify_type_reset;
+	rte_flow_classify_type_set;
+
+	local: *;
+};