[dpdk-dev] [RFC 17.08] flow_classify: add librte_flow_classify library
Ananyev, Konstantin
konstantin.ananyev at intel.com
Wed May 17 18:10:17 CEST 2017
> > Hi Ferruh,
> > Please see my comments/questions below.
> > Thanks
> > Konstantin
> >
> >> +
> >> +/**
> >> + * @file
> >> + *
> >> + * RTE Flow Classify Library
> >> + *
> >> + * This library provides flow record information with some measured properties.
> >> + *
> >> + * Application can select variety of flow types based on various flow keys.
> >> + *
> >> + * Library only maintains flow records between rte_flow_classify_stats_get()
> >> + * calls and with a maximum limit.
> >> + *
> >> + * Provided flow record will be linked list rte_flow_classify_stat_xxx
> >> + * structure.
> >> + *
> >> + * Library is responsible from allocating and freeing memory for flow record
> >> + * table. Previous table freed with next rte_flow_classify_stats_get() call and
> >> + * all tables are freed with rte_flow_classify_type_reset() or
> >> + * rte_flow_classify_type_set(x, 0). Memory for table allocated on the fly while
> >> + * creating records.
> >> + *
> >> + * A rte_flow_classify_type_set() with a valid type will register Rx/Tx
> >> + * callbacks and start filling flow record table.
> >> + * With rte_flow_classify_stats_get(), pointer sent to caller and meanwhile
> >> + * library continues collecting records.
> >> + *
> >> + * Usage:
> >> + * - application calls rte_flow_classify_type_set() for a device
> >> + * - library creates Rx/Tx callbacks for packets and start filling flow table
> >
> > Does it necessary to use an RX callback here?
> > Can library provide an API like collect(port_id, input_mbuf[], pkt_num) instead?
> > So the user would have a choice either setup a callback or call collect() directly.
>
> This was also comment from Morten, I will update RFC to use direct API call.
>
> >
> >> + * for that type of flow (currently only one flow type supported)
> >> + * - application calls rte_flow_classify_stats_get() to get pointer to linked
> >> + * listed flow table. Library assigns this pointer to another value and keeps
> >> + * collecting flow data. In next rte_flow_classify_stats_get(), library first
> >> + * free the previous table, and pass current table to the application, keep
> >> + * collecting data.
> >
> > Ok, but that means that you can't use stats_get() for the same type
> > from 2 different threads without explicit synchronization?
>
> Correct.
> And multiple threads shouldn't be calling this API. It doesn't store
> previous flow data, so multiple threads calling this only can have piece
> of information. Do you see any use case that multiple threads can call
> this API?
One example would be when you have multiple queues per port,
managed/monitored by different cores.
BTW, how are you going to collect the stats in that way?
>
> >
> >> + * - application calls rte_flow_classify_type_reset(), library unregisters the
> >> + * callbacks and free all flow table data.
> >> + *
> >> + */
> >> +
> >> +enum rte_flow_classify_type {
> >> + RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
> >> + RTE_FLOW_CLASSIFY_TYPE_MAX,
> >> +};
> >> +
> >> +#define RTE_FLOW_CLASSIFY_TYPE_MASK = (((RTE_FLOW_CLASSIFY_TYPE_MAX - 1) << 1) - 1)
> >> +
> >> +/**
> >> + * Global configuration struct
> >> + */
> >> +struct rte_flow_classify_config {
> >> + uint32_t type; /* bitwise enum rte_flow_classify_type values */
> >> + void *flow_table_prev;
> >> + uint32_t flow_table_prev_item_count;
> >> + void *flow_table_current;
> >> + uint32_t flow_table_current_item_count;
> >> +} rte_flow_classify_config[RTE_MAX_ETHPORTS];
> >> +
> >> +#define RTE_FLOW_CLASSIFY_STAT_MAX UINT16_MAX
> >> +
> >> +/**
> >> + * Classification stats data struct
> >> + */
> >> +struct rte_flow_classify_stat_generic {
> >> + struct rte_flow_classify_stat_generic *next;
> >> + uint32_t id;
> >> + uint64_t timestamp;
> >> +
> >> + struct ether_addr src_mac;
> >> + struct ether_addr dst_mac;
> >> + uint32_t src_ipv4;
> >> + uint32_t dst_ipv4;
> >> + uint8_t l3_protocol_id;
> >> + uint16_t src_port;
> >> + uint16_t dst_port;
> >> +
> >> + uint64_t packet_count;
> >> + uint64_t packet_size; /* bytes */
> >> +};
> >
> > Ok, so if I understood things right, for generic type it will always classify all incoming packets by:
> > <src_mac, dst_mac, src_ipv4, dst_ipv4, l3_protocol_id, src_port, dst_port>
> > all by absolute values, and represent results as a linked list.
> > Is that correct, or I misunderstood your intentions here?
>
> Correct.
>
> > If so, then I see several disadvantages here:
> > 1) It is really hard to predict what kind of stats is required for that particular cases.
> > Let say some people would like to collect stat by <dst_mac,, vlan> ,
> > another by <dst_ipv4,subnet_mask>, third ones by <l4 dst_port> and so on.
> > Having just one hardcoded filter doesn't seem very felxable/usable.
> > I think you need to find a way to allow user to define what type of filter they want to apply.
>
> The flow type should be provided by applications, according their needs,
> and needs to be implemented in this library. The generic one will be the
> only one implemented in first version:
> enum rte_flow_classify_type {
> RTE_FLOW_CLASSIFY_TYPE_GENERIC = (1 << 0),
> RTE_FLOW_CLASSIFY_TYPE_MAX,
> };
>
>
> App should set the type first via the API:
> rte_flow_classify_type_set(uint8_t port_id, uint32_t type);
>
>
> And the stats for this type will be returned, because returned type can
> be different type of struct, returned as void:
> rte_flow_classify_stats_get(uint8_t port_id, void *stats);
I understand that, but it means that for every different filter user wants to use,
someone has to update the library: define a new type and write a new piece of code to handle it.
That seems not flexible and totally un-extendable from user perspective.
Even HW allows some flexibility with RX filters.
Why not allow user to specify a classification filter he/she wants for that particular case?
In a way both rte_flow and rte_acl work?
>
> > I think it was discussed already, but I still wonder why rte_flow_item can't be used for that approach?
>
>
> > 2) Even one 10G port can produce you ~14M rte_flow_classify_stat_generic entries in one second
> > (all packets have different ipv4/ports or so).
> > Accessing/retrieving items over linked list with 14M entries - doesn't sound like a good idea.
> > I'd say we need some better way to retrieve/present collected data.
>
> This is to keep flows, so I expect the numbers will be less comparing to
> the packet numbers.
That was an extreme example to show how bad the selected approach should behave.
What I am trying to say: we need a way to collect and retrieve stats in a quick and easy way.
Let say right now user invoked stats_get(port=0, type=generic).
Now, he is interested to get stats for particular dst_ip only.
The only way to get it: walk over whole list stats_get() returned and examine each entry one by one.
I think would be much better to have something like:
struct rte_flow_stats {timestamp; packet_count; packet_bytes; ..};
<fill rte_flow_item (or something else) to define desired filter>
filter_id = rte_flow_stats_register(.., &rte_flow_item);
....
struct rte_flow_stats stats;
rte_flow_stats_get(..., filter_id, &stats);
That allows user to define flows to collect stats for.
Again in that case you don't need to worry about when/where to destroy the previous
version of your stats.
Of course the open question is how to treat packets that would match more than one flow
(priority/insertion order/something else?), but I suppose we'll need to deal with that question anyway.
Konstantin
> It is possible to use fixed size arrays for this. But I think it is easy
> to make this switch later, I would like to see the performance effect
> before doing this switch. Do you think is it OK to start like this and
> give that decision during implementation?
More information about the dev
mailing list