[dpdk-dev,v3,1/3] lib: add Generic Receive Offload API framework

Message ID 1493021398-115955-2-git-send-email-jiayu.hu@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Hu, Jiayu April 24, 2017, 8:09 a.m. UTC
  Generic Receive Offload (GRO) is a widely used SW-based offloading
technique to reduce per-packet processing overhead. It gains performance
by reassembling small packets into large ones. This patchset is to
support GRO in DPDK. To support GRO, this patch implements a GRO API
framework.

DPDK GRO is designed as a device ability, in which the reassembly
procedure is transparent to applications. The unit to enable/disable GRO
ability is port. To use DPDK GRO ability, applications just need to
explicitly invoke GRO enabling/disabling functions for specific ports.
If enable/disable GRO for a port, all of its queues will/won't perform
GRO, once receiving packets.

DPDK GRO is implemented as a new library, which includes two parts. One
is external functions provided to applications to use GRO ability; the
other is reassembly functions to reassemble packets for various
protocols.

For applications, DPDK GRO provides three external functions to
enable/disable GRO:
- rte_gro_init: initialize GRO environment;
- rte_gro_enable: enable GRO for a given port;
- rte_gro_disable: disable GRO for a given port.
Before using GRO, applications should explicitly call rte_gro_init to
initizalize GRO environment. After that, applications can call
rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
specific ports.

DPDK GRO has a generic reassembly function, which processes all inputted
packets in a burst-mode. If a port wants to enable GRO, this generic
reassembly function will be registered as a RX callback for all queues of
this port; if the port wants to disable GRO, all the callbacks of its
queues will be removed. Therefore, GRO procedure is performed in ethdev
layer.

In DPDK GRO, each specific protocol type has a corresponding reassembly
function, which tries to merge packets of its type. For example, TCP/IPv4
reassembly function is in charge of proccessing TCP/IPv4 packets. The
generic reassembly function calls these specific reassembly functions
according to packet types, and packets with unsupported protocols types
are not processed.

Signed-off-by: Jiayu Hu <jiayu.hu@intel.com>
---
 config/common_base              |   5 +
 lib/Makefile                    |   1 +
 lib/librte_gro/Makefile         |  50 +++++++++
 lib/librte_gro/rte_gro.c        | 219 ++++++++++++++++++++++++++++++++++++++++
 lib/librte_gro/rte_gro.h        |  29 ++++++
 lib/librte_gro/rte_gro_common.h |  75 ++++++++++++++
 mk/rte.app.mk                   |   1 +
 7 files changed, 380 insertions(+)
 create mode 100644 lib/librte_gro/Makefile
 create mode 100644 lib/librte_gro/rte_gro.c
 create mode 100644 lib/librte_gro/rte_gro.h
 create mode 100644 lib/librte_gro/rte_gro_common.h
  

Comments

Ananyev, Konstantin May 22, 2017, 9:19 a.m. UTC | #1
Hi Jiayu,
My comments/questions below.
Konstantin

> 
> For applications, DPDK GRO provides three external functions to
> enable/disable GRO:
> - rte_gro_init: initialize GRO environment;
> - rte_gro_enable: enable GRO for a given port;
> - rte_gro_disable: disable GRO for a given port.
> Before using GRO, applications should explicitly call rte_gro_init to
> initizalize GRO environment. After that, applications can call
> rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> specific ports.

I think this is too restrictive and wouldn't meet various user's needs.
User might want to:
- enable/disable GRO for particular RX queue
- or even setup different GRO types for different RX queues,
   i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
- For various reasons, user might prefer not to use RX callbacks for various reasons,
  But invoke gro() manually at somepoint in his code.
- Many users would like to control size (number of flows/items per flow),
  max allowed packet size, max timeout, etc., for different GRO tables.
- User would need a way to flush all or only timeout packets from particular GRO tables.

So I think that API needs to extended to become be much more fine-grained.
Something like that:

struct rte_gro_tbl_param {
   int32_t socket_id;
   size_t max_flows;
   size_t max_items_per_flow;
   size_t max_pkt_size;
   uint64_t packet_timeout_cycles;
   <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
  <probably type specific params>
  ...
};

struct rte_gro_tbl;
strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);

void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);

/*
 * process packets, might store some packets inside the GRO table,
 * returns number of filled entries in pkt[]
 */
uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);

/*
  * retirieves up to num timeouted packets from the table.
  */
uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);

And if you like to keep them as helper functions:

int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
void rte_gro_port_disable(uint8_t port);

Though from my perspective, it is out of scope of that library,
and I'd leave it to the upper layer to decide which callbacks and
in what order have to be installed on particular port. 

....

> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> new file mode 100644
> index 0000000..996b382
> --- /dev/null
> +++ b/lib/librte_gro/rte_gro.c
> @@ -0,0 +1,219 @@
> +#include <rte_ethdev.h>
> +#include <rte_mbuf.h>
> +#include <rte_hash.h>
> +#include <stdint.h>
> +#include <rte_malloc.h>
> +
> +#include "rte_gro.h"
> +#include "rte_gro_common.h"
> +
> +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};

Why not to initialize these arrays properly at building time (here)?

> +
> +struct rte_gro_status *gro_status;
> +
> +/**
> + * Internal function. It creates one hashing table for all
> + * DPDK-supported GRO types, and all of them are stored in an object
> + * of struct rte_gro_tbl.
> + *
> + * @param name
> + *  Name for GRO lookup table
> + * @param nb_entries
> + *  Element number of each hashing table
> + * @param socket_id
> + *  socket id
> + * @param gro_tbl
> + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> + *  inside rte_gro_tbl_setup.
> + * @return
> + *  If create successfully, return a positive value; if not, return
> + *  a negative value.
> + */
> +static int
> +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> +{
> +	gro_tbl_create_fn create_tbl_fn;
> +	const uint32_t len = strlen(name) + 10;
> +	char tbl_name[len];
> +	int i;
> +
> +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> +		sprintf(tbl_name, "%s_%u", name, i);
> +		create_tbl_fn = tbl_create_functions[i];
> +		if (create_tbl_fn && (create_tbl_fn(name,
> +						nb_entries,
> +						socket_id,
> +						&(gro_tbl->
> +							lkp_tbls[i].hash_tbl))
> +					< 0)) {

Shouldn't you destroy already created tables here?

> +			return -1;
> +		}
> +		gro_tbl->lkp_tbls[i].gro_type = i;
> +	}
> +	return 1;
> +}
> +
> +/**
> + * Internal function. It frees all the hashing tables stored in
> + * the given struct rte_gro_tbl object.
> + */
> +static void
> +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> +{
> +	int i;
> +
> +	if (gro_tbl == NULL)
> +		return;
> +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> +	}
> +}
> +
> +/**
> + * Internal function. It performs all supported GRO types on inputted
> + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> + * packets. Packets of unsupported GRO types won't be processed. For
> + * ethernet devices, which want to support GRO, this function is used to
> + * registered as RX callback for all queues.
> + *
> + * @param pkts
> + *  Packets to reassemble.
> + * @param nb_pkts
> + *  The number of packets to reassemble.
> + * @param gro_tbl
> + *  pointer points to an object of struct rte_gro_tbl, which has been
> + *  initialized by rte_gro_tbl_setup.
> + * @return
> + *  Packet number after GRO. If reassemble successfully, the value is
> + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> + *  parameters are invalid, return 0.
> + */
> +static uint16_t
> +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> +		uint16_t queue __rte_unused,
> +		struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		uint16_t max_pkts __rte_unused,
> +		void *gro_tbl)
> +{
> +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> +		printf("invalid parameters for GRO.\n");
> +		return 0;
> +	}
> +	uint16_t nb_after_gro = nb_pkts;

Here and everywhere - please move variable definitions to the top of the block.

> +
> +	return nb_after_gro;
> +}
> +
> +void
> +rte_gro_init(void)
> +{
> +	uint8_t nb_port, i;
> +	uint16_t nb_queue;
> +	struct rte_eth_dev_info dev_info;
> +
> +	/* if init already, return immediately */
> +	if (gro_status) {
> +		printf("repeatly init GRO environment\n");
> +		return;
> +	}
> +
> +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> +			NULL,
> +			sizeof(struct rte_gro_status),
> +			0);
> +
> +	nb_port = rte_eth_dev_count();

Number of ports and number of configured queues per port can change dynamically.
In fact, I don't think we need that function and global gro_status at all.
To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
Searching for particular cb_func and cb_arg values. 

> +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> +			NULL,
> +			nb_port * sizeof(struct gro_port_status),
> +			0);
> +	gro_status->nb_port = nb_port;
> +
> +	for (i = 0; i < nb_port; i++) {
> +		rte_eth_dev_info_get(i, &dev_info);
> +		nb_queue = dev_info.nb_rx_queues;
> +		gro_status->ports[i].gro_tbls =
> +			(struct rte_gro_tbl **)rte_zmalloc(
> +					NULL,
> +					nb_queue * sizeof(struct rte_gro_tbl *),
> +					0);
> +		gro_status->ports[i].gro_cbs =
> +			(struct rte_eth_rxtx_callback **)
> +			rte_zmalloc(
> +					NULL,
> +					nb_queue *
> +					sizeof(struct rte_eth_rxtx_callback *),
> +					0);
> +	}
> +}
> +
> +}
....
> diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
....

> +
> +typedef int (*gro_tbl_create_fn)(
> +		char *name,
> +		uint32_t nb_entries,
> +		uint16_t socket_id,
> +		struct rte_hash **hash_tbl);
> +

If you have create_fn, then you'll probably need a destroy_fn too.
Second thing why always assume that particular GRO implementation would
always use rte_hash to store it's data?
Probably better hide that inside something neutral, like 'struct gro_data;' or so. 

> +typedef int32_t (*gro_reassemble_fn)(
> +		struct rte_hash *hash_tbl,
> +		struct gro_item_list *item_list);
> +#endif
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index b5215c0..8956821 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> 
>  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> --
> 2.7.4
  
Hu, Jiayu May 23, 2017, 10:31 a.m. UTC | #2
Hi Konstantin,

Thanks for your comments. My replies/questions are below.

BRs,
Jiayu

On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> My comments/questions below.
> Konstantin
> 
> > 
> > For applications, DPDK GRO provides three external functions to
> > enable/disable GRO:
> > - rte_gro_init: initialize GRO environment;
> > - rte_gro_enable: enable GRO for a given port;
> > - rte_gro_disable: disable GRO for a given port.
> > Before using GRO, applications should explicitly call rte_gro_init to
> > initizalize GRO environment. After that, applications can call
> > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > specific ports.
> 
> I think this is too restrictive and wouldn't meet various user's needs.
> User might want to:
> - enable/disable GRO for particular RX queue
> - or even setup different GRO types for different RX queues,
>    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.

The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
to applications. But are there any scenarios that different queues of a port may
require different GRO control (i.e. GRO types and enable/disable GRO)?

> - For various reasons, user might prefer not to use RX callbacks for various reasons,
>   But invoke gro() manually at somepoint in his code.

An application-used GRO library can enable more flexibility to applications. Besides,
when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
rte_eth_rx_burst returns actually received packet number or GROed packet number. And
the same issue happens in GSO, and even more seriously. This is because applications
, like VPP, always rely on the return value of rte_eth_tx_burst to decide further
operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
segmentation/reassembling libraries to applications.

> - Many users would like to control size (number of flows/items per flow),
>   max allowed packet size, max timeout, etc., for different GRO tables.
> - User would need a way to flush all or only timeout packets from particular GRO tables.
> 
> So I think that API needs to extended to become be much more fine-grained.
> Something like that:
> 
> struct rte_gro_tbl_param {
>    int32_t socket_id;
>    size_t max_flows;
>    size_t max_items_per_flow;
>    size_t max_pkt_size;
>    uint64_t packet_timeout_cycles;
>    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
>   <probably type specific params>
>   ...
> };
> 
> struct rte_gro_tbl;
> strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> 
> void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);

Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?

> 
> /*
>  * process packets, might store some packets inside the GRO table,
>  * returns number of filled entries in pkt[]
>  */
> uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> 
> /*
>   * retirieves up to num timeouted packets from the table.
>   */
> uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);

Currently, we implement GRO as RX callback, whose processing logic is simple:
receive burst packets -> perform GRO -> return to application. GRO stops after
finishing processing received packets. If we provide rte_gro_tbl_timeout, when
and who will call it?

> 
> And if you like to keep them as helper functions:
> 
> int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> void rte_gro_port_disable(uint8_t port);
> 
> Though from my perspective, it is out of scope of that library,
> and I'd leave it to the upper layer to decide which callbacks and
> in what order have to be installed on particular port. 

In my opinion, GRO library includes two parts. One is in charge of reassembling
packets, the other is in charge of implementing GRO as RX callback. And
rte_gro_port_enable/disable belongs to the second part. You mean we should let
applications to register RX callback, and GRO library just provides reassembling
related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?

> 
> ....
> 
> > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > new file mode 100644
> > index 0000000..996b382
> > --- /dev/null
> > +++ b/lib/librte_gro/rte_gro.c
> > @@ -0,0 +1,219 @@
> > +#include <rte_ethdev.h>
> > +#include <rte_mbuf.h>
> > +#include <rte_hash.h>
> > +#include <stdint.h>
> > +#include <rte_malloc.h>
> > +
> > +#include "rte_gro.h"
> > +#include "rte_gro_common.h"
> > +
> > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> 
> Why not to initialize these arrays properly at building time (here)?

Yes, I should declare them as static variables.

> 
> > +
> > +struct rte_gro_status *gro_status;
> > +
> > +/**
> > + * Internal function. It creates one hashing table for all
> > + * DPDK-supported GRO types, and all of them are stored in an object
> > + * of struct rte_gro_tbl.
> > + *
> > + * @param name
> > + *  Name for GRO lookup table
> > + * @param nb_entries
> > + *  Element number of each hashing table
> > + * @param socket_id
> > + *  socket id
> > + * @param gro_tbl
> > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > + *  inside rte_gro_tbl_setup.
> > + * @return
> > + *  If create successfully, return a positive value; if not, return
> > + *  a negative value.
> > + */
> > +static int
> > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > +{
> > +	gro_tbl_create_fn create_tbl_fn;
> > +	const uint32_t len = strlen(name) + 10;
> > +	char tbl_name[len];
> > +	int i;
> > +
> > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > +		sprintf(tbl_name, "%s_%u", name, i);
> > +		create_tbl_fn = tbl_create_functions[i];
> > +		if (create_tbl_fn && (create_tbl_fn(name,
> > +						nb_entries,
> > +						socket_id,
> > +						&(gro_tbl->
> > +							lkp_tbls[i].hash_tbl))
> > +					< 0)) {
> 
> Shouldn't you destroy already created tables here?

Yes, I should add codes to destory created tables before creating new ones.

> 
> > +			return -1;
> > +		}
> > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > +	}
> > +	return 1;
> > +}
> > +
> > +/**
> > + * Internal function. It frees all the hashing tables stored in
> > + * the given struct rte_gro_tbl object.
> > + */
> > +static void
> > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > +{
> > +	int i;
> > +
> > +	if (gro_tbl == NULL)
> > +		return;
> > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > +	}
> > +}
> > +
> > +/**
> > + * Internal function. It performs all supported GRO types on inputted
> > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > + * packets. Packets of unsupported GRO types won't be processed. For
> > + * ethernet devices, which want to support GRO, this function is used to
> > + * registered as RX callback for all queues.
> > + *
> > + * @param pkts
> > + *  Packets to reassemble.
> > + * @param nb_pkts
> > + *  The number of packets to reassemble.
> > + * @param gro_tbl
> > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > + *  initialized by rte_gro_tbl_setup.
> > + * @return
> > + *  Packet number after GRO. If reassemble successfully, the value is
> > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > + *  parameters are invalid, return 0.
> > + */
> > +static uint16_t
> > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > +		uint16_t queue __rte_unused,
> > +		struct rte_mbuf **pkts,
> > +		uint16_t nb_pkts,
> > +		uint16_t max_pkts __rte_unused,
> > +		void *gro_tbl)
> > +{
> > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > +		printf("invalid parameters for GRO.\n");
> > +		return 0;
> > +	}
> > +	uint16_t nb_after_gro = nb_pkts;
> 
> Here and everywhere - please move variable definitions to the top of the block.

Thanks. I will modify them in next version.
 
> 
> > +
> > +	return nb_after_gro;
> > +}
> > +
> > +void
> > +rte_gro_init(void)
> > +{
> > +	uint8_t nb_port, i;
> > +	uint16_t nb_queue;
> > +	struct rte_eth_dev_info dev_info;
> > +
> > +	/* if init already, return immediately */
> > +	if (gro_status) {
> > +		printf("repeatly init GRO environment\n");
> > +		return;
> > +	}
> > +
> > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > +			NULL,
> > +			sizeof(struct rte_gro_status),
> > +			0);
> > +
> > +	nb_port = rte_eth_dev_count();
> 
> Number of ports and number of configured queues per port can change dynamically.
> In fact, I don't think we need that function and global gro_status at all.
> To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> Searching for particular cb_func and cb_arg values. 

Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
we can't remove RX callback by searching for particular cb_func inside GRO library, since
these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
You mean we need to add this API?

> 
> > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > +			NULL,
> > +			nb_port * sizeof(struct gro_port_status),
> > +			0);
> > +	gro_status->nb_port = nb_port;
> > +
> > +	for (i = 0; i < nb_port; i++) {
> > +		rte_eth_dev_info_get(i, &dev_info);
> > +		nb_queue = dev_info.nb_rx_queues;
> > +		gro_status->ports[i].gro_tbls =
> > +			(struct rte_gro_tbl **)rte_zmalloc(
> > +					NULL,
> > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > +					0);
> > +		gro_status->ports[i].gro_cbs =
> > +			(struct rte_eth_rxtx_callback **)
> > +			rte_zmalloc(
> > +					NULL,
> > +					nb_queue *
> > +					sizeof(struct rte_eth_rxtx_callback *),
> > +					0);
> > +	}
> > +}
> > +
> > +}
> ....
> > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> ....
> 
> > +
> > +typedef int (*gro_tbl_create_fn)(
> > +		char *name,
> > +		uint32_t nb_entries,
> > +		uint16_t socket_id,
> > +		struct rte_hash **hash_tbl);
> > +
> 
> If you have create_fn, then you'll probably need a destroy_fn too.
> Second thing why always assume that particular GRO implementation would
> always use rte_hash to store it's data?
> Probably better hide that inside something neutral, like 'struct gro_data;' or so. 

Agree. I will modify it.

> 
> > +typedef int32_t (*gro_reassemble_fn)(
> > +		struct rte_hash *hash_tbl,
> > +		struct gro_item_list *item_list);
> > +#endif
> > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > index b5215c0..8956821 100644
> > --- a/mk/rte.app.mk
> > +++ b/mk/rte.app.mk
> > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > 
> >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > --
> > 2.7.4
  
Ananyev, Konstantin May 24, 2017, 12:38 p.m. UTC | #3
Hi Jiayu,

> 
> Hi Konstantin,
> 
> Thanks for your comments. My replies/questions are below.
> 
> BRs,
> Jiayu
> 
> On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > Hi Jiayu,
> > My comments/questions below.
> > Konstantin
> >
> > >
> > > For applications, DPDK GRO provides three external functions to
> > > enable/disable GRO:
> > > - rte_gro_init: initialize GRO environment;
> > > - rte_gro_enable: enable GRO for a given port;
> > > - rte_gro_disable: disable GRO for a given port.
> > > Before using GRO, applications should explicitly call rte_gro_init to
> > > initizalize GRO environment. After that, applications can call
> > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > specific ports.
> >
> > I think this is too restrictive and wouldn't meet various user's needs.
> > User might want to:
> > - enable/disable GRO for particular RX queue
> > - or even setup different GRO types for different RX queues,
> >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> 
> The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> to applications. But are there any scenarios that different queues of a port may
> require different GRO control (i.e. GRO types and enable/disable GRO)?

I think yes.

> 
> > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> >   But invoke gro() manually at somepoint in his code.
> 
> An application-used GRO library can enable more flexibility to applications. Besides,
> when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> the same issue happens in GSO, and even more seriously. This is because applications
> , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> segmentation/reassembling libraries to applications.
> 
> > - Many users would like to control size (number of flows/items per flow),
> >   max allowed packet size, max timeout, etc., for different GRO tables.
> > - User would need a way to flush all or only timeout packets from particular GRO tables.
> >
> > So I think that API needs to extended to become be much more fine-grained.
> > Something like that:
> >
> > struct rte_gro_tbl_param {
> >    int32_t socket_id;
> >    size_t max_flows;
> >    size_t max_items_per_flow;
> >    size_t max_pkt_size;
> >    uint64_t packet_timeout_cycles;
> >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> >   <probably type specific params>
> >   ...
> > };
> >
> > struct rte_gro_tbl;
> > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> >
> > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> 
> Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?

For any packets that sits in the gro_table for too long.

> 
> >
> > /*
> >  * process packets, might store some packets inside the GRO table,
> >  * returns number of filled entries in pkt[]
> >  */
> > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> >
> > /*
> >   * retirieves up to num timeouted packets from the table.
> >   */
> > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> 
> Currently, we implement GRO as RX callback, whose processing logic is simple:
> receive burst packets -> perform GRO -> return to application. GRO stops after
> finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> and who will call it?

I mean the following scenario:
We receive a packet, find it is eligible for GRO and put it into gro_table
in expectation - there would be more packets for the same flow.
But it could happen that we would never (or for some long time) receive
any new packets for that flow.
So the first packet would never be delivered to the upper layer,
or delivered too late.
So we need a mechanism to extract such not merged packets
and push them to the upper layer.
My thought it would be application responsibility to call it from time to time.
That's actually another reason why I think we shouldn't use application
to always use RX callbacks for GRO.

> 
> >
> > And if you like to keep them as helper functions:
> >
> > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> > void rte_gro_port_disable(uint8_t port);
> >
>
 > Though from my perspective, it is out of scope of that library,
> > and I'd leave it to the upper layer to decide which callbacks and
> > in what order have to be installed on particular port.
> 
> In my opinion, GRO library includes two parts. One is in charge of reassembling
> packets, the other is in charge of implementing GRO as RX callback. And
> rte_gro_port_enable/disable belongs to the second part. You mean we should let
> applications to register RX callback, and GRO library just provides reassembling
> related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?

Yes, that would be my preference.
Let the application developer to decide how/when to
call GRO functions (callback, direct call, some combination).

> 
> >
> > ....
> >
> > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > new file mode 100644
> > > index 0000000..996b382
> > > --- /dev/null
> > > +++ b/lib/librte_gro/rte_gro.c
> > > @@ -0,0 +1,219 @@
> > > +#include <rte_ethdev.h>
> > > +#include <rte_mbuf.h>
> > > +#include <rte_hash.h>
> > > +#include <stdint.h>
> > > +#include <rte_malloc.h>
> > > +
> > > +#include "rte_gro.h"
> > > +#include "rte_gro_common.h"
> > > +
> > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> >
> > Why not to initialize these arrays properly at building time (here)?
> 
> Yes, I should declare them as static variables.
> 
> >
> > > +
> > > +struct rte_gro_status *gro_status;
> > > +
> > > +/**
> > > + * Internal function. It creates one hashing table for all
> > > + * DPDK-supported GRO types, and all of them are stored in an object
> > > + * of struct rte_gro_tbl.
> > > + *
> > > + * @param name
> > > + *  Name for GRO lookup table
> > > + * @param nb_entries
> > > + *  Element number of each hashing table
> > > + * @param socket_id
> > > + *  socket id
> > > + * @param gro_tbl
> > > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > > + *  inside rte_gro_tbl_setup.
> > > + * @return
> > > + *  If create successfully, return a positive value; if not, return
> > > + *  a negative value.
> > > + */
> > > +static int
> > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > > +{
> > > +	gro_tbl_create_fn create_tbl_fn;
> > > +	const uint32_t len = strlen(name) + 10;
> > > +	char tbl_name[len];
> > > +	int i;
> > > +
> > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > +		sprintf(tbl_name, "%s_%u", name, i);
> > > +		create_tbl_fn = tbl_create_functions[i];
> > > +		if (create_tbl_fn && (create_tbl_fn(name,
> > > +						nb_entries,
> > > +						socket_id,
> > > +						&(gro_tbl->
> > > +							lkp_tbls[i].hash_tbl))
> > > +					< 0)) {
> >
> > Shouldn't you destroy already created tables here?
> 
> Yes, I should add codes to destory created tables before creating new ones.
> 
> >
> > > +			return -1;
> > > +		}
> > > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > > +	}
> > > +	return 1;
> > > +}
> > > +
> > > +/**
> > > + * Internal function. It frees all the hashing tables stored in
> > > + * the given struct rte_gro_tbl object.
> > > + */
> > > +static void
> > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > +{
> > > +	int i;
> > > +
> > > +	if (gro_tbl == NULL)
> > > +		return;
> > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > > +	}
> > > +}
> > > +
> > > +/**
> > > + * Internal function. It performs all supported GRO types on inputted
> > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > > + * packets. Packets of unsupported GRO types won't be processed. For
> > > + * ethernet devices, which want to support GRO, this function is used to
> > > + * registered as RX callback for all queues.
> > > + *
> > > + * @param pkts
> > > + *  Packets to reassemble.
> > > + * @param nb_pkts
> > > + *  The number of packets to reassemble.
> > > + * @param gro_tbl
> > > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > > + *  initialized by rte_gro_tbl_setup.
> > > + * @return
> > > + *  Packet number after GRO. If reassemble successfully, the value is
> > > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > > + *  parameters are invalid, return 0.
> > > + */
> > > +static uint16_t
> > > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > > +		uint16_t queue __rte_unused,
> > > +		struct rte_mbuf **pkts,
> > > +		uint16_t nb_pkts,
> > > +		uint16_t max_pkts __rte_unused,
> > > +		void *gro_tbl)
> > > +{
> > > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > > +		printf("invalid parameters for GRO.\n");
> > > +		return 0;
> > > +	}
> > > +	uint16_t nb_after_gro = nb_pkts;
> >
> > Here and everywhere - please move variable definitions to the top of the block.
> 
> Thanks. I will modify them in next version.
> 
> >
> > > +
> > > +	return nb_after_gro;
> > > +}
> > > +
> > > +void
> > > +rte_gro_init(void)
> > > +{
> > > +	uint8_t nb_port, i;
> > > +	uint16_t nb_queue;
> > > +	struct rte_eth_dev_info dev_info;
> > > +
> > > +	/* if init already, return immediately */
> > > +	if (gro_status) {
> > > +		printf("repeatly init GRO environment\n");
> > > +		return;
> > > +	}
> > > +
> > > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > > +			NULL,
> > > +			sizeof(struct rte_gro_status),
> > > +			0);
> > > +
> > > +	nb_port = rte_eth_dev_count();
> >
> > Number of ports and number of configured queues per port can change dynamically.
> > In fact, I don't think we need that function and global gro_status at all.
> > To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> > Searching for particular cb_func and cb_arg values.
> 
> Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
> we can't remove RX callback by searching for particular cb_func inside GRO library, since
> these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
> is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
> lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
> You mean we need to add this API?

Yes my though was to add something like:
rte_eth_find_rx_callback()                        /*find specific callback *./
or rte_eth_remove_get_rx_callbacks()  /*get a copy  of list of all currently installed callback */ 

But if we move installing GRO callbacks out of scope of the library, we probably wouldn't need it.

> 
> >
> > > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > > +			NULL,
> > > +			nb_port * sizeof(struct gro_port_status),
> > > +			0);
> > > +	gro_status->nb_port = nb_port;
> > > +
> > > +	for (i = 0; i < nb_port; i++) {
> > > +		rte_eth_dev_info_get(i, &dev_info);
> > > +		nb_queue = dev_info.nb_rx_queues;
> > > +		gro_status->ports[i].gro_tbls =
> > > +			(struct rte_gro_tbl **)rte_zmalloc(
> > > +					NULL,
> > > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > > +					0);
> > > +		gro_status->ports[i].gro_cbs =
> > > +			(struct rte_eth_rxtx_callback **)
> > > +			rte_zmalloc(
> > > +					NULL,
> > > +					nb_queue *
> > > +					sizeof(struct rte_eth_rxtx_callback *),
> > > +					0);
> > > +	}
> > > +}
> > > +
> > > +}
> > ....
> > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> > ....
> >
> > > +
> > > +typedef int (*gro_tbl_create_fn)(
> > > +		char *name,
> > > +		uint32_t nb_entries,
> > > +		uint16_t socket_id,
> > > +		struct rte_hash **hash_tbl);
> > > +
> >
> > If you have create_fn, then you'll probably need a destroy_fn too.
> > Second thing why always assume that particular GRO implementation would
> > always use rte_hash to store it's data?
> > Probably better hide that inside something neutral, like 'struct gro_data;' or so.
> 
> Agree. I will modify it.
> 
> >
> > > +typedef int32_t (*gro_reassemble_fn)(
> > > +		struct rte_hash *hash_tbl,
> > > +		struct gro_item_list *item_list);
> > > +#endif
> > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > index b5215c0..8956821 100644
> > > --- a/mk/rte.app.mk
> > > +++ b/mk/rte.app.mk
> > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > >
> > >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > > --
> > > 2.7.4
  
Hu, Jiayu May 26, 2017, 7:26 a.m. UTC | #4
Hi Konstantin,

On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> 
> Hi Jiayu,
> 
> > 
> > Hi Konstantin,
> > 
> > Thanks for your comments. My replies/questions are below.
> > 
> > BRs,
> > Jiayu
> > 
> > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > Hi Jiayu,
> > > My comments/questions below.
> > > Konstantin
> > >
> > > >
> > > > For applications, DPDK GRO provides three external functions to
> > > > enable/disable GRO:
> > > > - rte_gro_init: initialize GRO environment;
> > > > - rte_gro_enable: enable GRO for a given port;
> > > > - rte_gro_disable: disable GRO for a given port.
> > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > initizalize GRO environment. After that, applications can call
> > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > specific ports.
> > >
> > > I think this is too restrictive and wouldn't meet various user's needs.
> > > User might want to:
> > > - enable/disable GRO for particular RX queue
> > > - or even setup different GRO types for different RX queues,
> > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > 
> > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > to applications. But are there any scenarios that different queues of a port may
> > require different GRO control (i.e. GRO types and enable/disable GRO)?
> 
> I think yes.
> 
> > 
> > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > >   But invoke gro() manually at somepoint in his code.
> > 
> > An application-used GRO library can enable more flexibility to applications. Besides,
> > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > the same issue happens in GSO, and even more seriously. This is because applications
> > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > segmentation/reassembling libraries to applications.
> > 
> > > - Many users would like to control size (number of flows/items per flow),
> > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > >
> > > So I think that API needs to extended to become be much more fine-grained.
> > > Something like that:
> > >
> > > struct rte_gro_tbl_param {
> > >    int32_t socket_id;
> > >    size_t max_flows;
> > >    size_t max_items_per_flow;
> > >    size_t max_pkt_size;
> > >    uint64_t packet_timeout_cycles;
> > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > >   <probably type specific params>
> > >   ...
> > > };
> > >
> > > struct rte_gro_tbl;
> > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > >
> > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > 
> > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> 
> For any packets that sits in the gro_table for too long.
> 
> > 
> > >
> > > /*
> > >  * process packets, might store some packets inside the GRO table,
> > >  * returns number of filled entries in pkt[]
> > >  */
> > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > >
> > > /*
> > >   * retirieves up to num timeouted packets from the table.
> > >   */
> > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > 
> > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > receive burst packets -> perform GRO -> return to application. GRO stops after
> > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > and who will call it?
> 
> I mean the following scenario:
> We receive a packet, find it is eligible for GRO and put it into gro_table
> in expectation - there would be more packets for the same flow.
> But it could happen that we would never (or for some long time) receive
> any new packets for that flow.
> So the first packet would never be delivered to the upper layer,
> or delivered too late.
> So we need a mechanism to extract such not merged packets
> and push them to the upper layer.
> My thought it would be application responsibility to call it from time to time.
> That's actually another reason why I think we shouldn't use application
> to always use RX callbacks for GRO.

Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
which merges N inputted packets at a time. After finishing processing these
packets, it returns all of them and reset hashing tables. Therefore, there
are no packets in hashing tables after rte_gro_reassemble_burst returns.

If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
function, like rte_gro_reassemble, which processes one given packet at a
time and won't reset hashing tables. Applications decide when to flush packets
in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used 
to flush the packets. Do you mean that?
> 
> > 
> > >
> > > And if you like to keep them as helper functions:
> > >
> > > int rte_gro_port_enable(uint8_t port, struct rte_gro_tbl_param *param);
> > > void rte_gro_port_disable(uint8_t port);
> > >
> >
>  > Though from my perspective, it is out of scope of that library,
> > > and I'd leave it to the upper layer to decide which callbacks and
> > > in what order have to be installed on particular port.
> > 
> > In my opinion, GRO library includes two parts. One is in charge of reassembling
> > packets, the other is in charge of implementing GRO as RX callback. And
> > rte_gro_port_enable/disable belongs to the second part. You mean we should let
> > applications to register RX callback, and GRO library just provides reassembling
> > related functions (e.g. rte_gro_tbl_process and rte_gro_tbl_create)?
> 
> Yes, that would be my preference.
> Let the application developer to decide how/when to
> call GRO functions (callback, direct call, some combination).
> 
> > 
> > >
> > > ....
> > >
> > > > diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> > > > new file mode 100644
> > > > index 0000000..996b382
> > > > --- /dev/null
> > > > +++ b/lib/librte_gro/rte_gro.c
> > > > @@ -0,0 +1,219 @@
> > > > +#include <rte_ethdev.h>
> > > > +#include <rte_mbuf.h>
> > > > +#include <rte_hash.h>
> > > > +#include <stdint.h>
> > > > +#include <rte_malloc.h>
> > > > +
> > > > +#include "rte_gro.h"
> > > > +#include "rte_gro_common.h"
> > > > +
> > > > +gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
> > > > +gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
> > >
> > > Why not to initialize these arrays properly at building time (here)?
> > 
> > Yes, I should declare them as static variables.
> > 
> > >
> > > > +
> > > > +struct rte_gro_status *gro_status;
> > > > +
> > > > +/**
> > > > + * Internal function. It creates one hashing table for all
> > > > + * DPDK-supported GRO types, and all of them are stored in an object
> > > > + * of struct rte_gro_tbl.
> > > > + *
> > > > + * @param name
> > > > + *  Name for GRO lookup table
> > > > + * @param nb_entries
> > > > + *  Element number of each hashing table
> > > > + * @param socket_id
> > > > + *  socket id
> > > > + * @param gro_tbl
> > > > + *  gro_tbl points to a rte_gro_tbl object, which will be initalized
> > > > + *  inside rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  If create successfully, return a positive value; if not, return
> > > > + *  a negative value.
> > > > + */
> > > > +static int
> > > > +rte_gro_tbl_setup(char *name, uint32_t nb_entries,
> > > > +		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +	gro_tbl_create_fn create_tbl_fn;
> > > > +	const uint32_t len = strlen(name) + 10;
> > > > +	char tbl_name[len];
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +		sprintf(tbl_name, "%s_%u", name, i);
> > > > +		create_tbl_fn = tbl_create_functions[i];
> > > > +		if (create_tbl_fn && (create_tbl_fn(name,
> > > > +						nb_entries,
> > > > +						socket_id,
> > > > +						&(gro_tbl->
> > > > +							lkp_tbls[i].hash_tbl))
> > > > +					< 0)) {
> > >
> > > Shouldn't you destroy already created tables here?
> > 
> > Yes, I should add codes to destory created tables before creating new ones.
> > 
> > >
> > > > +			return -1;
> > > > +		}
> > > > +		gro_tbl->lkp_tbls[i].gro_type = i;
> > > > +	}
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It frees all the hashing tables stored in
> > > > + * the given struct rte_gro_tbl object.
> > > > + */
> > > > +static void
> > > > +rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
> > > > +{
> > > > +	int i;
> > > > +
> > > > +	if (gro_tbl == NULL)
> > > > +		return;
> > > > +	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
> > > > +		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
> > > > +		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
> > > > +		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
> > > > +	}
> > > > +}
> > > > +
> > > > +/**
> > > > + * Internal function. It performs all supported GRO types on inputted
> > > > + * packets. For example, if current DPDK GRO supports TCP/IPv4 and
> > > > + * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
> > > > + * packets. Packets of unsupported GRO types won't be processed. For
> > > > + * ethernet devices, which want to support GRO, this function is used to
> > > > + * registered as RX callback for all queues.
> > > > + *
> > > > + * @param pkts
> > > > + *  Packets to reassemble.
> > > > + * @param nb_pkts
> > > > + *  The number of packets to reassemble.
> > > > + * @param gro_tbl
> > > > + *  pointer points to an object of struct rte_gro_tbl, which has been
> > > > + *  initialized by rte_gro_tbl_setup.
> > > > + * @return
> > > > + *  Packet number after GRO. If reassemble successfully, the value is
> > > > + *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
> > > > + *  parameters are invalid, return 0.
> > > > + */
> > > > +static uint16_t
> > > > +rte_gro_reassemble_burst(uint8_t port __rte_unused,
> > > > +		uint16_t queue __rte_unused,
> > > > +		struct rte_mbuf **pkts,
> > > > +		uint16_t nb_pkts,
> > > > +		uint16_t max_pkts __rte_unused,
> > > > +		void *gro_tbl)
> > > > +{
> > > > +	if ((gro_tbl == NULL) || (pkts == NULL)) {
> > > > +		printf("invalid parameters for GRO.\n");
> > > > +		return 0;
> > > > +	}
> > > > +	uint16_t nb_after_gro = nb_pkts;
> > >
> > > Here and everywhere - please move variable definitions to the top of the block.
> > 
> > Thanks. I will modify them in next version.
> > 
> > >
> > > > +
> > > > +	return nb_after_gro;
> > > > +}
> > > > +
> > > > +void
> > > > +rte_gro_init(void)
> > > > +{
> > > > +	uint8_t nb_port, i;
> > > > +	uint16_t nb_queue;
> > > > +	struct rte_eth_dev_info dev_info;
> > > > +
> > > > +	/* if init already, return immediately */
> > > > +	if (gro_status) {
> > > > +		printf("repeatly init GRO environment\n");
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	gro_status = (struct rte_gro_status *)rte_zmalloc(
> > > > +			NULL,
> > > > +			sizeof(struct rte_gro_status),
> > > > +			0);
> > > > +
> > > > +	nb_port = rte_eth_dev_count();
> > >
> > > Number of ports and number of configured queues per port can change dynamically.
> > > In fact, I don't think we need that function and global gro_status at all.
> > > To add/delete RX callback for particular port/queue - you can just scan over exisiting callbacks
> > > Searching for particular cb_func and cb_arg values.
> > 
> > Yes, it's better to store these parameters (e.g. hashing table pointers) as cb_arg. But
> > we can't remove RX callback by searching for particular cb_func inside GRO library, since
> > these operations need locking protection and the lock variable (i.e. rte_eth_rx_cb_lock)
> > is unavailable to GRO library. To my knowledge, the only way is to provide a new API in
> > lib/librte_ether/rte_ethdev.c to support removing RX callback via cb_func or cb_arg values.
> > You mean we need to add this API?
> 
> Yes my though was to add something like:
> rte_eth_find_rx_callback()                        /*find specific callback *./
> or rte_eth_remove_get_rx_callbacks()  /*get a copy  of list of all currently installed callback */ 
> 
> But if we move installing GRO callbacks out of scope of the library, we probably wouldn't need it.
> 
> > 
> > >
> > > > +	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
> > > > +			NULL,
> > > > +			nb_port * sizeof(struct gro_port_status),
> > > > +			0);
> > > > +	gro_status->nb_port = nb_port;
> > > > +
> > > > +	for (i = 0; i < nb_port; i++) {
> > > > +		rte_eth_dev_info_get(i, &dev_info);
> > > > +		nb_queue = dev_info.nb_rx_queues;
> > > > +		gro_status->ports[i].gro_tbls =
> > > > +			(struct rte_gro_tbl **)rte_zmalloc(
> > > > +					NULL,
> > > > +					nb_queue * sizeof(struct rte_gro_tbl *),
> > > > +					0);
> > > > +		gro_status->ports[i].gro_cbs =
> > > > +			(struct rte_eth_rxtx_callback **)
> > > > +			rte_zmalloc(
> > > > +					NULL,
> > > > +					nb_queue *
> > > > +					sizeof(struct rte_eth_rxtx_callback *),
> > > > +					0);
> > > > +	}
> > > > +}
> > > > +
> > > > +}
> > > ....
> > > > diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
> > > ....
> > >
> > > > +
> > > > +typedef int (*gro_tbl_create_fn)(
> > > > +		char *name,
> > > > +		uint32_t nb_entries,
> > > > +		uint16_t socket_id,
> > > > +		struct rte_hash **hash_tbl);
> > > > +
> > >
> > > If you have create_fn, then you'll probably need a destroy_fn too.
> > > Second thing why always assume that particular GRO implementation would
> > > always use rte_hash to store it's data?
> > > Probably better hide that inside something neutral, like 'struct gro_data;' or so.
> > 
> > Agree. I will modify it.
> > 
> > >
> > > > +typedef int32_t (*gro_reassemble_fn)(
> > > > +		struct rte_hash *hash_tbl,
> > > > +		struct gro_item_list *item_list);
> > > > +#endif
> > > > diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> > > > index b5215c0..8956821 100644
> > > > --- a/mk/rte.app.mk
> > > > +++ b/mk/rte.app.mk
> > > > @@ -98,6 +98,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
> > > > +_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
> > > >
> > > >  ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
> > > >  _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni
> > > > --
> > > > 2.7.4
  
Ananyev, Konstantin May 26, 2017, 11:10 p.m. UTC | #5
Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, May 26, 2017 8:26 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> >
> > Hi Jiayu,
> >
> > >
> > > Hi Konstantin,
> > >
> > > Thanks for your comments. My replies/questions are below.
> > >
> > > BRs,
> > > Jiayu
> > >
> > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > Hi Jiayu,
> > > > My comments/questions below.
> > > > Konstantin
> > > >
> > > > >
> > > > > For applications, DPDK GRO provides three external functions to
> > > > > enable/disable GRO:
> > > > > - rte_gro_init: initialize GRO environment;
> > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > initizalize GRO environment. After that, applications can call
> > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > specific ports.
> > > >
> > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > User might want to:
> > > > - enable/disable GRO for particular RX queue
> > > > - or even setup different GRO types for different RX queues,
> > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > >
> > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > to applications. But are there any scenarios that different queues of a port may
> > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> >
> > I think yes.
> >
> > >
> > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > >   But invoke gro() manually at somepoint in his code.
> > >
> > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > the same issue happens in GSO, and even more seriously. This is because applications
> > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > segmentation/reassembling libraries to applications.
> > >
> > > > - Many users would like to control size (number of flows/items per flow),
> > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > >
> > > > So I think that API needs to extended to become be much more fine-grained.
> > > > Something like that:
> > > >
> > > > struct rte_gro_tbl_param {
> > > >    int32_t socket_id;
> > > >    size_t max_flows;
> > > >    size_t max_items_per_flow;
> > > >    size_t max_pkt_size;
> > > >    uint64_t packet_timeout_cycles;
> > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > >   <probably type specific params>
> > > >   ...
> > > > };
> > > >
> > > > struct rte_gro_tbl;
> > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > >
> > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > >
> > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> >
> > For any packets that sits in the gro_table for too long.
> >
> > >
> > > >
> > > > /*
> > > >  * process packets, might store some packets inside the GRO table,
> > > >  * returns number of filled entries in pkt[]
> > > >  */
> > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > >
> > > > /*
> > > >   * retirieves up to num timeouted packets from the table.
> > > >   */
> > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > >
> > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > and who will call it?
> >
> > I mean the following scenario:
> > We receive a packet, find it is eligible for GRO and put it into gro_table
> > in expectation - there would be more packets for the same flow.
> > But it could happen that we would never (or for some long time) receive
> > any new packets for that flow.
> > So the first packet would never be delivered to the upper layer,
> > or delivered too late.
> > So we need a mechanism to extract such not merged packets
> > and push them to the upper layer.
> > My thought it would be application responsibility to call it from time to time.
> > That's actually another reason why I think we shouldn't use application
> > to always use RX callbacks for GRO.
> 
> Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> which merges N inputted packets at a time. After finishing processing these
> packets, it returns all of them and reset hashing tables. Therefore, there
> are no packets in hashing tables after rte_gro_reassemble_burst returns.

Ok, sorry I missed that part with rte_hash_reset().
So gro_ressemble_burst() performs only inline processing on current input packets
and doesn't try to save packets for future merging, correct?
Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
So my opinion let's keep it like that - nice and simple.
BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets. 

> 
> If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> function, like rte_gro_reassemble, which processes one given packet at a
> time and won't reset hashing tables. Applications decide when to flush packets
> in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> to flush the packets. Do you mean that?

Yes, that's what I meant, but as I said above - I think your approach is probably
more preferable - it is much simpler and lightweight.
Konstantin
  
Hu, Jiayu May 27, 2017, 3:41 a.m. UTC | #6
On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Friday, May 26, 2017 8:26 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > Hi Konstantin,
> > 
> > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > Thanks for your comments. My replies/questions are below.
> > > >
> > > > BRs,
> > > > Jiayu
> > > >
> > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > Hi Jiayu,
> > > > > My comments/questions below.
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > enable/disable GRO:
> > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > initizalize GRO environment. After that, applications can call
> > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > specific ports.
> > > > >
> > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > User might want to:
> > > > > - enable/disable GRO for particular RX queue
> > > > > - or even setup different GRO types for different RX queues,
> > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > >
> > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > to applications. But are there any scenarios that different queues of a port may
> > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > >
> > > I think yes.
> > >
> > > >
> > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > >   But invoke gro() manually at somepoint in his code.
> > > >
> > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > segmentation/reassembling libraries to applications.
> > > >
> > > > > - Many users would like to control size (number of flows/items per flow),
> > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > >
> > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > Something like that:
> > > > >
> > > > > struct rte_gro_tbl_param {
> > > > >    int32_t socket_id;
> > > > >    size_t max_flows;
> > > > >    size_t max_items_per_flow;
> > > > >    size_t max_pkt_size;
> > > > >    uint64_t packet_timeout_cycles;
> > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > >   <probably type specific params>
> > > > >   ...
> > > > > };
> > > > >
> > > > > struct rte_gro_tbl;
> > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > >
> > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > >
> > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > >
> > > For any packets that sits in the gro_table for too long.
> > >
> > > >
> > > > >
> > > > > /*
> > > > >  * process packets, might store some packets inside the GRO table,
> > > > >  * returns number of filled entries in pkt[]
> > > > >  */
> > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > >
> > > > > /*
> > > > >   * retirieves up to num timeouted packets from the table.
> > > > >   */
> > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > >
> > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > and who will call it?
> > >
> > > I mean the following scenario:
> > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > in expectation - there would be more packets for the same flow.
> > > But it could happen that we would never (or for some long time) receive
> > > any new packets for that flow.
> > > So the first packet would never be delivered to the upper layer,
> > > or delivered too late.
> > > So we need a mechanism to extract such not merged packets
> > > and push them to the upper layer.
> > > My thought it would be application responsibility to call it from time to time.
> > > That's actually another reason why I think we shouldn't use application
> > > to always use RX callbacks for GRO.
> > 
> > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > which merges N inputted packets at a time. After finishing processing these
> > packets, it returns all of them and reset hashing tables. Therefore, there
> > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> 
> Ok, sorry I missed that part with rte_hash_reset().
> So gro_ressemble_burst() performs only inline processing on current input packets
> and doesn't try to save packets for future merging, correct?

Yes.

> Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> So my opinion let's keep it like that - nice and simple.
> BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets. 

Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
some GRO type specific information?

> 
> > 
> > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > function, like rte_gro_reassemble, which processes one given packet at a
> > time and won't reset hashing tables. Applications decide when to flush packets
> > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > to flush the packets. Do you mean that?
> 
> Yes, that's what I meant, but as I said above - I think your approach is probably
> more preferable - it is much simpler and lightweight.
> Konstantin
>
  
Ananyev, Konstantin May 27, 2017, 11:12 a.m. UTC | #7
> -----Original Message-----
> From: Hu, Jiayu
> Sent: Saturday, May 27, 2017 4:42 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Friday, May 26, 2017 8:26 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > >
> > > Hi Konstantin,
> > >
> > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > > Hi Jiayu,
> > > >
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > Thanks for your comments. My replies/questions are below.
> > > > >
> > > > > BRs,
> > > > > Jiayu
> > > > >
> > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > > My comments/questions below.
> > > > > > Konstantin
> > > > > >
> > > > > > >
> > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > enable/disable GRO:
> > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > specific ports.
> > > > > >
> > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > User might want to:
> > > > > > - enable/disable GRO for particular RX queue
> > > > > > - or even setup different GRO types for different RX queues,
> > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > >
> > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > >
> > > > I think yes.
> > > >
> > > > >
> > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > >   But invoke gro() manually at somepoint in his code.
> > > > >
> > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > segmentation/reassembling libraries to applications.
> > > > >
> > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > >
> > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > Something like that:
> > > > > >
> > > > > > struct rte_gro_tbl_param {
> > > > > >    int32_t socket_id;
> > > > > >    size_t max_flows;
> > > > > >    size_t max_items_per_flow;
> > > > > >    size_t max_pkt_size;
> > > > > >    uint64_t packet_timeout_cycles;
> > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > >   <probably type specific params>
> > > > > >   ...
> > > > > > };
> > > > > >
> > > > > > struct rte_gro_tbl;
> > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > >
> > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > >
> > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > >
> > > > For any packets that sits in the gro_table for too long.
> > > >
> > > > >
> > > > > >
> > > > > > /*
> > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > >  * returns number of filled entries in pkt[]
> > > > > >  */
> > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > >
> > > > > > /*
> > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > >   */
> > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > >
> > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > and who will call it?
> > > >
> > > > I mean the following scenario:
> > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > in expectation - there would be more packets for the same flow.
> > > > But it could happen that we would never (or for some long time) receive
> > > > any new packets for that flow.
> > > > So the first packet would never be delivered to the upper layer,
> > > > or delivered too late.
> > > > So we need a mechanism to extract such not merged packets
> > > > and push them to the upper layer.
> > > > My thought it would be application responsibility to call it from time to time.
> > > > That's actually another reason why I think we shouldn't use application
> > > > to always use RX callbacks for GRO.
> > >
> > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > which merges N inputted packets at a time. After finishing processing these
> > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> >
> > Ok, sorry I missed that part with rte_hash_reset().
> > So gro_ressemble_burst() performs only inline processing on current input packets
> > and doesn't try to save packets for future merging, correct?
> 
> Yes.
> 
> > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > So my opinion let's keep it like that - nice and simple.
> > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> 
> Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> some GRO type specific information?

Yes, but we don't need the actual hash-tables, etc. inside.
Passing something like struct gro_param seems enough.

> 
> >
> > >
> > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > function, like rte_gro_reassemble, which processes one given packet at a
> > > time and won't reset hashing tables. Applications decide when to flush packets
> > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > to flush the packets. Do you mean that?
> >
> > Yes, that's what I meant, but as I said above - I think your approach is probably
> > more preferable - it is much simpler and lightweight.
> > Konstantin
> >
  
Hu, Jiayu May 27, 2017, 2:09 p.m. UTC | #8
Hi Konstantin,

On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Saturday, May 27, 2017 4:42 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > Hi Jiayu,
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > > Hi Jiayu,
> > > > >
> > > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > Thanks for your comments. My replies/questions are below.
> > > > > >
> > > > > > BRs,
> > > > > > Jiayu
> > > > > >
> > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > > Hi Jiayu,
> > > > > > > My comments/questions below.
> > > > > > > Konstantin
> > > > > > >
> > > > > > > >
> > > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > > enable/disable GRO:
> > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > > specific ports.
> > > > > > >
> > > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > > User might want to:
> > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > > >
> > > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > > >
> > > > > I think yes.
> > > > >
> > > > > >
> > > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > >
> > > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > > segmentation/reassembling libraries to applications.
> > > > > >
> > > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > > >
> > > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > > Something like that:
> > > > > > >
> > > > > > > struct rte_gro_tbl_param {
> > > > > > >    int32_t socket_id;
> > > > > > >    size_t max_flows;
> > > > > > >    size_t max_items_per_flow;
> > > > > > >    size_t max_pkt_size;
> > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > >   <probably type specific params>
> > > > > > >   ...
> > > > > > > };
> > > > > > >
> > > > > > > struct rte_gro_tbl;
> > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > > >
> > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > >
> > > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > > >
> > > > > For any packets that sits in the gro_table for too long.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > /*
> > > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > > >  * returns number of filled entries in pkt[]
> > > > > > >  */
> > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > >
> > > > > > > /*
> > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > >   */
> > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > >
> > > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > > and who will call it?
> > > > >
> > > > > I mean the following scenario:
> > > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > > in expectation - there would be more packets for the same flow.
> > > > > But it could happen that we would never (or for some long time) receive
> > > > > any new packets for that flow.
> > > > > So the first packet would never be delivered to the upper layer,
> > > > > or delivered too late.
> > > > > So we need a mechanism to extract such not merged packets
> > > > > and push them to the upper layer.
> > > > > My thought it would be application responsibility to call it from time to time.
> > > > > That's actually another reason why I think we shouldn't use application
> > > > > to always use RX callbacks for GRO.
> > > >
> > > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > > which merges N inputted packets at a time. After finishing processing these
> > > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> > >
> > > Ok, sorry I missed that part with rte_hash_reset().
> > > So gro_ressemble_burst() performs only inline processing on current input packets
> > > and doesn't try to save packets for future merging, correct?
> > 
> > Yes.
> > 
> > > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > > So my opinion let's keep it like that - nice and simple.
> > > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> > 
> > Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> > some GRO type specific information?
> 
> Yes, but we don't need the actual hash-tables, etc. inside.
> Passing something like struct gro_param seems enough.

Yes, we can just pass gro_param and allocate hashing tables
inside rte_gro_reassemble_burst. If so, hashing tables of
desired GRO types are created and freed in each invocation
of rte_gro_reassemble_burst. In GRO library, hashing tables
are created by GRO type specific gro_tbl_create_fn. These
gro_tbl_create_fn may allocate hashing table space via malloc
(or rte_malloc). Therefore, we may frequently call malloc/free
when using rte_gro_reassemble_burst. In my opinion, it will
degrade GRO performance greatly.

But if we ask applications to input hashing tables, what we
need to do is to reset them after finishing using in
rte_gro_reassemble_burst, rather than to malloc and free each
time. Therefore, I think this way is more efficient. How do you
think?

> 
> > 
> > >
> > > >
> > > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > > function, like rte_gro_reassemble, which processes one given packet at a
> > > > time and won't reset hashing tables. Applications decide when to flush packets
> > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > > to flush the packets. Do you mean that?
> > >
> > > Yes, that's what I meant, but as I said above - I think your approach is probably
> > > more preferable - it is much simpler and lightweight.
> > > Konstantin
> > >
  
Ananyev, Konstantin May 27, 2017, 4:51 p.m. UTC | #9
Hi Jiayu,

> 
> Hi Konstantin,
> 
> On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Saturday, May 27, 2017 4:42 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > >
> > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > Hi Jiayu,
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin wrote:
> > > > > >
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > >
> > > > > > > BRs,
> > > > > > > Jiayu
> > > > > > >
> > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev, Konstantin wrote:
> > > > > > > > Hi Jiayu,
> > > > > > > > My comments/questions below.
> > > > > > > > Konstantin
> > > > > > > >
> > > > > > > > >
> > > > > > > > > For applications, DPDK GRO provides three external functions to
> > > > > > > > > enable/disable GRO:
> > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > Before using GRO, applications should explicitly call rte_gro_init to
> > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to disable GRO for
> > > > > > > > > specific ports.
> > > > > > > >
> > > > > > > > I think this is too restrictive and wouldn't meet various user's needs.
> > > > > > > > User might want to:
> > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over IPV6/TCP for queue 1, etc.
> > > > > > >
> > > > > > > The reason for enabling/disabling GRO per-port instead of per-queue is that LINUX
> > > > > > > controls GRO per-port. To control GRO per-queue indeed can provide more flexibility
> > > > > > > to applications. But are there any scenarios that different queues of a port may
> > > > > > > require different GRO control (i.e. GRO types and enable/disable GRO)?
> > > > > >
> > > > > > I think yes.
> > > > > >
> > > > > > >
> > > > > > > > - For various reasons, user might prefer not to use RX callbacks for various reasons,
> > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > >
> > > > > > > An application-used GRO library can enable more flexibility to applications. Besides,
> > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an issue that
> > > > > > > rte_eth_rx_burst returns actually received packet number or GROed packet number. And
> > > > > > > the same issue happens in GSO, and even more seriously. This is because applications
> > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to decide further
> > > > > > > operations. If applications can direcly call GRO/GSO libraries, this issue won't exist.
> > > > > > > And DPDK is a library, which is not a holistic system like LINUX. We don't need to do
> > > > > > > the same as LINUX. Therefore, maybe it's a better idea to directly provide SW
> > > > > > > segmentation/reassembling libraries to applications.
> > > > > > >
> > > > > > > > - Many users would like to control size (number of flows/items per flow),
> > > > > > > >   max allowed packet size, max timeout, etc., for different GRO tables.
> > > > > > > > - User would need a way to flush all or only timeout packets from particular GRO tables.
> > > > > > > >
> > > > > > > > So I think that API needs to extended to become be much more fine-grained.
> > > > > > > > Something like that:
> > > > > > > >
> > > > > > > > struct rte_gro_tbl_param {
> > > > > > > >    int32_t socket_id;
> > > > > > > >    size_t max_flows;
> > > > > > > >    size_t max_items_per_flow;
> > > > > > > >    size_t max_pkt_size;
> > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > >   <probably type specific params>
> > > > > > > >   ...
> > > > > > > > };
> > > > > > > >
> > > > > > > > struct rte_gro_tbl;
> > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct rte_gro_tbl_param *param);
> > > > > > > >
> > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > >
> > > > > > > Yes, I agree with you. It's necessary to provide more fine-grained control APIs to
> > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it for TCP packets?
> > > > > >
> > > > > > For any packets that sits in the gro_table for too long.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > /*
> > > > > > > >  * process packets, might store some packets inside the GRO table,
> > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > >  */
> > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > >
> > > > > > > > /*
> > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > >   */
> > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > >
> > > > > > > Currently, we implement GRO as RX callback, whose processing logic is simple:
> > > > > > > receive burst packets -> perform GRO -> return to application. GRO stops after
> > > > > > > finishing processing received packets. If we provide rte_gro_tbl_timeout, when
> > > > > > > and who will call it?
> > > > > >
> > > > > > I mean the following scenario:
> > > > > > We receive a packet, find it is eligible for GRO and put it into gro_table
> > > > > > in expectation - there would be more packets for the same flow.
> > > > > > But it could happen that we would never (or for some long time) receive
> > > > > > any new packets for that flow.
> > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > or delivered too late.
> > > > > > So we need a mechanism to extract such not merged packets
> > > > > > and push them to the upper layer.
> > > > > > My thought it would be application responsibility to call it from time to time.
> > > > > > That's actually another reason why I think we shouldn't use application
> > > > > > to always use RX callbacks for GRO.
> > > > >
> > > > > Currently, we only provide one reassembly function, rte_gro_reassemble_burst,
> > > > > which merges N inputted packets at a time. After finishing processing these
> > > > > packets, it returns all of them and reset hashing tables. Therefore, there
> > > > > are no packets in hashing tables after rte_gro_reassemble_burst returns.
> > > >
> > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > So gro_ressemble_burst() performs only inline processing on current input packets
> > > > and doesn't try to save packets for future merging, correct?
> > >
> > > Yes.
> > >
> > > > Such approach indeed is much lightweight and doesn't require any extra timeouts and flush().
> > > > So my opinion let's keep it like that - nice and simple.
> > > > BTW, I think in that case we don't need any hashtables (or any other persistent strucures)at all.
> > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to perform on given array of packets.
> > >
> > > Beside GRO types that are desired to perform, maybe it also needs max_pkt_size and
> > > some GRO type specific information?
> >
> > Yes, but we don't need the actual hash-tables, etc. inside.
> > Passing something like struct gro_param seems enough.
> 
> Yes, we can just pass gro_param and allocate hashing tables
> inside rte_gro_reassemble_burst. If so, hashing tables of
> desired GRO types are created and freed in each invocation
> of rte_gro_reassemble_burst. In GRO library, hashing tables
> are created by GRO type specific gro_tbl_create_fn. These
> gro_tbl_create_fn may allocate hashing table space via malloc
> (or rte_malloc). Therefore, we may frequently call malloc/free
> when using rte_gro_reassemble_burst. In my opinion, it will
> degrade GRO performance greatly.

I don't' understand why do we need to put/extract each packet into/from hash table at all.
We have N input packets that need to be grouped/sorted  by some criteria.
Surely that can be done without any hash-table involved.
What is the need for hash table and all the overhead it brings here?
Konstantin

> 
> But if we ask applications to input hashing tables, what we
> need to do is to reset them after finishing using in
> rte_gro_reassemble_burst, rather than to malloc and free each
> time. Therefore, I think this way is more efficient. How do you
> think?
> 
> >
> > >
> > > >
> > > > >
> > > > > If we provide rte_gro_tbl_timeout, we also need to provide another reassmebly
> > > > > function, like rte_gro_reassemble, which processes one given packet at a
> > > > > time and won't reset hashing tables. Applications decide when to flush packets
> > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that can be used
> > > > > to flush the packets. Do you mean that?
> > > >
> > > > Yes, that's what I meant, but as I said above - I think your approach is probably
> > > > more preferable - it is much simpler and lightweight.
> > > > Konstantin
> > > >
  
Hu, Jiayu May 29, 2017, 10:22 a.m. UTC | #10
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Sunday, May 28, 2017 12:51 AM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> 
> Hi Jiayu,
> 
> >
> > Hi Konstantin,
> >
> > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Hu, Jiayu
> > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > > >
> > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > Hi Jiayu,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > > > > >
> > > > > > Hi Konstantin,
> > > > > >
> > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> wrote:
> > > > > > >
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > >
> > > > > > > > BRs,
> > > > > > > > Jiayu
> > > > > > > >
> > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> Konstantin wrote:
> > > > > > > > > Hi Jiayu,
> > > > > > > > > My comments/questions below.
> > > > > > > > > Konstantin
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > For applications, DPDK GRO provides three external functions
> to
> > > > > > > > > > enable/disable GRO:
> > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > Before using GRO, applications should explicitly call
> rte_gro_init to
> > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> disable GRO for
> > > > > > > > > > specific ports.
> > > > > > > > >
> > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> needs.
> > > > > > > > > User might want to:
> > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> IPV6/TCP for queue 1, etc.
> > > > > > > >
> > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> queue is that LINUX
> > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> provide more flexibility
> > > > > > > > to applications. But are there any scenarios that different
> queues of a port may
> > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> GRO)?
> > > > > > >
> > > > > > > I think yes.
> > > > > > >
> > > > > > > >
> > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> for various reasons,
> > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > >
> > > > > > > > An application-used GRO library can enable more flexibility to
> applications. Besides,
> > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> issue that
> > > > > > > > rte_eth_rx_burst returns actually received packet number or
> GROed packet number. And
> > > > > > > > the same issue happens in GSO, and even more seriously. This is
> because applications
> > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> decide further
> > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> this issue won't exist.
> > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> We don't need to do
> > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> directly provide SW
> > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > >
> > > > > > > > > - Many users would like to control size (number of flows/items
> per flow),
> > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> tables.
> > > > > > > > > - User would need a way to flush all or only timeout packets
> from particular GRO tables.
> > > > > > > > >
> > > > > > > > > So I think that API needs to extended to become be much more
> fine-grained.
> > > > > > > > > Something like that:
> > > > > > > > >
> > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > >    int32_t socket_id;
> > > > > > > > >    size_t max_flows;
> > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > >    size_t max_pkt_size;
> > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > >   <probably type specific params>
> > > > > > > > >   ...
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> rte_gro_tbl_param *param);
> > > > > > > > >
> > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > >
> > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> grained control APIs to
> > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> for TCP packets?
> > > > > > >
> > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > >  * process packets, might store some packets inside the GRO
> table,
> > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > >  */
> > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > >   */
> > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > >
> > > > > > > > Currently, we implement GRO as RX callback, whose processing
> logic is simple:
> > > > > > > > receive burst packets -> perform GRO -> return to application.
> GRO stops after
> > > > > > > > finishing processing received packets. If we provide
> rte_gro_tbl_timeout, when
> > > > > > > > and who will call it?
> > > > > > >
> > > > > > > I mean the following scenario:
> > > > > > > We receive a packet, find it is eligible for GRO and put it into
> gro_table
> > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > But it could happen that we would never (or for some long time)
> receive
> > > > > > > any new packets for that flow.
> > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > or delivered too late.
> > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > and push them to the upper layer.
> > > > > > > My thought it would be application responsibility to call it from
> time to time.
> > > > > > > That's actually another reason why I think we shouldn't use
> application
> > > > > > > to always use RX callbacks for GRO.
> > > > > >
> > > > > > Currently, we only provide one reassembly function,
> rte_gro_reassemble_burst,
> > > > > > which merges N inputted packets at a time. After finishing
> processing these
> > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> there
> > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> returns.
> > > > >
> > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > So gro_ressemble_burst() performs only inline processing on current
> input packets
> > > > > and doesn't try to save packets for future merging, correct?
> > > >
> > > > Yes.
> > > >
> > > > > Such approach indeed is much lightweight and doesn't require any
> extra timeouts and flush().
> > > > > So my opinion let's keep it like that - nice and simple.
> > > > > BTW, I think in that case we don't need any hashtables (or any other
> persistent strucures)at all.
> > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> perform on given array of packets.
> > > >
> > > > Beside GRO types that are desired to perform, maybe it also needs
> max_pkt_size and
> > > > some GRO type specific information?
> > >
> > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > Passing something like struct gro_param seems enough.
> >
> > Yes, we can just pass gro_param and allocate hashing tables
> > inside rte_gro_reassemble_burst. If so, hashing tables of
> > desired GRO types are created and freed in each invocation
> > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > are created by GRO type specific gro_tbl_create_fn. These
> > gro_tbl_create_fn may allocate hashing table space via malloc
> > (or rte_malloc). Therefore, we may frequently call malloc/free
> > when using rte_gro_reassemble_burst. In my opinion, it will
> > degrade GRO performance greatly.
> 
> I don't' understand why do we need to put/extract each packet into/from
> hash table at all.
> We have N input packets that need to be grouped/sorted  by some criteria.
> Surely that can be done without any hash-table involved.
> What is the need for hash table and all the overhead it brings here?

In current design, I assume all GRO types use hash tables to merge
packets. The key of the hash table is the criteria to merge packets.
So the main difference for different GRO types' hash tables is the
key definition.

And the reason for using hash tables is to speed up reassembly. Given
There are N TCP packets inputted, the simplest way to process packets[i]
Is to traverse processed packets[0]~packets[i-1] and try to find a one
to merge. In the worst case, we need to check all of packets[0~i-1].
In this case, the time complexity of processing N packets is O(N^2).
If we use a hash table, whose key is the criteria to merge two packets,
the time to find a packet that may be merged with packets[i] is O(1).

Do you think it's too complicated?

Jiayu

> Konstantin
> 
> >
> > But if we ask applications to input hashing tables, what we
> > need to do is to reset them after finishing using in
> > rte_gro_reassemble_burst, rather than to malloc and free each
> > time. Therefore, I think this way is more efficient. How do you
> > think?
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > If we provide rte_gro_tbl_timeout, we also need to provide another
> reassmebly
> > > > > > function, like rte_gro_reassemble, which processes one given
> packet at a
> > > > > > time and won't reset hashing tables. Applications decide when to
> flush packets
> > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that
> can be used
> > > > > > to flush the packets. Do you mean that?
> > > > >
> > > > > Yes, that's what I meant, but as I said above - I think your approach is
> probably
> > > > > more preferable - it is much simpler and lightweight.
> > > > > Konstantin
> > > > >
  
Bruce Richardson May 29, 2017, 12:18 p.m. UTC | #11
On Mon, May 29, 2017 at 10:22:57AM +0000, Hu, Jiayu wrote:
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Sunday, May 28, 2017 12:51 AM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> > 
> > 
> > Hi Jiayu,
> > 
> > >
> > > Hi Konstantin,
> > >
> > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > >
> > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > >
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > >
> > > > > > > > > BRs,
> > > > > > > > > Jiayu
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > Konstantin wrote:
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > > My comments/questions below.
> > > > > > > > > > Konstantin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For applications, DPDK GRO provides three external functions
> > to
> > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > Before using GRO, applications should explicitly call
> > rte_gro_init to
> > > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> > disable GRO for
> > > > > > > > > > > specific ports.
> > > > > > > > > >
> > > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> > needs.
> > > > > > > > > > User might want to:
> > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > IPV6/TCP for queue 1, etc.
> > > > > > > > >
> > > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> > queue is that LINUX
> > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > provide more flexibility
> > > > > > > > > to applications. But are there any scenarios that different
> > queues of a port may
> > > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> > GRO)?
> > > > > > > >
> > > > > > > > I think yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> > for various reasons,
> > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > >
> > > > > > > > > An application-used GRO library can enable more flexibility to
> > applications. Besides,
> > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> > issue that
> > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > GROed packet number. And
> > > > > > > > > the same issue happens in GSO, and even more seriously. This is
> > because applications
> > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> > decide further
> > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > this issue won't exist.
> > > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> > We don't need to do
> > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > directly provide SW
> > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > >
> > > > > > > > > > - Many users would like to control size (number of flows/items
> > per flow),
> > > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> > tables.
> > > > > > > > > > - User would need a way to flush all or only timeout packets
> > from particular GRO tables.
> > > > > > > > > >
> > > > > > > > > > So I think that API needs to extended to become be much more
> > fine-grained.
> > > > > > > > > > Something like that:
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > >    int32_t socket_id;
> > > > > > > > > >    size_t max_flows;
> > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > >   <probably type specific params>
> > > > > > > > > >   ...
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > rte_gro_tbl_param *param);
> > > > > > > > > >
> > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > >
> > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > grained control APIs to
> > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> > for TCP packets?
> > > > > > > >
> > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >  * process packets, might store some packets inside the GRO
> > table,
> > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > >  */
> > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > >   */
> > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > Currently, we implement GRO as RX callback, whose processing
> > logic is simple:
> > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > GRO stops after
> > > > > > > > > finishing processing received packets. If we provide
> > rte_gro_tbl_timeout, when
> > > > > > > > > and who will call it?
> > > > > > > >
> > > > > > > > I mean the following scenario:
> > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > gro_table
> > > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > > But it could happen that we would never (or for some long time)
> > receive
> > > > > > > > any new packets for that flow.
> > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > or delivered too late.
> > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > and push them to the upper layer.
> > > > > > > > My thought it would be application responsibility to call it from
> > time to time.
> > > > > > > > That's actually another reason why I think we shouldn't use
> > application
> > > > > > > > to always use RX callbacks for GRO.
> > > > > > >
> > > > > > > Currently, we only provide one reassembly function,
> > rte_gro_reassemble_burst,
> > > > > > > which merges N inputted packets at a time. After finishing
> > processing these
> > > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> > there
> > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > returns.
> > > > > >
> > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > So gro_ressemble_burst() performs only inline processing on current
> > input packets
> > > > > > and doesn't try to save packets for future merging, correct?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Such approach indeed is much lightweight and doesn't require any
> > extra timeouts and flush().
> > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > BTW, I think in that case we don't need any hashtables (or any other
> > persistent strucures)at all.
> > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > perform on given array of packets.
> > > > >
> > > > > Beside GRO types that are desired to perform, maybe it also needs
> > max_pkt_size and
> > > > > some GRO type specific information?
> > > >
> > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > Passing something like struct gro_param seems enough.
> > >
> > > Yes, we can just pass gro_param and allocate hashing tables
> > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > desired GRO types are created and freed in each invocation
> > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > are created by GRO type specific gro_tbl_create_fn. These
> > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > degrade GRO performance greatly.
> > 
> > I don't' understand why do we need to put/extract each packet into/from
> > hash table at all.
> > We have N input packets that need to be grouped/sorted  by some criteria.
> > Surely that can be done without any hash-table involved.
> > What is the need for hash table and all the overhead it brings here?
> 
> In current design, I assume all GRO types use hash tables to merge
> packets. The key of the hash table is the criteria to merge packets.
> So the main difference for different GRO types' hash tables is the
> key definition.
> 
> And the reason for using hash tables is to speed up reassembly. Given
> There are N TCP packets inputted, the simplest way to process packets[i]
> Is to traverse processed packets[0]~packets[i-1] and try to find a one
> to merge. In the worst case, we need to check all of packets[0~i-1].
> In this case, the time complexity of processing N packets is O(N^2).
> If we use a hash table, whose key is the criteria to merge two packets,
> the time to find a packet that may be merged with packets[i] is O(1).
> 
> Do you think it's too complicated?
> 
> Jiayu
> 
How big is your expected burst size? If you are expecting 32 or 64
packets per call, then N is small and the overhead of the hash table
seems a bit much. Perhaps you need different code paths for bigger and
smaller bursts?

/Bruce
  
Ananyev, Konstantin May 29, 2017, 12:51 p.m. UTC | #12
Hi Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Monday, May 29, 2017 11:23 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Sunday, May 28, 2017 12:51 AM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> >
> >
> > Hi Jiayu,
> >
> > >
> > > Hi Konstantin,
> > >
> > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Hu, Jiayu
> > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > >
> > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin wrote:
> > > > > > Hi Jiayu,
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > > > > >
> > > > > > > Hi Konstantin,
> > > > > > >
> > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > >
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > >
> > > > > > > > > BRs,
> > > > > > > > > Jiayu
> > > > > > > > >
> > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > Konstantin wrote:
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > > My comments/questions below.
> > > > > > > > > > Konstantin
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > For applications, DPDK GRO provides three external functions
> > to
> > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > Before using GRO, applications should explicitly call
> > rte_gro_init to
> > > > > > > > > > > initizalize GRO environment. After that, applications can call
> > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable to
> > disable GRO for
> > > > > > > > > > > specific ports.
> > > > > > > > > >
> > > > > > > > > > I think this is too restrictive and wouldn't meet various user's
> > needs.
> > > > > > > > > > User might want to:
> > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > IPV6/TCP for queue 1, etc.
> > > > > > > > >
> > > > > > > > > The reason for enabling/disabling GRO per-port instead of per-
> > queue is that LINUX
> > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > provide more flexibility
> > > > > > > > > to applications. But are there any scenarios that different
> > queues of a port may
> > > > > > > > > require different GRO control (i.e. GRO types and enable/disable
> > GRO)?
> > > > > > > >
> > > > > > > > I think yes.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > - For various reasons, user might prefer not to use RX callbacks
> > for various reasons,
> > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > >
> > > > > > > > > An application-used GRO library can enable more flexibility to
> > applications. Besides,
> > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is an
> > issue that
> > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > GROed packet number. And
> > > > > > > > > the same issue happens in GSO, and even more seriously. This is
> > because applications
> > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst to
> > decide further
> > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > this issue won't exist.
> > > > > > > > > And DPDK is a library, which is not a holistic system like LINUX.
> > We don't need to do
> > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > directly provide SW
> > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > >
> > > > > > > > > > - Many users would like to control size (number of flows/items
> > per flow),
> > > > > > > > > >   max allowed packet size, max timeout, etc., for different GRO
> > tables.
> > > > > > > > > > - User would need a way to flush all or only timeout packets
> > from particular GRO tables.
> > > > > > > > > >
> > > > > > > > > > So I think that API needs to extended to become be much more
> > fine-grained.
> > > > > > > > > > Something like that:
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > >    int32_t socket_id;
> > > > > > > > > >    size_t max_flows;
> > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > >   <probably type specific params>
> > > > > > > > > >   ...
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > rte_gro_tbl_param *param);
> > > > > > > > > >
> > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > >
> > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > grained control APIs to
> > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is it
> > for TCP packets?
> > > > > > > >
> > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >  * process packets, might store some packets inside the GRO
> > table,
> > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > >  */
> > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > /*
> > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > >   */
> > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl, uint64_t
> > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > >
> > > > > > > > > Currently, we implement GRO as RX callback, whose processing
> > logic is simple:
> > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > GRO stops after
> > > > > > > > > finishing processing received packets. If we provide
> > rte_gro_tbl_timeout, when
> > > > > > > > > and who will call it?
> > > > > > > >
> > > > > > > > I mean the following scenario:
> > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > gro_table
> > > > > > > > in expectation - there would be more packets for the same flow.
> > > > > > > > But it could happen that we would never (or for some long time)
> > receive
> > > > > > > > any new packets for that flow.
> > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > or delivered too late.
> > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > and push them to the upper layer.
> > > > > > > > My thought it would be application responsibility to call it from
> > time to time.
> > > > > > > > That's actually another reason why I think we shouldn't use
> > application
> > > > > > > > to always use RX callbacks for GRO.
> > > > > > >
> > > > > > > Currently, we only provide one reassembly function,
> > rte_gro_reassemble_burst,
> > > > > > > which merges N inputted packets at a time. After finishing
> > processing these
> > > > > > > packets, it returns all of them and reset hashing tables. Therefore,
> > there
> > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > returns.
> > > > > >
> > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > So gro_ressemble_burst() performs only inline processing on current
> > input packets
> > > > > > and doesn't try to save packets for future merging, correct?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > Such approach indeed is much lightweight and doesn't require any
> > extra timeouts and flush().
> > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > BTW, I think in that case we don't need any hashtables (or any other
> > persistent strucures)at all.
> > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > perform on given array of packets.
> > > > >
> > > > > Beside GRO types that are desired to perform, maybe it also needs
> > max_pkt_size and
> > > > > some GRO type specific information?
> > > >
> > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > Passing something like struct gro_param seems enough.
> > >
> > > Yes, we can just pass gro_param and allocate hashing tables
> > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > desired GRO types are created and freed in each invocation
> > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > are created by GRO type specific gro_tbl_create_fn. These
> > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > degrade GRO performance greatly.
> >
> > I don't' understand why do we need to put/extract each packet into/from
> > hash table at all.
> > We have N input packets that need to be grouped/sorted  by some criteria.
> > Surely that can be done without any hash-table involved.
> > What is the need for hash table and all the overhead it brings here?
> 
> In current design, I assume all GRO types use hash tables to merge
> packets. The key of the hash table is the criteria to merge packets.
> So the main difference for different GRO types' hash tables is the
> key definition.
> 
> And the reason for using hash tables is to speed up reassembly. Given
> There are N TCP packets inputted, the simplest way to process packets[i]
> Is to traverse processed packets[0]~packets[i-1] and try to find a one
> to merge. In the worst case, we need to check all of packets[0~i-1].
> In this case, the time complexity of processing N packets is O(N^2).
> If we use a hash table, whose key is the criteria to merge two packets,
> the time to find a packet that may be merged with packets[i] is O(1).

I understand that, but add/search inside the hash table,
plus resetting it for every burst of packets doesn't come for free also.
I think that for most common burst sizes (< 100 packets) 
hash overhead would significantly overweight the price of
worst case O(N^2) scan.
Also, if such worst case really worries you, you can  pre-sort
input packets first before starting the actual reassemble for them.
That might help to bring the price down.
Konstantin  

> 
> Do you think it's too complicated?
> 
> Jiayu
> 
> > Konstantin
> >
> > >
> > > But if we ask applications to input hashing tables, what we
> > > need to do is to reset them after finishing using in
> > > rte_gro_reassemble_burst, rather than to malloc and free each
> > > time. Therefore, I think this way is more efficient. How do you
> > > think?
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > If we provide rte_gro_tbl_timeout, we also need to provide another
> > reassmebly
> > > > > > > function, like rte_gro_reassemble, which processes one given
> > packet at a
> > > > > > > time and won't reset hashing tables. Applications decide when to
> > flush packets
> > > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways that
> > can be used
> > > > > > > to flush the packets. Do you mean that?
> > > > > >
> > > > > > Yes, that's what I meant, but as I said above - I think your approach is
> > probably
> > > > > > more preferable - it is much simpler and lightweight.
> > > > > > Konstantin
> > > > > >
  
Hu, Jiayu May 30, 2017, 5:29 a.m. UTC | #13
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, May 29, 2017 8:52 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Jiayu,
> 
> > -----Original Message-----
> > From: Hu, Jiayu
> > Sent: Monday, May 29, 2017 11:23 AM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Sunday, May 28, 2017 12:51 AM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > >
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > >
> > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> wrote:
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Hu, Jiayu
> > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> Konstantin
> > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Jiayu,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Konstantin,
> > > > > > > > > >
> > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > >
> > > > > > > > > > BRs,
> > > > > > > > > > Jiayu
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > Konstantin wrote:
> > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > Konstantin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > For applications, DPDK GRO provides three external
> functions
> > > to
> > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > rte_gro_init to
> > > > > > > > > > > > initizalize GRO environment. After that, applications can
> call
> > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> to
> > > disable GRO for
> > > > > > > > > > > > specific ports.
> > > > > > > > > > >
> > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> user's
> > > needs.
> > > > > > > > > > > User might want to:
> > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > IPV6/TCP for queue 1, etc.
> > > > > > > > > >
> > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> per-
> > > queue is that LINUX
> > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > provide more flexibility
> > > > > > > > > > to applications. But are there any scenarios that different
> > > queues of a port may
> > > > > > > > > > require different GRO control (i.e. GRO types and
> enable/disable
> > > GRO)?
> > > > > > > > >
> > > > > > > > > I think yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > - For various reasons, user might prefer not to use RX
> callbacks
> > > for various reasons,
> > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > >
> > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > applications. Besides,
> > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> an
> > > issue that
> > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > GROed packet number. And
> > > > > > > > > > the same issue happens in GSO, and even more seriously.
> This is
> > > because applications
> > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> to
> > > decide further
> > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > this issue won't exist.
> > > > > > > > > > And DPDK is a library, which is not a holistic system like
> LINUX.
> > > We don't need to do
> > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > directly provide SW
> > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > >
> > > > > > > > > > > - Many users would like to control size (number of
> flows/items
> > > per flow),
> > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> GRO
> > > tables.
> > > > > > > > > > > - User would need a way to flush all or only timeout
> packets
> > > from particular GRO tables.
> > > > > > > > > > >
> > > > > > > > > > > So I think that API needs to extended to become be much
> more
> > > fine-grained.
> > > > > > > > > > > Something like that:
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > >   ...
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > rte_gro_tbl_param *param);
> > > > > > > > > > >
> > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > >
> > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > grained control APIs to
> > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> it
> > > for TCP packets?
> > > > > > > > >
> > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >  * process packets, might store some packets inside the
> GRO
> > > table,
> > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > >  */
> > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > >   */
> > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> uint64_t
> > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > Currently, we implement GRO as RX callback, whose
> processing
> > > logic is simple:
> > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > GRO stops after
> > > > > > > > > > finishing processing received packets. If we provide
> > > rte_gro_tbl_timeout, when
> > > > > > > > > > and who will call it?
> > > > > > > > >
> > > > > > > > > I mean the following scenario:
> > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > gro_table
> > > > > > > > > in expectation - there would be more packets for the same
> flow.
> > > > > > > > > But it could happen that we would never (or for some long
> time)
> > > receive
> > > > > > > > > any new packets for that flow.
> > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > or delivered too late.
> > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > and push them to the upper layer.
> > > > > > > > > My thought it would be application responsibility to call it from
> > > time to time.
> > > > > > > > > That's actually another reason why I think we shouldn't use
> > > application
> > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > >
> > > > > > > > Currently, we only provide one reassembly function,
> > > rte_gro_reassemble_burst,
> > > > > > > > which merges N inputted packets at a time. After finishing
> > > processing these
> > > > > > > > packets, it returns all of them and reset hashing tables.
> Therefore,
> > > there
> > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > returns.
> > > > > > >
> > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > So gro_ressemble_burst() performs only inline processing on
> current
> > > input packets
> > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > extra timeouts and flush().
> > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > BTW, I think in that case we don't need any hashtables (or any
> other
> > > persistent strucures)at all.
> > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > perform on given array of packets.
> > > > > >
> > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > max_pkt_size and
> > > > > > some GRO type specific information?
> > > > >
> > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > Passing something like struct gro_param seems enough.
> > > >
> > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > desired GRO types are created and freed in each invocation
> > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > degrade GRO performance greatly.
> > >
> > > I don't' understand why do we need to put/extract each packet into/from
> > > hash table at all.
> > > We have N input packets that need to be grouped/sorted  by some
> criteria.
> > > Surely that can be done without any hash-table involved.
> > > What is the need for hash table and all the overhead it brings here?
> >
> > In current design, I assume all GRO types use hash tables to merge
> > packets. The key of the hash table is the criteria to merge packets.
> > So the main difference for different GRO types' hash tables is the
> > key definition.
> >
> > And the reason for using hash tables is to speed up reassembly. Given
> > There are N TCP packets inputted, the simplest way to process packets[i]
> > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > to merge. In the worst case, we need to check all of packets[0~i-1].
> > In this case, the time complexity of processing N packets is O(N^2).
> > If we use a hash table, whose key is the criteria to merge two packets,
> > the time to find a packet that may be merged with packets[i] is O(1).
> 
> I understand that, but add/search inside the hash table,
> plus resetting it for every burst of packets doesn't come for free also.
> I think that for most common burst sizes (< 100 packets)
> hash overhead would significantly overweight the price of
> worst case O(N^2) scan.

Yes, the packet burst size is always less than 100, which may amplify
the cost of using hash tables. To avoid the high price, maybe a simpler
structure, like rte_ip_frag_tbl in IP reassembly library, is better. And
actually, I have never tried other designs. In next version, I will use 
a simpler structure to merge TCP packets and compare performance
results. Thanks.

> Also, if such worst case really worries you, you can  pre-sort
> input packets first before starting the actual reassemble for them.

If the inputted N packets are consist of N1 TCP packets and N2 UDP
packets. If we pre-sort them, what should they look like after sorting?

> That might help to bring the price down.
> Konstantin

Jiayu

> 
> >
> > Do you think it's too complicated?
> >
> > Jiayu
> >
> > > Konstantin
> > >
> > > >
> > > > But if we ask applications to input hashing tables, what we
> > > > need to do is to reset them after finishing using in
> > > > rte_gro_reassemble_burst, rather than to malloc and free each
> > > > time. Therefore, I think this way is more efficient. How do you
> > > > think?
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > If we provide rte_gro_tbl_timeout, we also need to provide
> another
> > > reassmebly
> > > > > > > > function, like rte_gro_reassemble, which processes one given
> > > packet at a
> > > > > > > > time and won't reset hashing tables. Applications decide when to
> > > flush packets
> > > > > > > > in hashing tables. And rte_gro_tbl_timeout is one of the ways
> that
> > > can be used
> > > > > > > > to flush the packets. Do you mean that?
> > > > > > >
> > > > > > > Yes, that's what I meant, but as I said above - I think your
> approach is
> > > probably
> > > > > > > more preferable - it is much simpler and lightweight.
> > > > > > > Konstantin
> > > > > > >
  
Ananyev, Konstantin May 30, 2017, 11:56 a.m. UTC | #14
HI Jiayu,

> -----Original Message-----
> From: Hu, Jiayu
> Sent: Tuesday, May 30, 2017 6:29 AM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Monday, May 29, 2017 8:52 PM
> > To: Hu, Jiayu <jiayu.hu@intel.com>
> > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API framework
> >
> > Hi Jiayu,
> >
> > > -----Original Message-----
> > > From: Hu, Jiayu
> > > Sent: Monday, May 29, 2017 11:23 AM
> > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > >
> > > Hi Konstantin,
> > >
> > > > -----Original Message-----
> > > > From: Ananyev, Konstantin
> > > > Sent: Sunday, May 28, 2017 12:51 AM
> > > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > framework
> > > >
> > > >
> > > > Hi Jiayu,
> > > >
> > > > >
> > > > > Hi Konstantin,
> > > > >
> > > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Hu, Jiayu
> > > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > > framework
> > > > > > >
> > > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> > wrote:
> > > > > > > > Hi Jiayu,
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Hu, Jiayu
> > > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > > yuanhan.liu@linux.intel.com
> > > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > > framework
> > > > > > > > >
> > > > > > > > > Hi Konstantin,
> > > > > > > > >
> > > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> > Konstantin
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Jiayu,
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Konstantin,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > > >
> > > > > > > > > > > BRs,
> > > > > > > > > > > Jiayu
> > > > > > > > > > >
> > > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > > Konstantin wrote:
> > > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > > Konstantin
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > For applications, DPDK GRO provides three external
> > functions
> > > > to
> > > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > > rte_gro_init to
> > > > > > > > > > > > > initizalize GRO environment. After that, applications can
> > call
> > > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> > to
> > > > disable GRO for
> > > > > > > > > > > > > specific ports.
> > > > > > > > > > > >
> > > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> > user's
> > > > needs.
> > > > > > > > > > > > User might want to:
> > > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > > IPV6/TCP for queue 1, etc.
> > > > > > > > > > >
> > > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> > per-
> > > > queue is that LINUX
> > > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > > provide more flexibility
> > > > > > > > > > > to applications. But are there any scenarios that different
> > > > queues of a port may
> > > > > > > > > > > require different GRO control (i.e. GRO types and
> > enable/disable
> > > > GRO)?
> > > > > > > > > >
> > > > > > > > > > I think yes.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > - For various reasons, user might prefer not to use RX
> > callbacks
> > > > for various reasons,
> > > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > > >
> > > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > > applications. Besides,
> > > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> > an
> > > > issue that
> > > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > > GROed packet number. And
> > > > > > > > > > > the same issue happens in GSO, and even more seriously.
> > This is
> > > > because applications
> > > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> > to
> > > > decide further
> > > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > > this issue won't exist.
> > > > > > > > > > > And DPDK is a library, which is not a holistic system like
> > LINUX.
> > > > We don't need to do
> > > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > > directly provide SW
> > > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > > >
> > > > > > > > > > > > - Many users would like to control size (number of
> > flows/items
> > > > per flow),
> > > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> > GRO
> > > > tables.
> > > > > > > > > > > > - User would need a way to flush all or only timeout
> > packets
> > > > from particular GRO tables.
> > > > > > > > > > > >
> > > > > > > > > > > > So I think that API needs to extended to become be much
> > more
> > > > fine-grained.
> > > > > > > > > > > > Something like that:
> > > > > > > > > > > >
> > > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > > >   ...
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > > rte_gro_tbl_param *param);
> > > > > > > > > > > >
> > > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > > >
> > > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > > grained control APIs to
> > > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> > it
> > > > for TCP packets?
> > > > > > > > > >
> > > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > /*
> > > > > > > > > > > >  * process packets, might store some packets inside the
> > GRO
> > > > table,
> > > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > > >  */
> > > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > > >
> > > > > > > > > > > > /*
> > > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > > >   */
> > > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> > uint64_t
> > > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > Currently, we implement GRO as RX callback, whose
> > processing
> > > > logic is simple:
> > > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > > GRO stops after
> > > > > > > > > > > finishing processing received packets. If we provide
> > > > rte_gro_tbl_timeout, when
> > > > > > > > > > > and who will call it?
> > > > > > > > > >
> > > > > > > > > > I mean the following scenario:
> > > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > > gro_table
> > > > > > > > > > in expectation - there would be more packets for the same
> > flow.
> > > > > > > > > > But it could happen that we would never (or for some long
> > time)
> > > > receive
> > > > > > > > > > any new packets for that flow.
> > > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > > or delivered too late.
> > > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > > and push them to the upper layer.
> > > > > > > > > > My thought it would be application responsibility to call it from
> > > > time to time.
> > > > > > > > > > That's actually another reason why I think we shouldn't use
> > > > application
> > > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > > >
> > > > > > > > > Currently, we only provide one reassembly function,
> > > > rte_gro_reassemble_burst,
> > > > > > > > > which merges N inputted packets at a time. After finishing
> > > > processing these
> > > > > > > > > packets, it returns all of them and reset hashing tables.
> > Therefore,
> > > > there
> > > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > > returns.
> > > > > > > >
> > > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > > So gro_ressemble_burst() performs only inline processing on
> > current
> > > > input packets
> > > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > > extra timeouts and flush().
> > > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > > BTW, I think in that case we don't need any hashtables (or any
> > other
> > > > persistent strucures)at all.
> > > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > > perform on given array of packets.
> > > > > > >
> > > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > > max_pkt_size and
> > > > > > > some GRO type specific information?
> > > > > >
> > > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > > Passing something like struct gro_param seems enough.
> > > > >
> > > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > > desired GRO types are created and freed in each invocation
> > > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > > degrade GRO performance greatly.
> > > >
> > > > I don't' understand why do we need to put/extract each packet into/from
> > > > hash table at all.
> > > > We have N input packets that need to be grouped/sorted  by some
> > criteria.
> > > > Surely that can be done without any hash-table involved.
> > > > What is the need for hash table and all the overhead it brings here?
> > >
> > > In current design, I assume all GRO types use hash tables to merge
> > > packets. The key of the hash table is the criteria to merge packets.
> > > So the main difference for different GRO types' hash tables is the
> > > key definition.
> > >
> > > And the reason for using hash tables is to speed up reassembly. Given
> > > There are N TCP packets inputted, the simplest way to process packets[i]
> > > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > > to merge. In the worst case, we need to check all of packets[0~i-1].
> > > In this case, the time complexity of processing N packets is O(N^2).
> > > If we use a hash table, whose key is the criteria to merge two packets,
> > > the time to find a packet that may be merged with packets[i] is O(1).
> >
> > I understand that, but add/search inside the hash table,
> > plus resetting it for every burst of packets doesn't come for free also.
> > I think that for most common burst sizes (< 100 packets)
> > hash overhead would significantly overweight the price of
> > worst case O(N^2) scan.
> 
> Yes, the packet burst size is always less than 100, which may amplify
> the cost of using hash tables. To avoid the high price, maybe a simpler
> structure, like rte_ip_frag_tbl in IP reassembly library, is better. And
> actually, I have never tried other designs. In next version, I will use
> a simpler structure to merge TCP packets and compare performance
> results. Thanks.
> 
> > Also, if such worst case really worries you, you can  pre-sort
> > input packets first before starting the actual reassemble for them.
> 
> If the inputted N packets are consist of N1 TCP packets and N2 UDP
> packets. If we pre-sort them, what should they look like after sorting?
>

My thought was something like that:
<tcp_flow0 pkts>...<tcp_flowX pkts> <udp_flow0 pkts>...<udp_flowY pkts>
|------- N1 -------------------------------|--------- N2 -------------------------------|

Konstantin

 
> > That might help to bring the price down.
> > Konstantin
>
  
Hu, Jiayu May 30, 2017, 2:10 p.m. UTC | #15
Hi Bruce,

> -----Original Message-----
> From: Richardson, Bruce
> Sent: Monday, May 29, 2017 8:19 PM
> To: Hu, Jiayu <jiayu.hu@intel.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; dev@dpdk.org;
> Wiles, Keith <keith.wiles@intel.com>; yuanhan.liu@linux.intel.com
> Subject: Re: [dpdk-dev] [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> 
> On Mon, May 29, 2017 at 10:22:57AM +0000, Hu, Jiayu wrote:
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Sunday, May 28, 2017 12:51 AM
> > > To: Hu, Jiayu <jiayu.hu@intel.com>
> > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > Subject: RE: [PATCH v3 1/3] lib: add Generic Receive Offload API
> framework
> > >
> > >
> > > Hi Jiayu,
> > >
> > > >
> > > > Hi Konstantin,
> > > >
> > > > On Sat, May 27, 2017 at 07:12:16PM +0800, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Hu, Jiayu
> > > > > > Sent: Saturday, May 27, 2017 4:42 AM
> > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > >
> > > > > > On Sat, May 27, 2017 at 07:10:21AM +0800, Ananyev, Konstantin
> wrote:
> > > > > > > Hi Jiayu,
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Hu, Jiayu
> > > > > > > > Sent: Friday, May 26, 2017 8:26 AM
> > > > > > > > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > > > > > > > Cc: dev@dpdk.org; Wiles, Keith <keith.wiles@intel.com>;
> > > yuanhan.liu@linux.intel.com
> > > > > > > > Subject: Re: [PATCH v3 1/3] lib: add Generic Receive Offload API
> > > framework
> > > > > > > >
> > > > > > > > Hi Konstantin,
> > > > > > > >
> > > > > > > > On Wed, May 24, 2017 at 08:38:25PM +0800, Ananyev,
> Konstantin
> > > wrote:
> > > > > > > > >
> > > > > > > > > Hi Jiayu,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Konstantin,
> > > > > > > > > >
> > > > > > > > > > Thanks for your comments. My replies/questions are below.
> > > > > > > > > >
> > > > > > > > > > BRs,
> > > > > > > > > > Jiayu
> > > > > > > > > >
> > > > > > > > > > On Mon, May 22, 2017 at 05:19:19PM +0800, Ananyev,
> > > Konstantin wrote:
> > > > > > > > > > > Hi Jiayu,
> > > > > > > > > > > My comments/questions below.
> > > > > > > > > > > Konstantin
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > For applications, DPDK GRO provides three external
> functions
> > > to
> > > > > > > > > > > > enable/disable GRO:
> > > > > > > > > > > > - rte_gro_init: initialize GRO environment;
> > > > > > > > > > > > - rte_gro_enable: enable GRO for a given port;
> > > > > > > > > > > > - rte_gro_disable: disable GRO for a given port.
> > > > > > > > > > > > Before using GRO, applications should explicitly call
> > > rte_gro_init to
> > > > > > > > > > > > initizalize GRO environment. After that, applications can
> call
> > > > > > > > > > > > rte_gro_enable to enable GRO and call rte_gro_disable
> to
> > > disable GRO for
> > > > > > > > > > > > specific ports.
> > > > > > > > > > >
> > > > > > > > > > > I think this is too restrictive and wouldn't meet various
> user's
> > > needs.
> > > > > > > > > > > User might want to:
> > > > > > > > > > > - enable/disable GRO for particular RX queue
> > > > > > > > > > > - or even setup different GRO types for different RX queues,
> > > > > > > > > > >    i.e, - GRO over IPV4/TCP for queue 0, and  GRO over
> > > IPV6/TCP for queue 1, etc.
> > > > > > > > > >
> > > > > > > > > > The reason for enabling/disabling GRO per-port instead of
> per-
> > > queue is that LINUX
> > > > > > > > > > controls GRO per-port. To control GRO per-queue indeed can
> > > provide more flexibility
> > > > > > > > > > to applications. But are there any scenarios that different
> > > queues of a port may
> > > > > > > > > > require different GRO control (i.e. GRO types and
> enable/disable
> > > GRO)?
> > > > > > > > >
> > > > > > > > > I think yes.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > - For various reasons, user might prefer not to use RX
> callbacks
> > > for various reasons,
> > > > > > > > > > >   But invoke gro() manually at somepoint in his code.
> > > > > > > > > >
> > > > > > > > > > An application-used GRO library can enable more flexibility to
> > > applications. Besides,
> > > > > > > > > > when perform GRO in ethdev layer or inside PMD drivers, it is
> an
> > > issue that
> > > > > > > > > > rte_eth_rx_burst returns actually received packet number or
> > > GROed packet number. And
> > > > > > > > > > the same issue happens in GSO, and even more seriously.
> This is
> > > because applications
> > > > > > > > > > , like VPP, always rely on the return value of rte_eth_tx_burst
> to
> > > decide further
> > > > > > > > > > operations. If applications can direcly call GRO/GSO libraries,
> > > this issue won't exist.
> > > > > > > > > > And DPDK is a library, which is not a holistic system like
> LINUX.
> > > We don't need to do
> > > > > > > > > > the same as LINUX. Therefore, maybe it's a better idea to
> > > directly provide SW
> > > > > > > > > > segmentation/reassembling libraries to applications.
> > > > > > > > > >
> > > > > > > > > > > - Many users would like to control size (number of
> flows/items
> > > per flow),
> > > > > > > > > > >   max allowed packet size, max timeout, etc., for different
> GRO
> > > tables.
> > > > > > > > > > > - User would need a way to flush all or only timeout
> packets
> > > from particular GRO tables.
> > > > > > > > > > >
> > > > > > > > > > > So I think that API needs to extended to become be much
> more
> > > fine-grained.
> > > > > > > > > > > Something like that:
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl_param {
> > > > > > > > > > >    int32_t socket_id;
> > > > > > > > > > >    size_t max_flows;
> > > > > > > > > > >    size_t max_items_per_flow;
> > > > > > > > > > >    size_t max_pkt_size;
> > > > > > > > > > >    uint64_t packet_timeout_cycles;
> > > > > > > > > > >    <desired GRO types (IPV4_TCP | IPV6_TCP, ...)>
> > > > > > > > > > >   <probably type specific params>
> > > > > > > > > > >   ...
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > struct rte_gro_tbl;
> > > > > > > > > > > strct rte_gro_tbl *rte_gro_tbl_create(const struct
> > > rte_gro_tbl_param *param);
> > > > > > > > > > >
> > > > > > > > > > > void rte_gro_tbl_destroy(struct rte_gro_tbl *tbl);
> > > > > > > > > >
> > > > > > > > > > Yes, I agree with you. It's necessary to provide more fine-
> > > grained control APIs to
> > > > > > > > > > applications. But what's 'packet_timeout_cycles' used for? Is
> it
> > > for TCP packets?
> > > > > > > > >
> > > > > > > > > For any packets that sits in the gro_table for too long.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >  * process packets, might store some packets inside the
> GRO
> > > table,
> > > > > > > > > > >  * returns number of filled entries in pkt[]
> > > > > > > > > > >  */
> > > > > > > > > > > uint32_t rte_gro_tbl_process(struct rte_gro_tbl *tbl, struct
> > > rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > > >
> > > > > > > > > > > /*
> > > > > > > > > > >   * retirieves up to num timeouted packets from the table.
> > > > > > > > > > >   */
> > > > > > > > > > > uint32_t rtre_gro_tbl_timeout(struct rte_gro_tbl *tbl,
> uint64_t
> > > tmt, struct rte_mbuf *pkt[], uint32_t num);
> > > > > > > > > >
> > > > > > > > > > Currently, we implement GRO as RX callback, whose
> processing
> > > logic is simple:
> > > > > > > > > > receive burst packets -> perform GRO -> return to application.
> > > GRO stops after
> > > > > > > > > > finishing processing received packets. If we provide
> > > rte_gro_tbl_timeout, when
> > > > > > > > > > and who will call it?
> > > > > > > > >
> > > > > > > > > I mean the following scenario:
> > > > > > > > > We receive a packet, find it is eligible for GRO and put it into
> > > gro_table
> > > > > > > > > in expectation - there would be more packets for the same
> flow.
> > > > > > > > > But it could happen that we would never (or for some long
> time)
> > > receive
> > > > > > > > > any new packets for that flow.
> > > > > > > > > So the first packet would never be delivered to the upper layer,
> > > > > > > > > or delivered too late.
> > > > > > > > > So we need a mechanism to extract such not merged packets
> > > > > > > > > and push them to the upper layer.
> > > > > > > > > My thought it would be application responsibility to call it from
> > > time to time.
> > > > > > > > > That's actually another reason why I think we shouldn't use
> > > application
> > > > > > > > > to always use RX callbacks for GRO.
> > > > > > > >
> > > > > > > > Currently, we only provide one reassembly function,
> > > rte_gro_reassemble_burst,
> > > > > > > > which merges N inputted packets at a time. After finishing
> > > processing these
> > > > > > > > packets, it returns all of them and reset hashing tables.
> Therefore,
> > > there
> > > > > > > > are no packets in hashing tables after rte_gro_reassemble_burst
> > > returns.
> > > > > > >
> > > > > > > Ok, sorry I missed that part with rte_hash_reset().
> > > > > > > So gro_ressemble_burst() performs only inline processing on
> current
> > > input packets
> > > > > > > and doesn't try to save packets for future merging, correct?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > Such approach indeed is much lightweight and doesn't require any
> > > extra timeouts and flush().
> > > > > > > So my opinion let's keep it like that - nice and simple.
> > > > > > > BTW, I think in that case we don't need any hashtables (or any
> other
> > > persistent strucures)at all.
> > > > > > > What we need is just a set of GROs (tcp4, tpc6, etc.) we want to
> > > perform on given array of packets.
> > > > > >
> > > > > > Beside GRO types that are desired to perform, maybe it also needs
> > > max_pkt_size and
> > > > > > some GRO type specific information?
> > > > >
> > > > > Yes, but we don't need the actual hash-tables, etc. inside.
> > > > > Passing something like struct gro_param seems enough.
> > > >
> > > > Yes, we can just pass gro_param and allocate hashing tables
> > > > inside rte_gro_reassemble_burst. If so, hashing tables of
> > > > desired GRO types are created and freed in each invocation
> > > > of rte_gro_reassemble_burst. In GRO library, hashing tables
> > > > are created by GRO type specific gro_tbl_create_fn. These
> > > > gro_tbl_create_fn may allocate hashing table space via malloc
> > > > (or rte_malloc). Therefore, we may frequently call malloc/free
> > > > when using rte_gro_reassemble_burst. In my opinion, it will
> > > > degrade GRO performance greatly.
> > >
> > > I don't' understand why do we need to put/extract each packet into/from
> > > hash table at all.
> > > We have N input packets that need to be grouped/sorted  by some
> criteria.
> > > Surely that can be done without any hash-table involved.
> > > What is the need for hash table and all the overhead it brings here?
> >
> > In current design, I assume all GRO types use hash tables to merge
> > packets. The key of the hash table is the criteria to merge packets.
> > So the main difference for different GRO types' hash tables is the
> > key definition.
> >
> > And the reason for using hash tables is to speed up reassembly. Given
> > There are N TCP packets inputted, the simplest way to process packets[i]
> > Is to traverse processed packets[0]~packets[i-1] and try to find a one
> > to merge. In the worst case, we need to check all of packets[0~i-1].
> > In this case, the time complexity of processing N packets is O(N^2).
> > If we use a hash table, whose key is the criteria to merge two packets,
> > the time to find a packet that may be merged with packets[i] is O(1).
> >
> > Do you think it's too complicated?
> >
> > Jiayu
> >
> How big is your expected burst size? If you are expecting 32 or 64

In current design, if the packet burst size is always small (e.g. less than
32 or 64), applications can use a small hash table to reduce the overhead
from the hash table. But it's not a good solution, since the cost is still high
when the burst size is small.

> packets per call, then N is small and the overhead of the hash table
> seems a bit much. Perhaps you need different code paths for bigger and
> smaller bursts?

Yes, if hash tables can bring much better performance for the scenario
whose N is large, maybe we can consider use two code paths.

Jiayu 

> 
> /Bruce
  

Patch

diff --git a/config/common_base b/config/common_base
index 0b4297c..92e97ef 100644
--- a/config/common_base
+++ b/config/common_base
@@ -709,6 +709,11 @@  CONFIG_RTE_LIBRTE_VHOST_DEBUG=n
 CONFIG_RTE_LIBRTE_PMD_VHOST=n
 
 #
+# Compile GRO library
+#
+CONFIG_RTE_LIBRTE_GRO=y
+
+#
 #Compile Xen domain0 support
 #
 CONFIG_RTE_LIBRTE_XEN_DOM0=n
diff --git a/lib/Makefile b/lib/Makefile
index 07e1fd0..e253053 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -106,6 +106,7 @@  DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DEPDIRS-librte_reorder := librte_eal librte_mempool librte_mbuf
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DEPDIRS-librte_pdump := librte_eal librte_mempool librte_mbuf librte_ether
+DIRS-$(CONFIG_RTE_LIBRTE_GRO) += librte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
new file mode 100644
index 0000000..fb3a36c
--- /dev/null
+++ b/lib/librte_gro/Makefile
@@ -0,0 +1,50 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_gro.a
+
+CFLAGS += -O3
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
+
+EXPORT_MAP := rte_gro_version.map
+
+LIBABIVER := 1
+
+#source files
+SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
new file mode 100644
index 0000000..996b382
--- /dev/null
+++ b/lib/librte_gro/rte_gro.c
@@ -0,0 +1,219 @@ 
+#include <rte_ethdev.h>
+#include <rte_mbuf.h>
+#include <rte_hash.h>
+#include <stdint.h>
+#include <rte_malloc.h>
+
+#include "rte_gro.h"
+#include "rte_gro_common.h"
+
+gro_reassemble_fn reassemble_functions[GRO_TYPE_MAX_NB] = {NULL};
+gro_tbl_create_fn tbl_create_functions[GRO_TYPE_MAX_NB] = {NULL};
+
+struct rte_gro_status *gro_status;
+
+/**
+ * Internal function. It creates one hashing table for all
+ * DPDK-supported GRO types, and all of them are stored in an object
+ * of struct rte_gro_tbl.
+ *
+ * @param name
+ *  Name for GRO lookup table
+ * @param nb_entries
+ *  Element number of each hashing table
+ * @param socket_id
+ *  socket id
+ * @param gro_tbl
+ *  gro_tbl points to a rte_gro_tbl object, which will be initalized
+ *  inside rte_gro_tbl_setup.
+ * @return
+ *  If create successfully, return a positive value; if not, return
+ *  a negative value.
+ */
+static int
+rte_gro_tbl_setup(char *name, uint32_t nb_entries,
+		uint16_t socket_id, struct rte_gro_tbl *gro_tbl)
+{
+	gro_tbl_create_fn create_tbl_fn;
+	const uint32_t len = strlen(name) + 10;
+	char tbl_name[len];
+	int i;
+
+	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		sprintf(tbl_name, "%s_%u", name, i);
+		create_tbl_fn = tbl_create_functions[i];
+		if (create_tbl_fn && (create_tbl_fn(name,
+						nb_entries,
+						socket_id,
+						&(gro_tbl->
+							lkp_tbls[i].hash_tbl))
+					< 0)) {
+			return -1;
+		}
+		gro_tbl->lkp_tbls[i].gro_type = i;
+	}
+	return 1;
+}
+
+/**
+ * Internal function. It frees all the hashing tables stored in
+ * the given struct rte_gro_tbl object.
+ */
+static void
+rte_gro_tbl_destroy(struct rte_gro_tbl *gro_tbl)
+{
+	int i;
+
+	if (gro_tbl == NULL)
+		return;
+	for (i = 0; i < GRO_SUPPORT_TYPE_NB; i++) {
+		rte_hash_free(gro_tbl->lkp_tbls[i].hash_tbl);
+		gro_tbl->lkp_tbls[i].hash_tbl = NULL;
+		gro_tbl->lkp_tbls[i].gro_type = GRO_EMPTY_TYPE;
+	}
+}
+
+/**
+ * Internal function. It performs all supported GRO types on inputted
+ * packets. For example, if current DPDK GRO supports TCP/IPv4 and
+ * TCP/IPv6 GRO, this functions just reassembles TCP/IPv4 and TCP/IPv6
+ * packets. Packets of unsupported GRO types won't be processed. For
+ * ethernet devices, which want to support GRO, this function is used to
+ * registered as RX callback for all queues.
+ *
+ * @param pkts
+ *  Packets to reassemble.
+ * @param nb_pkts
+ *  The number of packets to reassemble.
+ * @param gro_tbl
+ *  pointer points to an object of struct rte_gro_tbl, which has been
+ *  initialized by rte_gro_tbl_setup.
+ * @return
+ *  Packet number after GRO. If reassemble successfully, the value is
+ *  less than nb_pkts; if not, the value is equal to nb_pkts. If the
+ *  parameters are invalid, return 0.
+ */
+static uint16_t
+rte_gro_reassemble_burst(uint8_t port __rte_unused,
+		uint16_t queue __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *gro_tbl)
+{
+	if ((gro_tbl == NULL) || (pkts == NULL)) {
+		printf("invalid parameters for GRO.\n");
+		return 0;
+	}
+	uint16_t nb_after_gro = nb_pkts;
+
+	return nb_after_gro;
+}
+
+void
+rte_gro_init(void)
+{
+	uint8_t nb_port, i;
+	uint16_t nb_queue;
+	struct rte_eth_dev_info dev_info;
+
+	/* if init already, return immediately */
+	if (gro_status) {
+		printf("repeatly init GRO environment\n");
+		return;
+	}
+
+	gro_status = (struct rte_gro_status *)rte_zmalloc(
+			NULL,
+			sizeof(struct rte_gro_status),
+			0);
+
+	nb_port = rte_eth_dev_count();
+	gro_status->ports = (struct gro_port_status *)rte_zmalloc(
+			NULL,
+			nb_port * sizeof(struct gro_port_status),
+			0);
+	gro_status->nb_port = nb_port;
+
+	for (i = 0; i < nb_port; i++) {
+		rte_eth_dev_info_get(i, &dev_info);
+		nb_queue = dev_info.nb_rx_queues;
+		gro_status->ports[i].gro_tbls =
+			(struct rte_gro_tbl **)rte_zmalloc(
+					NULL,
+					nb_queue * sizeof(struct rte_gro_tbl *),
+					0);
+		gro_status->ports[i].gro_cbs =
+			(struct rte_eth_rxtx_callback **)
+			rte_zmalloc(
+					NULL,
+					nb_queue *
+					sizeof(struct rte_eth_rxtx_callback *),
+					0);
+	}
+}
+
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id)
+{
+	if (gro_status->ports[port_id].gro_enable) {
+		printf("port %u has enabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue, i;
+	struct rte_eth_dev_info dev_info;
+	char tbl_name[20];
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (i = 0; i < nb_queue; i++) {
+		struct rte_gro_tbl *gro_tbl;
+
+		/* allocate hashing tables for this port */
+		sprintf(tbl_name, "GRO_TBL_%u", port_id);
+		gro_tbl = (struct rte_gro_tbl *)rte_malloc
+			(NULL, sizeof(struct rte_gro_tbl), 0);
+		rte_gro_tbl_setup(tbl_name,
+				GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB,
+				socket_id,
+				gro_tbl);
+		gro_status->ports[port_id].gro_tbls[i] = gro_tbl;
+		/**
+		 * register GRO reassembly function as a rx callback for each
+		 * queue of this port.
+		 */
+		gro_status->ports[port_id].gro_cbs[i] =
+			rte_eth_add_rx_callback
+			(port_id, i,
+			 rte_gro_reassemble_burst,
+			 gro_tbl);
+	}
+	gro_status->ports[port_id].gro_enable = 1;
+}
+
+void
+rte_gro_disable(uint8_t port_id)
+{
+	if (gro_status->ports[port_id].gro_enable == 0) {
+		printf("port %u has disabled GRO\n", port_id);
+		return;
+	}
+	uint16_t nb_queue, i;
+	struct rte_eth_dev_info dev_info;
+
+	rte_eth_dev_info_get(port_id, &dev_info);
+	nb_queue = dev_info.nb_rx_queues;
+
+	for (i = 0; i < nb_queue; i++) {
+		/* free all hashing tables */
+		rte_gro_tbl_destroy(gro_status->ports[port_id].gro_tbls[i]);
+		gro_status->ports[port_id].gro_tbls[i] = NULL;
+
+		/* remove GRO rx callback */
+		rte_eth_remove_rx_callback(port_id, i,
+				gro_status->ports[port_id].gro_cbs[i]);
+		gro_status->ports[port_id].gro_cbs[i] = NULL;
+	}
+	gro_status->ports[port_id].gro_enable = 0;
+}
diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
new file mode 100644
index 0000000..c84378e
--- /dev/null
+++ b/lib/librte_gro/rte_gro.h
@@ -0,0 +1,29 @@ 
+#ifndef _RTE_GRO_H_
+#define _RTE_GRO_H_
+
+/**
+ * Initialize GRO environment for all ports. It should be called after
+ * configuring all ethernet devices, and should be called just once.
+ */
+void
+rte_gro_init(void);
+
+/**
+ * Enable GRO for a given port.
+ * @param port_id
+ *  The id of the port that is to enable GRO.
+ * @param socket_id
+ *  The NUMA socket id to which the ethernet device is connected.
+ *  By default, it's value is SOCKET_ID_ANY.
+ */
+void
+rte_gro_enable(uint8_t port_id, uint16_t socket_id);
+
+/**
+ * Disable GRO for a given port.
+ * @param port_id
+ *  The idd of the port that disables GRO.
+ */
+void
+rte_gro_disable(uint8_t port_id);
+#endif
diff --git a/lib/librte_gro/rte_gro_common.h b/lib/librte_gro/rte_gro_common.h
new file mode 100644
index 0000000..611d833
--- /dev/null
+++ b/lib/librte_gro/rte_gro_common.h
@@ -0,0 +1,75 @@ 
+#ifndef _GRO_COMMON_H_
+#define _GRO_COMMON_H_
+
+/**
+ * the maximum number of supported GRO types
+ */
+#define GRO_TYPE_MAX_NB 256
+/**
+ * flag indicates empty GRO type
+ */
+#define GRO_EMPTY_TYPE 255
+/**
+ * current supported GRO types number
+ */
+#define GRO_SUPPORT_TYPE_NB 0
+
+/**
+ * default element number of the hashing table
+ */
+#define GRO_DEFAULT_LOOKUP_TABLE_ENTRY_NB 64
+
+/**
+ * Structure to store addresses of all hashing tables.
+ */
+struct rte_gro_lkp_tbl {
+	struct rte_hash *hash_tbl;
+	uint8_t gro_type;
+};
+struct rte_gro_tbl {
+	struct rte_gro_lkp_tbl lkp_tbls[GRO_SUPPORT_TYPE_NB];
+};
+
+/**
+ * Item-list structure.
+ */
+struct gro_item_list {
+	void *items;	/**< item array */
+	uint16_t nb_item;	/**< item number */
+};
+
+/**
+ * Each packet has an object of gro_info, which records the GRO
+ * information related to this packet.
+ */
+struct gro_info {
+	struct gro_item_list item_list;	/**< pre-allocated item-list */
+	/**< packets number that are merged with it */
+	uint16_t nb_merged_packets;
+	uint8_t gro_type;	/**< GRO type that the packet is performed */
+};
+
+/**
+ * Record GRO information for each port.
+ */
+struct gro_port_status {
+	struct rte_gro_tbl **gro_tbls;
+	struct rte_eth_rxtx_callback **gro_cbs;
+	uint8_t gro_enable;	/* flag indicates if the port enables GRO */
+};
+
+struct rte_gro_status {
+	struct gro_port_status *ports;
+	uint8_t nb_port;
+};
+
+typedef int (*gro_tbl_create_fn)(
+		char *name,
+		uint32_t nb_entries,
+		uint16_t socket_id,
+		struct rte_hash **hash_tbl);
+
+typedef int32_t (*gro_reassemble_fn)(
+		struct rte_hash *hash_tbl,
+		struct gro_item_list *item_list);
+#endif
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index b5215c0..8956821 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -98,6 +98,7 @@  _LDLIBS-$(CONFIG_RTE_LIBRTE_RING)           += -lrte_ring
 _LDLIBS-$(CONFIG_RTE_LIBRTE_EAL)            += -lrte_eal
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_REORDER)        += -lrte_reorder
+_LDLIBS-$(CONFIG_RTE_LIBRTE_GRO)        	+= -lrte_gro
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 _LDLIBS-$(CONFIG_RTE_LIBRTE_KNI)            += -lrte_kni