[dpdk-dev,v2,15/30] net/mlx5: add Hash Rx queue object

Message ID b9f5293b987f153cd4d70d5c890c4871b14d1533.1507207731.git.nelio.laranjeiro@6wind.com (mailing list archive)
State Changes Requested, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail apply patch file failure

Commit Message

Nélio Laranjeiro Oct. 5, 2017, 12:49 p.m. UTC
  Hash Rx queue is an high level queue providing the RSS hash algorithm, key
and indirection table to spread the packets.  Those objects can be easily
shared between several Verbs flows.  This commit bring this capability to
the PMD.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.c      |   3 +
 drivers/net/mlx5/mlx5.h      |   3 +-
 drivers/net/mlx5/mlx5_flow.c | 228 ++++++++++++++++++++++++-------------------
 drivers/net/mlx5/mlx5_rxq.c  | 165 +++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxtx.h |  17 ++++
 5 files changed, 312 insertions(+), 104 deletions(-)
  

Comments

Yongseok Koh Oct. 6, 2017, 4:59 a.m. UTC | #1
On Thu, Oct 05, 2017 at 02:49:47PM +0200, Nelio Laranjeiro wrote:
[...]
> +struct mlx5_hrxq*
> +mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
> +		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
> +{
> +	struct mlx5_hrxq *hrxq;
> +
> +	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
> +		struct mlx5_ind_table_ibv *ind_tbl;
> +
> +		if (hrxq->rss_key_len != rss_key_len)
> +			continue;
> +		if (memcmp(hrxq->rss_key, rss_key, rss_key_len))
> +			continue;
> +		if (hrxq->hash_fields != hash_fields)
> +			continue;
> +		ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
> +		if (!ind_tbl)
> +			continue;
> +		if (ind_tbl != hrxq->ind_table) {
> +			mlx5_priv_ind_table_ibv_release(priv, ind_tbl);

As one hrxq can have only one ind_tbl, it looks unnecessary to increment refcnt
of ind_tbl. As long as a hrxq exist, its ind_tbl can't be destroyed. So, it's
safe. How about moving up this _release() outside of this if-clause and remove
_release() in _hrxq_release()?

However, it is logically flawless, so
Acked-by: Yongseok Koh <yskoh@mellanox.com>
 
Thanks
  
Nélio Laranjeiro Oct. 6, 2017, 7:03 a.m. UTC | #2
On Thu, Oct 05, 2017 at 09:59:58PM -0700, Yongseok Koh wrote:
> On Thu, Oct 05, 2017 at 02:49:47PM +0200, Nelio Laranjeiro wrote:
> [...]
> > +struct mlx5_hrxq*
> > +mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
> > +		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
> > +{
> > +	struct mlx5_hrxq *hrxq;
> > +
> > +	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
> > +		struct mlx5_ind_table_ibv *ind_tbl;
> > +
> > +		if (hrxq->rss_key_len != rss_key_len)
> > +			continue;
> > +		if (memcmp(hrxq->rss_key, rss_key, rss_key_len))
> > +			continue;
> > +		if (hrxq->hash_fields != hash_fields)
> > +			continue;
> > +		ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
> > +		if (!ind_tbl)
> > +			continue;
> > +		if (ind_tbl != hrxq->ind_table) {
> > +			mlx5_priv_ind_table_ibv_release(priv, ind_tbl);
> 
> As one hrxq can have only one ind_tbl, it looks unnecessary to increment refcnt
> of ind_tbl. As long as a hrxq exist, its ind_tbl can't be destroyed. So, it's
> safe. How about moving up this _release() outside of this if-clause and remove
> _release() in _hrxq_release()?

This is right, but in the other side, an indirection table can be used
by several hash rx queues, that is the main reason why they have their
own reference counter.


  +-------+  +-------+
  | Hrxq  |  | Hrxq  |
  | r = 1 |  | r = 1 |
  +-------+  +-------+
      |          |
      v          v
 +-------------------+
 | indirection table |
 | r = 2             |
 +-------------------+

Seems logical to make the Indirection table counter evolve the same way
as the hash rx queue, otherwise a second hash rx queue using this
indirection may release it whereas it is still in use by another hash rx
queue.

> However, it is logically flawless, so
> Acked-by: Yongseok Koh <yskoh@mellanox.com>

Thanks,
  
Yongseok Koh Oct. 6, 2017, 10:50 p.m. UTC | #3
On Fri, Oct 06, 2017 at 09:03:25AM +0200, Nélio Laranjeiro wrote:
> On Thu, Oct 05, 2017 at 09:59:58PM -0700, Yongseok Koh wrote:
> > On Thu, Oct 05, 2017 at 02:49:47PM +0200, Nelio Laranjeiro wrote:
> > [...]
> > > +struct mlx5_hrxq*
> > > +mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
> > > +		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
> > > +{
> > > +	struct mlx5_hrxq *hrxq;
> > > +
> > > +	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
> > > +		struct mlx5_ind_table_ibv *ind_tbl;
> > > +
> > > +		if (hrxq->rss_key_len != rss_key_len)
> > > +			continue;
> > > +		if (memcmp(hrxq->rss_key, rss_key, rss_key_len))
> > > +			continue;
> > > +		if (hrxq->hash_fields != hash_fields)
> > > +			continue;
> > > +		ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
> > > +		if (!ind_tbl)
> > > +			continue;
> > > +		if (ind_tbl != hrxq->ind_table) {
> > > +			mlx5_priv_ind_table_ibv_release(priv, ind_tbl);
> > 
> > As one hrxq can have only one ind_tbl, it looks unnecessary to increment refcnt
> > of ind_tbl. As long as a hrxq exist, its ind_tbl can't be destroyed. So, it's
> > safe. How about moving up this _release() outside of this if-clause and remove
> > _release() in _hrxq_release()?
> 
> This is right, but in the other side, an indirection table can be used
> by several hash rx queues, that is the main reason why they have their
> own reference counter.
> 
> 
>   +-------+  +-------+
>   | Hrxq  |  | Hrxq  |
>   | r = 1 |  | r = 1 |
>   +-------+  +-------+
>       |          |
>       v          v
>  +-------------------+
>  | indirection table |
>  | r = 2             |
>  +-------------------+
> 
> Seems logical to make the Indirection table counter evolve the same way
> as the hash rx queue, otherwise a second hash rx queue using this
> indirection may release it whereas it is still in use by another hash rx
> queue.

Whenever a hash Rx queue is created, it gets to have a ind_tbl either by
mlx5_priv_ind_table_ibv_get() or by mlx5_priv_ind_table_ibv_new(). So, the
refcnt of the ind_tbl is already increased. So, even if other hash RxQ which
have had the ind_tbl releases it, it is safe. That's why I don't think
ind_tbl->refcnt needs to get increased on calling mlx5_priv_hrxq_get(). Makes
sense?

Thanks,
Yongseok
  
Nélio Laranjeiro Oct. 9, 2017, 8:05 a.m. UTC | #4
On Fri, Oct 06, 2017 at 03:50:06PM -0700, Yongseok Koh wrote:
> On Fri, Oct 06, 2017 at 09:03:25AM +0200, Nélio Laranjeiro wrote:
> > On Thu, Oct 05, 2017 at 09:59:58PM -0700, Yongseok Koh wrote:
> > > On Thu, Oct 05, 2017 at 02:49:47PM +0200, Nelio Laranjeiro wrote:
> > > [...]
> > > > +struct mlx5_hrxq*
> > > > +mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
> > > > +		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
> > > > +{
> > > > +	struct mlx5_hrxq *hrxq;
> > > > +
> > > > +	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
> > > > +		struct mlx5_ind_table_ibv *ind_tbl;
> > > > +
> > > > +		if (hrxq->rss_key_len != rss_key_len)
> > > > +			continue;
> > > > +		if (memcmp(hrxq->rss_key, rss_key, rss_key_len))
> > > > +			continue;
> > > > +		if (hrxq->hash_fields != hash_fields)
> > > > +			continue;
> > > > +		ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
> > > > +		if (!ind_tbl)
> > > > +			continue;
> > > > +		if (ind_tbl != hrxq->ind_table) {
> > > > +			mlx5_priv_ind_table_ibv_release(priv, ind_tbl);
> > > 
> > > As one hrxq can have only one ind_tbl, it looks unnecessary to increment refcnt
> > > of ind_tbl. As long as a hrxq exist, its ind_tbl can't be destroyed. So, it's
> > > safe. How about moving up this _release() outside of this if-clause and remove
> > > _release() in _hrxq_release()?
> > 
> > This is right, but in the other side, an indirection table can be used
> > by several hash rx queues, that is the main reason why they have their
> > own reference counter.
> > 
> > 
> >   +-------+  +-------+
> >   | Hrxq  |  | Hrxq  |
> >   | r = 1 |  | r = 1 |
> >   +-------+  +-------+
> >       |          |
> >       v          v
> >  +-------------------+
> >  | indirection table |
> >  | r = 2             |
> >  +-------------------+
> > 
> > Seems logical to make the Indirection table counter evolve the same way
> > as the hash rx queue, otherwise a second hash rx queue using this
> > indirection may release it whereas it is still in use by another hash rx
> > queue.
> 
> Whenever a hash Rx queue is created, it gets to have a ind_tbl either by
> mlx5_priv_ind_table_ibv_get() or by mlx5_priv_ind_table_ibv_new(). So, the
> refcnt of the ind_tbl is already increased. So, even if other hash RxQ which
> have had the ind_tbl releases it, it is safe. That's why I don't think
> ind_tbl->refcnt needs to get increased on calling mlx5_priv_hrxq_get(). Makes
> sense?

It make sense, but in this situation, the whole patches needs to be
modified to follow this design, the current one being, it needs an
object it gets a reference,  it does not need it anymore, it release the
reference.  Which mean a get() in a high level object causes a get() on
underlying ones.  A release on high level objects causes a release() on
underlying ones.  In this case, a flow will handle a reference on all
objects which contains a reference counter and used by it, even the
hidden ones.

Currently it won't hurt as it is a control plane point which already
rely on a lot of system calls.

Can we agree on letting the design as is for this release and maybe
changing it in the next one?

Thanks,
  
Yongseok Koh Oct. 9, 2017, 1:48 p.m. UTC | #5
On Oct 9, 2017, at 1:05 AM, Nélio Laranjeiro <nelio.laranjeiro@6wind.com<mailto:nelio.laranjeiro@6wind.com>> wrote:

Can we agree on letting the design as is for this release and maybe
changing it in the next one?
Sure, I totally agree. I didn’t want to stop you. Like I mentioned, as there’s no logical flaw, I acked the patches. Please go with v3.

Thanks
Yongseok
  

Patch

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 929f0df..2860480 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -235,6 +235,9 @@  mlx5_dev_close(struct rte_eth_dev *dev)
 	if (priv->reta_idx != NULL)
 		rte_free(priv->reta_idx);
 	priv_socket_uninit(priv);
+	ret = mlx5_priv_hrxq_ibv_verify(priv);
+	if (ret)
+		WARN("%p: some Hash Rx queue still remain", (void *)priv);
 	ret = mlx5_priv_ind_table_ibv_verify(priv);
 	if (ret)
 		WARN("%p: some Indirection table still remain", (void *)priv);
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index ab17ce6..77413c9 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -144,11 +144,12 @@  struct priv {
 	struct rte_intr_handle intr_handle; /* Interrupt handler. */
 	unsigned int (*reta_idx)[]; /* RETA index table. */
 	unsigned int reta_idx_n; /* RETA index size. */
-	struct rte_flow_drop *flow_drop_queue; /* Flow drop queue. */
+	struct mlx5_hrxq_drop *flow_drop_queue; /* Flow drop queue. */
 	TAILQ_HEAD(mlx5_flows, rte_flow) flows; /* RTE Flow rules. */
 	LIST_HEAD(mr, mlx5_mr) mr; /* Memory region. */
 	LIST_HEAD(rxq, mlx5_rxq_ctrl) rxqsctrl; /* DPDK Rx queues. */
 	LIST_HEAD(rxqibv, mlx5_rxq_ibv) rxqsibv; /* Verbs Rx queues. */
+	LIST_HEAD(hrxq, mlx5_hrxq) hrxqs; /* Verbs Hash Rx queues. */
 	LIST_HEAD(txq, mlx5_txq_ctrl) txqsctrl; /* DPDK Tx queues. */
 	LIST_HEAD(txqibv, mlx5_txq_ibv) txqsibv; /* Verbs Tx queues. */
 	/* Verbs Indirection tables. */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index dc9adeb..4948882 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -87,17 +87,37 @@  mlx5_flow_create_vxlan(const struct rte_flow_item *item,
 		       const void *default_mask,
 		       void *data);
 
-struct rte_flow {
-	TAILQ_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
-	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
-	struct mlx5_ind_table_ibv *ind_table; /**< Indirection table. */
+/** Structure for Drop queue. */
+struct mlx5_hrxq_drop {
+	struct ibv_rwq_ind_table *ind_table; /**< Indirection table. */
 	struct ibv_qp *qp; /**< Verbs queue pair. */
-	struct ibv_flow *ibv_flow; /**< Verbs flow. */
 	struct ibv_wq *wq; /**< Verbs work queue. */
 	struct ibv_cq *cq; /**< Verbs completion queue. */
+};
+
+/* Flows structures. */
+struct mlx5_flow {
+	uint64_t hash_fields; /**< Fields that participate in the hash. */
+	struct mlx5_hrxq *hrxq; /**< Hash Rx queues. */
+};
+
+/* Drop flows structures. */
+struct mlx5_flow_drop {
+	struct mlx5_hrxq_drop hrxq; /**< Drop hash Rx queue. */
+};
+
+struct rte_flow {
+	TAILQ_ENTRY(rte_flow) next; /**< Pointer to the next flow structure. */
 	uint32_t mark:1; /**< Set if the flow is marked. */
 	uint32_t drop:1; /**< Drop queue. */
-	uint64_t hash_fields; /**< Fields that participate in the hash. */
+	struct ibv_flow_attr *ibv_attr; /**< Pointer to Verbs attributes. */
+	struct ibv_flow *ibv_flow; /**< Verbs flow. */
+	uint16_t queues_n; /**< Number of entries in queue[]. */
+	uint16_t (*queues)[]; /**< Queues indexes to use. */
+	union {
+		struct mlx5_flow frxq; /**< Flow with Rx queue. */
+		struct mlx5_flow_drop drxq; /**< Flow with drop Rx queue. */
+	};
 };
 
 /** Static initializer for items. */
@@ -288,14 +308,6 @@  struct mlx5_flow_parse {
 	struct mlx5_flow_action actions; /**< Parsed action result. */
 };
 
-/** Structure for Drop queue. */
-struct rte_flow_drop {
-	struct ibv_rwq_ind_table *ind_table; /**< Indirection table. */
-	struct ibv_qp *qp; /**< Verbs queue pair. */
-	struct ibv_wq *wq; /**< Verbs work queue. */
-	struct ibv_cq *cq; /**< Verbs completion queue. */
-};
-
 static const struct rte_flow_ops mlx5_flow_ops = {
 	.validate = mlx5_flow_validate,
 	.create = mlx5_flow_create,
@@ -1052,8 +1064,8 @@  priv_flow_create_action_queue_drop(struct priv *priv,
 	rte_flow->ibv_attr = flow->ibv_attr;
 	if (!priv->dev->data->dev_started)
 		return rte_flow;
-	rte_flow->qp = priv->flow_drop_queue->qp;
-	rte_flow->ibv_flow = ibv_create_flow(rte_flow->qp,
+	rte_flow->drxq.hrxq.qp = priv->flow_drop_queue->qp;
+	rte_flow->ibv_flow = ibv_create_flow(rte_flow->drxq.hrxq.qp,
 					     rte_flow->ibv_attr);
 	if (!rte_flow->ibv_flow) {
 		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -1091,62 +1103,52 @@  priv_flow_create_action_queue(struct priv *priv,
 	assert(priv->pd);
 	assert(priv->ctx);
 	assert(!flow->actions.drop);
-	rte_flow = rte_calloc(__func__, 1, sizeof(*rte_flow), 0);
+	rte_flow =
+		rte_calloc(__func__, 1,
+			   sizeof(*flow) +
+			   flow->actions.queues_n * sizeof(uint16_t),
+			   0);
 	if (!rte_flow) {
 		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
 				   NULL, "cannot allocate flow memory");
 		return NULL;
 	}
-	for (i = 0; i != flow->actions.queues_n; ++i) {
-		struct mlx5_rxq_data *q =
-			(*priv->rxqs)[flow->actions.queues[i]];
-
-		q->mark |= flow->actions.mark;
-	}
 	rte_flow->mark = flow->actions.mark;
 	rte_flow->ibv_attr = flow->ibv_attr;
-	rte_flow->hash_fields = flow->hash_fields;
-	rte_flow->ind_table =
-		mlx5_priv_ind_table_ibv_get(priv, flow->actions.queues,
-					    flow->actions.queues_n);
-	if (!rte_flow->ind_table) {
-		rte_flow->ind_table =
-			mlx5_priv_ind_table_ibv_new(priv, flow->actions.queues,
-						    flow->actions.queues_n);
-		if (!rte_flow->ind_table) {
-			rte_flow_error_set(error, ENOMEM,
-					   RTE_FLOW_ERROR_TYPE_HANDLE,
-					   NULL,
-					   "cannot allocate indirection table");
-			goto error;
-		}
+	rte_flow->queues = (uint16_t (*)[])(rte_flow + 1);
+	memcpy(rte_flow->queues, flow->actions.queues,
+	       flow->actions.queues_n * sizeof(uint16_t));
+	rte_flow->queues_n = flow->actions.queues_n;
+	rte_flow->frxq.hash_fields = flow->hash_fields;
+	rte_flow->frxq.hrxq = mlx5_priv_hrxq_get(priv, rss_hash_default_key,
+						 rss_hash_default_key_len,
+						 flow->hash_fields,
+						 (*rte_flow->queues),
+						 rte_flow->queues_n);
+	if (rte_flow->frxq.hrxq) {
+		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
+				   NULL, "duplicated flow");
+		goto error;
 	}
-	rte_flow->qp = ibv_create_qp_ex(
-		priv->ctx,
-		&(struct ibv_qp_init_attr_ex){
-			.qp_type = IBV_QPT_RAW_PACKET,
-			.comp_mask =
-				IBV_QP_INIT_ATTR_PD |
-				IBV_QP_INIT_ATTR_IND_TABLE |
-				IBV_QP_INIT_ATTR_RX_HASH,
-			.rx_hash_conf = (struct ibv_rx_hash_conf){
-				.rx_hash_function =
-					IBV_RX_HASH_FUNC_TOEPLITZ,
-				.rx_hash_key_len = rss_hash_default_key_len,
-				.rx_hash_key = rss_hash_default_key,
-				.rx_hash_fields_mask = rte_flow->hash_fields,
-			},
-			.rwq_ind_tbl = rte_flow->ind_table->ind_table,
-			.pd = priv->pd
-		});
-	if (!rte_flow->qp) {
+	rte_flow->frxq.hrxq = mlx5_priv_hrxq_new(priv, rss_hash_default_key,
+						 rss_hash_default_key_len,
+						 flow->hash_fields,
+						 (*rte_flow->queues),
+						 rte_flow->queues_n);
+	if (!rte_flow->frxq.hrxq) {
 		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
-				   NULL, "cannot allocate QP");
+				   NULL, "cannot create hash rxq");
 		goto error;
 	}
+	for (i = 0; i != flow->actions.queues_n; ++i) {
+		struct mlx5_rxq_data *q =
+			(*priv->rxqs)[flow->actions.queues[i]];
+
+		q->mark |= flow->actions.mark;
+	}
 	if (!priv->dev->data->dev_started)
 		return rte_flow;
-	rte_flow->ibv_flow = ibv_create_flow(rte_flow->qp,
+	rte_flow->ibv_flow = ibv_create_flow(rte_flow->frxq.hrxq->qp,
 					     rte_flow->ibv_attr);
 	if (!rte_flow->ibv_flow) {
 		rte_flow_error_set(error, ENOMEM, RTE_FLOW_ERROR_TYPE_HANDLE,
@@ -1156,10 +1158,8 @@  priv_flow_create_action_queue(struct priv *priv,
 	return rte_flow;
 error:
 	assert(rte_flow);
-	if (rte_flow->qp)
-		ibv_destroy_qp(rte_flow->qp);
-	if (rte_flow->ind_table)
-		mlx5_priv_ind_table_ibv_release(priv, rte_flow->ind_table);
+	if (rte_flow->frxq.hrxq)
+		mlx5_priv_hrxq_release(priv, rte_flow->frxq.hrxq);
 	rte_free(rte_flow);
 	return NULL;
 }
@@ -1277,45 +1277,43 @@  priv_flow_destroy(struct priv *priv,
 		  struct rte_flow *flow)
 {
 	unsigned int i;
+	uint16_t *queues;
+	uint16_t queues_n;
 
-	TAILQ_REMOVE(&priv->flows, flow, next);
-	if (flow->ibv_flow)
-		claim_zero(ibv_destroy_flow(flow->ibv_flow));
-	if (flow->drop)
+	if (flow->drop || !flow->mark)
 		goto free;
-	if (flow->qp)
-		claim_zero(ibv_destroy_qp(flow->qp));
-	for (i = 0; i != flow->ind_table->queues_n; ++i) {
+	queues = flow->frxq.hrxq->ind_table->queues;
+	queues_n = flow->frxq.hrxq->ind_table->queues_n;
+	for (i = 0; i != queues_n; ++i) {
 		struct rte_flow *tmp;
-		struct mlx5_rxq_data *rxq_data =
-			(*priv->rxqs)[flow->ind_table->queues[i]];
+		struct mlx5_rxq_data *rxq_data = (*priv->rxqs)[queues[i]];
+		int mark = 0;
 
 		/*
 		 * To remove the mark from the queue, the queue must not be
 		 * present in any other marked flow (RSS or not).
 		 */
-		if (flow->mark) {
-			int mark = 0;
-
-			TAILQ_FOREACH(tmp, &priv->flows, next) {
-				unsigned int j;
-
-				if (tmp->drop)
-					continue;
-				if (!tmp->mark)
-					continue;
-				for (j = 0;
-				     (j != tmp->ind_table->queues_n) && !mark;
-				     j++)
-					if (tmp->ind_table->queues[j] ==
-					    flow->ind_table->queues[i])
-						mark = 1;
-			}
-			rxq_data->mark = mark;
+		TAILQ_FOREACH(tmp, &priv->flows, next) {
+			unsigned int j;
+
+			if (!tmp->mark)
+				continue;
+			for (j = 0;
+			     (j != tmp->frxq.hrxq->ind_table->queues_n) &&
+			     !mark;
+			     j++)
+				if (tmp->frxq.hrxq->ind_table->queues[j] ==
+				    queues[i])
+					mark = 1;
 		}
+		rxq_data->mark = mark;
 	}
-	mlx5_priv_ind_table_ibv_release(priv, flow->ind_table);
 free:
+	if (flow->ibv_flow)
+		claim_zero(ibv_destroy_flow(flow->ibv_flow));
+	if (!flow->drop)
+		mlx5_priv_hrxq_release(priv, flow->frxq.hrxq);
+	TAILQ_REMOVE(&priv->flows, flow, next);
 	rte_free(flow->ibv_attr);
 	DEBUG("Flow destroyed %p", (void *)flow);
 	rte_free(flow);
@@ -1389,7 +1387,7 @@  mlx5_flow_flush(struct rte_eth_dev *dev,
 static int
 priv_flow_create_drop_queue(struct priv *priv)
 {
-	struct rte_flow_drop *fdq = NULL;
+	struct mlx5_hrxq_drop *fdq = NULL;
 
 	assert(priv->pd);
 	assert(priv->ctx);
@@ -1472,7 +1470,7 @@  priv_flow_create_drop_queue(struct priv *priv)
 static void
 priv_flow_delete_drop_queue(struct priv *priv)
 {
-	struct rte_flow_drop *fdq = priv->flow_drop_queue;
+	struct mlx5_hrxq_drop *fdq = priv->flow_drop_queue;
 
 	if (!fdq)
 		return;
@@ -1504,9 +1502,12 @@  priv_flow_stop(struct priv *priv)
 	TAILQ_FOREACH_REVERSE(flow, &priv->flows, mlx5_flows, next) {
 		claim_zero(ibv_destroy_flow(flow->ibv_flow));
 		flow->ibv_flow = NULL;
+		mlx5_priv_hrxq_release(priv, flow->frxq.hrxq);
+		flow->frxq.hrxq = NULL;
 		if (flow->mark) {
 			unsigned int n;
-			struct mlx5_ind_table_ibv *ind_tbl = flow->ind_table;
+			struct mlx5_ind_table_ibv *ind_tbl =
+				flow->frxq.hrxq->ind_table;
 
 			for (n = 0; n < ind_tbl->queues_n; ++n)
 				(*priv->rxqs)[ind_tbl->queues[n]]->mark = 0;
@@ -1535,13 +1536,31 @@  priv_flow_start(struct priv *priv)
 	if (ret)
 		return -1;
 	TAILQ_FOREACH(flow, &priv->flows, next) {
-		struct ibv_qp *qp;
-
-		if (flow->drop)
-			qp = priv->flow_drop_queue->qp;
-		else
-			qp = flow->qp;
-		flow->ibv_flow = ibv_create_flow(qp, flow->ibv_attr);
+		if (flow->frxq.hrxq)
+			goto flow_create;
+		flow->frxq.hrxq =
+			mlx5_priv_hrxq_get(priv, rss_hash_default_key,
+					   rss_hash_default_key_len,
+					   flow->frxq.hash_fields,
+					   (*flow->queues),
+					   flow->queues_n);
+		if (flow->frxq.hrxq)
+			goto flow_create;
+		flow->frxq.hrxq =
+			mlx5_priv_hrxq_new(priv, rss_hash_default_key,
+					   rss_hash_default_key_len,
+					   flow->frxq.hash_fields,
+					   (*flow->queues),
+					   flow->queues_n);
+		if (!flow->frxq.hrxq) {
+			DEBUG("Flow %p cannot be applied",
+			      (void *)flow);
+			rte_errno = EINVAL;
+			return rte_errno;
+		}
+flow_create:
+		flow->ibv_flow = ibv_create_flow(flow->frxq.hrxq->qp,
+						 flow->ibv_attr);
 		if (!flow->ibv_flow) {
 			DEBUG("Flow %p cannot be applied", (void *)flow);
 			rte_errno = EINVAL;
@@ -1551,8 +1570,11 @@  priv_flow_start(struct priv *priv)
 		if (flow->mark) {
 			unsigned int n;
 
-			for (n = 0; n < flow->ind_table->queues_n; ++n) {
-				uint16_t idx = flow->ind_table->queues[n];
+			for (n = 0;
+			     n < flow->frxq.hrxq->ind_table->queues_n;
+			     ++n) {
+				uint16_t idx =
+					flow->frxq.hrxq->ind_table->queues[n];
 				(*priv->rxqs)[idx]->mark = 1;
 			}
 		}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 4a53282..b240c16 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -1775,3 +1775,168 @@  mlx5_priv_ind_table_ibv_verify(struct priv *priv)
 	}
 	return ret;
 }
+
+/**
+ * Create an Rx Hash queue.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rss_key
+ *   RSS key for the Rx hash queue.
+ * @param rss_key_len
+ *   RSS key length.
+ * @param hash_fields
+ *   Verbs protocol hash field to make the RSS on.
+ * @param queues
+ *   Queues entering in hash queue.
+ * @param queues_n
+ *   Number of queues.
+ *
+ * @return
+ *   An hash Rx queue on success.
+ */
+struct mlx5_hrxq*
+mlx5_priv_hrxq_new(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
+		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
+{
+	struct mlx5_hrxq *hrxq;
+	struct mlx5_ind_table_ibv *ind_tbl;
+	struct ibv_qp *qp;
+
+	ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
+	if (!ind_tbl)
+		ind_tbl = mlx5_priv_ind_table_ibv_new(priv, queues, queues_n);
+	if (!ind_tbl)
+		return NULL;
+	qp = ibv_create_qp_ex(
+		priv->ctx,
+		&(struct ibv_qp_init_attr_ex){
+			.qp_type = IBV_QPT_RAW_PACKET,
+			.comp_mask =
+				IBV_QP_INIT_ATTR_PD |
+				IBV_QP_INIT_ATTR_IND_TABLE |
+				IBV_QP_INIT_ATTR_RX_HASH,
+			.rx_hash_conf = (struct ibv_rx_hash_conf){
+				.rx_hash_function = IBV_RX_HASH_FUNC_TOEPLITZ,
+				.rx_hash_key_len = rss_key_len,
+				.rx_hash_key = rss_key,
+				.rx_hash_fields_mask = hash_fields,
+			},
+			.rwq_ind_tbl = ind_tbl->ind_table,
+			.pd = priv->pd,
+		});
+	if (!qp)
+		goto error;
+	hrxq = rte_calloc(__func__, 1, sizeof(*hrxq) + rss_key_len, 0);
+	if (!hrxq)
+		goto error;
+	hrxq->ind_table = ind_tbl;
+	hrxq->qp = qp;
+	hrxq->rss_key_len = rss_key_len;
+	hrxq->hash_fields = hash_fields;
+	memcpy(hrxq->rss_key, rss_key, rss_key_len);
+	rte_atomic32_inc(&hrxq->refcnt);
+	LIST_INSERT_HEAD(&priv->hrxqs, hrxq, next);
+	DEBUG("%p: Hash Rx queue %p: refcnt %d", (void *)priv,
+	      (void *)hrxq, rte_atomic32_read(&hrxq->refcnt));
+	return hrxq;
+error:
+	mlx5_priv_ind_table_ibv_release(priv, ind_tbl);
+	if (qp)
+		claim_zero(ibv_destroy_qp(qp));
+	return NULL;
+}
+
+/**
+ * Get an Rx Hash queue.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rss_conf
+ *   RSS configuration for the Rx hash queue.
+ * @param queues
+ *   Queues entering in hash queue.
+ * @param queues_n
+ *   Number of queues.
+ *
+ * @return
+ *   An hash Rx queue on success.
+ */
+struct mlx5_hrxq*
+mlx5_priv_hrxq_get(struct priv *priv, uint8_t *rss_key, uint8_t rss_key_len,
+		   uint64_t hash_fields, uint16_t queues[], uint16_t queues_n)
+{
+	struct mlx5_hrxq *hrxq;
+
+	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
+		struct mlx5_ind_table_ibv *ind_tbl;
+
+		if (hrxq->rss_key_len != rss_key_len)
+			continue;
+		if (memcmp(hrxq->rss_key, rss_key, rss_key_len))
+			continue;
+		if (hrxq->hash_fields != hash_fields)
+			continue;
+		ind_tbl = mlx5_priv_ind_table_ibv_get(priv, queues, queues_n);
+		if (!ind_tbl)
+			continue;
+		if (ind_tbl != hrxq->ind_table) {
+			mlx5_priv_ind_table_ibv_release(priv, ind_tbl);
+			continue;
+		}
+		rte_atomic32_inc(&hrxq->refcnt);
+		DEBUG("%p: Hash Rx queue %p: refcnt %d", (void *)priv,
+		      (void *)hrxq, rte_atomic32_read(&hrxq->refcnt));
+		return hrxq;
+	}
+	return NULL;
+}
+
+/**
+ * Release the hash Rx queue.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param hrxq
+ *   Pointer to Hash Rx queue to release.
+ *
+ * @return
+ *   0 on success, errno value on failure.
+ */
+int
+mlx5_priv_hrxq_release(struct priv *priv, struct mlx5_hrxq *hrxq)
+{
+	DEBUG("%p: Hash Rx queue %p: refcnt %d", (void *)priv,
+	      (void *)hrxq, rte_atomic32_read(&hrxq->refcnt));
+	if (rte_atomic32_dec_and_test(&hrxq->refcnt)) {
+		claim_zero(ibv_destroy_qp(hrxq->qp));
+		mlx5_priv_ind_table_ibv_release(priv, hrxq->ind_table);
+		LIST_REMOVE(hrxq, next);
+		rte_free(hrxq);
+		return 0;
+	}
+	claim_nonzero(mlx5_priv_ind_table_ibv_release(priv, hrxq->ind_table));
+	return EBUSY;
+}
+
+/**
+ * Verify the Rx Queue list is empty
+ *
+ * @param priv
+ *  Pointer to private structure.
+ *
+ * @return the number of object not released.
+ */
+int
+mlx5_priv_hrxq_ibv_verify(struct priv *priv)
+{
+	struct mlx5_hrxq *hrxq;
+	int ret = 0;
+
+	LIST_FOREACH(hrxq, &priv->hrxqs, next) {
+		DEBUG("%p: Verbs Hash Rx queue %p still referenced",
+		      (void *)priv, (void *)hrxq);
+		++ret;
+	}
+	return ret;
+}
diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index 1b6dc97..30ce810 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -165,6 +165,17 @@  struct mlx5_ind_table_ibv {
 	uint16_t queues[]; /**< Queue list. */
 };
 
+/* Hash Rx queue. */
+struct mlx5_hrxq {
+	LIST_ENTRY(mlx5_hrxq) next; /* Pointer to the next element. */
+	rte_atomic32_t refcnt; /* Reference counter. */
+	struct mlx5_ind_table_ibv *ind_table; /* Indirection table. */
+	struct ibv_qp *qp; /* Verbs queue pair. */
+	uint64_t hash_fields; /* Verbs Hash fields. */
+	uint8_t rss_key_len; /* Hash key length in bytes. */
+	uint8_t rss_key[]; /* Hash key. */
+};
+
 /* Hash RX queue types. */
 enum hash_rxq_type {
 	HASH_RXQ_TCPV4,
@@ -362,6 +373,12 @@  struct mlx5_ind_table_ibv *mlx5_priv_ind_table_ibv_get(struct priv *,
 						       uint16_t);
 int mlx5_priv_ind_table_ibv_release(struct priv *, struct mlx5_ind_table_ibv *);
 int mlx5_priv_ind_table_ibv_verify(struct priv *);
+struct mlx5_hrxq *mlx5_priv_hrxq_new(struct priv *, uint8_t *, uint8_t,
+				     uint64_t, uint16_t [], uint16_t);
+struct mlx5_hrxq *mlx5_priv_hrxq_get(struct priv *, uint8_t *, uint8_t,
+				     uint64_t, uint16_t [], uint16_t);
+int mlx5_priv_hrxq_release(struct priv *, struct mlx5_hrxq *);
+int mlx5_priv_hrxq_ibv_verify(struct priv *);
 
 /* mlx5_txq.c */