[dpdk-dev,2/2] net/mlx5: panic when destroying a queue in use

Message ID d3c84543c14abf4cade0195e3d6386e4f18bf746.1491924075.git.nelio.laranjeiro@6wind.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Nélio Laranjeiro April 11, 2017, 3:21 p.m. UTC
  Since the queue release API does not allow failures (return value is void)
and the flow API does not allow a queue to be released as long as a flow
rule depends on it, the only rational decision to avoid undefined behavior
is to panic in this situation.

Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 drivers/net/mlx5/mlx5.h      |  1 +
 drivers/net/mlx5/mlx5_flow.c | 31 +++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_rxq.c  |  4 ++++
 3 files changed, 36 insertions(+)
  

Comments

Ferruh Yigit April 12, 2017, 4:48 p.m. UTC | #1
On 4/11/2017 4:21 PM, Nelio Laranjeiro wrote:
> Since the queue release API does not allow failures (return value is void)
> and the flow API does not allow a queue to be released as long as a flow
> rule depends on it, the only rational decision to avoid undefined behavior
> is to panic in this situation.
> 
> Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

<...>

> @@ -1248,6 +1249,9 @@ mlx5_rx_queue_release(void *dpdk_rxq)
>  	rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
>  	priv = rxq_ctrl->priv;
>  	priv_lock(priv);
> +	if (priv_flow_rxq_in_use(priv, rxq))
> +		rte_panic("Rx queue %p is still used by a flow and cannot be"
> +			  " removed\n", (void *)rxq_ctrl);

Actually, this adds exit code to the PMD, not sure this is good idea.

There was a patch to remove them from libraries. The exit decision
should be belong to the application, not to the PMD, what do you think?

>  	for (i = 0; (i != priv->rxqs_n); ++i)
>  		if ((*priv->rxqs)[i] == rxq) {
>  			DEBUG("%p: removing RX queue %p from list",
>
  
Adrien Mazarguil April 12, 2017, 5:04 p.m. UTC | #2
On Wed, Apr 12, 2017 at 05:48:40PM +0100, Ferruh Yigit wrote:
> On 4/11/2017 4:21 PM, Nelio Laranjeiro wrote:
> > Since the queue release API does not allow failures (return value is void)
> > and the flow API does not allow a queue to be released as long as a flow
> > rule depends on it, the only rational decision to avoid undefined behavior
> > is to panic in this situation.
> > 
> > Signed-off-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> > Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> 
> <...>
> 
> > @@ -1248,6 +1249,9 @@ mlx5_rx_queue_release(void *dpdk_rxq)
> >  	rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
> >  	priv = rxq_ctrl->priv;
> >  	priv_lock(priv);
> > +	if (priv_flow_rxq_in_use(priv, rxq))
> > +		rte_panic("Rx queue %p is still used by a flow and cannot be"
> > +			  " removed\n", (void *)rxq_ctrl);
> 
> Actually, this adds exit code to the PMD, not sure this is good idea.
> 
> There was a patch to remove them from libraries. The exit decision
> should be belong to the application, not to the PMD, what do you think?

I think rte_panic() makes sense in such cases. It's a voluntary crash
similar to assert() (BUG_ON() for kernel folks) to avoids memory corruption
and other nasty things which unfortunately can occur when applications
happen to do things in the wrong order.

In this specific case, it is caused by shortcomings in the current API,
namely the lack of a return value for this function, which cannot be changed
at this stage. This is not a fix as there is not really a commit to blame.

> 
> >  	for (i = 0; (i != priv->rxqs_n); ++i)
> >  		if ((*priv->rxqs)[i] == rxq) {
> >  			DEBUG("%p: removing RX queue %p from list",
> > 
>
  

Patch

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index e8bf361..71e19ec 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -313,5 +313,6 @@  int mlx5_flow_destroy(struct rte_eth_dev *, struct rte_flow *,
 int mlx5_flow_flush(struct rte_eth_dev *, struct rte_flow_error *);
 int priv_flow_start(struct priv *);
 void priv_flow_stop(struct priv *);
+int priv_flow_rxq_in_use(struct priv *, struct rxq *);
 
 #endif /* RTE_PMD_MLX5_H_ */
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index c735c43..8d62f85 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1535,3 +1535,34 @@  priv_flow_start(struct priv *priv)
 	}
 	return 0;
 }
+
+/**
+ * Verify if the Rx queue is used in a flow.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ * @param rxq
+ *   Pointer to the queue to search.
+ *
+ * @return
+ *   Nonzero if the queue is used by a flow.
+ */
+int
+priv_flow_rxq_in_use(struct priv *priv, struct rxq *rxq)
+{
+	struct rte_flow *flow;
+
+	for (flow = LIST_FIRST(&priv->flows);
+	     flow;
+	     flow = LIST_NEXT(flow, next)) {
+		unsigned int n;
+
+		if (flow->drop)
+			continue;
+		for (n = 0; n < flow->rxqs_n; ++n) {
+			if (flow->rxqs[n] == rxq)
+				return 1;
+		}
+	}
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 5566482..8b78233 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -59,6 +59,7 @@ 
 #include <rte_ethdev.h>
 #include <rte_common.h>
 #include <rte_interrupts.h>
+#include <rte_debug.h>
 #ifdef PEDANTIC
 #pragma GCC diagnostic error "-Wpedantic"
 #endif
@@ -1248,6 +1249,9 @@  mlx5_rx_queue_release(void *dpdk_rxq)
 	rxq_ctrl = container_of(rxq, struct rxq_ctrl, rxq);
 	priv = rxq_ctrl->priv;
 	priv_lock(priv);
+	if (priv_flow_rxq_in_use(priv, rxq))
+		rte_panic("Rx queue %p is still used by a flow and cannot be"
+			  " removed\n", (void *)rxq_ctrl);
 	for (i = 0; (i != priv->rxqs_n); ++i)
 		if ((*priv->rxqs)[i] == rxq) {
 			DEBUG("%p: removing RX queue %p from list",