[dpdk-dev] net/mlx5: fix deadlock of link status alarm

Message ID 20180110174649.33662-1-yskoh@mellanox.com (mailing list archive)
State Accepted, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Yongseok Koh Jan. 10, 2018, 5:46 p.m. UTC
  If mlx5_dev_link_status_handler() is executed while canceling the alarm,
deadlock can happen because rte_eal_alarm_cancel() waits for all callbackes
to finish execution and both calls are protected by priv->lock.

Fixes: 198a3c339a8f ("mlx5: handle link status interrupts")
Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
---
 drivers/net/mlx5/mlx5.h        | 16 ++++++++++++++++
 drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++----
 2 files changed, 25 insertions(+), 4 deletions(-)
  

Comments

Shahaf Shuler Jan. 15, 2018, 6:45 a.m. UTC | #1
Wednesday, January 10, 2018 7:47 PM, Yongseok Koh:
> If mlx5_dev_link_status_handler() is executed while canceling the alarm,
> deadlock can happen because rte_eal_alarm_cancel() waits for all callbackes
> to finish execution and both calls are protected by priv->lock.
> 
> Fixes: 198a3c339a8f ("mlx5: handle link status interrupts")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> Acked-by: Nelio Laranjeiro <nelio.laranjeiro@6wind.com>
> ---
>  drivers/net/mlx5/mlx5.h        | 16 ++++++++++++++++
>  drivers/net/mlx5/mlx5_ethdev.c | 13 +++++++++----
>  2 files changed, 25 insertions(+), 4 deletions(-)
> 

Applied to next-net-mlx, thanks.
  

Patch

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 8ee522069..e740a4e77 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -180,6 +180,22 @@  priv_lock(struct priv *priv)
 }
 
 /**
+ * Try to lock private structure to protect it from concurrent access in the
+ * control path.
+ *
+ * @param priv
+ *   Pointer to private structure.
+ *
+ * @return
+ *   1 if the lock is successfully taken; 0 otherwise.
+ */
+static inline int
+priv_trylock(struct priv *priv)
+{
+	return rte_spinlock_trylock(&priv->lock);
+}
+
+/**
  * Unlock private structure.
  *
  * @param priv
diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index 278a4dfc3..618d13d5f 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -1187,8 +1187,12 @@  mlx5_dev_link_status_handler(void *arg)
 	struct priv *priv = dev->data->dev_private;
 	int ret;
 
-	priv_lock(priv);
-	assert(priv->pending_alarm == 1);
+	while (!priv_trylock(priv)) {
+		/* Alarm is being canceled. */
+		if (priv->pending_alarm == 0)
+			return;
+		rte_pause();
+	}
 	priv->pending_alarm = 0;
 	ret = priv_link_status_update(priv);
 	priv_unlock(priv);
@@ -1258,9 +1262,10 @@  priv_dev_interrupt_handler_uninstall(struct priv *priv, struct rte_eth_dev *dev)
 	if (priv->primary_socket)
 		rte_intr_callback_unregister(&priv->intr_handle_socket,
 					     mlx5_dev_handler_socket, dev);
-	if (priv->pending_alarm)
+	if (priv->pending_alarm) {
+		priv->pending_alarm = 0;
 		rte_eal_alarm_cancel(mlx5_dev_link_status_handler, dev);
-	priv->pending_alarm = 0;
+	}
 	priv->intr_handle.fd = 0;
 	priv->intr_handle.type = RTE_INTR_HANDLE_UNKNOWN;
 	priv->intr_handle_socket.fd = 0;