patch 'net/mlx5: fix hairpin queue unbind' has been queued to stable release 22.11.4

Xueming Li xuemingl at nvidia.com
Mon Dec 11 11:11:55 CET 2023


Hi,

FYI, your patch has been queued to stable release 22.11.4

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 12/13/23. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Queued patches are on a temporary branch at:
https://git.dpdk.org/dpdk-stable/log/?h=22.11-staging

This queued commit can be viewed at:
https://git.dpdk.org/dpdk-stable/commit/?h=22.11-staging&id=ab4675324505ddcdfecdafa1e336f2c83e1b80f8

Thanks.

Xueming Li <xuemingl at nvidia.com>

---
>From ab4675324505ddcdfecdafa1e336f2c83e1b80f8 Mon Sep 17 00:00:00 2001
From: Dariusz Sosnowski <dsosnowski at nvidia.com>
Date: Thu, 9 Nov 2023 20:01:09 +0200
Subject: [PATCH] net/mlx5: fix hairpin queue unbind
Cc: Xueming Li <xuemingl at nvidia.com>

[ upstream commit ab2439f80bdf94e2382efe941cf827da6710b5d7 ]

Let's take an application with the following configuration:

- It uses 2 ports.
- Each port has 3 Rx queues and 3 Tx queues.
- On each port, Rx queues have a following purposes:
  - Rx queue 0 - SW queue,
  - Rx queue 1 - hairpin queue, bound to Tx queue on the same port,
  - Rx queue 2 - hairpin queue, bound to Tx queue on another port.
- On each port, Tx queues have a following purposes:
  - Tx queue 0 - SW queue,
  - Tx queue 1 - hairpin queue, bound to Rx queue on the same port,
  - Tx queue 2 - hairpin queue, bound to Rx queue on another port.
- Application configured all of the hairpin queues for manual binding.

After ports are configured and queues are set up,
if the application does the following API call sequence:

1. rte_eth_dev_start(port_id=0)
2. rte_eth_hairpin_bind(tx_port=0, rx_port=0)
3. rte_eth_hairpin_bind(tx_port=0, rx_port=1)

mlx5 PMD fails to modify SQ and logs this error:

  mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq():
    Failed to modify SQ using DevX

This error was caused by an incorrect unbind operation taken during
error handling inside call (3).

(3) fails, because port 1 (Rx side of the hairpin) was not started.
As a result of this failure, PMD goes into error handling, where all
previously bound hairpin queues are unbound.
This is incorrect, since this error handling procedure
in rte_eth_hairpin_bind() implementation assumes that
all hairpin queues are bound to the same rx_port, which is not the case.
The following sequence of function calls appears:

- rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0),
- mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1).

Which violates the hairpin queue destroy flow, by unbinding Tx queue 1
on port 0, before unbinding Rx queue 1 on port 1.

This patch fixes that behavior, by filtering Tx queues on which error
handling is done to only affect:

- hairpin queues (it also reduces unnecessary debug log messages),
- hairpin queues connected to the rx_port which is currently processed.

Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode")

Signed-off-by: Dariusz Sosnowski <dsosnowski at nvidia.com>
Acked-by: Viacheslav Ovsiienko <viacheslavo at nvidia.com>
---
 drivers/net/mlx5/mlx5_trigger.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 5bf637a0cd..80d2a60473 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -845,6 +845,11 @@ error:
 		txq_ctrl = mlx5_txq_get(dev, i);
 		if (txq_ctrl == NULL)
 			continue;
+		if (!txq_ctrl->is_hairpin ||
+		    txq_ctrl->hairpin_conf.peers[0].port != rx_port) {
+			mlx5_txq_release(dev, i);
+			continue;
+		}
 		rx_queue = txq_ctrl->hairpin_conf.peers[0].queue;
 		rte_eth_hairpin_queue_peer_unbind(rx_port, rx_queue, 0);
 		mlx5_hairpin_queue_peer_unbind(dev, i, 1);
-- 
2.25.1

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2023-12-11 17:56:26.018245300 +0800
+++ 0090-net-mlx5-fix-hairpin-queue-unbind.patch	2023-12-11 17:56:23.187652300 +0800
@@ -1 +1 @@
-From ab2439f80bdf94e2382efe941cf827da6710b5d7 Mon Sep 17 00:00:00 2001
+From ab4675324505ddcdfecdafa1e336f2c83e1b80f8 Mon Sep 17 00:00:00 2001
@@ -4,0 +5,3 @@
+Cc: Xueming Li <xuemingl at nvidia.com>
+
+[ upstream commit ab2439f80bdf94e2382efe941cf827da6710b5d7 ]
@@ -56 +58,0 @@
-Cc: stable at dpdk.org
@@ -65 +67 @@
-index 7694140537..4b5becc10c 100644
+index 5bf637a0cd..80d2a60473 100644


More information about the stable mailing list