[PATCH 21.11] net/mlx5: fix hairpin queue unbind

Kevin Traynor ktraynor at redhat.com
Thu Nov 23 11:48:45 CET 2023


On 17/11/2023 09:48, Dariusz Sosnowski wrote:
> [ upstream commit ab2439f80bdf94e2382efe941cf827da6710b5d7 ]
> 
> Let's take an application with the following configuration:
> 
> - It uses 2 ports.
> - Each port has 3 Rx queues and 3 Tx queues.
> - On each port, Rx queues have a following purposes:
>    - Rx queue 0 - SW queue,
>    - Rx queue 1 - hairpin queue, bound to Tx queue on the same port,
>    - Rx queue 2 - hairpin queue, bound to Tx queue on another port.
> - On each port, Tx queues have a following purposes:
>    - Tx queue 0 - SW queue,
>    - Tx queue 1 - hairpin queue, bound to Rx queue on the same port,
>    - Tx queue 2 - hairpin queue, bound to Rx queue on another port.
> - Application configured all of the hairpin queues for manual binding.
> 
> After ports are configured and queues are set up,
> if the application does the following API call sequence:
> 
> 1. rte_eth_dev_start(port_id=0)
> 2. rte_eth_hairpin_bind(tx_port=0, rx_port=0)
> 3. rte_eth_hairpin_bind(tx_port=0, rx_port=1)
> 
> mlx5 PMD fails to modify SQ and logs this error:
> 
>    mlx5_common: mlx5_devx_cmds.c:2079: mlx5_devx_cmd_modify_sq():
>      Failed to modify SQ using DevX
> 
> This error was caused by an incorrect unbind operation taken during
> error handling inside call (3).
> 
> (3) fails, because port 1 (Rx side of the hairpin) was not started.
> As a result of this failure, PMD goes into error handling, where all
> previously bound hairpin queues are unbound.
> This is incorrect, since this error handling procedure
> in rte_eth_hairpin_bind() implementation assumes that
> all hairpin queues are bound to the same rx_port, which is not the case.
> The following sequence of function calls appears:
> 
> - rte_eth_hairpin_queue_peer_unbind(rx_port=**1**, rx_queue=1, 0),
> - mlx5_hairpin_queue_peer_unbind(dev=**port 0**, tx_queue=1, 1).
> 
> Which violates the hairpin queue destroy flow, by unbinding Tx queue 1
> on port 0, before unbinding Rx queue 1 on port 1.
> 
> This patch fixes that behavior, by filtering Tx queues on which error
> handling is done to only affect:
> 
> - hairpin queues (it also reduces unnecessary debug log messages),
> - hairpin queues connected to the rx_port which is currently processed.
> 
> Fixes: 37cd4501e873 ("net/mlx5: support two ports hairpin mode")
> Cc:stable at dpdk.org
> 
> Signed-off-by: Dariusz Sosnowski<dsosnowski at nvidia.com>
> Acked-by: Viacheslav Ovsiienko<viacheslavo at nvidia.com>
> ---
>   drivers/net/mlx5/mlx5_trigger.c | 5 +++++
>   1 file changed, 5 insertions(+)

Applied to 21.11 branch. Thanks for backporting.



More information about the stable mailing list