mlx5/net: fix release of SQ resources in error flow

Message ID 20201028171040.6476-1-talshn@nvidia.com (mailing list archive)
State Accepted, archived
Delegated to: Raslan Darawsheh
Headers
Series mlx5/net: fix release of SQ resources in error flow |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-testing success Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/travis-robot success Travis build: passed
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Tal Shnaiderman Oct. 28, 2020, 5:10 p.m. UTC
  Fix in error flow in which the function mlx5_txq_release_devx_sq_resources
is called twice by setting the release object to NULL after the first call

The incorrect flow was introduced in the work done on generic
object creation.

Once an error flow inside mlx5_txq_create_devx_sq_resources
occurs the function will call mlx5_txq_release_devx_sq_resources
however the released pointers are not set to NULL after the release
calls and undefined memory is released in the same call in
mlx5_txq_release_devx_resources.

This results in calls to MLX5_FREE with
an already released memory addresses and assert in mlx5_release_dbr:

EAL: Error: Invalid memory
EAL: Error: Invalid memory

PANIC in mlx5_txq_release_devx_sq_resources():
assert "(mlx5_release_dbr(&txq_obj->txq_ctrl->priv->dbrpgs,
 mlx5_os_get_umem_id (txq_obj->sq_dbrec_page->umem),
 txq_obj->sq_dbrec_offset)) == 0" failed

The fix is setting the released pointers to NULL after the first release
calls.

Fixes: 86d259cec852 ("net/mlx5: separate Tx queue object creations")
Cc: stable@dpdk.org

Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
 drivers/net/mlx5/mlx5_devx.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)
  

Comments

Raslan Darawsheh Nov. 5, 2020, 12:08 p.m. UTC | #1
Hi,

> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Tal Shnaiderman
> Sent: Wednesday, October 28, 2020 7:11 PM
> To: dev@dpdk.org
> Cc: NBU-Contact-Thomas Monjalon <thomas@monjalon.net>; Matan Azrad
> <matan@nvidia.com>; Shahaf Shuler <shahafs@nvidia.com>; Slava
> Ovsiienko <viacheslavo@nvidia.com>; stable@dpdk.org
> Subject: [dpdk-dev] [PATCH] mlx5/net: fix release of SQ resources in error
> flow
> 
> Fix in error flow in which the function mlx5_txq_release_devx_sq_resources
> is called twice by setting the release object to NULL after the first call
> 
> The incorrect flow was introduced in the work done on generic
> object creation.
> 
> Once an error flow inside mlx5_txq_create_devx_sq_resources
> occurs the function will call mlx5_txq_release_devx_sq_resources
> however the released pointers are not set to NULL after the release
> calls and undefined memory is released in the same call in
> mlx5_txq_release_devx_resources.
> 
> This results in calls to MLX5_FREE with
> an already released memory addresses and assert in mlx5_release_dbr:
> 
> EAL: Error: Invalid memory
> EAL: Error: Invalid memory
> 
> PANIC in mlx5_txq_release_devx_sq_resources():
> assert "(mlx5_release_dbr(&txq_obj->txq_ctrl->priv->dbrpgs,
>  mlx5_os_get_umem_id (txq_obj->sq_dbrec_page->umem),
>  txq_obj->sq_dbrec_offset)) == 0" failed
> 
> The fix is setting the released pointers to NULL after the first release
> calls.
> 
> Fixes: 86d259cec852 ("net/mlx5: separate Tx queue object creations")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Tal Shnaiderman <talshn@nvidia.com>
> Acked-by: Matan Azrad <matan@nvidia.com>
> ---
>  drivers/net/mlx5/mlx5_devx.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
Patch applied to next-net-mlx,
Kindest regards,
Raslan Darawsheh
  

Patch

diff --git a/drivers/net/mlx5/mlx5_devx.c b/drivers/net/mlx5/mlx5_devx.c
index 11bda32557..5f5f2f2444 100644
--- a/drivers/net/mlx5/mlx5_devx.c
+++ b/drivers/net/mlx5/mlx5_devx.c
@@ -948,17 +948,25 @@  mlx5_txq_obj_hairpin_new(struct rte_eth_dev *dev, uint16_t idx)
 static void
 mlx5_txq_release_devx_sq_resources(struct mlx5_txq_obj *txq_obj)
 {
-	if (txq_obj->sq_devx)
+	if (txq_obj->sq_devx) {
 		claim_zero(mlx5_devx_cmd_destroy(txq_obj->sq_devx));
-	if (txq_obj->sq_umem)
+		txq_obj->sq_devx = NULL;
+	}
+	if (txq_obj->sq_umem) {
 		claim_zero(mlx5_glue->devx_umem_dereg(txq_obj->sq_umem));
-	if (txq_obj->sq_buf)
+		txq_obj->sq_umem = NULL;
+	}
+	if (txq_obj->sq_buf) {
 		mlx5_free(txq_obj->sq_buf);
-	if (txq_obj->sq_dbrec_page)
+		txq_obj->sq_buf = NULL;
+	}
+	if (txq_obj->sq_dbrec_page) {
 		claim_zero(mlx5_release_dbr(&txq_obj->txq_ctrl->priv->dbrpgs,
 					    mlx5_os_get_umem_id
 						 (txq_obj->sq_dbrec_page->umem),
 					    txq_obj->sq_dbrec_offset));
+		txq_obj->sq_dbrec_page = NULL;
+	}
 }
 
 /**