[dpdk-dev] net/mlx5: fix verification of device context

Message ID 5ec0604196fb087a42fa75e6ddc2a04aab293593.1501047767.git.shacharbe@mellanox.com (mailing list archive)
State Rejected, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Shachar Beiser July 26, 2017, 5:43 a.m. UTC
  Get interface name function lacks verification of device context.
It might lead to segmentation fault when trying to query the name
after the device is closed.fixing it by adding the missing verification

Fixes: cd89f22a1e9770 ("net/mlx5: remove unused interface name query")
Cc: stable@dpdk.org

Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
---
 drivers/net/mlx5/mlx5_ethdev.c | 4 ++++
 1 file changed, 4 insertions(+)
  

Comments

Adrien Mazarguil July 26, 2017, 9:06 a.m. UTC | #1
Hi Shachar,

On Wed, Jul 26, 2017 at 05:43:24AM +0000, Shachar Beiser wrote:
> Get interface name function lacks verification of device context.
> It might lead to segmentation fault when trying to query the name
> after the device is closed.fixing it by adding the missing verification
> 

Thanks, however if by "close" you mean it may occur when applications use
ethdev callbacks after a call to rte_eth_dev_close(), I do not think PMDs
have to protect themselves against bad application behavior, otherwise there
is no end to such fixes.

The reverse of rte_eth_dev_close() is not rte_eth_dev_configure() nor any
other ethdev callback (see documentation), but a bus probe operation.

Perhaps I've missed something, so in case a crash occurs *while* calling
rte_eth_dev_close() I guess this patch is fine, but then please describe
the reason.

> Fixes: cd89f22a1e9770 ("net/mlx5: remove unused interface name query")

This commit doesn't look like the root cause of that issue?

> Cc: stable@dpdk.org
> 
> Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_ethdev.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
> index b70b7b9..6e67461 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -173,6 +173,10 @@ struct priv *
>  	char match[IF_NAMESIZE] = "";
>  
>  	{
> +		if (priv->ctx == NULL) {
> +			DEBUG("The device is closed, cannot query interface name ");
> +			return -1;
> +		}

MKSTR() is at the beginning of this block because it defines a new variable
(path). For coding style consistency you should not put any code before
variable declarations, or at least insert an empty line between
them. Otherwise you could move this check to the parent block.

>  		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
>  
>  		dir = opendir(path);
> -- 
> 1.8.3.1
> 

I think this patch is not necessary unless proved otherwise, have you
actually observed a crash addressed by it?
  
Shachar Beiser July 26, 2017, 9:21 a.m. UTC | #2
Hi ,

       When I say close I mean : " mlx5_dev_close" . This function set the priv->ctx to NULL.
       We think this patch is required because we have an open bug of seg fault while accessing priv->ctx == NULL.

                         -Shachar Beiser. 
         

-----Original Message-----
From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] 
Sent: Wednesday, July 26, 2017 12:06 PM
To: Shachar Beiser <shacharbe@mellanox.com>
Cc: dev@dpdk.org; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; stable@dpdk.org
Subject: Re: [PATCH] net/mlx5: fix verification of device context

Hi Shachar,

On Wed, Jul 26, 2017 at 05:43:24AM +0000, Shachar Beiser wrote:
> Get interface name function lacks verification of device context.
> It might lead to segmentation fault when trying to query the name 
> after the device is closed.fixing it by adding the missing 
> verification
> 

Thanks, however if by "close" you mean it may occur when applications use ethdev callbacks after a call to rte_eth_dev_close(), I do not think PMDs have to protect themselves against bad application behavior, otherwise there is no end to such fixes.

The reverse of rte_eth_dev_close() is not rte_eth_dev_configure() nor any other ethdev callback (see documentation), but a bus probe operation.

Perhaps I've missed something, so in case a crash occurs *while* calling
rte_eth_dev_close() I guess this patch is fine, but then please describe the reason.

> Fixes: cd89f22a1e9770 ("net/mlx5: remove unused interface name query")

This commit doesn't look like the root cause of that issue?

> Cc: stable@dpdk.org
> 
> Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_ethdev.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_ethdev.c 
> b/drivers/net/mlx5/mlx5_ethdev.c index b70b7b9..6e67461 100644
> --- a/drivers/net/mlx5/mlx5_ethdev.c
> +++ b/drivers/net/mlx5/mlx5_ethdev.c
> @@ -173,6 +173,10 @@ struct priv *
>  	char match[IF_NAMESIZE] = "";
>  
>  	{
> +		if (priv->ctx == NULL) {
> +			DEBUG("The device is closed, cannot query interface name ");
> +			return -1;
> +		}

MKSTR() is at the beginning of this block because it defines a new variable (path). For coding style consistency you should not put any code before variable declarations, or at least insert an empty line between them. Otherwise you could move this check to the parent block.

>  		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
>  
>  		dir = opendir(path);
> --
> 1.8.3.1
> 

I think this patch is not necessary unless proved otherwise, have you
actually observed a crash addressed by it?
  
Gaëtan Rivet July 26, 2017, 12:55 p.m. UTC | #3
Hi Shachar,

On Wed, Jul 26, 2017 at 09:21:27AM +0000, Shachar Beiser wrote:
> Hi ,
> 
>        When I say close I mean : " mlx5_dev_close" . This function set the priv->ctx to NULL.
>        We think this patch is required because we have an open bug of seg fault while accessing priv->ctx == NULL.
> 
>                          -Shachar Beiser. 
>          

This patch does not fix the root cause of the issue.
It is a bug in the ether layer, and missing flags within MLX5 PMD.
So NACK on this patch, I will send shortly the proper fix.


Best regards,

> 
> -----Original Message-----
> From: Adrien Mazarguil [mailto:adrien.mazarguil@6wind.com] 
> Sent: Wednesday, July 26, 2017 12:06 PM
> To: Shachar Beiser <shacharbe@mellanox.com>
> Cc: dev@dpdk.org; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>; stable@dpdk.org
> Subject: Re: [PATCH] net/mlx5: fix verification of device context
> 
> Hi Shachar,
> 
> On Wed, Jul 26, 2017 at 05:43:24AM +0000, Shachar Beiser wrote:
> > Get interface name function lacks verification of device context.
> > It might lead to segmentation fault when trying to query the name 
> > after the device is closed.fixing it by adding the missing 
> > verification
> > 
> 
> Thanks, however if by "close" you mean it may occur when applications use ethdev callbacks after a call to rte_eth_dev_close(), I do not think PMDs have to protect themselves against bad application behavior, otherwise there is no end to such fixes.
> 
> The reverse of rte_eth_dev_close() is not rte_eth_dev_configure() nor any other ethdev callback (see documentation), but a bus probe operation.
> 
> Perhaps I've missed something, so in case a crash occurs *while* calling
> rte_eth_dev_close() I guess this patch is fine, but then please describe the reason.
> 
> > Fixes: cd89f22a1e9770 ("net/mlx5: remove unused interface name query")
> 
> This commit doesn't look like the root cause of that issue?
> 
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Shachar Beiser <shacharbe@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_ethdev.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_ethdev.c 
> > b/drivers/net/mlx5/mlx5_ethdev.c index b70b7b9..6e67461 100644
> > --- a/drivers/net/mlx5/mlx5_ethdev.c
> > +++ b/drivers/net/mlx5/mlx5_ethdev.c
> > @@ -173,6 +173,10 @@ struct priv *
> >  	char match[IF_NAMESIZE] = "";
> >  
> >  	{
> > +		if (priv->ctx == NULL) {
> > +			DEBUG("The device is closed, cannot query interface name ");
> > +			return -1;
> > +		}
> 
> MKSTR() is at the beginning of this block because it defines a new variable (path). For coding style consistency you should not put any code before variable declarations, or at least insert an empty line between them. Otherwise you could move this check to the parent block.
> 
> >  		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
> >  
> >  		dir = opendir(path);
> > --
> > 1.8.3.1
> > 
> 
> I think this patch is not necessary unless proved otherwise, have you
> actually observed a crash addressed by it?
> 
> -- 
> Adrien Mazarguil
> 6WIND
  
Gaëtan Rivet July 26, 2017, 1:30 p.m. UTC | #4
Device detach in librte_ether is rough right now.

 - Device hotplug capability is not properly checked
 - Device state should be set after a successful detach
 - MLX drivers are lacking the relevant flag
 - And this flag should actually be removed, thus occuring an API change
   for v17.11. An announce follows.

Without this series on an MLX4 port:

   testpmd> port close 0
   Closing ports...
   Port 0 is now not stopped
   Done
   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   testpmd> show port info 0
   Segmentation fault (core dumped)

With this series:

   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port detach 0
   Detaching a port...
   Please close port first
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   Port '00:03.0' is detached. Now total ports is 0
   Done
   testpmd> show port info 0
   Invalid port 0
   Valid port range is [0]

Gaetan Rivet (6):
  ethdev: fix device state on detach
  ethdev: properly check detach capability
  net/mlx4: advertize the detach capability
  net/mlx5: advertize the detach capability
  app/testpmd: let the user know device detach failed
  doc: announce ethdev API change for detach flag

 app/test-pmd/testpmd.c               |   9 ++++++---
 core                                 | Bin 0 -> 114331648 bytes
 doc/guides/rel_notes/deprecation.rst |   6 ++++++
 drivers/net/mlx4/mlx4.c              |   1 +
 drivers/net/mlx5/mlx5.c              |   1 +
 lib/librte_ether/rte_ethdev.c        |  11 +----------
 6 files changed, 15 insertions(+), 13 deletions(-)
 create mode 100644 core
  
Gaëtan Rivet July 26, 2017, 1:35 p.m. UTC | #5
Device detach in librte_ether is rough right now.

 - Device hotplug capability is not properly checked
 - Device state should be set after a successful detach
 - MLX drivers are lacking the relevant flag
 - And this flag should actually be removed, thus occuring an API change
   for v17.11. An announce follows.

Without this series on an MLX4 port:

   testpmd> port close 0
   Closing ports...
   Port 0 is now not stopped
   Done
   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   testpmd> show port info 0
   Segmentation fault (core dumped)

With this series:

   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port detach 0
   Detaching a port...
   Please close port first
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   Port '00:03.0' is detached. Now total ports is 0
   Done
   testpmd> show port info 0
   Invalid port 0
   Valid port range is [0]

v2:

  - remove coredump from patchset

Gaetan Rivet (6):
  ethdev: fix device state on detach
  ethdev: properly check detach capability
  net/mlx4: advertize the detach capability
  net/mlx5: advertize the detach capability
  app/testpmd: let the user know device detach failed
  doc: announce ethdev API change for detach flag

 app/test-pmd/testpmd.c               |  9 ++++++---
 doc/guides/rel_notes/deprecation.rst |  6 ++++++
 drivers/net/mlx4/mlx4.c              |  1 +
 drivers/net/mlx5/mlx5.c              |  1 +
 lib/librte_ether/rte_ethdev.c        | 11 +----------
 5 files changed, 15 insertions(+), 13 deletions(-)
  
Shachar Beiser July 30, 2017, 7:33 a.m. UTC | #6
Tested-by : Shachar Beiser <shacharbe@mellanox.com>


The bug is fixed and now there is no crash: 

testpmd> port stop all
Stopping ports...
Done
testpmd> port close all
Closing ports...
Done
testpmd> port detach 0
Detaching a port...
Invalid port 0
Please close port first
testpmd> show port info 0
Invalid port 0
Valid port range is [0]
testpmd>


-----Original Message-----
From: Gaetan Rivet [mailto:gaetan.rivet@6wind.com] 
Sent: Wednesday, July 26, 2017 4:36 PM
To: dev@dpdk.org
Cc: Gaetan Rivet <gaetan.rivet@6wind.com>; Thomas Monjalon <thomas@monjalon.net>; Shachar Beiser <shacharbe@mellanox.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>; Nélio Laranjeiro <nelio.laranjeiro@6wind.com>
Subject: [PATCH v2 0/6] fix ethdev device detach

Device detach in librte_ether is rough right now.

 - Device hotplug capability is not properly checked
 - Device state should be set after a successful detach
 - MLX drivers are lacking the relevant flag
 - And this flag should actually be removed, thus occuring an API change
   for v17.11. An announce follows.

Without this series on an MLX4 port:

   testpmd> port close 0
   Closing ports...
   Port 0 is now not stopped
   Done
   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   testpmd> show port info 0
   Segmentation fault (core dumped)

With this series:

   testpmd> port stop 0
   Stopping ports...
   Checking link statuses...
   Done
   testpmd> port detach 0
   Detaching a port...
   Please close port first
   testpmd> port close 0
   Closing ports...
   Done
   testpmd> port detach 0
   Detaching a port...
   Port '00:03.0' is detached. Now total ports is 0
   Done
   testpmd> show port info 0
   Invalid port 0
   Valid port range is [0]

v2:

  - remove coredump from patchset

Gaetan Rivet (6):
  ethdev: fix device state on detach
  ethdev: properly check detach capability
  net/mlx4: advertize the detach capability
  net/mlx5: advertize the detach capability
  app/testpmd: let the user know device detach failed
  doc: announce ethdev API change for detach flag

 app/test-pmd/testpmd.c               |  9 ++++++---
 doc/guides/rel_notes/deprecation.rst |  6 ++++++
 drivers/net/mlx4/mlx4.c              |  1 +
 drivers/net/mlx5/mlx5.c              |  1 +
 lib/librte_ether/rte_ethdev.c        | 11 +----------
 5 files changed, 15 insertions(+), 13 deletions(-)
  
Adrien Mazarguil July 31, 2017, 8:57 a.m. UTC | #7
On Wed, Jul 26, 2017 at 03:35:51PM +0200, Gaetan Rivet wrote:
> Device detach in librte_ether is rough right now.
> 
>  - Device hotplug capability is not properly checked
>  - Device state should be set after a successful detach
>  - MLX drivers are lacking the relevant flag
>  - And this flag should actually be removed, thus occuring an API change
>    for v17.11. An announce follows.
> 
> Without this series on an MLX4 port:
> 
>    testpmd> port close 0
>    Closing ports...
>    Port 0 is now not stopped
>    Done
>    testpmd> port stop 0
>    Stopping ports...
>    Checking link statuses...
>    Done
>    testpmd> port close 0
>    Closing ports...
>    Done
>    testpmd> port detach 0
>    Detaching a port...
>    testpmd> show port info 0
>    Segmentation fault (core dumped)
> 
> With this series:
> 
>    testpmd> port stop 0
>    Stopping ports...
>    Checking link statuses...
>    Done
>    testpmd> port detach 0
>    Detaching a port...
>    Please close port first
>    testpmd> port close 0
>    Closing ports...
>    Done
>    testpmd> port detach 0
>    Detaching a port...
>    Port '00:03.0' is detached. Now total ports is 0
>    Done
>    testpmd> show port info 0
>    Invalid port 0
>    Valid port range is [0]
> 
> v2:
> 
>   - remove coredump from patchset
> 
> Gaetan Rivet (6):
>   ethdev: fix device state on detach
>   ethdev: properly check detach capability
>   net/mlx4: advertize the detach capability
>   net/mlx5: advertize the detach capability
>   app/testpmd: let the user know device detach failed
>   doc: announce ethdev API change for detach flag
> 
>  app/test-pmd/testpmd.c               |  9 ++++++---
>  doc/guides/rel_notes/deprecation.rst |  6 ++++++
>  drivers/net/mlx4/mlx4.c              |  1 +
>  drivers/net/mlx5/mlx5.c              |  1 +
>  lib/librte_ether/rte_ethdev.c        | 11 +----------
>  5 files changed, 15 insertions(+), 13 deletions(-)
> 
> -- 
> 2.1.4
> 

Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
  
Thomas Monjalon July 31, 2017, 9:31 a.m. UTC | #8
> > Gaetan Rivet (6):
> >   ethdev: fix device state on detach
> >   ethdev: properly check detach capability
> >   net/mlx4: advertize the detach capability
> >   net/mlx5: advertize the detach capability
> >   app/testpmd: let the user know device detach failed
> >   doc: announce ethdev API change for detach flag
> 
> Acked-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>

Applied without the last patch for deprecation notice, thanks

This deprecation notice requires more time to be reviewed.
  

Patch

diff --git a/drivers/net/mlx5/mlx5_ethdev.c b/drivers/net/mlx5/mlx5_ethdev.c
index b70b7b9..6e67461 100644
--- a/drivers/net/mlx5/mlx5_ethdev.c
+++ b/drivers/net/mlx5/mlx5_ethdev.c
@@ -173,6 +173,10 @@  struct priv *
 	char match[IF_NAMESIZE] = "";
 
 	{
+		if (priv->ctx == NULL) {
+			DEBUG("The device is closed, cannot query interface name ");
+			return -1;
+		}
 		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
 
 		dir = opendir(path);