[dpdk-dev,v1] net/mlx4: fix 'show port info all' during detach

Message ID 1519836450-11895-1-git-send-email-ophirmu@mellanox.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Ophir Munk Feb. 28, 2018, 4:47 p.m. UTC
  The following scenario causes a crash in function mlx4_get_ifname
1. On testpmd startup mlx4 device is probed and started
2. mlx4 sriov is disabled. As a result an RMV event is sent to
testpmd which closes the device and nullify the priv struct
members.
3. Running 'show port info all' in testpmd results in segmentation
fault because of accessing NULL pointer priv->ctx

The fix is to return with an error from mlx4_get_ifname() if priv->ctx
member is NULL.

Fixes: 61cbdd419478 ("net/mlx4: separate device control functions")
Cc: stable@dpdk.org

Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
---
 drivers/net/mlx4/mlx4_ethdev.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Adrien Mazarguil March 2, 2018, 7:12 p.m. UTC | #1
On Wed, Feb 28, 2018 at 04:47:30PM +0000, Ophir Munk wrote:
> The following scenario causes a crash in function mlx4_get_ifname
> 1. On testpmd startup mlx4 device is probed and started
> 2. mlx4 sriov is disabled. As a result an RMV event is sent to
> testpmd which closes the device and nullify the priv struct
> members.
> 3. Running 'show port info all' in testpmd results in segmentation
> fault because of accessing NULL pointer priv->ctx
> 
> The fix is to return with an error from mlx4_get_ifname() if priv->ctx
> member is NULL.

So priv->ctx is NULL after mlx4_dev_close() was called. In my opinion the
fact testpmd is allowed to call rte_eth_dev_info_get() on a closed port is
fishy, there are quite a few other ethdev callbacks that may end up
crashing in such a scenario.

Since commit 284c908cc588 ("app/testpmd: request device removal interrupt"),
testpmd automatically calls rte_eth_dev_close() and rte_eal_dev_detach()
after receiving a removal event on a port.

rte_eth_dev_close() documentation says:

 "Close a stopped Ethernet device. The device cannot be restarted!
  The function frees all resources except for needed by the
  closed state. To free these resources, call rte_eth_dev_detach()."

Unfortunately testpmd calls rte_*eal*_dev_detach() not
rte_*eth*_dev_detach(), the former doesn't release ethdev ports while the
latter does. I think it's a mistake, there's no point in keeping a closed
device around after it's been physically unplugged.

In short, rmv_event_callback() should call detach_port() instead of
rte_eal_dev_detach().

> Fixes: 61cbdd419478 ("net/mlx4: separate device control functions")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ophir Munk <ophirmu@mellanox.com>

The above suggests the problem is actually in testpmd and was introduced in
v17.05 by commit 284c908cc588. The proposed patch is a workaround that
doesn't address the underlying issue, thus NACK unless proven otherwise :)

> ---
>  drivers/net/mlx4/mlx4_ethdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
> index 3bc6927..cca5223 100644
> --- a/drivers/net/mlx4/mlx4_ethdev.c
> +++ b/drivers/net/mlx4/mlx4_ethdev.c
> @@ -67,6 +67,9 @@ mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
>  	char match[IF_NAMESIZE] = "";
>  
>  	{
> +		if (priv->ctx == NULL)
> +			return -ENOENT;
> +
>  		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
>  
>  		dir = opendir(path);
> -- 
> 2.7.4
>
  
Gaëtan Rivet March 20, 2018, 2:02 p.m. UTC | #2
Hi Ophir, Adrien,

On Fri, Mar 02, 2018 at 08:12:58PM +0100, Adrien Mazarguil wrote:
> On Wed, Feb 28, 2018 at 04:47:30PM +0000, Ophir Munk wrote:
> > The following scenario causes a crash in function mlx4_get_ifname
> > 1. On testpmd startup mlx4 device is probed and started
> > 2. mlx4 sriov is disabled. As a result an RMV event is sent to
> > testpmd which closes the device and nullify the priv struct
> > members.
> > 3. Running 'show port info all' in testpmd results in segmentation
> > fault because of accessing NULL pointer priv->ctx
> > 
> > The fix is to return with an error from mlx4_get_ifname() if priv->ctx
> > member is NULL.
> 
> So priv->ctx is NULL after mlx4_dev_close() was called. In my opinion the
> fact testpmd is allowed to call rte_eth_dev_info_get() on a closed port is
> fishy, there are quite a few other ethdev callbacks that may end up
> crashing in such a scenario.
> 
> Since commit 284c908cc588 ("app/testpmd: request device removal interrupt"),
> testpmd automatically calls rte_eth_dev_close() and rte_eal_dev_detach()
> after receiving a removal event on a port.
> 
> rte_eth_dev_close() documentation says:
> 
>  "Close a stopped Ethernet device. The device cannot be restarted!
>   The function frees all resources except for needed by the
>   closed state. To free these resources, call rte_eth_dev_detach()."
> 
> Unfortunately testpmd calls rte_*eal*_dev_detach() not
> rte_*eth*_dev_detach(), the former doesn't release ethdev ports while the
> latter does. I think it's a mistake, there's no point in keeping a closed
> device around after it's been physically unplugged.
> 
> In short, rmv_event_callback() should call detach_port() instead of
> rte_eal_dev_detach().
> 

This is correct, the issue is actually testpmd making a mistake when
calling directly rte_eal_dev_detach().

The fix should thus be simply to change this call.

> > Fixes: 61cbdd419478 ("net/mlx4: separate device control functions")
> > Cc: stable@dpdk.org
> > 
> > Signed-off-by: Ophir Munk <ophirmu@mellanox.com>
> 
> The above suggests the problem is actually in testpmd and was introduced in
> v17.05 by commit 284c908cc588. The proposed patch is a workaround that
> doesn't address the underlying issue, thus NACK unless proven otherwise :)
> 
> > ---
> >  drivers/net/mlx4/mlx4_ethdev.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
> > index 3bc6927..cca5223 100644
> > --- a/drivers/net/mlx4/mlx4_ethdev.c
> > +++ b/drivers/net/mlx4/mlx4_ethdev.c
> > @@ -67,6 +67,9 @@ mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
> >  	char match[IF_NAMESIZE] = "";
> >  
> >  	{
> > +		if (priv->ctx == NULL)
> > +			return -ENOENT;
> > +
> >  		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
> >  
> >  		dir = opendir(path);
> > -- 
> > 2.7.4
> > 
> 
> -- 
> Adrien Mazarguil
> 6WIND
  

Patch

diff --git a/drivers/net/mlx4/mlx4_ethdev.c b/drivers/net/mlx4/mlx4_ethdev.c
index 3bc6927..cca5223 100644
--- a/drivers/net/mlx4/mlx4_ethdev.c
+++ b/drivers/net/mlx4/mlx4_ethdev.c
@@ -67,6 +67,9 @@  mlx4_get_ifname(const struct priv *priv, char (*ifname)[IF_NAMESIZE])
 	char match[IF_NAMESIZE] = "";
 
 	{
+		if (priv->ctx == NULL)
+			return -ENOENT;
+
 		MKSTR(path, "%s/device/net", priv->ctx->device->ibdev_path);
 
 		dir = opendir(path);