[v2] dev: don't remove devargs that are still referenced

Message ID 20181121193827.62540-1-dariusz.stojaczyk@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers
Series [v2] dev: don't remove devargs that are still referenced |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS

Commit Message

Stojaczyk, Dariusz Nov. 21, 2018, 7:38 p.m. UTC
  Even if a device failed to plug, it's still a device
object that references the devargs. Those devargs will
be freed automatically together with the device, but
freeing them any earlier - like it's done in the hotplug
error handling path right now - will give us a dangling
pointer and a segfault scenario.

Consider the following case:
 * secondary process receives the hotplug request IPC message
   * devargs are either created or updated
   * the bus is scanned
     * a new device object is created with the latest devargs
   * the device can't be plugged for whatever reason,
     bus->plug returns error
     * the devargs are freed, even though they're still referenced
       by the device object on the bus

For PCI devices, the generic device name comes from
a buffer within the devargs. Freeing those will make
EAL segfault whenever the device name is checked.

This patch just prevents the hotplug error handling
path from removing the devargs when there's a device
that references them. This is done by simply exiting
early from the hotplug function. As mentioned in the
beginning, those devargs will be freed later, together
with the device itself.

Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
Cc: gaetan.rivet@6wind.com
Cc: thomas@monjalon.net

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
---
 lib/librte_eal/common/eal_common_dev.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)
  

Comments

Gaëtan Rivet Nov. 22, 2018, 9:54 a.m. UTC | #1
On Wed, Nov 21, 2018 at 08:38:27PM +0100, Darek Stojaczyk wrote:
> Even if a device failed to plug, it's still a device
> object that references the devargs. Those devargs will
> be freed automatically together with the device, but
> freeing them any earlier - like it's done in the hotplug
> error handling path right now - will give us a dangling
> pointer and a segfault scenario.
> 
> Consider the following case:
>  * secondary process receives the hotplug request IPC message
>    * devargs are either created or updated
>    * the bus is scanned
>      * a new device object is created with the latest devargs
>    * the device can't be plugged for whatever reason,
>      bus->plug returns error
>      * the devargs are freed, even though they're still referenced
>        by the device object on the bus
> 
> For PCI devices, the generic device name comes from
> a buffer within the devargs. Freeing those will make
> EAL segfault whenever the device name is checked.
> 
> This patch just prevents the hotplug error handling
> path from removing the devargs when there's a device
> that references them. This is done by simply exiting
> early from the hotplug function. As mentioned in the
> beginning, those devargs will be freed later, together
> with the device itself.
> 

This seems ok in conjunction with Thomas' patch on overwriting devargs
on insertion.

The only place a device will be freed is the unplug bus ops, it already
does remove the device devargs.

> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> Cc: gaetan.rivet@6wind.com
> Cc: thomas@monjalon.net
> 
> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> ---
>  lib/librte_eal/common/eal_common_dev.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> index 1fdc9ab17..b6fc5e437 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -169,11 +169,10 @@ local_dev_probe(const char *devargs, struct rte_device **new_dev)
>  
>  	ret = dev->bus->plug(dev);
>  	if (ret) {
> -		if (rte_dev_is_probed(dev)) /* if already succeeded earlier */
> -			return ret; /* no rollback */
> -		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> -			dev->name);
> -		goto err_devarg;
> +		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
> +			RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> +				dev->name);

Maybe a comment here to describe that the devargs is still the
responsibility of the rte_device and should not be removed.

> +		return ret;
>  	}
>  
>  	*new_dev = dev;
> -- 
> 2.17.1
>
  
Stojaczyk, Dariusz Nov. 22, 2018, 12:54 p.m. UTC | #2
> -----Original Message-----
> From: Gaëtan Rivet [mailto:gaetan.rivet@6wind.com]
> Sent: Thursday, November 22, 2018 10:54 AM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>
> Cc: dev@dpdk.org; thomas@monjalon.net
> Subject: Re: [PATCH v2] dev: don't remove devargs that are still referenced
> 
> On Wed, Nov 21, 2018 at 08:38:27PM +0100, Darek Stojaczyk wrote:
> > Even if a device failed to plug, it's still a device
> > object that references the devargs. Those devargs will
> > be freed automatically together with the device, but
> > freeing them any earlier - like it's done in the hotplug
> > error handling path right now - will give us a dangling
> > pointer and a segfault scenario.
> >
> > Consider the following case:
> >  * secondary process receives the hotplug request IPC message
> >    * devargs are either created or updated
> >    * the bus is scanned
> >      * a new device object is created with the latest devargs
> >    * the device can't be plugged for whatever reason,
> >      bus->plug returns error
> >      * the devargs are freed, even though they're still referenced
> >        by the device object on the bus
> >
> > For PCI devices, the generic device name comes from
> > a buffer within the devargs. Freeing those will make
> > EAL segfault whenever the device name is checked.
> >
> > This patch just prevents the hotplug error handling
> > path from removing the devargs when there's a device
> > that references them. This is done by simply exiting
> > early from the hotplug function. As mentioned in the
> > beginning, those devargs will be freed later, together
> > with the device itself.
> >
> 
> This seems ok in conjunction with Thomas' patch on overwriting devargs
> on insertion.
> 
> The only place a device will be freed is the unplug bus ops, it already
> does remove the device devargs.
> 
> > Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> > Cc: gaetan.rivet@6wind.com
> > Cc: thomas@monjalon.net
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> >  lib/librte_eal/common/eal_common_dev.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> > index 1fdc9ab17..b6fc5e437 100644
> > --- a/lib/librte_eal/common/eal_common_dev.c
> > +++ b/lib/librte_eal/common/eal_common_dev.c
> > @@ -169,11 +169,10 @@ local_dev_probe(const char *devargs, struct
> rte_device **new_dev)
> >
> >  	ret = dev->bus->plug(dev);
> >  	if (ret) {
> > -		if (rte_dev_is_probed(dev)) /* if already succeeded earlier
> */
> > -			return ret; /* no rollback */
> > -		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> > -			dev->name);
> > -		goto err_devarg;
> > +		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
> > +			RTE_LOG(ERR, EAL, "Driver cannot attach the device
> (%s)\n",
> > +				dev->name);
> 
> Maybe a comment here to describe that the devargs is still the
> responsibility of the rte_device and should not be removed.

Ack, I'll do that.

> 
> > +		return ret;
> >  	}
> >
> >  	*new_dev = dev;
> > --
> > 2.17.1
> >
> 
> --
> Gaëtan Rivet
> 6WIND
  

Patch

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 1fdc9ab17..b6fc5e437 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -169,11 +169,10 @@  local_dev_probe(const char *devargs, struct rte_device **new_dev)
 
 	ret = dev->bus->plug(dev);
 	if (ret) {
-		if (rte_dev_is_probed(dev)) /* if already succeeded earlier */
-			return ret; /* no rollback */
-		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
-			dev->name);
-		goto err_devarg;
+		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
+			RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
+				dev->name);
+		return ret;
 	}
 
 	*new_dev = dev;