[v3] dev: don't remove devargs that are still referenced

Message ID 20181123154328.97021-1-dariusz.stojaczyk@intel.com (mailing list archive)
State Accepted, archived
Headers
Series [v3] dev: don't remove devargs that are still referenced |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/intel-Performance-Testing success Performance Testing PASS

Commit Message

Stojaczyk, Dariusz Nov. 23, 2018, 3:43 p.m. UTC
  Even if a device failed to plug, it's still a device
object that references the devargs. Those devargs will
be freed automatically together with the device, but
freeing them any earlier - like it's done in the hotplug
error handling path right now - will give us a dangling
pointer and a segfault scenario.

Consider the following case:
 * secondary process receives the hotplug request IPC message
   * devargs are either created or updated
   * the bus is scanned
     * a new device object is created with the latest devargs
   * the device can't be plugged for whatever reason,
     bus->plug returns error
     * the devargs are freed, even though they're still referenced
       by the device object on the bus

For PCI devices, the generic device name comes from
a buffer within the devargs. Freeing those will make
EAL segfault whenever the device name is checked.

This patch just prevents the hotplug error handling
path from removing the devargs when there's a device
that references them. This is done by simply exiting
early from the hotplug function. As mentioned in the
beginning, those devargs will be freed later, together
with the device itself.

Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
Cc: gaetan.rivet@6wind.com
Cc: thomas@monjalon.net

Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
---
Changes since v2:
 * added an extra comment (Gaetan)

Changes since v1:
 * described the failing scenario in commit msg (Thomas)

 lib/librte_eal/common/eal_common_dev.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)
  

Comments

Maxime Coquelin Nov. 23, 2018, 5:04 p.m. UTC | #1
Hi,

On 11/23/18 4:43 PM, Darek Stojaczyk wrote:
> Even if a device failed to plug, it's still a device
> object that references the devargs. Those devargs will
> be freed automatically together with the device, but
> freeing them any earlier - like it's done in the hotplug
> error handling path right now - will give us a dangling
> pointer and a segfault scenario.
> 
> Consider the following case:
>   * secondary process receives the hotplug request IPC message
>     * devargs are either created or updated
>     * the bus is scanned
>       * a new device object is created with the latest devargs
>     * the device can't be plugged for whatever reason,
>       bus->plug returns error
>       * the devargs are freed, even though they're still referenced
>         by the device object on the bus
> 
> For PCI devices, the generic device name comes from
> a buffer within the devargs. Freeing those will make
> EAL segfault whenever the device name is checked.
> 
> This patch just prevents the hotplug error handling
> path from removing the devargs when there's a device
> that references them. This is done by simply exiting
> early from the hotplug function. As mentioned in the
> beginning, those devargs will be freed later, together
> with the device itself.
> 
> Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")

Should you also cc stable?
Above commit is in since v17.08.

> Cc: gaetan.rivet@6wind.com
> Cc: thomas@monjalon.net
> 
> Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> ---
> Changes since v2:
>   * added an extra comment (Gaetan)
> 
> Changes since v1:
>   * described the failing scenario in commit msg (Thomas)
> 
>   lib/librte_eal/common/eal_common_dev.c | 13 ++++++++-----
>   1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
> index 1fdc9ab17..d7950bc9a 100644
> --- a/lib/librte_eal/common/eal_common_dev.c
> +++ b/lib/librte_eal/common/eal_common_dev.c
> @@ -166,14 +166,17 @@ local_dev_probe(const char *devargs, struct rte_device **new_dev)
>   		ret = -ENODEV;
>   		goto err_devarg;
>   	}
> +	/* Since there is a matching device, it is now its responsibility
> +	 * to manage the devargs we've just inserted. From this point
> +	 * those devargs shouldn't be removed manually anymore.
> +	 */
>   
>   	ret = dev->bus->plug(dev);
>   	if (ret) {
> -		if (rte_dev_is_probed(dev)) /* if already succeeded earlier */
> -			return ret; /* no rollback */
> -		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> -			dev->name);
> -		goto err_devarg;
> +		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
> +			RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> +				dev->name);
> +		return ret;
>   	}
>   
>   	*new_dev = dev;
> 

Other than that, it looks good to me:
Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Regards,
Maxime
  
Stojaczyk, Dariusz Nov. 23, 2018, 9:45 p.m. UTC | #2
> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, November 23, 2018 6:05 PM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; dev@dpdk.org
> Cc: gaetan.rivet@6wind.com; thomas@monjalon.net
> Subject: Re: [dpdk-dev] [PATCH v3] dev: don't remove devargs that are still
> referenced
> 
> Hi,
> 
> On 11/23/18 4:43 PM, Darek Stojaczyk wrote:
> > Even if a device failed to plug, it's still a device
> > object that references the devargs. Those devargs will
> > be freed automatically together with the device, but
> > freeing them any earlier - like it's done in the hotplug
> > error handling path right now - will give us a dangling
> > pointer and a segfault scenario.
> >
> > Consider the following case:
> >   * secondary process receives the hotplug request IPC message
> >     * devargs are either created or updated
> >     * the bus is scanned
> >       * a new device object is created with the latest devargs
> >     * the device can't be plugged for whatever reason,
> >       bus->plug returns error
> >       * the devargs are freed, even though they're still referenced
> >         by the device object on the bus
> >
> > For PCI devices, the generic device name comes from
> > a buffer within the devargs. Freeing those will make
> > EAL segfault whenever the device name is checked.
> >
> > This patch just prevents the hotplug error handling
> > path from removing the devargs when there's a device
> > that references them. This is done by simply exiting
> > early from the hotplug function. As mentioned in the
> > beginning, those devargs will be freed later, together
> > with the device itself.
> >
> > Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> 
> Should you also cc stable?
> Above commit is in since v17.08.
> 

Hi Maxime,

Stable could use a similar patch, but not exactly this one as it is now. I'll resubmit for stable once the one here gets approved.

Thank you,
D.  

> > Cc: gaetan.rivet@6wind.com
> > Cc: thomas@monjalon.net
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> > Changes since v2:
> >   * added an extra comment (Gaetan)
> >
> > Changes since v1:
> >   * described the failing scenario in commit msg (Thomas)
> >
> >   lib/librte_eal/common/eal_common_dev.c | 13 ++++++++-----
> >   1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_dev.c
> b/lib/librte_eal/common/eal_common_dev.c
> > index 1fdc9ab17..d7950bc9a 100644
> > --- a/lib/librte_eal/common/eal_common_dev.c
> > +++ b/lib/librte_eal/common/eal_common_dev.c
> > @@ -166,14 +166,17 @@ local_dev_probe(const char *devargs, struct
> rte_device **new_dev)
> >   		ret = -ENODEV;
> >   		goto err_devarg;
> >   	}
> > +	/* Since there is a matching device, it is now its responsibility
> > +	 * to manage the devargs we've just inserted. From this point
> > +	 * those devargs shouldn't be removed manually anymore.
> > +	 */
> >
> >   	ret = dev->bus->plug(dev);
> >   	if (ret) {
> > -		if (rte_dev_is_probed(dev)) /* if already succeeded earlier
> */
> > -			return ret; /* no rollback */
> > -		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
> > -			dev->name);
> > -		goto err_devarg;
> > +		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
> > +			RTE_LOG(ERR, EAL, "Driver cannot attach the device
> (%s)\n",
> > +				dev->name);
> > +		return ret;
> >   	}
> >
> >   	*new_dev = dev;
> >
> 
> Other than that, it looks good to me:
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> 
> Regards,
> Maxime
  
Thomas Monjalon Nov. 25, 2018, 12:46 p.m. UTC | #3
23/11/2018 18:04, Maxime Coquelin:
> Hi,
> 
> On 11/23/18 4:43 PM, Darek Stojaczyk wrote:
> > Even if a device failed to plug, it's still a device
> > object that references the devargs. Those devargs will
> > be freed automatically together with the device, but
> > freeing them any earlier - like it's done in the hotplug
> > error handling path right now - will give us a dangling
> > pointer and a segfault scenario.
> > 
> > Consider the following case:
> >   * secondary process receives the hotplug request IPC message
> >     * devargs are either created or updated
> >     * the bus is scanned
> >       * a new device object is created with the latest devargs
> >     * the device can't be plugged for whatever reason,
> >       bus->plug returns error
> >       * the devargs are freed, even though they're still referenced
> >         by the device object on the bus
> > 
> > For PCI devices, the generic device name comes from
> > a buffer within the devargs. Freeing those will make
> > EAL segfault whenever the device name is checked.
> > 
> > This patch just prevents the hotplug error handling
> > path from removing the devargs when there's a device
> > that references them. This is done by simply exiting
> > early from the hotplug function. As mentioned in the
> > beginning, those devargs will be freed later, together
> > with the device itself.
> > 
> > Fixes: 7e8b26650146 ("eal: fix hotplug add / remove")
> 
> Should you also cc stable?
> Above commit is in since v17.08.
> 
> > Cc: gaetan.rivet@6wind.com
> > Cc: thomas@monjalon.net
> > 
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> Acked-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied (with rebase), thanks
  

Patch

diff --git a/lib/librte_eal/common/eal_common_dev.c b/lib/librte_eal/common/eal_common_dev.c
index 1fdc9ab17..d7950bc9a 100644
--- a/lib/librte_eal/common/eal_common_dev.c
+++ b/lib/librte_eal/common/eal_common_dev.c
@@ -166,14 +166,17 @@  local_dev_probe(const char *devargs, struct rte_device **new_dev)
 		ret = -ENODEV;
 		goto err_devarg;
 	}
+	/* Since there is a matching device, it is now its responsibility
+	 * to manage the devargs we've just inserted. From this point
+	 * those devargs shouldn't be removed manually anymore.
+	 */
 
 	ret = dev->bus->plug(dev);
 	if (ret) {
-		if (rte_dev_is_probed(dev)) /* if already succeeded earlier */
-			return ret; /* no rollback */
-		RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
-			dev->name);
-		goto err_devarg;
+		if (!rte_dev_is_probed(dev)) /* if hasn't succeeded earlier */
+			RTE_LOG(ERR, EAL, "Driver cannot attach the device (%s)\n",
+				dev->name);
+		return ret;
 	}
 
 	*new_dev = dev;