[dpdk-dev] [PATCH] dev: fix attach rollback of a device that was already attached

Stojaczyk, Dariusz dariusz.stojaczyk at intel.com
Fri Nov 23 21:29:04 CET 2018



> -----Original Message-----
> From: Zhang, Qi Z
> Sent: Friday, November 23, 2018 8:11 PM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk at intel.com>; dev at dpdk.org
> Cc: thomas at monjalon.net
> Subject: RE: [PATCH] dev: fix attach rollback of a device that was already
> attached
> 
> 
> 
> > -----Original Message-----
> > From: Stojaczyk, Dariusz
> > Sent: Friday, November 23, 2018 6:45 AM
> > To: dev at dpdk.org
> > Cc: thomas at monjalon.net; Stojaczyk, Dariusz
> <dariusz.stojaczyk at intel.com>;
> > Zhang, Qi Z <qi.z.zhang at intel.com>
> > Subject: [PATCH] dev: fix attach rollback of a device that was already
> attached
> >
> > When primary process receives an IPC attach request of a device that's
> already
> > locally-attached, it doesn't setup its variables properly and is prone to
> segfaulting
> > on a subsequent rollback.
> >
> > `ret = local_dev_probe(req->devargs, &dev)`
> >
> > The above function will set `dev` pointer to the proper device *unless* it
> returns
> > with error. One of those errors is -EEXIST, which the hotplug function
> explicitly
> > ignores. For -EEXIST, it proceeds with attaching the device and expects the
> dev
> > pointer to be valid.
> 
> Good capture.
> >
> > Despite this patch being a fix, it also introduces a design decision - when
> any
> > secondary process fails to attach a device, the primary process that already
> had
> > the device attached won't attempt to detach that device locally as a part of
> the
> > rollback routine.
> > Primary process would have already printed a message "Failed to [...] on
> > secondary" and now it will also print a warning "Devices may not be in sync
> [...]".
> 
> A little bit concern for this.
> we may try to avoid the abnormal situation that device is not synced.
> The scenario you describe actually is start from an abnormal situation due to
> some previous error.
> so is it better to always take chance to end up with a normal situation.
> 
> It looks better for me if we can fixed it in local_dev_probe to return a valid
> device with -EEXIST.

Actually that was my original idea, but I gave it up in the end.
Ok, I'll do that in V2.

Thanks,
D.

> 
> >
> > Fixes: ac9e4a17370f ("eal: support attach/detach shared device from
> > secondary")
> > Cc: qi.z.zhang at intel.com
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk at intel.com>
> > ---
> >  lib/librte_eal/common/hotplug_mp.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/hotplug_mp.c
> > b/lib/librte_eal/common/hotplug_mp.c
> > index 7c9fcc46c..7ee074a31 100644
> > --- a/lib/librte_eal/common/hotplug_mp.c
> > +++ b/lib/librte_eal/common/hotplug_mp.c
> > @@ -88,7 +88,7 @@ __handle_secondary_request(void *param)
> >  		(const struct eal_dev_mp_req *)msg->param;
> >  	struct eal_dev_mp_req tmp_req;
> >  	struct rte_devargs *da;
> > -	struct rte_device *dev;
> > +	struct rte_device *dev = NULL;
> >  	struct rte_bus *bus;
> >  	int ret = 0;
> >
> > @@ -168,7 +168,15 @@ __handle_secondary_request(void *param)
> >  	if (req->t == EAL_DEV_REQ_TYPE_ATTACH) {
> >  		tmp_req.t = EAL_DEV_REQ_TYPE_ATTACH_ROLLBACK;
> >  		eal_dev_hotplug_request_to_secondary(&tmp_req);
> > -		local_dev_remove(dev);
> > +		if (dev == NULL) {
> > +			/* device was already attached at the time we got
> the
> > +			 * request, don't detach it now.
> > +			 */
> > +			RTE_LOG(WARNING, EAL,
> > +				"Devices in secondary may not sync with
> primary\n");
> > +		} else {
> > +			local_dev_remove(dev);
> > +		}
> >  	} else {
> >  		tmp_req.t = EAL_DEV_REQ_TYPE_DETACH_ROLLBACK;
> >  		eal_dev_hotplug_request_to_secondary(&tmp_req);
> > --
> > 2.17.1



More information about the dev mailing list