[dpdk-dev] [PATCH] service: fix race in service on app lcore function

Bruce Richardson bruce.richardson at intel.com
Thu Nov 2 10:10:20 CET 2017


On Wed, Nov 01, 2017 at 05:59:51PM +0000, Van Haaren, Harry wrote:
> > From: Richardson, Bruce
> > Sent: Wednesday, November 1, 2017 5:09 PM
> > To: Van Haaren, Harry <harry.van.haaren at intel.com>
> > Cc: dev at dpdk.org; pbhagavatula at caviumnetworks.com; thomas at monjalon.net
> > Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore
> > function
> > 
> > On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote:
> > > This commit fixes a possible race condition if an application
> > > uses the service-cores infrastructure and the function to run
> > > a service on an application lcore at the same time.
> > >
> > > The fix is to change the num_mapped_cores variable to be an
> > > atomic variable. This causes concurrent accesses by multiple
> > > threads to a service using rte_service_run_iter_on_app_lcore()
> > > to detect if another core is currently mapped to the service,
> > > and refuses to run if it is not multi-thread safe.
> > >
> > > No performance impact is expected as the mappings for the
> > > service-cores changes at control-path frequency, hence the
> > > change from an ordinary write to an atomic write will not
> > > have any significant impact.
> > >
> > > Two unit tests were added to verify the behaviour of the
> > > function to run a service on an application core, testing both
> > > a multi-thread safe service, and a multi-thread unsafe service.
> > >
> > > The doxygen API documentation for the function has been updated
> > > to reflect the current and correct behaviour.
> > >
> > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore")
> > >
> > > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> > >
> > > ---
> > >
> > <snip>
> > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t
> > service_mask)
> > >  int32_t rte_service_run_iter_on_app_lcore(uint32_t id)
> > >  {
> > >  	/* run service on calling core, using all-ones as the service mask */
> > > +	if (!service_valid(id))
> > > +		return -EINVAL;
> > > +
> > >  	struct core_state *cs = &lcore_states[rte_lcore_id()];
> > > -	return service_run(id, cs, UINT64_MAX);
> > > +	struct rte_service_spec_impl *s = &rte_services[id];
> > > +
> > > +	/* Atomically add this core to the mapped cores first, then examine if
> > > +	 * we can run the service. This avoids a race condition between
> > > +	 * checking the value, and atomically adding to the mapped count.
> > > +	 */
> > > +	rte_atomic32_inc(&s->num_mapped_cores);
> > > +
> > > +	if (service_mt_safe(s) == 0 &&
> > > +			rte_atomic32_read(&s->num_mapped_cores) > 1) {
> > > +		rte_atomic32_dec(&s->num_mapped_cores);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	int ret = service_run(id, cs, UINT64_MAX);
> > > +	rte_atomic32_dec(&s->num_mapped_cores);
> > > +
> > > +	return ret;
> > >  }
> > 
> > Do we really need to do an atomic inc and dec in this function? If we
> > check that there are no service cores mapped, would that not be enough
> > for safety? If an app core is calling a service, the control plane is
> > unlikely to decide to start spawning off a service core for that
> > service simultaneously. Manipulating the count is safer, yes, but unlike
> > the other changes in this patch, this one will affect performance, so I
> > think we can go without. Similarly, for multiple data plane threads
> > calling the same service simultaneously: everything else dataplane is
> > done without locks so I think this should be too.
> 
> 
> I think the inc-dec is required with the current code; what if *two* application
> cores are calling the same service? If we don't have the atomic inc-dec of the
> num-mapped-cores, both app cores could simultaneously run a multi-thread unsafe service.
> 
> The other option is to demand that the *app* must serialize access to a particular service
> using this function - but in my eyes that limits the usefulness of this function, because
> it pushes the problem up a level instead of solving it.
> 
> I guess an application "in the know" of how its cores are acting would prefer to not
> have the atomics at the service-cores library level. As a compromise, the use of these
> atomics could be a parameter to the rte_service_run_iter_on_app_lcore() function.
> 
> That would allow the application to choose if it wants to avail of the functionality
> in the service cores library, or if it takes the responsibility to serialize access
> to multi-thread unsafe functions itself.
> 
> 
> I'll spin up a v2 with a parameter to the function.
> 
Ok, that sounds reasonable enough. Just so long as apps that don't need
the lock don't have to use it.

/Bruce


More information about the dev mailing list