[dpdk-stable] [PATCH v2 1/6] service: fix race condition for MT unsafe service

Van Haaren, Harry harry.van.haaren at intel.com
Fri May 1 16:21:53 CEST 2020


> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>
> Sent: Wednesday, April 29, 2020 11:49 PM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>; Phil Yang
> <Phil.Yang at arm.com>; dev at dpdk.org
> Cc: thomas at monjalon.net; david.marchand at redhat.com; Ananyev, Konstantin
> <konstantin.ananyev at intel.com>; jerinj at marvell.com;
> hemant.agrawal at nxp.com; Gavin Hu <Gavin.Hu at arm.com>; nd
> <nd at arm.com>; stable at dpdk.org; Eads, Gage <gage.eads at intel.com>;
> Richardson, Bruce <bruce.richardson at intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH v2 1/6] service: fix race condition for MT unsafe service
> 
> Hi Harry,
> 	Thanks for getting back on this.
> 
> <snip>
> 
> > > Subject: [PATCH v2 1/6] service: fix race condition for MT unsafe
> > > service
> > >
> > > From: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > >
> > > The MT unsafe service might get configured to run on another core
> > > while the service is running currently. This might result in the MT
> > > unsafe service running on multiple cores simultaneously. Use
> > > 'execute_lock' always when the service is MT unsafe.
> > >
> > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore")
> > > Cc: stable at dpdk.org
> > >
> > > Signed-off-by: Honnappa Nagarahalli <honnappa.nagarahalli at arm.com>
> > > Reviewed-by: Phil Yang <phil.yang at arm.com>
> > > ---
> >
> > Thanks for spinning a new revision - based on ML discussion previously, it
> > seems like the "use service-run-count" to avoid this race would be a complex
> > solution.
> >
> > Suggesting the following;
> > 1) Take the approach as per this patch, to always take the atomic, fixing the
> > race condition.
> Ok

I've micro-benchmarked this code change inside the service cores autotest, and it
introduces around 35 cycles of overhead per service call.  This is not ideal, but given
it's a bugfix, and by far the simplest method to fix this race-condition. Having
discussed and investigated multiple other solutions, I believe this is the right solution.
Thanks Honnappa and Phil for identifying and driving a solution.

I suggest to post the benchmarking unit-test addition patch, and integrate that
*before* the series under review here gets merged? This makes benchmarking
of the "before bugfix" performance in future easier should it be required.


> > 2) Add an API to service-cores, which allows "committing" of mappings.
> > Committing the mapping would imply that the mappings will not be changed
> > in future. With runtime-remapping being removed from the equation, the
> > existing branch-over-atomic optimization is valid again.
> Ok. Just to make sure I understand this:
> a) on the data plane, if commit API is called (probably a new state variable) and
> num_mapped_cores is set to 1, there is no need to take the lock.
> b) possible implementation of the commit API would check if
> num_mapped_cores for the service is set to 1 and set a variable to indicate that
> the lock is not required.
> 
> What do you think about asking the application to set  the service capability to
> MT_SAFE if it knows that the service will run on a single core? This would
> require us to change the documentation and does not require additional code.

That's a nice idea - I like that if applications want to micro-optimize around
the atomic, that they have a workaround/solution to do so, particularly that it
doesn't require code-changes and backporting.

Will send review and send feedback on the patches themselves.
Regards, -Harry

> > So this would offer applications two situations
> > A) No application change: possible performance regression due to atomic
> > always taken.
> > B) Call "commit" API, and regain the performance as per previous DPDK
> > versions.
> >
> > Thoughts/opinions on the above?  I've flagged the rest of the patchset for
> > review ASAP. Regards, -Harry
> >
> > >  lib/librte_eal/common/rte_service.c | 11 +++++------
> > >  1 file changed, 5 insertions(+), 6 deletions(-)
<snip patch changes>



More information about the stable mailing list