[dpdk-dev] [RFC 0/3] tqs: add thread quiescent state library

Stephen Hemminger stephen at networkplumber.org
Fri Nov 30 19:32:07 CET 2018


On Fri, 30 Nov 2018 16:26:25 +0000
Luca Boccassi <bluca at debian.org> wrote:

> On Fri, 2018-11-30 at 02:13 +0000, Honnappa Nagarahalli wrote:
> > >   
> > > > > > Mixed feelings about this one.
> > > > > > 
> > > > > > Love to see RCU used for more things since it is much better
> > > > > > than
> > > > > > reader/writer locks for many applications. But hate to see
> > > > > > DPDK
> > > > > > reinventing every other library and not reusing code.
> > > > > > Userspace
> > > > > > RCU https://liburcu.org/ is widely supported by distro's,
> > > > > > more
> > > > > > throughly tested and documented, and more flexiple.
> > > > > > 
> > > > > > The issue with many of these reinventions is a tradeoff of
> > > > > > DPDK
> > > > > > growing another dependency on external library versus using
> > > > > > common  
> > > 
> > > code.  
> > > > > >   
> > > > 
> > > > Agree with the dependency issues. Sometimes flexibility also
> > > > causes confusion  
> > > 
> > > and features that are not necessarily required for a targeted use
> > > case. I have
> > > seen that much of the functionality that can be left to the
> > > application is
> > > implemented as part of the library.  
> > > > I think having it in DPDK will give us control over the amount of
> > > > capability this  
> > > 
> > > library will have and freedom over changes we would like to make to
> > > such a
> > > library. I also view DPDK as one package where all things required
> > > for data
> > > plane development are available.  
> > > >   
> > > > > > For RCU, the big issue for me is the testing and
> > > > > > documentation of
> > > > > > how to do RCU safely. Many people get it wrong!  
> > > > 
> > > > Hopefully, we all will do a better job collectively :)
> > > >   
> > > > >   
> > > 
> > > Reinventing RCU is not helping anyone.  
> > 
> > IMO, this depends on what the rte_tqs has to offer and what the
> > requirements are. Before starting this patch, I looked at the liburcu
> > APIs. I have to say, fairly quickly (no offense) I concluded that
> > this does not address DPDK's needs. I took a deeper look at the
> > APIs/code in the past day and I still concluded the same. My partial
> > analysis (analysis of more APIs can be done, I do not have cycles at
> > this point) is as follows:
> > 
> > The reader threads' information is maintained in a linked list[1].
> > This linked list is protected by a mutex lock[2]. Any
> > additions/deletions/traversals of this list are blocking and cannot
> > happen in parallel.
> > 
> > The API, 'synchronize_rcu' [3] (similar functionality to
> > rte_tqs_check call) is a blocking call. There is no option provided
> > to make it non-blocking. The writer spins cycles while waiting for
> > the grace period to get over.
> > 
> > 'synchronize_rcu' also has grace period lock [4]. If I have multiple
> > writers running on data plane threads, I cannot call this API to
> > reclaim the memory in the worker threads as it will block other
> > worker threads. This means, there is an extra thread required (on the
> > control plane?) which does garbage collection and a method to push
> > the pointers from worker threads to the garbage collection thread.
> > This also means the time duration from delete to free increases
> > putting pressure on amount of memory held up.
> > Since this API cannot be called concurrently by multiple writers,
> > each writer has to wait for other writer's grace period to get over
> > (i.e. multiple writer threads cannot overlap their grace periods).
> > 
> > This API also has to traverse the linked list which is not very well
> > suited for calling on data plane.
> > 
> > I have not gone too much into rcu_thread_offline[5] API. This again
> > needs to be used in worker cores and does not look to be very
> > optimal.
> > 
> > I have glanced at rcu_quiescent_state [6], it wakes up the thread
> > calling 'synchronize_rcu' which seems good amount of code for the
> > data plane.
> > 
> > [1] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st
> > atic/urcu-qsbr.h#L85
> > [2] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c
> > #L68
> > [3] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c
> > #L344
> > [4] https://github.com/urcu/userspace-rcu/blob/master/src/urcu-qsbr.c
> > #L58
> > [5] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st
> > atic/urcu-qsbr.h#L211
> > [6] https://github.com/urcu/userspace-rcu/blob/master/include/urcu/st
> > atic/urcu-qsbr.h#L193
> > 
> > Coming to what is provided in rte_tqs:
> > 
> > The synchronize_rcu functionality is split in to 2 APIs:
> > rte_tqs_start and rte_tqs_check. The reader data is maintained as an
> > array.
> > 
> > Both the APIs are lock-free, allowing them to be called from multiple
> > threads concurrently. This allows multiple writers to wait for their
> > grace periods concurrently as well as overlap their grace periods.
> > rte_tqs_start API returns a token which provides the ability to
> > separate the quiescent state waiting of different writers. Hence, no
> > writer waits for other writer's grace period to get over. 
> > Since these 2 APIs are lock-free, they can be called from writers
> > running on worker cores as well without the need for a separate
> > thread to do garbage collection.
> > 
> > The separation into 2 APIs provides the ability for writers to not
> > spin cycles waiting for the grace period to get over. This enables
> > different ways of doing garbage collection. For ex: a data structure
> > delete API could remove the entry from the data structure, call
> > rte_tqs_start and return back to the caller. On the invocation of
> > next API call of the library, the API can call rte_tqs_check (which
> > will mostly indicate that the grace period is complete) and free the
> > previously deleted entry.
> > 
> > rte_tqs_update (mapping to rcu_quiescent_state) is pretty small and
> > simple.
> > 
> > rte_tqs_register and rte_tqs_unregister APIs are lock free. Hence
> > additional APIs like rcu_thread_online and rcu_thread_offline are not
> > required. The rte_tqs_unregister API (when compared to
> > rcu_thread_offline) is much simple and conducive to be used in worker
> > threads.  
> 
> liburcu has many flavours already, qsbr being one of them. If none of
> those are optimal for this use case, why not work with upstream to
> either improve the existing flavours, or add a new one?
> 
> You have the specific knowledge and expertise about the requirements
> needs for the implementation for this use case, and they have the long-
> time and extensive experience maintaining the library on a wide range
> of systems and use cases. Why not combine both?
> I might be wrong, but to me, nothing described above seems to be
> particularly or uniquely tied to implementing a software dataplane or
> to DPDK. IMHO we should pool and share wherever possible, rather than
> build an ecosystem closed onto itself.
> 
> Just my 2c of course!
> 
> > > DPDK needs to fix its dependency model, and just admit that it is
> > > ok to build off
> > > of more than glibc.
> > > 
> > > Having used liburcu, it can be done in a small manner and really
> > > isn't that
> > > confusing.
> > > 
> > > Is your real issue the LGPL license of liburcu for your skittish
> > > customers?  
> > 
> > I have not had any discussions on this. Customers are mainly focused
> > on having a solution on which they have meaningful control. They want
> > to be able to submit a patch and change things if required. For ex:
> > barriers for Arm [7] are not optimal. How easy is it to change this
> > and get it into distros (there are both internal and external factors
> > here)?  
> 
> It's just as easy (or as hard) as it is with DPDK. So it's either wait
> for the distros to update, or rebuild locally.
> 

Either way it would be useful to have the broader RCU community
in on the discussion.


More information about the dev mailing list