[dpdk-dev] Proposal of mbuf Race Condition Detection

Adrien Mazarguil adrien.mazarguil at 6wind.com
Thu Apr 13 16:59:10 CEST 2017


Hi Qinglai,

On Thu, Apr 13, 2017 at 04:38:19PM +0300, jigsaw wrote:
> Hi,
> 
> I have a proposal for mbuf race condition detection and I would like to get
> your opinions before
> committing any patch.
> 
> Race condition is the worst bug I can think of; as it causes crashing long
> after the first crime scene,
> and the consequence is unpredictable and difficult to understand.
> 
> Besides, race condition is very difficult to reproduce. Usually we get a
> coredump from live network,
> but the coredump itself delivers nearly zero info. We just know that the
> mbuf is somehow broken, and
> it is perhaps overwritten by another core at any moment.
> 
> There are tools such as Valgrind and ThreadSanitizer to capture this fault.
> But the overhead brought
> by the tools are too high to be deployed in performance test, not to
> mention in the live network. But
> race condition rarely happens under low pressure.
> 
> Even worse, even in theory, the tools mentioned are not capable of
> capturing the scenario, because
> they requires explicit calls on locking primitives such as pthread mutex.
> This is because the semantics
> are not understood by the tools without explicit lock/unlock.
> 
> With such known restrictions, I have a proposal roughly as below.
> 
> The idea is to ask coder to do explicit lock/unlock on each mbuf that
> matters. The pseudo code is as below:
> 
>     RTE_MBUF_LOCK(m);
>     /* use mbuf as usual */
>     ...
>     RTE_MBUF_UNLOCK(m);
> 
> During the protected critical section, only the core which holds the lock
> can access the mbuf; while
> other cores, if they dare to use mbuf, they will be punished by segfault.
> 
> Since the proposal shall be feasible at least in performance test, the
> overhead of locking and
> punishment must be small. And the access to mbuf must be as usual.
> 
> Hence, the RTE_MBUF_LOCK is actually doing the following (pseudo code):
> 
> RTE_MBUF_LOCK(m)
> {
>     store_per_core_cache(m, m->buf_addr);
>     m->buf_addr = NULL;
>     mb(); // memory barrier
> }
> 
> And RTE_MBUF_UNLOCK is simply the reverse:
> 
> RTE_MBUF_UNLOCK(m)
> {
>     purge_per_core_cache(m);
>     m->buf_addr = load_per_core_cache(m);
>     mb();
> }
> 
> As we can see that the idea is to re-write m->buf_addr with NULL, and store
> it in a per-core
> cache. The other cores, while trying to access the mbuf during critical
> section, will be certainly
> punished.
> 
> Then of course we need to keep the owner core working. This is done by
> making changes to
> mtod, as below:
> 
> #define rte_pktmbuf_mtod_offset(m, t, o) \
>     ((t)((char *)(m)->buf_addr + load_per_core_cache(m) + (m)->data_off +
> (o)))
> 
> The per-core cache of mbuf works like a key-value database, which works
> like a direct-mapping
> cache. If an eviction happens (a newly arriving mbuf must kick out an old
> one), then the we restore
> the buf_addr of the evicted mbuf. Of course the RTE_MBUF_UNLOCK will then
> take care of such
> situation.
> 
> Note that we expect the LOCK/UNLOCK to work even when the mbuf is freed and
> allocated, since
> we don't trust the refcnt. A double-free is actually very rare; but data
> race can occur without double-free. Therefore, we need to survive the
> allocation of mbuf, that is rte_pktmbuf_init, which reset the
> buf_addr.
> 
> Further, other dereference to buf_addr in rte_mbuf.h need to be revised.
> But it is not a big deal (I hope).
> 
> The runtime overhead of this implementation is very light-weighted, and
> probably is able to turned
> on even in live network. And it is not intrusive at all: no change is
> needed in struct rte_mbuf; we just
> need a O(N) space complexity (where N is number of cores), and O(1) runtime
> complexity.
> 
> Alright, that is basically what is in my mind. Before any patch is
> provided, or any perf analysis is made, I would like to know what are the
> risks form design point of view. Please leave your feedback.

Your proposal makes sense but I'm not sure where developers should call
RTE_MBUF_LOCK() and RTE_MBUF_UNLOCK(). Is it targeted at user applications
only? Is it to be used internally by DPDK as well? Does it include PMDs?

I think any overhead outside of debugging mode, as minimal as it is, is too
much of a "punishment" for well behaving applications. The cost of a memory
barrier can be extremely high and this is one of the reasons DPDK processes
mbufs as bulks every time it gets the chance. RTE_MBUF_LOCK() and
RTE_MBUF_UNLOCK() should thus exist under some compilation option disabled
by default, and thought as an additional debugging tool.

Also the implementation would probably be more expensive than your
suggestion, in my opinion the reference count must be taken into account
and/or RTE_MBUF_LOCK() must be recursive, because this is how mbufs
work. Freeing mbufs multiple times is perfectly valid in many cases.

How can one lock several mbufs at once by the way?

Since it affects performance, this can only make sense as a debugging tool
developers can use when they encounter some corruption issue they cannot
identify, somewhat like poisoning buffers before they are freed. Not sure
it's worth the trouble.

Regards,

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list