[dpdk-stable] [PATCH v3 1/3] ring: read tail using atomic load

Jerin Jacob jerin.jacob at caviumnetworks.com
Mon Oct 8 08:06:31 CEST 2018


-----Original Message-----
> Date: Sun, 7 Oct 2018 20:44:54 +0000
> From: Ola Liljedahl <Ola.Liljedahl at arm.com>
> To: Jerin Jacob <jerin.jacob at caviumnetworks.com>
> CC: "dev at dpdk.org" <dev at dpdk.org>, Honnappa Nagarahalli
>  <Honnappa.Nagarahalli at arm.com>, "Ananyev, Konstantin"
>  <konstantin.ananyev at intel.com>, "Gavin Hu (Arm Technology China)"
>  <Gavin.Hu at arm.com>, Steve Capper <Steve.Capper at arm.com>, nd <nd at arm.com>,
>  "stable at dpdk.org" <stable at dpdk.org>
> Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
> user-agent: Microsoft-MacOutlook/10.11.0.180909
> 


Could you please fix the email client for inline reply.

https://www.kernel.org/doc/html/v4.19-rc7/process/email-clients.html


> 
> On 07/10/2018, 06:03, "Jerin Jacob" <jerin.jacob at caviumnetworks.com> wrote:
> 
>     In arm64 case, it will have ATOMIC_RELAXED followed by asm volatile ("":::"memory") of rte_pause().
>     I would n't have any issue, if the generated code code is same or better than the exiting case. but it not the case, Right?
> The existing case is actually not interesting (IMO) as it exposes undefined behaviour which allows the compiler to do anything. But you seem to be satisfied with "works for me, right here right now". I think the cost of avoiding undefined behaviour is acceptable (actually I don't think it even will be noticeable).

I am not convinced because of use of volatile in head and tail indexes.
For me that brings the defined behavior.That the reason why I shared
the generated assembly code. If you think other way, Pick any compiler
and see generated output.

And

Freebsd implementation of ring buffer(Which DPDK derived from), Don't have
such logic, See https://github.com/freebsd/freebsd/blob/master/sys/sys/buf_ring.h#L108

See below too.

> 
> Skipping the compiler memory barrier in rte_pause() potentially allows for optimisations that provide much more benefit, e.g. hiding some cache miss latency for later loads. The DPDK ring buffer implementation is defined so to enable inlining of enqueue/dequeue functions into the caller, any code could immediately follow these calls.
> 
> From INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x
> Programming languages — C
> 
> 5.1.2.4
> 4 Two expression evaluations conflict if one of them modifies a memory location and the other one reads or modifies the same memory location.
> 
> 25 The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

IMO, Both condition will satisfy if the variable is volatile and 32bit read will atomic
for 32b and 64b machines. If not, the problem persist for generic case
as well(lib/librte_ring/rte_ring_generic.h)


I agree with you on C11 memory model semantics usage. The reason why I
propose name for the file as rte_ring_c11_mem.h as DPDK it self did not
had definitions for load acquire and store release semantics.
I was looking for taking load acquire and store release semantics
from C11 instead of creating new API like Linux kernel for FreeBSD(APIs
like  atomic_load_acq_32(), atomic_store_rel_32()). If the file name is your
concern then we could create new abstractions as well. That would help
exiting KNI problem as well.

I think, currently it mixed usage because, the same variable declaration
used for C11 vs non C11 usage.Ideally we wont need "volatile" for C11
case. Either we need to change only to C11 mode OR have APIs for 
atomic_load_acq_() and atomic_store_rel_() to allow both models like
Linux kernel and FreeBSD.

> 
> -- Ola
> 
> 
> 


More information about the stable mailing list