[v3,2/5] ring: add a non-blocking implementation

  This commit adds support for non-blocking circular ring enqueue and dequeue
functions. The ring uses a 128-bit compare-and-swap instruction, and thus
is currently limited to x86_64.

The algorithm is based on the original rte ring (derived from FreeBSD's
bufring.h) and inspired by Michael and Scott's non-blocking concurrent
queue. Importantly, it adds a modification counter to each ring entry to
ensure only one thread can write to an unused entry.

-----
Algorithm:

Multi-producer non-blocking enqueue:
1. Move the producer head index 'n' locations forward, effectively
   reserving 'n' locations.
2. For each pointer:
 a. Read the producer tail index, then ring[tail]. If ring[tail]'s
    modification counter isn't 'tail', retry.
 b. Construct the new entry: {pointer, tail + ring size}
 c. Compare-and-swap the old entry with the new. If unsuccessful, the
    next loop iteration will try to enqueue this pointer again.
 d. Compare-and-swap the tail index with 'tail + 1', whether or not step 2c
    succeeded. This guarantees threads can make forward progress.

Multi-consumer non-blocking dequeue:
1. Move the consumer head index 'n' locations forward, effectively
   reserving 'n' pointers to be dequeued.
2. Copy 'n' pointers into the caller's object table (ignoring the
   modification counter), starting from ring[tail], then compare-and-swap
   the tail index with 'tail + n'.  If unsuccessful, repeat step 2.

-----
Discussion:

There are two cases where the ABA problem is mitigated:
1. Enqueueing a pointer to the ring: without a modification counter
   tied to the tail index, the index could become stale by the time the
   enqueue happens, causing it to overwrite valid data. Tying the
   counter to the tail index gives us an expected value (as opposed to,
   say, a monotonically incrementing counter).

   Since the counter will eventually wrap, there is potential for the ABA
   problem. However, using a 64-bit counter makes this likelihood
   effectively zero.

2. Updating a tail index: the ABA problem can occur if the thread is
   preempted and the tail index wraps around. However, using 64-bit indexes
   makes this likelihood effectively zero.

With no contention, an enqueue of n pointers uses (1 + 2n) CAS operations
and a dequeue of n pointers uses 2. This algorithm has worse average-case
performance than the regular rte ring (particularly a highly-contended ring
with large bulk accesses), however:
- For applications with preemptible pthreads, the regular rte ring's
  worst-case performance (i.e. one thread being preempted in the
  update_tail() critical section) is much worse than the non-blocking
  ring's.
- Software caching can mitigate the average case performance for ring-based
  algorithms. For example, a non-blocking ring based mempool (a likely use
  case for this ring) with per-thread caching.

The non-blocking ring is enabled via a new flag, RING_F_NB. Because the
ring's memsize is now a function of its flags (the non-blocking ring
requires 128b for each entry), this commit adds a new argument ('flags') to
rte_ring_get_memsize(). An API deprecation notice will be sent in a
separate commit.

For ease-of-use, existing ring enqueue and dequeue functions work on both
regular and non-blocking rings. This introduces an additional branch in
the datapath, but this should be a highly predictable branch.
ring_perf_autotest shows a negligible performance impact; it's hard to
distinguish a real difference versus system noise.

                                  | ring_perf_autotest cycles with branch -
             Test                 |   ring_perf_autotest cycles without
------------------------------------------------------------------
SP/SC single enq/dequeue          | 0.33
MP/MC single enq/dequeue          | -4.00
SP/SC burst enq/dequeue (size 8)  | 0.00
MP/MC burst enq/dequeue (size 8)  | 0.00
SP/SC burst enq/dequeue (size 32) | 0.00
MP/MC burst enq/dequeue (size 32) | 0.00
SC empty dequeue                  | 1.00
MC empty dequeue                  | 0.00

Single lcore:
SP/SC bulk enq/dequeue (size 8)   | 0.49
MP/MC bulk enq/dequeue (size 8)   | 0.08
SP/SC bulk enq/dequeue (size 32)  | 0.07
MP/MC bulk enq/dequeue (size 32)  | 0.09

Two physical cores:
SP/SC bulk enq/dequeue (size 8)   | 0.19
MP/MC bulk enq/dequeue (size 8)   | -0.37
SP/SC bulk enq/dequeue (size 32)  | 0.09
MP/MC bulk enq/dequeue (size 32)  | -0.05

Two NUMA nodes:
SP/SC bulk enq/dequeue (size 8)   | -1.96
MP/MC bulk enq/dequeue (size 8)   | 0.88
SP/SC bulk enq/dequeue (size 32)  | 0.10
MP/MC bulk enq/dequeue (size 32)  | 0.46

Test setup: x86_64 build with default config, dual-socket Xeon E5-2699 v4,
running on isolcpus cores with a tickless scheduler. Each test run three
times and the results averaged.

Signed-off-by: Gage Eads <gage.eads@intel.com>
---
 lib/librte_ring/rte_ring.c           |  72 ++++-
 lib/librte_ring/rte_ring.h           | 550 +++++++++++++++++++++++++++++++++--
 lib/librte_ring/rte_ring_version.map |   7 +
 3 files changed, 587 insertions(+), 42 deletions(-)

Message ID	20190118152326.22686-3-gage.eads@intel.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F132E2BE2; Fri, 18 Jan 2019 16:24:38 +0100 (CET) Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id 435732B9E for <dev@dpdk.org>; Fri, 18 Jan 2019 16:24:35 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 18 Jan 2019 07:24:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,491,1539673200"; d="scan'208";a="292610275" Received: from txasoft-yocto.an.intel.com (HELO txasoft-yocto.an.intel.com.) ([10.123.72.192]) by orsmga005.jf.intel.com with ESMTP; 18 Jan 2019 07:24:33 -0800 From: Gage Eads <gage.eads@intel.com> To: dev@dpdk.org Cc: olivier.matz@6wind.com, arybchenko@solarflare.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, stephen@networkplumber.org Date: Fri, 18 Jan 2019 09:23:23 -0600 Message-Id: <20190118152326.22686-3-gage.eads@intel.com> X-Mailer: git-send-email 2.13.6 In-Reply-To: <20190118152326.22686-1-gage.eads@intel.com> References: <20190115235227.14013-1-gage.eads@intel.com> <20190118152326.22686-1-gage.eads@intel.com> Subject: [dpdk-dev] [PATCH v3 2/5] ring: add a non-blocking implementation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	Add non-blocking ring \| [v3,0/5] Add non-blocking ring [v3,1/5] ring: add 64-bit headtail structure [v3,2/5] ring: add a non-blocking implementation [v3,3/5] test_ring: add non-blocking ring autotest [v3,4/5] test_ring_perf: add non-blocking ring perf test [v3,5/5] mempool/ring: add non-blocking ring handlers

[v3,2/5] ring: add a non-blocking implementation

Checks

Commit Message

Comments

Patch