[dpdk-dev] [PATCH v2 0/5] lib/stack: improve lockfree C11 implementation
Steven Lariau
steven.lariau at arm.com
Fri Sep 25 19:43:34 CEST 2020
One implementation of the DPDK stack library is lockfree,
based on C11 memory model for atomics.
Some of these atomic operations use unnecessary memory orders,
that can be relaxed.
This patch relax some of these operations in order to improve
the performance of the stack library.
The patch was tested on several architectures, to ensure that
the implementation is correct, and to measure performance.
Below are the results for a few architectures on multithread stack
lockfree test.
The cycles count is the average number of cycles per item to perform
a bulk push / pop.
$sudo ./builddir/app/dpdk-test
RTE>>stack_lf_perf_autotest
difference compared to main
Cycles count on ThunderX2
2 cores, bulk size = 8: -15.85%
2 cores, bulk size = 32: -04.56%
4 cores, bulk size = 8: -05.00%
4 cores, bulk size = 32: -04.35%
16 cores, bulk size = 8: -02.38%
16 cores, bulk size = 32: -01.88%
difference compared to main
Cycles count on N1SDP
2 cores, batch size = 8: +00.77%
2 cores, batch size = 32: -16.00%
difference compared to main
Cycles count on Skylake
2 cores, bulk size = 8: -00.18%
2 cores, bulk size = 32: -00.95%
4 cores, bulk size = 8: -01.19%
4 cores, bulk size = 32: +00.64%
16 cores, bulk size = 8: +01.20%
16 cores, bulk size = 32: +00.48%
v2: add comment to explain why pop head CAS relaxed is valid
added Fixes information
Steven Lariau (5):
lib/stack: fix inconsistent weak / strong cas
lib/stack: remove push acquire fence
lib/stack: remove redundant orderings for list->len
lib/stack: reload head when pop fails
lib/stack: remove pop cas release ordering
lib/librte_stack/rte_stack_lf_c11.h | 32 +++++++++++++++++++----------
1 file changed, 21 insertions(+), 11 deletions(-)
--
2.17.1
More information about the dev
mailing list