[dpdk-stable] patch 'test/service: fix race in attr check' has been queued to stable release 20.11.4

Xueming Li xuemingl at nvidia.com
Wed Nov 10 07:30:03 CET 2021


Hi,

FYI, your patch has been queued to stable release 20.11.4

Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet.
It will be pushed if I get no objections before 11/12/21. So please
shout if anyone has objections.

Also note that after the patch there's a diff of the upstream commit vs the
patch applied to the branch. This will indicate if there was any rebasing
needed to apply to the stable branch. If there were code changes for rebasing
(ie: not only metadata diffs), please double check that the rebase was
correctly done.

Queued patches are on a temporary branch at:
https://github.com/steevenlee/dpdk

This queued commit can be viewed at:
https://github.com/steevenlee/dpdk/commit/cc7ded572d96e2438836ae07bba0473f9aa5f055

Thanks.

Xueming Li <xuemingl at nvidia.com>

---
>From cc7ded572d96e2438836ae07bba0473f9aa5f055 Mon Sep 17 00:00:00 2001
From: David Marchand <david.marchand at redhat.com>
Date: Mon, 11 Oct 2021 16:54:30 +0200
Subject: [PATCH] test/service: fix race in attr check
Cc: Xueming Li <xuemingl at nvidia.com>

[ upstream commit 4a985f4e84c56fa6d562b4e920c4912203b95eb6 ]

The CI reported rare (and cryptic) failures like:

RTE>>service_autotest
 + ------------------------------------------------------- +
 + Test Suite : service core test suite
 + ------------------------------------------------------- +
 + TestCase [ 0] : unregister_all succeeded
 + TestCase [ 1] : service_name succeeded
 + TestCase [ 2] : service_get_by_name succeeded
Service dummy_service Summary
  dummy_service: stats 1	calls 0	cycles 0	avg: 0
Service dummy_service Summary
  dummy_service: stats 0	calls 0	cycles 0	avg: 0
 + TestCase [ 3] : service_dump succeeded
 + TestCase [ 4] : service_attr_get failed
 + TestCase [ 5] : service_lcore_attr_get succeeded
 + TestCase [ 6] : service_probe_capability succeeded
 + TestCase [ 7] : service_start_stop succeeded
 + TestCase [ 8] : service_lcore_add_del succeeded
 + TestCase [ 9] : service_lcore_start_stop succeeded
 + TestCase [10] : service_lcore_en_dis_able succeeded
 + TestCase [11] : service_mt_unsafe_poll succeeded
 + TestCase [12] : service_mt_safe_poll succeeded
perf test for MT Safe: 42.7 cycles per call
 + TestCase [13] : service_app_lcore_mt_safe succeeded
perf test for MT Unsafe: 73.3 cycles per call
 + TestCase [14] : service_app_lcore_mt_unsafe succeeded
 + TestCase [15] : service_may_be_active succeeded
 + TestCase [16] : service_active_two_cores succeeded
 + ------------------------------------------------------- +
 + Test Suite Summary : service core test suite
 + ------------------------------------------------------- +
 + Tests Total :       17
 + Tests Skipped :      0
 + Tests Executed :    17
 + Tests Unsupported:   0
 + Tests Passed :      16
 + Tests Failed :       1
 + ------------------------------------------------------- +
Test Failed
RTE>>
stderr:
EAL: Detected CPU lcores: 16
EAL: Detected NUMA nodes: 2
EAL: Detected static linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/service_autotest/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 1048576 kB hugepages reported
EAL: VFIO support initialized
EAL: Device 0000:03:00.0 is not NUMA-aware, defaulting socket to 0
APP: HPET is not enabled, using TSC as default timer
EAL: Test assert service_attr_get line 340 failed: attr_get() call didn't
 get call count (zero)

According to API, trying to stop a service lcore is not possible if this
lcore is the only one associated to a service.
Doing this will result in a -EBUSY return code from
rte_service_lcore_stop() which the service_attr_get subtest was not
checking.
This left the service lcore running, and a race existed with the main
lcore on checking the service attributes which triggered this CI
failure.

To fix this, dissociate the service lcore with current service.

Once fixed this first issue, a race still exists, because the
wait_slcore_inactive helper added in a previous fix was not
paired with a check that the service lcore _did_ stop.

Add missing check on rte_service_lcore_may_be_active.

Fixes: 4d55194d76a4 ("service: add attribute get function")
Fixes: 52bb6be259ff ("test/service: fix race condition on stopping lcore")

Signed-off-by: David Marchand <david.marchand at redhat.com>
Acked-by: Aaron Conole <aconole at redhat.com>
Acked-by: Harry van Haaren <harry.van.haaren at intel.com>
---
 app/test/test_service_cores.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 8b61dcfd30..0aee8c04e3 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -314,10 +314,16 @@ service_attr_get(void)
 	TEST_ASSERT_EQUAL(1, cycles_gt_zero,
 			"attr_get() failed to get cycles (expected > zero)");
 
-	rte_service_lcore_stop(slcore_id);
+	TEST_ASSERT_EQUAL(0, rte_service_map_lcore_set(id, slcore_id, 0),
+			"Disabling valid service and core failed");
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_stop(slcore_id),
+			"Failed to stop service lcore");
 
 	wait_slcore_inactive(slcore_id);
 
+	TEST_ASSERT_EQUAL(0, rte_service_lcore_may_be_active(slcore_id),
+			  "Service lcore not stopped after waiting.");
+
 	TEST_ASSERT_EQUAL(0, rte_service_attr_get(id, attr_calls, &attr_value),
 			"Valid attr_get() call didn't return success");
 	TEST_ASSERT_EQUAL(1, (attr_value > 0),
-- 
2.33.0

---
  Diff of the applied patch vs upstream commit (please double-check if non-empty:
---
--- -	2021-11-10 14:17:07.269361164 +0800
+++ 0119-test-service-fix-race-in-attr-check.patch	2021-11-10 14:17:01.884079315 +0800
@@ -1 +1 @@
-From 4a985f4e84c56fa6d562b4e920c4912203b95eb6 Mon Sep 17 00:00:00 2001
+From cc7ded572d96e2438836ae07bba0473f9aa5f055 Mon Sep 17 00:00:00 2001
@@ -4,0 +5,3 @@
+Cc: Xueming Li <xuemingl at nvidia.com>
+
+[ upstream commit 4a985f4e84c56fa6d562b4e920c4912203b95eb6 ]
@@ -79 +81,0 @@
-Cc: stable at dpdk.org
@@ -89 +91 @@
-index 8659a1526c..ced6ed0081 100644
+index 8b61dcfd30..0aee8c04e3 100644
@@ -92 +94 @@
-@@ -318,10 +318,16 @@ service_attr_get(void)
+@@ -314,10 +314,16 @@ service_attr_get(void)


More information about the stable mailing list