[v1,1/1] test/distributor: prevent return buffer overload

Message ID 20210119035910.8324-2-l.wojciechow@partner.samsung.com (mailing list archive)
State Accepted, archived
Delegated to: David Marchand
Headers
Series distributor test fix |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK
ci/intel-Testing success Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/iol-mellanox-Functional success Functional Testing PASS
ci/iol-abi-testing success Testing PASS
ci/iol-testing warning Testing issues
ci/iol-mellanox-Performance success Performance Testing PASS

Commit Message

Lukasz Wojciechowski Jan. 19, 2021, 3:59 a.m. UTC
  The distributor library implementation uses a cyclic queue to store
packets returned from workers. These packets can be later collected
with rte_distributor_returned_pkts() call.
However the queue has limited capacity. It is able to contain only
127 packets (RTE_DISTRIB_RETURNS_MASK).

Big burst tests sent 1024 packets in 32 packets bursts without waiting
until they are processed by the distributor. In case when tests were
run with big number of worker threads, it happened that more than
127 packets were returned from workers and put into cyclic queue.
This caused packets to be dropped by the queue, making them impossible
to be collected later with rte_distributor_returned_pkts() calls.
However the test waited for all packets to be returned infinitely.

This patch fixes the big burst test by not allowing more than
queue capacity packets to be processed at the same time, making
impossible to drop any packets.
It also cleans up duplicated code in the same test.

Bugzilla ID: 612
Fixes: c0de0eb82e40 ("distributor: switch over to new API")
Cc: david.hunt@intel.com
Cc: stable@dpdk.org

Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
---
 app/test/test_distributor.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)
  

Comments

David Marchand Jan. 28, 2021, 2:10 p.m. UTC | #1
On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
<l.wojciechow@partner.samsung.com> wrote:
>
> The distributor library implementation uses a cyclic queue to store
> packets returned from workers. These packets can be later collected
> with rte_distributor_returned_pkts() call.
> However the queue has limited capacity. It is able to contain only
> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>
> Big burst tests sent 1024 packets in 32 packets bursts without waiting
> until they are processed by the distributor. In case when tests were
> run with big number of worker threads, it happened that more than
> 127 packets were returned from workers and put into cyclic queue.
> This caused packets to be dropped by the queue, making them impossible
> to be collected later with rte_distributor_returned_pkts() calls.
> However the test waited for all packets to be returned infinitely.
>
> This patch fixes the big burst test by not allowing more than
> queue capacity packets to be processed at the same time, making
> impossible to drop any packets.
> It also cleans up duplicated code in the same test.
>
> Bugzilla ID: 612
> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> Cc: david.hunt@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>

Pasting from my reply to the cover letter:

> I reproduced the issue with starting a testpmd on the same cores in this system.
> I usually reproduce it after 1-2 minutes of continuously running the
> distributor_autotest unit test.
>
> I've applied your fix in my tree and I will let this loop run for a while.

This has been running fine for more than 30 minutes on my x86 28 cores system.
Tested-by: David Marchand <david.marchand@redhat.com>
  
Hunt, David Jan. 28, 2021, 4:46 p.m. UTC | #2
Hi Lukasz,

On 19/1/2021 3:59 AM, Lukasz Wojciechowski wrote:
> The distributor library implementation uses a cyclic queue to store
> packets returned from workers. These packets can be later collected
> with rte_distributor_returned_pkts() call.
> However the queue has limited capacity. It is able to contain only
> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>
> Big burst tests sent 1024 packets in 32 packets bursts without waiting
> until they are processed by the distributor. In case when tests were
> run with big number of worker threads, it happened that more than
> 127 packets were returned from workers and put into cyclic queue.
> This caused packets to be dropped by the queue, making them impossible
> to be collected later with rte_distributor_returned_pkts() calls.
> However the test waited for all packets to be returned infinitely.
>
> This patch fixes the big burst test by not allowing more than
> queue capacity packets to be processed at the same time, making
> impossible to drop any packets.
> It also cleans up duplicated code in the same test.
>
> Bugzilla ID: 612
> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> Cc: david.hunt@intel.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> ---


This patch cleans up the code nicely, and it makes sense to return 
packets in the do..while. LGTM.


Reviewed-by: David Hunt <david.hunt@intel.com>
  
David Marchand Jan. 29, 2021, 8:03 a.m. UTC | #3
On Thu, Jan 28, 2021 at 3:10 PM David Marchand
<david.marchand@redhat.com> wrote:
>
> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
> <l.wojciechow@partner.samsung.com> wrote:
> >
> > The distributor library implementation uses a cyclic queue to store
> > packets returned from workers. These packets can be later collected
> > with rte_distributor_returned_pkts() call.
> > However the queue has limited capacity. It is able to contain only
> > 127 packets (RTE_DISTRIB_RETURNS_MASK).
> >
> > Big burst tests sent 1024 packets in 32 packets bursts without waiting
> > until they are processed by the distributor. In case when tests were
> > run with big number of worker threads, it happened that more than
> > 127 packets were returned from workers and put into cyclic queue.
> > This caused packets to be dropped by the queue, making them impossible
> > to be collected later with rte_distributor_returned_pkts() calls.
> > However the test waited for all packets to be returned infinitely.
> >
> > This patch fixes the big burst test by not allowing more than
> > queue capacity packets to be processed at the same time, making
> > impossible to drop any packets.
> > It also cleans up duplicated code in the same test.
> >
> > Bugzilla ID: 612
> > Fixes: c0de0eb82e40 ("distributor: switch over to new API")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Tested-by: David Marchand <david.marchand@redhat.com>
Reviewed-by: David Hunt <david.hunt@intel.com>

Applied, thanks Lukasz.

This should fix the issue seen at UNH on the ARM server.
  
Lukasz Wojciechowski Jan. 29, 2021, 12:36 p.m. UTC | #4
Thank you guys!

W dniu 29.01.2021 o 09:03, David Marchand pisze:
> On Thu, Jan 28, 2021 at 3:10 PM David Marchand
> <david.marchand@redhat.com> wrote:
>> On Tue, Jan 19, 2021 at 4:59 AM Lukasz Wojciechowski
>> <l.wojciechow@partner.samsung.com> wrote:
>>> The distributor library implementation uses a cyclic queue to store
>>> packets returned from workers. These packets can be later collected
>>> with rte_distributor_returned_pkts() call.
>>> However the queue has limited capacity. It is able to contain only
>>> 127 packets (RTE_DISTRIB_RETURNS_MASK).
>>>
>>> Big burst tests sent 1024 packets in 32 packets bursts without waiting
>>> until they are processed by the distributor. In case when tests were
>>> run with big number of worker threads, it happened that more than
>>> 127 packets were returned from workers and put into cyclic queue.
>>> This caused packets to be dropped by the queue, making them impossible
>>> to be collected later with rte_distributor_returned_pkts() calls.
>>> However the test waited for all packets to be returned infinitely.
>>>
>>> This patch fixes the big burst test by not allowing more than
>>> queue capacity packets to be processed at the same time, making
>>> impossible to drop any packets.
>>> It also cleans up duplicated code in the same test.
>>>
>>> Bugzilla ID: 612
>>> Fixes: c0de0eb82e40 ("distributor: switch over to new API")
>>> Cc: stable@dpdk.org
>>>
>>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
>> Tested-by: David Marchand <david.marchand@redhat.com>
> Reviewed-by: David Hunt <david.hunt@intel.com>
>
> Applied, thanks Lukasz.
>
> This should fix the issue seen at UNH on the ARM server.
>
  

Patch

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index f4c6229f1..961f326cd 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -217,6 +217,8 @@  sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	clear_packet_count();
 	struct rte_mbuf *many_bufs[BIG_BATCH], *return_bufs[BIG_BATCH];
 	unsigned num_returned = 0;
+	unsigned int num_being_processed = 0;
+	unsigned int return_buffer_capacity = 127;/* RTE_DISTRIB_RETURNS_MASK */
 
 	/* flush out any remaining packets */
 	rte_distributor_flush(db);
@@ -233,16 +235,16 @@  sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
 		rte_distributor_process(db,
 				&many_bufs[i*BURST], BURST);
-		count = rte_distributor_returned_pkts(db,
-				&return_bufs[num_returned],
-				BIG_BATCH - num_returned);
-		num_returned += count;
+		num_being_processed += BURST;
+		do {
+			count = rte_distributor_returned_pkts(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_being_processed -= count;
+			num_returned += count;
+			rte_distributor_flush(db);
+		} while (num_being_processed + BURST > return_buffer_capacity);
 	}
-	rte_distributor_flush(db);
-	count = rte_distributor_returned_pkts(db,
-		&return_bufs[num_returned],
-			BIG_BATCH - num_returned);
-	num_returned += count;
 	retries = 0;
 	do {
 		rte_distributor_flush(db);