[dpdk-stable] [PATCH v1 1/6] app/test: fix deadlock in distributor test

Lukasz Wojciechowski l.wojciechow at partner.samsung.com
Thu Sep 17 16:01:24 CEST 2020


Hi David,

W dniu 17.09.2020 o 13:21, David Hunt pisze:
> Hi Lukasz,
>
> On 15/9/2020 8:34 PM, Lukasz Wojciechowski wrote:
>> The sanity test with worker shutdown delegates all bufs
>> to be processed by a single lcore worker, then it freezes
>> one of the lcore workers and continues to send more bufs.
>>
>> Problem occurred if freezed lcore is the same as the one
>> that is processing the mbufs. The lcore processing mbufs
>> might be different every time test is launched.
>> This is caused by keeping the value of wkr static variable
>> in rte_distributor_process function between running test cases.
>>
>> Test freezed always lcore with 0 id. The patch changes avoids
>> possible collision by freezing lcore with zero_idx. The lcore
>> that receives the data updates the zero_idx, so it is not freezed
>> itself.
>>
>> To reproduce the issue fixed by this patch, please run
>> distributor_autotest command in test app several times in a row.
>>
>> Fixes: c3eabff124e6 ("distributor: add unit tests")
>> Cc: bruce.richardson at intel.com
>> Cc: stable at dpdk.org
>>
>> Signed-off-by: Lukasz Wojciechowski <l.wojciechow at partner.samsung.com>
>> ---
>>   app/test/test_distributor.c | 22 ++++++++++++++++++++--
>>   1 file changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
>> index ba1f81cf8..35b25463a 100644
>> --- a/app/test/test_distributor.c
>> +++ b/app/test/test_distributor.c
>> @@ -28,6 +28,7 @@ struct worker_params worker_params;
>>   static volatile int quit;      /**< general quit variable for all 
>> threads */
>>   static volatile int zero_quit; /**< var for when we just want thr0 
>> to quit*/
>>   static volatile unsigned worker_idx;
>> +static volatile unsigned zero_idx;
>>     struct worker_stats {
>>       volatile unsigned handled_packets;
>> @@ -346,27 +347,43 @@ handle_work_for_shutdown_test(void *arg)
>>       unsigned int total = 0;
>>       unsigned int i;
>>       unsigned int returned = 0;
>> +    unsigned int zero_id = 0;
>>       const unsigned int id = __atomic_fetch_add(&worker_idx, 1,
>>               __ATOMIC_RELAXED);
>>         num = rte_distributor_get_pkt(d, id, buf, buf, num);
>>   +    zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
>> +    if (id == zero_id && num > 0) {
>> +        zero_id = (zero_id + 1) % __atomic_load_n(&worker_idx,
>> +            __ATOMIC_ACQUIRE);
>> +        __atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE);
>> +    }
>> +
>>       /* wait for quit single globally, or for worker zero, wait
>>        * for zero_quit */
>> -    while (!quit && !(id == 0 && zero_quit)) {
>> +    while (!quit && !(id == zero_id && zero_quit)) {
>>           worker_stats[id].handled_packets += num;
>>           count += num;
>>           for (i = 0; i < num; i++)
>>               rte_pktmbuf_free(buf[i]);
>>           num = rte_distributor_get_pkt(d,
>>                   id, buf, buf, num);
>> +
>> +        zero_id = __atomic_load_n(&zero_idx, __ATOMIC_ACQUIRE);
>> +        if (id == zero_id && num > 0) {
>> +            zero_id = (zero_id + 1) % __atomic_load_n(&worker_idx,
>> +                __ATOMIC_ACQUIRE);
>> +            __atomic_store_n(&zero_idx, zero_id, __ATOMIC_RELEASE);
>> +        }
>> +
>>           total += num;
>>       }
>>       worker_stats[id].handled_packets += num;
>>       count += num;
>>       returned = rte_distributor_return_pkt(d, id, buf, num);
>>   -    if (id == 0) {
>> +    if (id == zero_id) {
>>           /* for worker zero, allow it to restart to pick up last packet
>>            * when all workers are shutting down.
>>            */
>> @@ -586,6 +603,7 @@ quit_workers(struct worker_params *wp, struct 
>> rte_mempool *p)
>>       rte_eal_mp_wait_lcore();
>>       quit = 0;
>>       worker_idx = 0;
>> +    zero_idx = 0;
>>   }
>>     static int
>
>
> Lockup reproducable if you run the distributor_autotest 19 times in 
> succesion. I was able to run the test many times more than that with 
> the patch applied. Thanks.
The number depends on number of lcores on your test environment.
>
> Tested-by: David Hunt <david.hunt at intel.com>


Thank you very much for reviewing and testing whole series.

>
>
>
-- 
Lukasz Wojciechowski
Principal Software Engineer

Samsung R&D Institute Poland
Samsung Electronics
Office +48 22 377 88 25
l.wojciechow at partner.samsung.com



More information about the stable mailing list