Bug 804 - distributor: exiting workers can hang distributor core
Summary: distributor: exiting workers can hang distributor core
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: examples (show other bugs)
Version: 21.08
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2021-09-02 20:03 CEST by Brian Poole
Modified: 2021-09-14 17:17 CEST (History)
1 user (show)



Attachments
patch to avoid hanging (430 bytes, patch)
2021-09-02 20:03 CEST, Brian Poole
Details | Diff

Description Brian Poole 2021-09-02 20:03:40 CEST
Created attachment 171 [details]
patch to avoid hanging

Hello,

I have been testing the distributor example using one interface and five cores - ./build/distributor_app -l 10-14 -- -p 1. This leaves me one worker thread. I noticed the application often hangs after I supply SIGINT and I have to manually kill the process.

I added some additional debugging and discovered the distributor core is not returning from rte_distributor_flush(). Looking inside that function, I see the loop waiting for total_outstanding() to return 0. I believe that requires all workers to have returned all in-flight packets?

Moving to the worker core, it calls rte_distributor_get_pkt(), does its processing, then loops to the start of the while(!quit_signal_work) block. When exiting, this is true, so it immediately exits the loop - without returning the last batch of packets. I believe this is what is causing the distributor core to fail to exit as it continues to loop waiting for workers to return their buffers.

As a test, I added a call to rte_distributor_return_pkt(d, id, buf, num) outside the worker while() loop but before the thread exits. I've run many tests and have not seen the process hang once.
Comment 1 Brian Poole 2021-09-03 16:21:49 CEST
Looking at the app/test/test_distributor.c file, it does exactly what I described above for the distributor example - rte_distributor_get_pkt() calls in the loop, then rte_distributor_return_pkt() outside the loop.

Note You need to log in before you can comment on or make changes to this bug.