Bug 804

Summary: distributor: exiting workers can hang distributor core
Product: DPDK Reporter: Brian Poole (brian90013)
Component: examplesAssignee: dev
Status: UNCONFIRMED ---    
Severity: normal CC: david.hunt
Priority: Normal    
Version: 21.08   
Target Milestone: ---   
Hardware: All   
OS: All   
Attachments: patch to avoid hanging

Description Brian Poole 2021-09-02 20:03:40 CEST
Created attachment 171 [details]
patch to avoid hanging

Hello,

I have been testing the distributor example using one interface and five cores - ./build/distributor_app -l 10-14 -- -p 1. This leaves me one worker thread. I noticed the application often hangs after I supply SIGINT and I have to manually kill the process.

I added some additional debugging and discovered the distributor core is not returning from rte_distributor_flush(). Looking inside that function, I see the loop waiting for total_outstanding() to return 0. I believe that requires all workers to have returned all in-flight packets?

Moving to the worker core, it calls rte_distributor_get_pkt(), does its processing, then loops to the start of the while(!quit_signal_work) block. When exiting, this is true, so it immediately exits the loop - without returning the last batch of packets. I believe this is what is causing the distributor core to fail to exit as it continues to loop waiting for workers to return their buffers.

As a test, I added a call to rte_distributor_return_pkt(d, id, buf, num) outside the worker while() loop but before the thread exits. I've run many tests and have not seen the process hang once.
Comment 1 Brian Poole 2021-09-03 16:21:49 CEST
Looking at the app/test/test_distributor.c file, it does exactly what I described above for the distributor example - rte_distributor_get_pkt() calls in the loop, then rte_distributor_return_pkt() outside the loop.