[PATCH v2] net/pcap: fix timeout of stopping device

Ferruh Yigit ferruh.yigit at amd.com
Fri Dec 2 12:19:46 CET 2022


On 12/2/2022 10:13 AM, Zhou, YidingX wrote:
> 
> 
>>>>> On Tue,  6 Sep 2022 16:05:11 +0800
>>>>> Yiding Zhou <mailto:yidingx.zhou at intel.com> wrote:
>>>>>
>>>>>> The pcap file will be synchronized to the disk when stopping the device.
>>>>>> It takes a long time if the file is large that would cause the
>>>>>> 'detach sync request' timeout when the device is closed under
>>>>>> multi-process scenario.
>>>>>>
>>>>>> This commit fixes the issue by using alarm handler to release dumper.
>>>>>>
>>>>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private")
>>>>>> Cc: mailto:stable at dpdk.org
>>>>>>
>>>>>> Signed-off-by: Yiding Zhou <mailto:yidingx.zhou at intel.com>
>>>>>
>>>>>
>>>>> I think you need to redesign the handshake if this the case.
>>>>> Forcing 30 second delay at the end of all uses of pcap is not acceptable.
>>>>
>>>> @Zhang, Qi Z Do we need to redesign the handshake to fix this?
>>>
>>> Hi, Ferruh
>>> Sorry for the late reply.
>>> I did not receive your email on Oct 6, I got your comments from patchwork.
>>>
>>> "Can you please provide more details on multi-process communication
>>> and call trace, to help us think about a solution to address this
>>> issue in a more generic way (not just for pcap but for any case device
>>> close takes more than multi-process timeout)?"
>>>
>>> I try to explain this issue with a sequence diagram, hope it can be displayed
>> correctly in the mail.
>>>
>>>        thread                                 intr thread           intr thread             thread
>>>     of secondary                       of secondary          of primary          of primary
>>>              |                                              |                         |                          |
>>>              |                                              |                         |                          |
>>> rte_eal_hotplug_remove
>>> rte_dev_remove
>>> eal_dev_hotplug_request_to_primary
>>> rte_mp_request_sync ------------------------------------------------------->|
>>>                                                                                                                     |
>>>
>> handle_secondary_request
>>>                                                                                          |<-----------------|
>>>                                                                                          |
>>>                                                                    __handle_secondary_request
>>>                                                           eal_dev_hotplug_request_to_secondary
>>>            |<------------------------------------- rte_mp_request_sync
>>>            |
>>> handle_primary_request--------->|
>>>                                                            |
>>>                             __handle_primary_request
>>>                                local_dev_remove(this will take long time)
>>>                                             rte_mp_reply -------------------------------->|
>>>                                                                                          |
>>>                                                                              local_dev_remove
>>>           |<-------------------------------------------------
>>> rte_mp_reply
>>>
>>> The marked 'local_dev_remove()' in the secondary process will perform a
>> pcap file synchronization operation.
>>> When the pcap file is too large, it will take a lot of time (according to my test
>> 100G takes 20+ seconds).
>>> This caused the processing of hot_plug message to time out.
>>
>> Hi Yiding,
>>
>> Thanks for the information,
>>
>> Right now all MP operations timeout is hardcoded in the code and it is 5
>> seconds.
>> Do you think does it work to have an API to set custom timeout, something like
>> `rte_mp_timeout_set()`, and call this from pdump?
>>
>> This gives a generic solution for similar cases, not just for pcap.
>> But my concern is if this is too much multi-process related internal detail to
>> update, @Anatoly may comment on this.
> 
> Hi, Ferruh
> For pdump case only, I think the timeout is affected by pcap's size and other system components, such as the type of FS, system memory size.
> It may be difficult to predict the specific time value for setting.

It doesn't have to be specific.

Point here is to have a multi process API to set timeout, instead of put
a hardcoded timeout in pcap PMD.



More information about the stable mailing list