[PATCH v2] net/pcap: fix timeout of stopping device

Zhou, YidingX yidingx.zhou at intel.com
Mon Dec 5 02:58:49 CET 2022


> >>>>> On Tue,  6 Sep 2022 16:05:11 +0800 Yiding Zhou
> >>>>> <mailto:yidingx.zhou at intel.com> wrote:
> >>>>>
> >>>>>> The pcap file will be synchronized to the disk when stopping the device.
> >>>>>> It takes a long time if the file is large that would cause the
> >>>>>> 'detach sync request' timeout when the device is closed under
> >>>>>> multi-process scenario.
> >>>>>>
> >>>>>> This commit fixes the issue by using alarm handler to release dumper.
> >>>>>>
> >>>>>> Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private")
> >>>>>> Cc: mailto:stable at dpdk.org
> >>>>>>
> >>>>>> Signed-off-by: Yiding Zhou <mailto:yidingx.zhou at intel.com>
> >>>>>
> >>>>>
> >>>>> I think you need to redesign the handshake if this the case.
> >>>>> Forcing 30 second delay at the end of all uses of pcap is not acceptable.
> >>>>
> >>>> @Zhang, Qi Z Do we need to redesign the handshake to fix this?
> >>>
> >>> Hi, Ferruh
> >>> Sorry for the late reply.
> >>> I did not receive your email on Oct 6, I got your comments from patchwork.
> >>>
> >>> "Can you please provide more details on multi-process communication
> >>> and call trace, to help us think about a solution to address this
> >>> issue in a more generic way (not just for pcap but for any case
> >>> device close takes more than multi-process timeout)?"
> >>>
> >>> I try to explain this issue with a sequence diagram, hope it can be
> >>> displayed
> >> correctly in the mail.
> >>>
> >>>        thread                                 intr thread           intr thread             thread
> >>>     of secondary                       of secondary          of primary          of
> primary
> >>>              |                                              |                         |                          |
> >>>              |                                              |                         |                          |
> >>> rte_eal_hotplug_remove
> >>> rte_dev_remove
> >>> eal_dev_hotplug_request_to_primary
> >>> rte_mp_request_sync ------------------------------------------------------->|
> >>>
> >>> |
> >>>
> >> handle_secondary_request
> >>>                                                                                          |<-----------------|
> >>>                                                                                          |
> >>>                                                                    __handle_secondary_request
> >>>
> eal_dev_hotplug_request_to_secondary
> >>>            |<------------------------------------- rte_mp_request_sync
> >>>            |
> >>> handle_primary_request--------->|
> >>>                                                            |
> >>>                             __handle_primary_request
> >>>                                local_dev_remove(this will take long time)
> >>>                                             rte_mp_reply -------------------------------->|
> >>>                                                                                          |
> >>>                                                                              local_dev_remove
> >>>           |<-------------------------------------------------
> >>> rte_mp_reply
> >>>
> >>> The marked 'local_dev_remove()' in the secondary process will
> >>> perform a
> >> pcap file synchronization operation.
> >>> When the pcap file is too large, it will take a lot of time
> >>> (according to my test
> >> 100G takes 20+ seconds).
> >>> This caused the processing of hot_plug message to time out.
> >>
> >> Hi Yiding,
> >>
> >> Thanks for the information,
> >>
> >> Right now all MP operations timeout is hardcoded in the code and it
> >> is 5 seconds.
> >> Do you think does it work to have an API to set custom timeout,
> >> something like `rte_mp_timeout_set()`, and call this from pdump?
> >>
> >> This gives a generic solution for similar cases, not just for pcap.
> >> But my concern is if this is too much multi-process related internal
> >> detail to update, @Anatoly may comment on this.
> >
> > Hi, Ferruh
> > For pdump case only, I think the timeout is affected by pcap's size and other
> system components, such as the type of FS, system memory size.
> > It may be difficult to predict the specific time value for setting.
> 
> It doesn't have to be specific.
> 
> Point here is to have a multi process API to set timeout, instead of put a
> hardcoded timeout in pcap PMD.

OK, I understood.


More information about the stable mailing list