[PATCH v2] net/pcap: fix timeout of stopping device

Stephen Hemminger stephen at networkplumber.org
Tue Nov 22 18:28:57 CET 2022


On Tue, 22 Nov 2022 09:25:33 +0000
"Zhou, YidingX" <yidingx.zhou at intel.com> wrote:

> > -----Original Message-----
> > From: Zhou, YidingX <yidingx.zhou at intel.com>
> > Sent: Wednesday, September 21, 2022 3:15 PM
> > To: Stephen Hemminger <stephen at networkplumber.org>; Zhang, Qi Z
> > <qi.z.zhang at intel.com>
> > Cc: dev at dpdk.org; Burakov, Anatoly <anatoly.burakov at intel.com>; He,
> > Xingguang <xingguang.he at intel.com>; stable at dpdk.org
> > Subject: RE: [PATCH v2] net/pcap: fix timeout of stopping device
> > 
> > 
> >   
> > > -----Original Message-----
> > > From: Stephen Hemminger <mailto:stephen at networkplumber.org>
> > > Sent: Tuesday, September 6, 2022 10:58 PM
> > > To: Zhou, YidingX <mailto:yidingx.zhou at intel.com>
> > > Cc: mailto:dev at dpdk.org; Zhang, Qi Z <mailto:qi.z.zhang at intel.com>; Burakov, Anatoly
> > > <mailto:anatoly.burakov at intel.com>; He, Xingguang <mailto:xingguang.he at intel.com>;
> > > mailto:stable at dpdk.org
> > > Subject: Re: [PATCH v2] net/pcap: fix timeout of stopping device
> > >
> > > On Tue,  6 Sep 2022 16:05:11 +0800
> > > Yiding Zhou <mailto:yidingx.zhou at intel.com> wrote:
> > >  
> > > > The pcap file will be synchronized to the disk when stopping the device.
> > > > It takes a long time if the file is large that would cause the
> > > > 'detach sync request' timeout when the device is closed under
> > > > multi-process scenario.
> > > >
> > > > This commit fixes the issue by using alarm handler to release dumper.
> > > >
> > > > Fixes: 0ecfb6c04d54 ("net/pcap: move handler to process private")
> > > > Cc: mailto:stable at dpdk.org
> > > >
> > > > Signed-off-by: Yiding Zhou <mailto:yidingx.zhou at intel.com>  
> > >
> > >
> > > I think you need to redesign the handshake if this the case.
> > > Forcing 30 second delay at the end of all uses of pcap is not acceptable.  
> > 
> > @Zhang, Qi Z Do we need to redesign the handshake to fix this?  
> 
> Hi, Ferruh
> Sorry for the late reply.
> I did not receive your email on Oct 6, I got your comments from patchwork.
> 
> "Can you please provide more details on multi-process communication and 
> call trace, to help us think about a solution to address this issue in a 
> more generic way (not just for pcap but for any case device close takes 
> more than multi-process timeout)?"
> 
> I try to explain this issue with a sequence diagram, hope it can be displayed correctly in the mail.
> 
>        thread                                 intr thread           intr thread             thread
>     of secondary                       of secondary          of primary          of primary
>              |                                              |                         |                          |
>              |                                              |                         |                          |
> rte_eal_hotplug_remove
> rte_dev_remove
> eal_dev_hotplug_request_to_primary
> rte_mp_request_sync ------------------------------------------------------->|
>                                                                                                                     |
>                                                                                               handle_secondary_request
>                                                                                          |<-----------------|
>                                                                                          |
>                                                                    __handle_secondary_request
>                                                           eal_dev_hotplug_request_to_secondary
>            |<------------------------------------- rte_mp_request_sync
>            |
> handle_primary_request--------->|
>                                                            |
>                             __handle_primary_request
>                                local_dev_remove(this will take long time)
>                                             rte_mp_reply -------------------------------->|                              
>                                                                                          |
>                                                                              local_dev_remove
>           |<------------------------------------------------- rte_mp_reply
> 
> The marked 'local_dev_remove()' in the secondary process will perform a pcap file synchronization operation.
> When the pcap file is too large, it will take a lot of time (according to my test 100G takes 20+ seconds).
> This caused the processing of hot_plug message to time out.


Part of the problem maybe a hidden file sync in some library.
Normally, closing a file should be fast even with lots of outstanding data.
The actual write done by OS will continue from file cache.

I wonder if doing some kind of fadvise call might help see POSIX_FADV_SEQUENTIAL or POSIX_FADV_DONTNEED


More information about the stable mailing list