[PATCH v3] test/service: fix spurious failures by extending timeout

Van Haaren, Harry harry.van.haaren at intel.com
Tue Jan 31 18:24:40 CET 2023


> -----Original Message-----
> From: David Marchand <david.marchand at redhat.com>
> Sent: Thursday, January 26, 2023 9:30 AM
> To: Van Haaren, Harry <harry.van.haaren at intel.com>
> Cc: dev at dpdk.org; dpdklab at iol.unh.edu; ci at dpdk.org;
> Honnappa.Nagarahalli at arm.com; mattias.ronnblom
> <mattias.ronnblom at ericsson.com>; thomas at monjalon.net; Morten Brørup
> <mb at smartsharesystems.com>; Tyler Retzlaff <roretzla at linux.microsoft.com>;
> Aaron Conole <aconole at redhat.com>
> Subject: Re: [PATCH v3] test/service: fix spurious failures by extending timeout
> 
> Hello Harry,

Hi David,

> On Thu, Oct 6, 2022 at 9:33 PM David Marchand <david.marchand at redhat.com>
> wrote:
> >
> > On Thu, Oct 6, 2022 at 3:27 PM Morten Brørup <mb at smartsharesystems.com>
> wrote:
> > > > This commit extends the timeout for service_may_be_active()
> > > > from 100ms to 1000ms. Local testing on a idle and loaded system
> > > > (compiling DPDK with all cores) always completes after 1 ms.
> > > >
> > > > The wait time for a service-lcore to finish is also extended
> > > > from 100ms to 1000ms.
> > > >
> > > > The same timeout waiting code was duplicated in two tests, and
> > > > is now refactored to a standalone function avoiding duplication.
> > > >
> > > > Reported-by: David Marchand <david.marchand at redhat.com>
> > > > Suggested-by: Mattias Ronnblom <mattias.ronnblom at ericsson.com>
> > > > Signed-off-by: Harry van Haaren <harry.van.haaren at intel.com>
> > > Acked-by: Morten Brørup <mb at smartsharesystems.com>
> > Reviewed-by: Mattias Rönnblom <mattias.ronnblom at ericsson.com>
> >
> > Ok, let's see if the situation gets better with this.
> > Applied, thanks.
> 
> I took a look at the january month failures at UNH.
> 
> Downloads/dpdk_31608e4db568_2023-01-03_06-58-00_NA/out/testlog.txt:EAL:
> Test assert service_lcore_attr_get line 422 failed: Service lcore not
> stopped after waiting.
> Extending the timeout just made it less likely.

Aha, okay.

<snip>
> The timeout approach just does not have its place in a functional test.
> Either this test is rewritten, or it must go to the performance tests
> list so that we stop getting false positives.
> Can you work on this?

I'll investigate various approaches on Thursday and reply here with suggested next steps.

Regards, -Harry


More information about the dev mailing list