[dpdk-dev] [PATCH] i40e: improve performance of vector PMD

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Apr 14 16:00:11 CEST 2016



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> Sent: Thursday, April 14, 2016 2:50 PM
> To: dev at dpdk.org
> Cc: Zhang, Helin; Wu, Jingjing
> Subject: Re: [dpdk-dev] [PATCH] i40e: improve performance of vector PMD
> 
> On Thu, Apr 14, 2016 at 11:15:21AM +0100, Bruce Richardson wrote:
> > An analysis of the i40e code using Intel® VTune™ Amplifier 2016 showed
> > that the code was unexpectedly causing stalls due to "Loads blocked by
> > Store Forwards". This can occur when a load from memory has to wait
> > due to the prior store being to the same address, but being of a smaller
> > size i.e. the stored value cannot be directly returned to the loader.
> > [See ref: https://software.intel.com/en-us/node/544454]
> >
> > These stalls are due to the way in which the data_len values are handled
> > in the driver. The lengths are extracted using vector operations, but those
> > 16-bit lengths are then assigned using scalar operations i.e. 16-bit
> > stores.
> >
> > These regular 16-bit stores actually have two effects in the code:
> > * they cause the "Loads blocked by Store Forwards" issues reported
> > * they also cause the previous loads in the RX function to actually be a
> > load followed by a store to an address on the stack, because the 16-bit
> > assignment can't be done to an xmm register.
> >
> > By converting the 16-bit stores operations into a sequence of SSE blend
> > operations, we can ensure that the descriptor loads only occur once, and
> > avoid both the additional store and loads from the stack, as well as the
> > stalls due to the second loads being blocked.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
> >
> Self-NAK on this version. The blend instruction used is SSE4.1 so breaks the
> "default" build.
> 
> Two obvious options to fix this:
> 1. Keep the old code with SSE4.1 #ifdefs separating old and new
> 2. Update the vpmd requirement to SSE4.1, and factor that in during runtime
> select of the RX code path.
> 
> Personally, I prefer the second option. Any objections?

+1 for second one.

> 
> /Bruce


More information about the dev mailing list