[dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx

Vladyslav Buslov Vladyslav.Buslov at harmonicinc.com
Tue Nov 15 14:27:16 CET 2016


> -----Original Message-----
> From: Ferruh Yigit [mailto:ferruh.yigit at intel.com]
> Sent: Tuesday, November 15, 2016 2:19 PM
> To: Ananyev, Konstantin; Richardson, Bruce
> Cc: Vladyslav Buslov; Wu, Jingjing; Zhang, Helin; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch
> instructions for bulk rx
> 
> On 10/13/2016 11:30 AM, Ananyev, Konstantin wrote:
> 
> <...>
> 
> >>>>
> >>>> Actually I can see some valid use cases where it is beneficial to have this
> prefetch in driver.
> >>>> In our sw distributor case it is trivial to just prefetch next packet on
> each iteration because packets are processed one by one.
> >>>> However when we move this functionality to hw by means of
> >>>> RSS/vfunction/FlowDirector(our long term goal) worker threads will
> >> receive
> >>>> packets directly from rx queues of NIC.
> >>>> First operation of worker thread is to perform bulk lookup in hash
> >>>> table by destination MAC. This will cause cache miss on accessing
> >> each
> >>>> eth header and can't be easily mitigated in application code.
> >>>> I assume it is ubiquitous use case for DPDK.
> >>>
> >>> Yes it is a quite common use-case.
> >>> Though I many cases it is possible to reorder user code to hide (or
> minimize) that data-access latency.
> >>> From other side there are scenarios where this prefetch is excessive and
> can cause some drop in performance.
> >>> Again, as I know, none of PMDs for Intel devices prefetches packet's
> data in  simple (single segment) RX mode.
> >>> Another thing that some people may argue then - why only one cache
> >>> line is prefetched, in some use-cases might need to look at 2-nd one.
> >>>
> >> There is a build-time config setting for this behaviour for exactly
> >> the reasons called out here - in some apps you get a benefit, in
> >> others you see a perf hit. The default is "on", which makes sense for most
> cases, I think.
> >> From common_base:
> >>
> >> CONFIG_RTE_PMD_PACKET_PREFETCH=y$
> >
> > Yes, but right now i40e and ixgbe non-scattered RX (both vector and scalar)
> just ignore that flag.
> > Though yes, might be a good thing to make them to obey that flag
> properly.
> 
> Hi Vladyslav,
> 
> According Konstantin's comment, what do you think updating patch to do
> prefetch within CONFIG_RTE_PMD_PACKET_PREFETCH ifdef?
> 
> But since config option is enabled by default, performance concern is still
> valid and needs to be investigated.
> 
> Thanks,
> ferruh

Hi Ferruh,

I'll update my patch according to code review suggestions.

Regards,
Vlad


More information about the dev mailing list