[dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx

Wu, Jingjing jingjing.wu at intel.com
Mon Oct 10 15:25:57 CEST 2016



> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Wednesday, September 14, 2016 9:25 PM
> To: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com>; Zhang, Helin
> <helin.zhang at intel.com>; Wu, Jingjing <jingjing.wu at intel.com>
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch instructions for bulk rx
> 
> On 7/14/2016 6:27 PM, Vladyslav Buslov wrote:
> > Added prefetch of first packet payload cacheline in i40e_rx_scan_hw_ring
> > Added prefetch of second mbuf cacheline in i40e_rx_alloc_bufs
> >
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com>
> > ---
> >  drivers/net/i40e/i40e_rxtx.c | 7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c
> > index d3cfb98..e493fb4 100644
> > --- a/drivers/net/i40e/i40e_rxtx.c
> > +++ b/drivers/net/i40e/i40e_rxtx.c
> > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
> >                 /* Translate descriptor info to mbuf parameters */
> >                 for (j = 0; j < nb_dd; j++) {
> >                         mb = rxep[j].mbuf;
> > +                       rte_prefetch0(RTE_PTR_ADD(mb->buf_addr,
> RTE_PKTMBUF_HEADROOM));

Why did prefetch here? I think if application need to deal with packet, it is more suitable to put it in application.

> >                         qword1 = rte_le_to_cpu_64(\
> >                                 rxdp[j].wb.qword1.status_error_len);
> >                         pkt_len = ((qword1 &
> I40E_RXD_QW1_LENGTH_PBUF_MASK) >>
> > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue *rxq)
> >
> >         rxdp = &rxq->rx_ring[alloc_idx];
> >         for (i = 0; i < rxq->rx_free_thresh; i++) {
> > -               if (likely(i < (rxq->rx_free_thresh - 1)))
> > +               if (likely(i < (rxq->rx_free_thresh - 1))) {
> >                         /* Prefetch next mbuf */
> > -                       rte_prefetch0(rxep[i + 1].mbuf);
> > +                       rte_prefetch0(&rxep[i + 1].mbuf->cacheline0);
> > +                       rte_prefetch0(&rxep[i + 1].mbuf->cacheline1);
> > +               }
Agree with this change. And when I test it by testpmd with iofwd, no performance increase is observed but minor decrease.
Can you share will us when it will benefit the performance in your scenario ? 


Thanks
Jingjing


More information about the dev mailing list