[dpdk-dev] [PATCH v3 2/4] ixgbe: implement vector PMD for arm architecture

Bruce Richardson bruce.richardson at intel.com
Wed May 25 14:53:33 CEST 2016


On Wed, May 25, 2016 at 05:59:38PM +0530, Jerin Jacob wrote:
> On Fri, May 06, 2016 at 11:55:46AM +0530, Jianbo Liu wrote:
> > use ARM NEON intrinsic to implement ixgbe vPMD
> > 
> > Signed-off-by: Jianbo Liu <jianbo.liu at linaro.org>
> > ---
> >  drivers/net/ixgbe/Makefile              |   4 +
> >  drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 561 ++++++++++++++++++++++++++++++++
> >  2 files changed, 565 insertions(+)
> >  create mode 100644 drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c
> > 
<snip>
> > +	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
> > +			pos += RTE_IXGBE_DESCS_PER_LOOP,
> > +			rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
> > +		uint64x2_t descs[RTE_IXGBE_DESCS_PER_LOOP];
> > +		uint8x16_t pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
> > +		uint8x16x2_t sterr_tmp1, sterr_tmp2;
> > +		uint64x2_t mbp1, mbp2;
> > +		uint8x16_t staterr;
> > +		uint16x8_t tmp;
> > +		uint32_t stat;
> > +
> > +		/* B.1 load 1 mbuf point */
> > +		mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]);
> > +
> > +		/* Read desc statuses backwards to avoid race condition */
> > +		/* A.1 load 4 pkts desc */
> > +		descs[3] =  vld1q_u64((uint64_t *)(rxdp + 3));
> > +		rte_rmb();
> 
> Any specific reason to add rte_rmb() here, If there is no performance
> drop then it makes sense to add before descs[3] uses it.i.e
> at rte_compiler_barrier() place in x86 code.
> 
> > +
> > +		/* B.2 copy 2 mbuf point into rx_pkts  */
> > +		vst1q_u64((uint64_t *)&rx_pkts[pos], mbp1);
> > +
> > +		/* B.1 load 1 mbuf point */
> > +		mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]);
> > +
> > +		descs[2] =  vld1q_u64((uint64_t *)(rxdp + 2));
> > +		/* B.1 load 2 mbuf point */
> > +		descs[1] =  vld1q_u64((uint64_t *)(rxdp + 1));
> > +		descs[0] =  vld1q_u64((uint64_t *)(rxdp));
> > +
> > +		/* B.2 copy 2 mbuf point into rx_pkts  */
> > +		vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2);
> > +
> > +		if (split_packet) {
> > +			rte_prefetch_non_temporal(&rx_pkts[pos]->cacheline1);
> > +			rte_prefetch_non_temporal(&rx_pkts[pos+1]->cacheline1);
> > +			rte_prefetch_non_temporal(&rx_pkts[pos+2]->cacheline1);
> > +			rte_prefetch_non_temporal(&rx_pkts[pos+3]->cacheline1);
> 
> replace with rte_mbuf_prefetch_part2 or equivalent
> 
Hi Jerin, Jianbo,

since this patch has already been applied and these are not critical issues with
it, can a new patch please be submitted to propose these additional changes on
top of what's on next-net now.

Thanks,
/Bruce


More information about the dev mailing list