[dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy
thomas.monjalon at 6wind.com
Tue Jan 17 16:08:42 CET 2017
2016-12-08 10:18, Yuanhan Liu:
> On Tue, Dec 06, 2016 at 08:31:06PM -0500, Zhihong Wang wrote:
> > This patch optimizes rte_memcpy for well aligned cases, where both
> > dst and src addr are aligned to maximum MOV width. It introduces a
> > dedicated function called rte_memcpy_aligned to handle the aligned
> > cases with simplified instruction stream. The existing rte_memcpy
> > is renamed as rte_memcpy_generic. The selection between them 2 is
> > done at the entry of rte_memcpy.
> > The existing rte_memcpy is for generic cases, it handles unaligned
> > copies and make store aligned, it even makes load aligned for micro
> > architectures like Ivy Bridge. However alignment handling comes at
> > a price: It adds extra load/store instructions, which can cause
> > complications sometime.
> > DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
> > The copy is aligned, and remote, and there is header write along
> > which is also remote. In this case the memcpy instruction stream
> > should be simplified, to reduce extra load/store, therefore reduce
> > the probability of load/store buffer full caused pipeline stall, to
> > let the actual memcpy instructions be issued and let H/W prefetcher
> > goes to work as early as possible.
> > This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
> > up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
> > from 64 to 1500 bytes.
> > The test can also be conducted without NIC, by setting loopback
> > traffic between Virtio and Vhost. For example, modify the macro
> > TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
> > rebuild and start testpmd in both host and guest, then "start" on
> > one side and "start tx_first 32" on the other.
> > Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>
> Reviewed-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>
More information about the dev