[dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy

Yuanhan Liu yuanhan.liu at linux.intel.com
Thu Dec 8 03:18:43 CET 2016


On Tue, Dec 06, 2016 at 08:31:06PM -0500, Zhihong Wang wrote:
> This patch optimizes rte_memcpy for well aligned cases, where both
> dst and src addr are aligned to maximum MOV width. It introduces a
> dedicated function called rte_memcpy_aligned to handle the aligned
> cases with simplified instruction stream. The existing rte_memcpy
> is renamed as rte_memcpy_generic. The selection between them 2 is
> done at the entry of rte_memcpy.
> 
> The existing rte_memcpy is for generic cases, it handles unaligned
> copies and make store aligned, it even makes load aligned for micro
> architectures like Ivy Bridge. However alignment handling comes at
> a price: It adds extra load/store instructions, which can cause
> complications sometime.
> 
> DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
> The copy is aligned, and remote, and there is header write along
> which is also remote. In this case the memcpy instruction stream
> should be simplified, to reduce extra load/store, therefore reduce
> the probability of load/store buffer full caused pipeline stall, to
> let the actual memcpy instructions be issued and let H/W prefetcher
> goes to work as early as possible.
> 
> This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
> up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
> from 64 to 1500 bytes.
> 
> The test can also be conducted without NIC, by setting loopback
> traffic between Virtio and Vhost. For example, modify the macro
> TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
> rebuild and start testpmd in both host and guest, then "start" on
> one side and "start tx_first 32" on the other.
> 
> 
> Signed-off-by: Zhihong Wang <zhihong.wang at intel.com>

Reviewed-by: Yuanhan Liu <yuanhan.liu at linux.intel.com>

	--yliu


More information about the dev mailing list