[dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset and related test

Maxime Coquelin maxime.coquelin at redhat.com
Tue Dec 6 09:29:37 CET 2016



On 12/06/2016 07:33 AM, Yang, Zhiyong wrote:
> Hi, Maxime:
>
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
>> Sent: Friday, December 2, 2016 6:01 PM
>> To: Yang, Zhiyong <zhiyong.yang at intel.com>; dev at dpdk.org
>> Cc: yuanhan.liu at linux.intel.com; Richardson, Bruce
>> <bruce.richardson at intel.com>; Ananyev, Konstantin
>> <konstantin.ananyev at intel.com>
>> Subject: Re: [dpdk-dev] [PATCH 0/4] eal/common: introduce rte_memset
>> and related test
>>
>> Hi Zhiyong,
>>
>> On 12/05/2016 09:26 AM, Zhiyong Yang wrote:
>>> DPDK code has met performance drop badly in some case when calling
>>> glibc function memset. Reference to discussions about memset in
>>> http://dpdk.org/ml/archives/dev/2016-October/048628.html
>>> It is necessary to introduce more high efficient function to fix it.
>>> One important thing about rte_memset is that we can get clear control
>>> on what instruction flow is used.
>>>
>>> This patchset introduces rte_memset to bring more high efficient
>>> implementation, and will bring obvious perf improvement, especially
>>> for small N bytes in the most application scenarios.
>>>
>>> Patch 1 implements rte_memset in the file rte_memset.h on IA platform
>>> The file supports three types of instruction sets including sse & avx
>>> (128bits), avx2(256bits) and avx512(512bits). rte_memset makes use of
>>> vectorization and inline function to improve the perf on IA. In
>>> addition, cache line and memory alignment are fully taken into
>> consideration.
>>>
>>> Patch 2 implements functional autotest to validates the function
>>> whether to work in a right way.
>>>
>>> Patch 3 implements performance autotest separately in cache and memory.
>>>
>>> Patch 4 Using rte_memset instead of copy_virtio_net_hdr can bring
>>> 3%~4% performance improvements on IA platform from virtio/vhost
>>> non-mergeable loopback testing.
>>>
>>> Zhiyong Yang (4):
>>>   eal/common: introduce rte_memset on IA platform
>>>   app/test: add functional autotest for rte_memset
>>>   app/test: add performance autotest for rte_memset
>>>   lib/librte_vhost: improve vhost perf using rte_memset
>>>
>>>  app/test/Makefile                                  |   3 +
>>>  app/test/test_memset.c                             | 158 +++++++++
>>>  app/test/test_memset_perf.c                        | 348 +++++++++++++++++++
>>>  doc/guides/rel_notes/release_17_02.rst             |  11 +
>>>  .../common/include/arch/x86/rte_memset.h           | 376
>> +++++++++++++++++++++
>>>  lib/librte_eal/common/include/generic/rte_memset.h |  51 +++
>>>  lib/librte_vhost/virtio_net.c                      |  18 +-
>>>  7 files changed, 958 insertions(+), 7 deletions(-)  create mode
>>> 100644 app/test/test_memset.c  create mode 100644
>>> app/test/test_memset_perf.c  create mode 100644
>>> lib/librte_eal/common/include/arch/x86/rte_memset.h
>>>  create mode 100644
>> lib/librte_eal/common/include/generic/rte_memset.h
>>>
>>
>> Thanks for the series, idea looks good to me.
>>
>> Wouldn't be worth to also use rte_memset in Virtio PMD (not
>> compiled/tested)? :
>>
>
> I think  rte_memset  maybe can bring some benefit here,  but , I'm not clear how to
> enter the branch and test it. :)

Indeed, you will need Pierre's patch:
[dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set

Thanks,
Maxime
>
> thanks
> Zhiyong
>
>> diff --git a/drivers/net/virtio/virtio_rxtx.c
>> b/drivers/net/virtio/virtio_rxtx.c
>> index 22d97a4..a5f70c4 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -287,7 +287,7 @@ virtqueue_enqueue_xmit(struct virtnet_tx *txvq,
>> struct rte_mbuf *cookie,
>>                          rte_pktmbuf_prepend(cookie, head_size);
>>                  /* if offload disabled, it is not zeroed below, do it now */
>>                  if (offload == 0)
>> -                       memset(hdr, 0, head_size);
>> +                       rte_memset(hdr, 0, head_size);
>>          } else if (use_indirect) {
>>                  /* setup tx ring slot to point to indirect
>>                   * descriptor list stored in reserved region.
>>
>> Cheers,
>> Maxime


More information about the dev mailing list