[dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue function

Jason Wang jasowang at redhat.com
Thu Jul 11 11:54:45 CEST 2019


On 2019/7/11 下午5:49, Liu, Yong wrote:
>
>> -----Original Message-----
>> From: Jason Wang [mailto:jasowang at redhat.com]
>> Sent: Thursday, July 11, 2019 12:11 PM
>> To: Liu, Yong <yong.liu at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>;
>> maxime.coquelin at redhat.com; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast enqueue
>> function
>>
>>
>> On 2019/7/10 下午3:30, Liu, Yong wrote:
>>>> -----Original Message-----
>>>> From: Jason Wang [mailto:jasowang at redhat.com]
>>>> Sent: Wednesday, July 10, 2019 12:28 PM
>>>> To: Liu, Yong <yong.liu at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>;
>>>> maxime.coquelin at redhat.com; dev at dpdk.org
>>>> Subject: Re: [dpdk-dev] [RFC PATCH 02/13] add vhost packed ring fast
>> enqueue
>>>> function
>>>>
>>>>
>>>> On 2019/7/9 上午1:13, Marvin Liu wrote:
>>>>> In fast enqueue function, will first check whether descriptors are
>>>>> cache aligned. Fast enqueue function will check prerequisites in the
>>>>> beginning. Fast enqueue function do not support chained mbufs, normal
>>>>> function will handle that.
>>>>>
>>>>> Signed-off-by: Marvin Liu <yong.liu at intel.com>
>>>> Any reason for not letting compiler to unroll the loops?
>>>>
>>> Hi Jason,
>>> I'm not sure about how much compiler can help on unrolling loops as it
>> can't know how much loops will create in one call.
>>> After force not using unroll-loop optimization by "-fno-unroll-loops",
>> virtio_dev_rx_packed function size remained the same.
>>> So look like gcc unroll-loop optimization do not help here.
>>
>> I meant something like "pragma GCC unroll N" just before the loop you
>> want unrolled.
>>
>> Thanks
>>
> Hi Jason,
> Just tired with gcc8.3.0 and master code, only 0.1Mpps performance gain with "#pragma GCC unroll".
> I think this compiler pragma is not helpful in the big loop which contained so much functions.
>
> Thanks,
> Marvin


Yes, it probably need some trick e.g break the big loop into small ones. 
What I want do here is unroll the loop based on 
PACKED_DESC_PER_CACHELINE instead of a hard-coded 4.

Thanks


>>> And fast enqueue function not only did unroll loop, it also checked cache
>> alignment which can help performance in another side.
>>> Regards,
>>> Marvin
>>>
>>>> Thanks
>>>>


More information about the dev mailing list