[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx

Xie, Huawei huawei.xie at intel.com
Fri Jun 3 10:18:23 CEST 2016


On 6/1/2016 2:53 PM, Yuanhan Liu wrote:
> On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:
>>>  	/* Retrieve all of the head indexes first to avoid caching issues. */
>>>  	for (i = 0; i < count; i++) {
>>> -		desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
>>> -					(vq->size - 1)];
>>> +		used_idx = (vq->last_used_idx + i) & (vq->size - 1);
>>> +		desc_indexes[i] = vq->avail->ring[used_idx];
>>> +
>>> +		vq->used->ring[used_idx].id  = desc_indexes[i];
>>> +		vq->used->ring[used_idx].len = 0;
>>> +		vhost_log_used_vring(dev, vq,
>>> +				offsetof(struct vring_used, ring[used_idx]),
>>> +				sizeof(vq->used->ring[used_idx]));
>>>  	}
>>>  
>>>  	/* Prefetch descriptor index. */
>>>  	rte_prefetch0(&vq->desc[desc_indexes[0]]);
>>> -	rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
>>> -
>>>  	for (i = 0; i < count; i++) {
>>>  		int err;
>>>  
>>> -		if (likely(i + 1 < count)) {
>>> +		if (likely(i + 1 < count))
>>>  			rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>>> -			rte_prefetch0(&vq->used->ring[(used_idx + 1) &
>>> -						      (vq->size - 1)]);
>>> -		}
>>>  
>>>  		pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
>>>  		if (unlikely(pkts[i] == NULL)) {
>>> @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>>  			rte_pktmbuf_free(pkts[i]);
>>>  			break;
>>>  		}
>>> -
>>> -		used_idx = vq->last_used_idx++ & (vq->size - 1);
>>> -		vq->used->ring[used_idx].id  = desc_indexes[i];
>>> -		vq->used->ring[used_idx].len = 0;
>>> -		vhost_log_used_vring(dev, vq,
>>> -				offsetof(struct vring_used, ring[used_idx]),
>>> -				sizeof(vq->used->ring[used_idx]));
>>>  	}
>> Had tried post-updating used ring in batch,  but forget the perf change.
> I would assume pre-updating gives better performance gain, as we are
> fiddling with avail and used ring together, which would be more cache
> friendly.
 
The distance between entry for avail ring and used ring are at least 8
cache lines.
The benefit comes from batch updates, if applicable.

>
>> One optimization would be on vhost_log_used_ring.
>> I have two ideas,
>> a) In QEMU side, we always assume use ring will be changed. so that we
>> don't need to log used ring in VHOST.
>>
>> Michael: feasible in QEMU? comments on this?
>>
>> b) We could always mark the total used ring modified rather than entry
>> by entry.
> I doubt it's worthwhile. One fact is that vhost_log_used_ring is
> a non operation in most time: it will take action only in the short
> gap of during live migration.
>
> And FYI, I even tried with all vhost_log_xxx being removed, it showed
> no performance boost at all. Therefore, it's not a factor that will
> impact performance.

I knew this.

> 	--yliu
>



More information about the dev mailing list