[dpdk-dev] [PATCH 1/3] vhost: pre update used ring for Tx and Rx
Xie, Huawei
huawei.xie at intel.com
Fri Jun 3 10:18:23 CEST 2016
On 6/1/2016 2:53 PM, Yuanhan Liu wrote:
> On Wed, Jun 01, 2016 at 06:40:41AM +0000, Xie, Huawei wrote:
>>> /* Retrieve all of the head indexes first to avoid caching issues. */
>>> for (i = 0; i < count; i++) {
>>> - desc_indexes[i] = vq->avail->ring[(vq->last_used_idx + i) &
>>> - (vq->size - 1)];
>>> + used_idx = (vq->last_used_idx + i) & (vq->size - 1);
>>> + desc_indexes[i] = vq->avail->ring[used_idx];
>>> +
>>> + vq->used->ring[used_idx].id = desc_indexes[i];
>>> + vq->used->ring[used_idx].len = 0;
>>> + vhost_log_used_vring(dev, vq,
>>> + offsetof(struct vring_used, ring[used_idx]),
>>> + sizeof(vq->used->ring[used_idx]));
>>> }
>>>
>>> /* Prefetch descriptor index. */
>>> rte_prefetch0(&vq->desc[desc_indexes[0]]);
>>> - rte_prefetch0(&vq->used->ring[vq->last_used_idx & (vq->size - 1)]);
>>> -
>>> for (i = 0; i < count; i++) {
>>> int err;
>>>
>>> - if (likely(i + 1 < count)) {
>>> + if (likely(i + 1 < count))
>>> rte_prefetch0(&vq->desc[desc_indexes[i + 1]]);
>>> - rte_prefetch0(&vq->used->ring[(used_idx + 1) &
>>> - (vq->size - 1)]);
>>> - }
>>>
>>> pkts[i] = rte_pktmbuf_alloc(mbuf_pool);
>>> if (unlikely(pkts[i] == NULL)) {
>>> @@ -916,18 +920,12 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id,
>>> rte_pktmbuf_free(pkts[i]);
>>> break;
>>> }
>>> -
>>> - used_idx = vq->last_used_idx++ & (vq->size - 1);
>>> - vq->used->ring[used_idx].id = desc_indexes[i];
>>> - vq->used->ring[used_idx].len = 0;
>>> - vhost_log_used_vring(dev, vq,
>>> - offsetof(struct vring_used, ring[used_idx]),
>>> - sizeof(vq->used->ring[used_idx]));
>>> }
>> Had tried post-updating used ring in batch, but forget the perf change.
> I would assume pre-updating gives better performance gain, as we are
> fiddling with avail and used ring together, which would be more cache
> friendly.
The distance between entry for avail ring and used ring are at least 8
cache lines.
The benefit comes from batch updates, if applicable.
>
>> One optimization would be on vhost_log_used_ring.
>> I have two ideas,
>> a) In QEMU side, we always assume use ring will be changed. so that we
>> don't need to log used ring in VHOST.
>>
>> Michael: feasible in QEMU? comments on this?
>>
>> b) We could always mark the total used ring modified rather than entry
>> by entry.
> I doubt it's worthwhile. One fact is that vhost_log_used_ring is
> a non operation in most time: it will take action only in the short
> gap of during live migration.
>
> And FYI, I even tried with all vhost_log_xxx being removed, it showed
> no performance boost at all. Therefore, it's not a factor that will
> impact performance.
I knew this.
> --yliu
>
More information about the dev
mailing list