[dpdk-dev] virtio optimization idea
Tetsuya Mukawa
mukawa at igel.co.jp
Tue Sep 8 10:21:09 CEST 2015
On 2015/09/05 1:50, Xie, Huawei wrote:
> There is some format issue with the ascii chart of the tx ring. Update
> that chart.
> Sorry for the trouble.
Hi XIe,
Thanks for sharing a way to optimize virtio.
I have a few questions.
>
> On 9/4/2015 4:25 PM, Xie, Huawei wrote:
>> Hi:
>>
>> Recently I have done one virtio optimization proof of concept. The
>> optimization includes two parts:
>> 1) avail ring set with fixed descriptors
>> 2) RX vectorization
>> With the optimizations, we could have several times of performance boost
>> for purely vhost-virtio throughput.
When you check performance, have you optimized only virtio-net driver?
If so, can we optimize vhost backend(librte_vhost) also using your
optimization way?
>>
>> Here i will only cover the first part, which is the prerequisite for the
>> second part.
>> Let us first take RX for example. Currently when we fill the avail ring
>> with guest mbuf, we need
>> a) allocate one descriptor(for non sg mbuf) from free descriptors
>> b) set the idx of the desc into the entry of avail ring
>> c) set the addr/len field of the descriptor to point to guest blank mbuf
>> data area
>>
>> Those operation takes time, and especially step b results in modifed (M)
>> state of the cache line for the avail ring in the virtio processing
>> core. When vhost processes the avail ring, the cache line transfer from
>> virtio processing core to vhost processing core takes pretty much CPU
>> cycles.
>> To solve this problem, this is the arrangement of RX ring for DPDK
>> pmd(for non-mergable case).
>>
>> avail
>> idx
>> +
>> |
>> +----+----+---+-------------+------+
>> | 0 | 1 | 2 | ... | 254 | 255 | avail ring
>> +-+--+-+--+-+-+---------+---+--+---+
>> | | | | | |
>> | | | | | |
>> v v v | v v
>> +-+--+-+--+-+-+---------+---+--+---+
>> | 0 | 1 | 2 | ... | 254 | 255 | desc ring
>> +----+----+---+-------------+------+
>> |
>> |
>> +----+----+---+-------------+------+
>> | 0 | 1 | 2 | | 254 | 255 | used ring
>> +----+----+---+-------------+------+
>> |
>> +
>> Avail ring is initialized with fixed descriptor and is never changed,
>> i.e, the index value of the nth avail ring entry is always n, which
>> means virtio PMD is actually refilling desc ring only, without having to
>> change avail ring.
For example, avail ring is like below.
struct vring_avail {
uint16_t flags;
uint16_t idx;
uint16_t ring[QUEUE_SIZE];
};
My understanding is that virtio-net driver still needs to change
avail_ring.idx, but don't need to change avail_ring.ring[].
Is this correct?
Tetsuya
>> When vhost fetches avail ring, if not evicted, it is always in its first
>> level cache.
>>
>> When RX receives packets from used ring, we use the used->idx as the
>> desc idx. This requires that vhost processes and returns descs from
>> avail ring to used ring in order, which is true for both current dpdk
>> vhost and kernel vhost implementation. In my understanding, there is no
>> necessity for vhost net to process descriptors OOO. One case could be
>> zero copy, for example, if one descriptor doesn't meet zero copy
>> requirment, we could directly return it to used ring, earlier than the
>> descriptors in front of it.
>> To enforce this, i want to use a reserved bit to indicate in order
>> processing of descriptors.
>>
>> For tx ring, the arrangement is like below. Each transmitted mbuf needs
>> a desc for virtio_net_hdr, so actually we have only 128 free slots.
>>
>>
>>
>> ++
>> ||
>> ||
>> +-----+-----+-----+--------------+------+------+------+
>> | 0 | 1 | ... | 127 || 128 | 129 | ... | 255 | avail ring
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>> | | | || | | |
>> v v v || v v v
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>> | 127 | 128 | ... | 255 || 127 | 128 | ... | 255 | desc ring for virtio_net_hdr
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>> | | | || | | |
>> v v v || v v v
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+
>> | 0 | 1 | ... | 127 || 0 | 1 | ... | 127 | desc ring for tx dat
>>
>>
>>
>> /huawei
>>
More information about the dev
mailing list