[dpdk-dev] virtio optimization idea

Xie, Huawei huawei.xie at intel.com
Tue Sep 8 11:42:27 CEST 2015


On 9/8/2015 4:21 PM, Tetsuya Mukawa wrote:
> On 2015/09/05 1:50, Xie, Huawei wrote:
>> There is some format issue with the ascii chart of the tx ring. Update
>> that chart.
>> Sorry for the trouble.
> Hi XIe,
>
> Thanks for sharing a way to optimize virtio.
> I have a few questions.
>
>> On 9/4/2015 4:25 PM, Xie, Huawei wrote:
>>> Hi:
>>>
>>> Recently I have done one virtio optimization proof of concept. The
>>> optimization includes two parts:
>>> 1) avail ring set with fixed descriptors
>>> 2) RX vectorization
>>> With the optimizations, we could have several times of performance boost
>>> for purely vhost-virtio throughput.
> When you check performance, have you optimized only virtio-net driver?
> If so, can we optimize vhost backend(librte_vhost) also using your
> optimization way?

We could do some optimization to vhost based on the same vring layout,
but as vhost needs to support legacy virtio as well, it couldn't make
this assumption.
>>> Here i will only cover the first part, which is the prerequisite for the
>>> second part.
>>> Let us first take RX for example. Currently when we fill the avail ring
>>> with guest mbuf, we need
>>> a) allocate one descriptor(for non sg mbuf) from free descriptors
>>> b) set the idx of the desc into the entry of avail ring
>>> c) set the addr/len field of the descriptor to point to guest blank mbuf
>>> data area
>>>
>>> Those operation takes time, and especially step b results in modifed (M)
>>> state of the cache line for the avail ring in the virtio processing
>>> core. When vhost processes the avail ring, the cache line transfer from
>>> virtio processing core to vhost processing core takes pretty much CPU
>>> cycles.
>>> To solve this problem, this is the arrangement of RX ring for DPDK
>>> pmd(for non-mergable case).
>>>    
>>>                     avail                      
>>>                     idx                        
>>>                     +                          
>>>                     |                          
>>> +----+----+---+-------------+------+           
>>> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
>>> +-+--+-+--+-+-+---------+---+--+---+           
>>>   |    |    |       |   |      |               
>>>   |    |    |       |   |      |               
>>>   v    v    v       |   v      v               
>>> +-+--+-+--+-+-+---------+---+--+---+           
>>> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
>>> +----+----+---+-------------+------+           
>>>                     |                          
>>>                     |                          
>>> +----+----+---+-------------+------+           
>>> | 0  | 1  | 2 |     |  254  | 255  |  used ring
>>> +----+----+---+-------------+------+           
>>>                     |                          
>>>                     +    
>>> Avail ring is initialized with fixed descriptor and is never changed,
>>> i.e, the index value of the nth avail ring entry is always n, which
>>> means virtio PMD is actually refilling desc ring only, without having to
>>> change avail ring.
> For example, avail ring is like below.
> struct vring_avail {
>         uint16_t flags;
>         uint16_t idx;
>         uint16_t ring[QUEUE_SIZE];
> };
>
> My understanding is that virtio-net driver still needs to change
> avail_ring.idx, but don't need to change avail_ring.ring[].
> Is this correct?

Yes, avail ring is initialized once and never gets updated. It is like
virtio frontend is only using descriptor ring.
>
> Tetsuya
>
>>> When vhost fetches avail ring, if not evicted, it is always in its first
>>> level cache.
>>>
>>> When RX receives packets from used ring, we use the used->idx as the
>>> desc idx. This requires that vhost processes and returns descs from
>>> avail ring to used ring in order, which is true for both current dpdk
>>> vhost and kernel vhost implementation. In my understanding, there is no
>>> necessity for vhost net to process descriptors OOO. One case could be
>>> zero copy, for example, if one descriptor doesn't meet zero copy
>>> requirment, we could directly return it to used ring, earlier than the
>>> descriptors in front of it.
>>> To enforce this, i want to use a reserved bit to indicate in order
>>> processing of descriptors.
>>>
>>> For tx ring, the arrangement is like below. Each transmitted mbuf needs
>>> a desc for virtio_net_hdr, so actually we have only 128 free slots.
>>>                                                                                       
>>>
>>>                            
>>>                             ++                                                           
>>>                             ||                                                           
>>>                             ||                                                           
>>>    +-----+-----+-----+--------------+------+------+------+                               
>>>    |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring                  
>>>    +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
>>>       |     |            |  ||  |      |             |                                   
>>>       v     v            v  ||  v      v             v                                   
>>>    +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
>>>    | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring for virtio_net_hdr
>>>    +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
>>>       |     |            |  ||  |      |             |                                   
>>>       v     v            v  ||  v      v             v                                   
>>>    +--+--+--+--+-----+---+------+---+--+---+------+--+---+                               
>>>    |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring for tx dat                        
>>>
>>>
>>>                      
>>> /huawei
>>>
>



More information about the dev mailing list