[dpdk-dev] virtio optimization idea

Xie, Huawei huawei.xie at intel.com
Tue Sep 8 17:52:35 CEST 2015


On 9/8/2015 11:39 PM, Stephen Hemminger wrote:
> On Fri, 4 Sep 2015 08:25:05 +0000
> "Xie, Huawei" <huawei.xie at intel.com> wrote:
>
>> Hi:
>>
>> Recently I have done one virtio optimization proof of concept. The
>> optimization includes two parts:
>> 1) avail ring set with fixed descriptors
>> 2) RX vectorization
>> With the optimizations, we could have several times of performance boost
>> for purely vhost-virtio throughput.
>>
>> Here i will only cover the first part, which is the prerequisite for the
>> second part.
>> Let us first take RX for example. Currently when we fill the avail ring
>> with guest mbuf, we need
>> a) allocate one descriptor(for non sg mbuf) from free descriptors
>> b) set the idx of the desc into the entry of avail ring
>> c) set the addr/len field of the descriptor to point to guest blank mbuf
>> data area
>>
>> Those operation takes time, and especially step b results in modifed (M)
>> state of the cache line for the avail ring in the virtio processing
>> core. When vhost processes the avail ring, the cache line transfer from
>> virtio processing core to vhost processing core takes pretty much CPU
>> cycles.
>> To solve this problem, this is the arrangement of RX ring for DPDK
>> pmd(for non-mergable case).
>>    
>>                     avail                      
>>                     idx                        
>>                     +                          
>>                     |                          
>> +----+----+---+-------------+------+           
>> | 0  | 1  | 2 | ... |  254  | 255  |  avail ring
>> +-+--+-+--+-+-+---------+---+--+---+           
>>   |    |    |       |   |      |               
>>   |    |    |       |   |      |               
>>   v    v    v       |   v      v               
>> +-+--+-+--+-+-+---------+---+--+---+           
>> | 0  | 1  | 2 | ... |  254  | 255  |  desc ring
>> +----+----+---+-------------+------+           
>>                     |                          
>>                     |                          
>> +----+----+---+-------------+------+           
>> | 0  | 1  | 2 |     |  254  | 255  |  used ring
>> +----+----+---+-------------+------+           
>>                     |                          
>>                     +    
>> Avail ring is initialized with fixed descriptor and is never changed,
>> i.e, the index value of the nth avail ring entry is always n, which
>> means virtio PMD is actually refilling desc ring only, without having to
>> change avail ring.
>> When vhost fetches avail ring, if not evicted, it is always in its first
>> level cache.
>>
>> When RX receives packets from used ring, we use the used->idx as the
>> desc idx. This requires that vhost processes and returns descs from
>> avail ring to used ring in order, which is true for both current dpdk
>> vhost and kernel vhost implementation. In my understanding, there is no
>> necessity for vhost net to process descriptors OOO. One case could be
>> zero copy, for example, if one descriptor doesn't meet zero copy
>> requirment, we could directly return it to used ring, earlier than the
>> descriptors in front of it.
>> To enforce this, i want to use a reserved bit to indicate in order
>> processing of descriptors.
>>
>> For tx ring, the arrangement is like below. Each transmitted mbuf needs
>> a desc for virtio_net_hdr, so actually we have only 128 free slots.
>>                                                                                       
>>
>>                            
>> ++                                                          
>>                            
>> ||                                                          
>>                            
>> ||                                                          
>>   
>> +-----+-----+-----+--------------+------+------+------+                              
>>
>>    |  0  |  1  | ... |  127 || 128  | 129  | ...  | 255  |   avail ring
>> with fixed descriptor                
>>   
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                              
>>
>>       |     |            |  ||  |      |            
>> |                                  
>>       v     v            v  ||  v      v            
>> v                                  
>>   
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                              
>>
>>    | 127 | 128 | ... |  255 || 127  | 128  | ...  | 255  |   desc ring
>> for virtio_net_hdr
>>   
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                              
>>
>>       |     |            |  ||  |      |            
>> |                                  
>>       v     v            v  ||  v      v            
>> v                                  
>>   
>> +--+--+--+--+-----+---+------+---+--+---+------+--+---+                              
>>
>>    |  0  |  1  | ... |  127 ||  0   |  1   | ...  | 127  |   desc ring
>> for tx dat       
>>   
>> +-----+-----+-----+--------------+------+------+------+                        
>>
> Does this still work with Linux (or BSD) guest/host.
> If you are assuming both virtio/vhost are DPDK this is never going
> to be usable.
It works with both dpdk vhost and kernel vhost implementations.
But to enforce this, we had better add a new feature bit.
>
> On a related note, have you looked at getting virtio to support the
> new standard (not legacy) mode?
Yes, we add it to our plan to support virtio 1.0.
>
>



More information about the dev mailing list