[dpdk-dev] vhost: virtio-net rx-ring stop work after work many hours, bug?

Xie, Huawei huawei.xie at intel.com
Wed Jan 28 10:51:07 CET 2015



> -----Original Message-----
> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
> Sent: Tuesday, January 27, 2015 3:57 PM
> To: dpd >> dev at dpdk.org; ms >> Michael S. Tsirkin
> Cc: lilijun; liuyongan at huawei.com; Xie, Huawei
> Subject: vhost: virtio-net rx-ring stop work after work many hours,bug?
> 
> Hi,all
> 
> I use vhost-user to send data to VM at first it cant work well but after many
> hours VM can not receive data but can send data.
> 
> (gdb)p avail_idx
> $4 = 2668
> (gdb)p free_entries
> $5 = 0
> (gdb)l
>         /* check that we have enough buffers */
>         if (unlikely(count > free_entries))
>             count = free_entries;
> 
>         if (count == 0){
>             int b=0;
>             if(b) { // when set b=1 to notify guest rx_ring will restart to work
>                 if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT)) {
> 
>                     eventfd_write(vq->callfd, 1);
>                 }
>             }
>             return 0;
>         }
> 
> some info i print in guest:
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1644, avail->idx=1644
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1645, avail->idx=1645
> 
> net eth3:vi->num=199
> net eth3:rvq info: num_free=57, used->idx=2668, avail->idx=2668
> net eth3:svq info: num_free=254, used->idx=1646, avail->idx=1646
> 
> # free
>              total       used       free     shared    buffers     cached
> Mem:      3924100      337252    3586848          0      95984     138060
> -/+ buffers/cache:     103208    3820892
> Swap:       970748          0     970748
> 
> I have two questions:
> 1.Should we need to notify guest when there is no buffer in vq->avail?
> 2.Why virtio_net stop to fill avail?
> 
>

Haifeng:
Thanks for reporting this issue.
It might not be vhost-user specific, because as long vhost-user has received all the vring information correctly, it shares the same code receiving/transmitting packets with vhost-cuse.
Are you using latest patch or the old patch?
1  Do you disable merge-able feature support in vhost example? There is an bug in vhost-user feature negotiation which is fixed in latest patch.  It could cause guest not receive packets at all. So if you are testing only using linux net device, this isn't the cause.
2.Do you still have the spot? Could you check if there are available descriptors from checking the desc ring or even dump the vring status? Check the notify_on_empty flag Michael mentioned?  I find a bug in vhost library when processing three or more chained descriptors. But if you never re-configure eth0 with different features,  this isn't the cause.
3. Is this reproduce-able? Next time if you run long hours stability test, could you try to disable guest virtio feature?
-device virtio-net-pci,netdev=mynet0,mac=54:00:00:54:00:01,csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off

I have run more than ten hours' nightly test many times before, and haven't met this issue. 
We will check * if there is issue in the vhost code delivering interrupts to guest which cause potential deadlock *if there are places we should but miss delivering interrupts to guest.

> 
> 
> 
> 
> --
> Regards,
> Haifeng



More information about the dev mailing list