Bug 337

Summary: Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails: Failed to load virtio-net:virtio
Product: DPDK Reporter: Pei Zhang (pezhang)
Component: vhost/virtioAssignee: Maxime Coquelin (maxime.coquelin)
Status: CONFIRMED ---    
Severity: normal CC: ajit.khaparde, amorenoz
Priority: Normal    
Version: 19.08   
Target Milestone: ---   
Hardware: All   
OS: All   

Description Pei Zhang 2019-08-12 05:10:17 CEST
Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails when there are packets through VM.


This is a regression bug: v19.08-rc2 is the first version hit this problem.

commit 07efd6ddc0499688eb11ae4866d3532295d6db2b (tag: v19.05, origin/releases)  works well

commit cc091931dc05212db32ddbd7da3031104ca4963f (tag: v19.08-rc1)               works well

commit 83a124fb73c50b051ee20ef6b1998c81be7e65df (tag: v19.08-rc2)               fail

commit 0710d87b7f5d0a2cd01861d44c4689efd4714b5f (tag: v19.08-rc4)               fail


Steps:

1. Boot testpmd in src and des host, refer to [1]

2. Start VM in guest, refer to [2]

3. Start testpmd in guest and start MoonGen in another host, guest can receive packets, refer to[3]

4. Migrate guest from src to des host, fail.

# /bin/virsh migrate --verbose --persistent --live rhel8.0 qemu+ssh://192.168.1.2/system
Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2019-08-08T04:53:36.910570Z qemu-system-x86_64: -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser0.sock,server
2019-08-08T04:53:37.158587Z qemu-system-x86_64: -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser1.sock,server
2019-08-08T04:53:52.371518Z qemu-system-x86_64: VQ 0 size 0x400 < last_avail_idx 0x4d6b - used_idx 0x3ffa
2019-08-08T04:53:52.371633Z qemu-system-x86_64: Failed to load virtio-net:virtio
2019-08-08T04:53:52.371639Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0:00.0/virtio-net'
2019-08-08T04:53:52.373273Z qemu-system-x86_64: load of migration failed: Operation not permitted


[1]
# /home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 2,4,6,8,10 \
--socket-mem 1024,1024 \
-n 4 \
-d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \
--vdev net_vhost0,iface=/tmp/vhostuser0.sock,queues=1,client=1,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhostuser1.sock,queues=1,client=1,iommu-support=1 \
-- \
--portmask=f \
-i \
--rxd=512 --txd=512 \
--rxq=1 --txq=1 \
--nb-cores=4 \
--forward-mode=io

testpmd> set portlist 0,2,1,3
testpmd> start

[2]
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:02'/>
      <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:03'/>
      <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>


[3]
/home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 1,2,3 \
-n 4 \
-d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_virtio.so \
-w 0000:06:00.0 -w 0000:07:00.0 \
-- \
--nb-cores=2 \
-i \
--disable-rss \
--rxd=512 --txd=512 \
--rxq=1 --txq=1

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1381992    RX-missed: 0          RX-bytes:  82919520
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1177961    TX-errors: 0          TX-bytes:  70677660

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 1178923    RX-missed: 0          RX-bytes:  70735380
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1381077    TX-errors: 0          TX-bytes:  82864620

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################
testpmd>
Comment 1 Ajit Khaparde 2019-08-15 02:51:18 CEST
Maxime, can you please take a look? Thanks
Comment 2 Adrian Moreno 2019-09-17 09:41:51 CEST
I have been able to reproduce this issue and bisected it to the following commit:

commit bbe29a9bd7ab6feab9a52051c32092a94ee886eb
Author: Jerin Jacob <jerinj@marvell.com>
Date:   Mon Jul 22 14:56:53 2019 +0200

    eal/linux: select IOVA as VA mode for default case
    
    When bus layer reports the preferred mode as RTE_IOVA_DC then
    select the RTE_IOVA_VA mode:
    
    - All drivers work in RTE_IOVA_VA mode, irrespective of physical
    address availability.
    
    - By default, a mempool asks for IOVA-contiguous memory using
    RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
    may affect the application boot time.
    
    Signed-off-by: Jerin Jacob <jerinj@marvell.com>
    Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
    Signed-off-by: David Marchand <david.marchand@redhat.com>

This commit only changes the default IOVA mode, from IOVA_PA to IOVA_VA so this is just revealing an underlying problem.
Confirmed this by verifying that affected version (19.08) works fine with "--iova-mode pa" fine and stable (18.11) fails in the same manner if "--iova-mode va" is used.

Going to qemu, the code that detecting the error is:

           vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
                                vdev->vq[i].used_idx);
            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
                             "used_idx 0x%x",
                             i, vdev->vq[i].vring.num,
                             vdev->vq[i].last_avail_idx,
                             vdev->vq[i].used_idx);
                return -1;
            }

One of the times I've reproduced it, I looked at the index values on the sending qemu just before sending the vmstates:
size 0x100 | last_avail_idx 0x3aa0 | used_idx 0x3aa0
And just after loading the vmstates at the receiving qemu:
VQ 0 size 0x100 < last_avail_idx 0x3aa0 - used_idx 0xbda0

At first I suspected an endianes issue but then confirmed that virtio_lduw_phys_cached handles it properly.

So, it might be that the memory caches don't get properly synchronized before the migration takes place.
Comment 3 Adrian Moreno 2019-09-17 09:44:08 CEST
Further research confirms that the vhost backend is not logging the dirty pages properly when IOVA_VA is used.
This makes QEMU to send outdated pages to the destination, which causes the failure.