Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails when there are packets through VM. This is a regression bug: v19.08-rc2 is the first version hit this problem. commit 07efd6ddc0499688eb11ae4866d3532295d6db2b (tag: v19.05, origin/releases) works well commit cc091931dc05212db32ddbd7da3031104ca4963f (tag: v19.08-rc1) works well commit 83a124fb73c50b051ee20ef6b1998c81be7e65df (tag: v19.08-rc2) fail commit 0710d87b7f5d0a2cd01861d44c4689efd4714b5f (tag: v19.08-rc4) fail Steps: 1. Boot testpmd in src and des host, refer to [1] 2. Start VM in guest, refer to [2] 3. Start testpmd in guest and start MoonGen in another host, guest can receive packets, refer to[3] 4. Migrate guest from src to des host, fail. # /bin/virsh migrate --verbose --persistent --live rhel8.0 qemu+ssh://192.168.1.2/system Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2019-08-08T04:53:36.910570Z qemu-system-x86_64: -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser0.sock,server 2019-08-08T04:53:37.158587Z qemu-system-x86_64: -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser1.sock,server 2019-08-08T04:53:52.371518Z qemu-system-x86_64: VQ 0 size 0x400 < last_avail_idx 0x4d6b - used_idx 0x3ffa 2019-08-08T04:53:52.371633Z qemu-system-x86_64: Failed to load virtio-net:virtio 2019-08-08T04:53:52.371639Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0:00.0/virtio-net' 2019-08-08T04:53:52.373273Z qemu-system-x86_64: load of migration failed: Operation not permitted [1] # /home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \ -l 2,4,6,8,10 \ --socket-mem 1024,1024 \ -n 4 \ -d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \ --vdev net_vhost0,iface=/tmp/vhostuser0.sock,queues=1,client=1,iommu-support=1 \ --vdev net_vhost1,iface=/tmp/vhostuser1.sock,queues=1,client=1,iommu-support=1 \ -- \ --portmask=f \ -i \ --rxd=512 --txd=512 \ --rxq=1 --txq=1 \ --nb-cores=4 \ --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start [2] <interface type='vhostuser'> <mac address='18:66:da:5f:dd:02'/> <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:5f:dd:03'/> <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/> <model type='virtio'/> <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/> <alias name='net2'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </interface> [3] /home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \ -l 1,2,3 \ -n 4 \ -d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_virtio.so \ -w 0000:06:00.0 -w 0000:07:00.0 \ -- \ --nb-cores=2 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=1 --txq=1 testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 1381992 RX-missed: 0 RX-bytes: 82919520 RX-errors: 0 RX-nombuf: 0 TX-packets: 1177961 TX-errors: 0 TX-bytes: 70677660 Throughput (since last show) Rx-pps: 0 Tx-pps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 1178923 RX-missed: 0 RX-bytes: 70735380 RX-errors: 0 RX-nombuf: 0 TX-packets: 1381077 TX-errors: 0 TX-bytes: 82864620 Throughput (since last show) Rx-pps: 0 Tx-pps: 0 ############################################################################ testpmd>
Maxime, can you please take a look? Thanks
I have been able to reproduce this issue and bisected it to the following commit: commit bbe29a9bd7ab6feab9a52051c32092a94ee886eb Author: Jerin Jacob <jerinj@marvell.com> Date: Mon Jul 22 14:56:53 2019 +0200 eal/linux: select IOVA as VA mode for default case When bus layer reports the preferred mode as RTE_IOVA_DC then select the RTE_IOVA_VA mode: - All drivers work in RTE_IOVA_VA mode, irrespective of physical address availability. - By default, a mempool asks for IOVA-contiguous memory using RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it may affect the application boot time. Signed-off-by: Jerin Jacob <jerinj@marvell.com> Acked-by: Anatoly Burakov <anatoly.burakov@intel.com> Signed-off-by: David Marchand <david.marchand@redhat.com> This commit only changes the default IOVA mode, from IOVA_PA to IOVA_VA so this is just revealing an underlying problem. Confirmed this by verifying that affected version (19.08) works fine with "--iova-mode pa" fine and stable (18.11) fails in the same manner if "--iova-mode va" is used. Going to qemu, the code that detecting the error is: vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx - vdev->vq[i].used_idx); if (vdev->vq[i].inuse > vdev->vq[i].vring.num) { error_report("VQ %d size 0x%x < last_avail_idx 0x%x - " "used_idx 0x%x", i, vdev->vq[i].vring.num, vdev->vq[i].last_avail_idx, vdev->vq[i].used_idx); return -1; } One of the times I've reproduced it, I looked at the index values on the sending qemu just before sending the vmstates: size 0x100 | last_avail_idx 0x3aa0 | used_idx 0x3aa0 And just after loading the vmstates at the receiving qemu: VQ 0 size 0x100 < last_avail_idx 0x3aa0 - used_idx 0xbda0 At first I suspected an endianes issue but then confirmed that virtio_lduw_phys_cached handles it properly. So, it might be that the memory caches don't get properly synchronized before the migration takes place.
Further research confirms that the vhost backend is not logging the dirty pages properly when IOVA_VA is used. This makes QEMU to send outdated pages to the destination, which causes the failure.