Bug 337 - Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails: Failed to load virtio-net:virtio
Summary: Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails: Failed to ...
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: vhost/virtio (show other bugs)
Version: 19.08
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: Maxime Coquelin
URL:
Depends on:
Blocks:
 
Reported: 2019-08-12 05:10 CEST by Pei Zhang
Modified: 2019-09-17 09:44 CEST (History)
2 users (show)



Attachments

Description Pei Zhang 2019-08-12 05:10:17 CEST
Live migration with dpdk(in host)+vhost-user+dpdk(in guest) fails when there are packets through VM.


This is a regression bug: v19.08-rc2 is the first version hit this problem.

commit 07efd6ddc0499688eb11ae4866d3532295d6db2b (tag: v19.05, origin/releases)  works well

commit cc091931dc05212db32ddbd7da3031104ca4963f (tag: v19.08-rc1)               works well

commit 83a124fb73c50b051ee20ef6b1998c81be7e65df (tag: v19.08-rc2)               fail

commit 0710d87b7f5d0a2cd01861d44c4689efd4714b5f (tag: v19.08-rc4)               fail


Steps:

1. Boot testpmd in src and des host, refer to [1]

2. Start VM in guest, refer to [2]

3. Start testpmd in guest and start MoonGen in another host, guest can receive packets, refer to[3]

4. Migrate guest from src to des host, fail.

# /bin/virsh migrate --verbose --persistent --live rhel8.0 qemu+ssh://192.168.1.2/system
Migration: [100 %]error: internal error: qemu unexpectedly closed the monitor: 2019-08-08T04:53:36.910570Z qemu-system-x86_64: -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser0.sock,server
2019-08-08T04:53:37.158587Z qemu-system-x86_64: -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server: info: QEMU waiting for connection on: disconnected:unix:/tmp/vhostuser1.sock,server
2019-08-08T04:53:52.371518Z qemu-system-x86_64: VQ 0 size 0x400 < last_avail_idx 0x4d6b - used_idx 0x3ffa
2019-08-08T04:53:52.371633Z qemu-system-x86_64: Failed to load virtio-net:virtio
2019-08-08T04:53:52.371639Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:06.0:00.0/virtio-net'
2019-08-08T04:53:52.373273Z qemu-system-x86_64: load of migration failed: Operation not permitted


[1]
# /home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 2,4,6,8,10 \
--socket-mem 1024,1024 \
-n 4 \
-d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_vhost.so \
--vdev net_vhost0,iface=/tmp/vhostuser0.sock,queues=1,client=1,iommu-support=1 \
--vdev net_vhost1,iface=/tmp/vhostuser1.sock,queues=1,client=1,iommu-support=1 \
-- \
--portmask=f \
-i \
--rxd=512 --txd=512 \
--rxq=1 --txq=1 \
--nb-cores=4 \
--forward-mode=io

testpmd> set portlist 0,2,1,3
testpmd> start

[2]
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:02'/>
      <source type='unix' path='/tmp/vhostuser0.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:5f:dd:03'/>
      <source type='unix' path='/tmp/vhostuser1.sock' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' rx_queue_size='1024' iommu='on' ats='on'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>


[3]
/home/dpdk-latest/x86_64-native-linuxapp-gcc/build/app/test-pmd/testpmd \
-l 1,2,3 \
-n 4 \
-d /home/dpdk-latest/x86_64-native-linuxapp-gcc/lib/librte_pmd_virtio.so \
-w 0000:06:00.0 -w 0000:07:00.0 \
-- \
--nb-cores=2 \
-i \
--disable-rss \
--rxd=512 --txd=512 \
--rxq=1 --txq=1

testpmd> show port stats all 

  ######################## NIC statistics for port 0  ########################
  RX-packets: 1381992    RX-missed: 0          RX-bytes:  82919520
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1177961    TX-errors: 0          TX-bytes:  70677660

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################

  ######################## NIC statistics for port 1  ########################
  RX-packets: 1178923    RX-missed: 0          RX-bytes:  70735380
  RX-errors: 0
  RX-nombuf:  0         
  TX-packets: 1381077    TX-errors: 0          TX-bytes:  82864620

  Throughput (since last show)
  Rx-pps:            0
  Tx-pps:            0
  ############################################################################
testpmd>
Comment 1 Ajit Khaparde 2019-08-15 02:51:18 CEST
Maxime, can you please take a look? Thanks
Comment 2 Adrian Moreno 2019-09-17 09:41:51 CEST
I have been able to reproduce this issue and bisected it to the following commit:

commit bbe29a9bd7ab6feab9a52051c32092a94ee886eb
Author: Jerin Jacob <jerinj@marvell.com>
Date:   Mon Jul 22 14:56:53 2019 +0200

    eal/linux: select IOVA as VA mode for default case
    
    When bus layer reports the preferred mode as RTE_IOVA_DC then
    select the RTE_IOVA_VA mode:
    
    - All drivers work in RTE_IOVA_VA mode, irrespective of physical
    address availability.
    
    - By default, a mempool asks for IOVA-contiguous memory using
    RTE_MEMZONE_IOVA_CONTIG. This is slow in RTE_IOVA_PA mode and it
    may affect the application boot time.
    
    Signed-off-by: Jerin Jacob <jerinj@marvell.com>
    Acked-by: Anatoly Burakov <anatoly.burakov@intel.com>
    Signed-off-by: David Marchand <david.marchand@redhat.com>

This commit only changes the default IOVA mode, from IOVA_PA to IOVA_VA so this is just revealing an underlying problem.
Confirmed this by verifying that affected version (19.08) works fine with "--iova-mode pa" fine and stable (18.11) fails in the same manner if "--iova-mode va" is used.

Going to qemu, the code that detecting the error is:

           vdev->vq[i].inuse = (uint16_t)(vdev->vq[i].last_avail_idx -
                                vdev->vq[i].used_idx);
            if (vdev->vq[i].inuse > vdev->vq[i].vring.num) {
                error_report("VQ %d size 0x%x < last_avail_idx 0x%x - "
                             "used_idx 0x%x",
                             i, vdev->vq[i].vring.num,
                             vdev->vq[i].last_avail_idx,
                             vdev->vq[i].used_idx);
                return -1;
            }

One of the times I've reproduced it, I looked at the index values on the sending qemu just before sending the vmstates:
size 0x100 | last_avail_idx 0x3aa0 | used_idx 0x3aa0
And just after loading the vmstates at the receiving qemu:
VQ 0 size 0x100 < last_avail_idx 0x3aa0 - used_idx 0xbda0

At first I suspected an endianes issue but then confirmed that virtio_lduw_phys_cached handles it properly.

So, it might be that the memory caches don't get properly synchronized before the migration takes place.
Comment 3 Adrian Moreno 2019-09-17 09:44:08 CEST
Further research confirms that the vhost backend is not logging the dirty pages properly when IOVA_VA is used.
This makes QEMU to send outdated pages to the destination, which causes the failure.

Note You need to log in before you can comment on or make changes to this bug.