[dpdk-dev] virtio-pci and vdev=eth_pcap combined lead to a hang on virtnet_set_rx_mode (lcore-slave)

Christian Ehrhardt christian.ehrhardt at canonical.com
Fri Mar 18 09:36:03 CET 2016


Hi,
TL;DR
When using --vdev=eth_pcap0 on testpmd "quit" the guest runs into a blocked
cpu on virtnet_set_rx_mode (lcore-slave).
Using these devices actually appers to be fine, but on the release path
(testpmd quit) the issue triggers.
Maybe virtio-pci and vdev=eth_pcap are just incompatible, but then there
could be more graceful exits :-)


Background:
I was "reusing" a testpmd commandline I had used on a physical system on a
KVM guest.
Doing so caused weird hangs.
Unfortunately I didn't have the time to debug it down to a fix, but along
the way of checking what actually was going on I collected quite a few logs
that I wanted to share.
If anybody recognizes (or even debugs and fixes) it right away great, if
not it will at least be "google'able" by the next with a similar issue.


Steps to reproduce:
Failing Commandline:
sudo /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048
--vdev=eth_pcap0,iface=eth1 --vdev=eth_pcap1,iface=eth2 -- --interactive
--total-num-mbufs=2048

Note: Working Commandline is the same just without vdev's
sudo /usr/bin/testpmd --pci-blacklist 0000:00:03.0 --socket-mem 2048 --
--interactive --total-num-mbufs=2048

The blacklist on 03.0 is the device that I use to run ssh into the guest.


Setup:
(The setup of the physical system is equivalent, just ixgbe cards instead
of virtio)
- DPDK 2.2 / Qemu 2.5
- guest has 3 virtio network devices
- one to actually connect to it via ssh
- two to be dedicated for dpdk experiments
- I kept them at the virtio-pci driver, dpdk_nic_bind status:
  Network devices using kernel driver
  ===================================
  0000:00:03.0 'Virtio network device' if= drv=virtio-pci unused=
  0000:00:04.0 'Virtio network device' if= drv=virtio-pci unused=
  0000:00:05.0 'Virtio network device' if= drv=virtio-pci unused=
- ~3G hugepages are allocated (in 1536 2M hugepages)
- 4 virtual CPUs in the guest
- virtio-net devices have 4 queues each, combined with ethtool
  ethtool -L eth1
  current values: tx 0 rx 0 other 0 combined 4


After start testpmd initialization is ok
- got into interactive shell of testpmd, all fine
- ping Host to guest main dev still ok after testpmd start, protected by
blacklist (expected)
- ping Host to guest dpdk used dev stops (expected)

Nothing else interesting on journal / console of host or guest
Checking config from testpmds interactive console also didn't show anything
obvious to me.


Run "start tx_first"
- still all fine; ping to main console and guest ssh connectivity ok.
- host saw something happening on one of the dpdk ports, but nothing
stopped any active conenction
  kernel: localbridge0: port 2(vnet1) neighbor 8000.52:54:00:0f:a3:5f lost
  kernel: localbridge0: topology change detected, propagating


But then "quit" in testpmd seems to upset something in the guest
>From the quit itselt I get a hanging:
testpmd> quit
Stopping port 0...done
Stopping port 1...done
Stopping port 2...done
Stopping port 3...done
bye...
(hangs here)

The guest console at the same time triggers a hang check
[ 1712.124021] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[lcore-slave-1:5653]


The guests journal list the soft lockup deatils as:
dpdk-guest1 kernel: device eth2 left promiscuous mode
dpdk-guest1 kernel: NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s!
[lcore-slave-1:5653]
dpdk-guest1 kernel: Modules linked in: xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc
iptable_filter ip_tables x_tables isofs ppdev kvm_intel kvm irqbypass
crct10dif_pclmul crc32_pclmul joydev serio_raw parport_pc parport iscsi_tcp
libiscsi_tcp libiscsi scsi_transport_iscsi autofs4 btrfs raid10 raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
libcrc32c raid1 raid0 multipath linear aesni_intel psmouse aes_x86_64
glue_helper lrw gf128mul ablk_helper cryptd floppy
dpdk-guest1 kernel: CPU: 1 PID: 5653 Comm: lcore-slave-1 Not tainted
4.4.0-13-generic #29-Ubuntu
dpdk-guest1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
dpdk-guest1 kernel: task: ffff8801b7626040 ti: ffff8800ba944000 task.ti:
ffff8800ba944000
dpdk-guest1 kernel: RIP: 0010:[<ffffffff815eaa13>]  [<ffffffff815eaa13>]
virtnet_send_command+0xf3/0x150
dpdk-guest1 kernel: RSP: 0018:ffff8800ba947a20  EFLAGS: 00000246
dpdk-guest1 kernel: RAX: 0000000000000000 RBX: ffff8800bbb0e840 RCX:
ffff8801b658d000
dpdk-guest1 kernel: RDX: 000000000000c050 RSI: ffff8800ba947a24 RDI:
ffff8800bbaf0c00
dpdk-guest1 kernel: RBP: ffff8800ba947ab8 R08: 0000000000000004 R09:
ffff8801b9001b00
dpdk-guest1 kernel: R10: ffff8800ba676340 R11: ffff8801b76d1000 R12:
0000000000000002
dpdk-guest1 kernel: R13: ffff8800ba947a48 R14: 0000000000000000 R15:
0000000000000000
dpdk-guest1 kernel: FS:  00007f2a7a9fe700(0000) GS:ffff8801bfc80000(0000)
knlGS:0000000000000000
dpdk-guest1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
dpdk-guest1 kernel: CR2: 00007fed60775000 CR3: 0000000001e0a000 CR4:
00000000001406e0
dpdk-guest1 kernel: Stack:
dpdk-guest1 kernel:  0000000000000000 ffffea0002eec382 0000000200000940
0000000000000000
dpdk-guest1 kernel:  0000000000000000 ffffea0002eec382 0000000100000942
0000000000000000
dpdk-guest1 kernel:  0000000000000000 ffff8800ba947a28 ffff8800ba947ac8
ffff8800ba947a48
dpdk-guest1 kernel: Call Trace:
dpdk-guest1 kernel:  [<ffffffff815eaf25>] virtnet_set_rx_mode+0xb5/0x2f0
dpdk-guest1 kernel:  [<ffffffff8171625f>] __dev_set_rx_mode+0x3f/0x90
dpdk-guest1 kernel:  [<ffffffff817162d6>] dev_set_rx_mode+0x26/0x40
dpdk-guest1 kernel:  [<ffffffff817164fc>] dev_set_promiscuity+0x3c/0x50
dpdk-guest1 kernel:  [<ffffffff817fc517>] packet_dev_mc+0x77/0x90
dpdk-guest1 kernel:  [<ffffffff817fe227>] packet_release+0x147/0x360
dpdk-guest1 kernel:  [<ffffffff816f53ef>] sock_release+0x1f/0x80
dpdk-guest1 kernel:  [<ffffffff816f5462>] sock_close+0x12/0x20
dpdk-guest1 kernel:  [<ffffffff8120ccf4>] __fput+0xe4/0x220
dpdk-guest1 kernel:  [<ffffffff8120ce6e>] ____fput+0xe/0x10
dpdk-guest1 kernel:  [<ffffffff8109d803>] task_work_run+0x73/0x90
dpdk-guest1 kernel:  [<ffffffff81082a54>] do_exit+0x2e4/0xae0
dpdk-guest1 kernel:  [<ffffffff810832d3>] do_group_exit+0x43/0xb0
dpdk-guest1 kernel:  [<ffffffff8108f392>] get_signal+0x292/0x600
dpdk-guest1 kernel:  [<ffffffff81017517>] do_signal+0x37/0x6f0
dpdk-guest1 kernel:  [<ffffffff8100320c>] exit_to_usermode_loop+0x8c/0xd0
dpdk-guest1 kernel:  [<ffffffff81003c5e>] syscall_return_slowpath+0x4e/0x60
dpdk-guest1 kernel:  [<ffffffff8181ced0>] int_ret_from_sys_call+0x25/0x8f
dpdk-guest1 kernel: Code: 44 89 e2 4c 89 6c c5 b0 e8 9b de ec ff 48 8b 7b
08 e8 52 de ec ff 84 c0 75 11 eb 24 48 8b 7b 08 e8 33 d9 ec ff 84 c0 75 17
f3 90 <48> 8b 7b 08 48 8d b5 6c ff ff ff e8 ad e2 ec ff 48 85 c0 74 dc

Afterwards this hang continues to catch reocurring
- "NMI watchdog: BUG: soft lockup"
- "rcu_sched detected stalls on CPUs/tasks"

The backtrace matches the former message of "device eth2 left promiscuous
mode".
It seems something on the path of "giving back" the device that kills it.


### More Logs ###


# EAL Log so far after initialization: #
dpdk-guest1 testpmd[5650]: EAL: TSC frequency is ~2397227 KHz
dpdk-guest1 testpmd[5650]: EAL: WARNING: cpu flags constant_tsc=yes
nonstop_tsc=no -> using unreliable clock cycles !
dpdk-guest1 testpmd[5650]: EAL: Master lcore 0 is ready
(tid=dfc058c0;cpuset=[0])
dpdk-guest1 testpmd[5650]: PMD: Initializing pmd_pcap for eth_pcap0
dpdk-guest1 kernel: device eth1 entered promiscuous mode
dpdk-guest1 testpmd[5650]: PMD: Creating pcap-backed ethdev on numa socket 0
dpdk-guest1 testpmd[5650]: PMD: Initializing pmd_pcap for eth_pcap1
dpdk-guest1 kernel: device eth2 entered promiscuous mode
dpdk-guest1 testpmd[5650]: PMD: Creating pcap-backed ethdev on numa socket 0
dpdk-guest1 testpmd[5650]: EAL: lcore 3 is ready (tid=799fc700;cpuset=[3])
dpdk-guest1 testpmd[5650]: EAL: lcore 1 is ready (tid=7a9fe700;cpuset=[1])
dpdk-guest1 testpmd[5650]: EAL: lcore 2 is ready (tid=7a1fd700;cpuset=[2])
dpdk-guest1 testpmd[5650]: EAL: PCI device 0000:00:03.0 on NUMA socket -1
dpdk-guest1 testpmd[5650]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
dpdk-guest1 testpmd[5650]: EAL:   Device is blacklisted, not initializing
dpdk-guest1 testpmd[5650]: EAL: PCI device 0000:00:04.0 on NUMA socket -1
dpdk-guest1 testpmd[5650]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
dpdk-guest1 testpmd[5650]: EAL: PCI device 0000:00:05.0 on NUMA socket -1
dpdk-guest1 testpmd[5650]: EAL:   probe driver: 1af4:1000 rte_virtio_pmd
dpdk-guest1 testpmd[5650]: PMD: gntalloc: ioctl error

=> All fine, the gntalloc is no issue (avoided by the testpmd patch I
submitted a while ago)


# Config info from testpmd's console: #
testpmd> show config cores
List of forwarding lcores:  1  2  3
testpmd> show config fwd
io packet forwarding - ports=4 - cores=1 - streams=4 - NUMA support
disabled, MP over anonymous pages disabled
Logical Core 1 (socket 0) forwards packets on 4 streams:
  RX P=0/Q=0 (socket 0) -> TX P=1/Q=0 (socket 0) peer=02:00:00:00:00:01
  RX P=1/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00
  RX P=2/Q=0 (socket 0) -> TX P=3/Q=0 (socket 0) peer=02:00:00:00:00:03
  RX P=3/Q=0 (socket 0) -> TX P=2/Q=0 (socket 0) peer=02:00:00:00:00:02

testpmd> show port info all

********************* Infos for port 0  *********************
MAC address: 00:00:00:01:02:03
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
Promiscuous mode: disabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 1
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip on
  filter on
  qinq(extend) off
No flow type is supported.
Max possible RX queues: 1
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Max possible TX queues: 1
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1

********************* Infos for port 1  *********************
MAC address: 00:00:00:01:02:03
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
Promiscuous mode: disabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 1
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip on
  filter on
  qinq(extend) off
No flow type is supported.
Max possible RX queues: 1
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Max possible TX queues: 1
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1

********************* Infos for port 2  *********************
MAC address: 52:54:00:1D:22:E1
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 64
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip on
  filter on
  qinq(extend) off
No flow type is supported.
Max possible RX queues: 4
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Max possible TX queues: 4
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1

********************* Infos for port 3  *********************
MAC address: 52:54:00:4C:B0:4A
Connect to socket: 0
memory allocation on the socket: 0
Link status: up
Link speed: 10000 Mbps
Link duplex: full-duplex
Promiscuous mode: enabled
Allmulticast mode: disabled
Maximum number of MAC addresses: 64
Maximum number of MAC addresses of hash filtering: 0
VLAN offload:
  strip on
  filter on
  qinq(extend) off
No flow type is supported.
Max possible RX queues: 4
Max possible number of RXDs per queue: 65535
Min possible number of RXDs per queue: 0
RXDs number alignment: 1
Max possible TX queues: 4
Max possible number of TXDs per queue: 65535
Min possible number of TXDs per queue: 0
TXDs number alignment: 1


# Running testpmd "start tx_first" #
testpmd> start tx_first
  io packet forwarding - CRC stripping disabled - packets/burst=32
  nb forwarding cores=1 - nb forwarding ports=4
  RX queues=1 - RX desc=128 - RX free threshold=0
  RX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX queues=1 - TX desc=512 - TX free threshold=0
  TX threshold registers: pthresh=0 hthresh=0 wthresh=0
  TX RS bit threshold=0 - TXQ flags=0x0
testpmd> stop
Telling cores to stop...
Waiting for lcores to finish...

  ---------------------- Forward statistics for port 0
 ----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 32             TX-dropped: 0             TX-total: 32

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 1
 ----------------------
  RX-packets: 0              RX-dropped: 0             RX-total: 0
  TX-packets: 32             TX-dropped: 0             TX-total: 32

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 2
 ----------------------
  RX-packets: 266            RX-dropped: 0             RX-total: 266
  TX-packets: 44             TX-dropped: 0             TX-total: 44

----------------------------------------------------------------------------

  ---------------------- Forward statistics for port 3
 ----------------------
  RX-packets: 12             RX-dropped: 0             RX-total: 12
  TX-packets: 183            TX-dropped: 115           TX-total: 298

----------------------------------------------------------------------------

  +++++++++++++++ Accumulated forward statistics for all
ports+++++++++++++++
  RX-packets: 278            RX-dropped: 0             RX-total: 278
  TX-packets: 291            TX-dropped: 115           TX-total: 406

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Done.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd


More information about the dev mailing list