[dpdk-users] OpenVSwitch with DPDK causes pmd segfault

Sergey Matov smatov at mirantis.com
Mon Jun 27 17:53:05 CEST 2016


Hello dear community.

I've faced unexpected segmentation fault running Open vSwitch with DPDK
using OpenStack.

On the Ubuntu 14.04 with 3.13 kernel we are having Open vSwitch ml2 neutron
accelerated with DPDK. Running VM with 3 interfaces and vCPU pinning goes
ok. VM got all pings and SSH successfully.

Host related configuration:

Hugepages:
HugePages_Total:   39552
HugePages_Free:    31360
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

NUMA topology:
root at node-1:~# lscpu | grep NUMA
NUMA node(s):          4
NUMA node0 CPU(s):     0-4,20-24
NUMA node1 CPU(s):     5-9,25-29
NUMA node2 CPU(s):     10-14,30-34
NUMA node3 CPU(s):     15-19,35-39

OpenStack release: stable/Mitaka
OVS version: 2.4.1 (bit patched for DPDK 2.1 support)
DPDK: 2.1
QEMU: 2.3
Libvirt: 1.2.9.3

OVS coremask 0x1
OVS PMD CPU mask 0x308426

VM based on Ubuntu 14.04 with 4.2 kernel. Guest system got 8GB RAM, 8GB
disc and 4 vCPUs.
All vCPUs pinned are from same NUMA. We are also make sure that every NUMA
node got > 12 GB of HugePages.

Brief Guest vCPU configuration:

  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='2048' unit='KiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>
  <vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='24'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='3'/>
    <vcpupin vcpu='3' cpuset='23'/>
    <emulatorpin cpuset='3-4,23-24'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
    <memnode cellid='0' mode='strict' nodeset='0'/>
  </numatune>

Guest VM running with VirtIO vNICs.


After successful setup of VM we are trying to run DPDK application on Guest
VM (for example, testpmd or pktgen) we are able to see Application is
working for some time. But after short time, OR after restarting DPDK
application, we are seeing OVS dpdk-based ports (dpdk and vhost-user) are
goes down with "Cannot allocate memory"

root at node-1:~# ovs-vsctl show
49af53c7-e8fc-46b2-b077-f5afd64302a1
    Bridge br-floating
        Port phy-br-floating
            Interface phy-br-floating
                type: patch
                options: {peer=int-br-floating}
        Port "p_ff798dba-0"
            Interface "p_ff798dba-0"
                type: internal
        Port br-floating
            Interface br-floating
                type: internal
    Bridge br-int
        fail_mode: secure
        Port "vhu775c67a4-1c"
            tag: 2
            Interface "vhu775c67a4-1c"
                type: dpdkvhostuser
                error: "could not open network device vhu775c67a4-1c (Cannot
allocate memory)"
        Port "vhu50d83b4f-ed"
            tag: 4
            Interface "vhu50d83b4f-ed"
                type: dpdkvhostuser
                error: "could not open network device vhu50d83b4f-ed (Cannot
allocate memory)"
        Port int-br-prv
            Interface int-br-prv
                type: patch
                options: {peer=phy-br-prv}
        Port "fg-52a823ea-0f"
            tag: 1
            Interface "fg-52a823ea-0f"
                type: internal
        Port br-int
            Interface br-int
                type: internal
        Port "qr-96fc706d-60"
            tag: 2
            Interface "qr-96fc706d-60"
                type: internal
        Port int-br-floating
            Interface int-br-floating
                type: patch
                options: {peer=phy-br-floating}
        Port "vhu10b484ac-d9"
            tag: 3
            Interface "vhu10b484ac-d9"
                type: dpdkvhostuser
                error: "could not open network device vhu10b484ac-d9 (Cannot
allocate memory)"
    Bridge br-prv
        Port br-prv
            Interface br-prv
                type: internal
        Port phy-br-prv
            Interface phy-br-prv
                type: patch
                options: {peer=int-br-prv}
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
                error: "could not open network device dpdk0 (Cannot
allocate memory)"
    ovs_version: "2.4.1"

And dmesg shows next error:
[11364.145636] pmd35[5499]: segfault at 8 ip 00007fe3abe4ea4e sp
00007fe325ffa7b0 error 4 in libdpdk.so[7fe3abcee000+1bb000]

Back trace of GDB:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8c56ffd700 (LWP 5650)]
__netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00,
pkts=pkts at entry=0x7f8c56ff8fa0, cnt=cnt at entry=32, may_steal=<optimized
out>) at lib/netdev-dpdk.c:1054
1054	            while (!rte_vring_available_entries(virtio_dev, VIRTIO_RXQ)) {
(gdb) bt
#0  __netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00,
pkts=pkts at entry=0x7f8c56ff8fa0, cnt=cnt at entry=32, may_steal=<optimized
out>) at lib/netdev-dpdk.c:1054
#1  0x00000000005048f7 in netdev_dpdk_vhost_send (netdev=<optimized
out>, qid=<optimized out>, pkts=0x7f8c56ff8fa0, cnt=32,
may_steal=<optimized out>) at lib/netdev-dpdk.c:1196
#2  0x0000000000476950 in netdev_send (netdev=<optimized out>,
qid=<optimized out>, buffers=buffers at entry=0x7f8c56ff8fa0,
cnt=cnt at entry=32, may_steal=may_steal at entry=true) at lib/netdev.c:740
#3  0x000000000045c28c in dp_execute_cb
(aux_=aux_ at entry=0x7f8c56ffc820, packets=packets at entry=0x7f8c56ff8fa0,
cnt=cnt at entry=32, a=a at entry=0x7f8c9c009884, may_steal=true) at
lib/dpif-netdev.c:3396
#4  0x000000000047c451 in odp_execute_actions
(dp=dp at entry=0x7f8c56ffc820, packets=packets at entry=0x7f8c56ff8fa0,
cnt=32, steal=steal at entry=true, actions=<optimized out>,
actions_len=<optimized out>,
dp_execute_action=dp_execute_action at entry=0x45c120 <dp_execute_cb>)
    at lib/odp-execute.c:518
#5  0x000000000045bcce in dp_netdev_execute_actions
(actions_len=<optimized out>, actions=<optimized out>, may_steal=true,
cnt=<optimized out>, packets=<optimized out>, pmd=0x1c12e00) at
lib/dpif-netdev.c:3536
#6  packet_batch_execute (now=<optimized out>, pmd=<optimized out>,
batch=0x7f8c56ff8f88) at lib/dpif-netdev.c:3084
#7  dp_netdev_input (pmd=pmd at entry=0x1c12e00,
packets=packets at entry=0x7f8c56ffc920, cnt=<optimized out>) at
lib/dpif-netdev.c:3320
#8  0x000000000045bf22 in dp_netdev_process_rxq_port
(pmd=pmd at entry=0x1c12e00, port=0x17b8f50, rxq=<optimized out>) at
lib/dpif-netdev.c:2513
#9  0x000000000045cf93 in pmd_thread_main (f_=0x1c12e00) at
lib/dpif-netdev.c:2661
#10 0x00000000004aec34 in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:340
#11 0x00007f8d21e1b184 in start_thread (arg=0x7f8c56ffd700) at
pthread_create.c:312
#12 0x00007f8d21b4837d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111


OVS log error:

2016-06-27T15:32:31.235Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid
4628 died, killed (Segmentation fault), core dumped, restarting

This behavior is quite similar to
https://bugzilla.redhat.com/show_bug.cgi?id=1293495.

I am able to reproduce this scenario in 100% by running DPDK 2.1.0 and
latest pktgen or in-tree testpmd on guest.

Please can anyone clarify the issue? Does the vhost-user code on this
particular DPDK v 2.1.0 is suffering from this kind of issue(I can't
find one)?


-- 
*Best Regards*
*Sergey Matov*
*Mirantis Inc*


More information about the users mailing list