[dpdk-users] OpenVSwitch with DPDK causes pmd segfault
Sergey Matov
smatov at mirantis.com
Mon Jun 27 17:53:05 CEST 2016
Hello dear community.
I've faced unexpected segmentation fault running Open vSwitch with DPDK
using OpenStack.
On the Ubuntu 14.04 with 3.13 kernel we are having Open vSwitch ml2 neutron
accelerated with DPDK. Running VM with 3 interfaces and vCPU pinning goes
ok. VM got all pings and SSH successfully.
Host related configuration:
Hugepages:
HugePages_Total: 39552
HugePages_Free: 31360
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
NUMA topology:
root at node-1:~# lscpu | grep NUMA
NUMA node(s): 4
NUMA node0 CPU(s): 0-4,20-24
NUMA node1 CPU(s): 5-9,25-29
NUMA node2 CPU(s): 10-14,30-34
NUMA node3 CPU(s): 15-19,35-39
OpenStack release: stable/Mitaka
OVS version: 2.4.1 (bit patched for DPDK 2.1 support)
DPDK: 2.1
QEMU: 2.3
Libvirt: 1.2.9.3
OVS coremask 0x1
OVS PMD CPU mask 0x308426
VM based on Ubuntu 14.04 with 4.2 kernel. Guest system got 8GB RAM, 8GB
disc and 4 vCPUs.
All vCPUs pinned are from same NUMA. We are also make sure that every NUMA
node got > 12 GB of HugePages.
Brief Guest vCPU configuration:
<memory unit='KiB'>8388608</memory>
<currentMemory unit='KiB'>8388608</currentMemory>
<memoryBacking>
<hugepages>
<page size='2048' unit='KiB' nodeset='0'/>
</hugepages>
</memoryBacking>
<vcpu placement='static'>4</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='24'/>
<vcpupin vcpu='1' cpuset='4'/>
<vcpupin vcpu='2' cpuset='3'/>
<vcpupin vcpu='3' cpuset='23'/>
<emulatorpin cpuset='3-4,23-24'/>
</cputune>
<numatune>
<memory mode='strict' nodeset='0'/>
<memnode cellid='0' mode='strict' nodeset='0'/>
</numatune>
Guest VM running with VirtIO vNICs.
After successful setup of VM we are trying to run DPDK application on Guest
VM (for example, testpmd or pktgen) we are able to see Application is
working for some time. But after short time, OR after restarting DPDK
application, we are seeing OVS dpdk-based ports (dpdk and vhost-user) are
goes down with "Cannot allocate memory"
root at node-1:~# ovs-vsctl show
49af53c7-e8fc-46b2-b077-f5afd64302a1
Bridge br-floating
Port phy-br-floating
Interface phy-br-floating
type: patch
options: {peer=int-br-floating}
Port "p_ff798dba-0"
Interface "p_ff798dba-0"
type: internal
Port br-floating
Interface br-floating
type: internal
Bridge br-int
fail_mode: secure
Port "vhu775c67a4-1c"
tag: 2
Interface "vhu775c67a4-1c"
type: dpdkvhostuser
error: "could not open network device vhu775c67a4-1c (Cannot
allocate memory)"
Port "vhu50d83b4f-ed"
tag: 4
Interface "vhu50d83b4f-ed"
type: dpdkvhostuser
error: "could not open network device vhu50d83b4f-ed (Cannot
allocate memory)"
Port int-br-prv
Interface int-br-prv
type: patch
options: {peer=phy-br-prv}
Port "fg-52a823ea-0f"
tag: 1
Interface "fg-52a823ea-0f"
type: internal
Port br-int
Interface br-int
type: internal
Port "qr-96fc706d-60"
tag: 2
Interface "qr-96fc706d-60"
type: internal
Port int-br-floating
Interface int-br-floating
type: patch
options: {peer=phy-br-floating}
Port "vhu10b484ac-d9"
tag: 3
Interface "vhu10b484ac-d9"
type: dpdkvhostuser
error: "could not open network device vhu10b484ac-d9 (Cannot
allocate memory)"
Bridge br-prv
Port br-prv
Interface br-prv
type: internal
Port phy-br-prv
Interface phy-br-prv
type: patch
options: {peer=int-br-prv}
Port "dpdk0"
Interface "dpdk0"
type: dpdk
error: "could not open network device dpdk0 (Cannot
allocate memory)"
ovs_version: "2.4.1"
And dmesg shows next error:
[11364.145636] pmd35[5499]: segfault at 8 ip 00007fe3abe4ea4e sp
00007fe325ffa7b0 error 4 in libdpdk.so[7fe3abcee000+1bb000]
Back trace of GDB:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8c56ffd700 (LWP 5650)]
__netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00,
pkts=pkts at entry=0x7f8c56ff8fa0, cnt=cnt at entry=32, may_steal=<optimized
out>) at lib/netdev-dpdk.c:1054
1054 while (!rte_vring_available_entries(virtio_dev, VIRTIO_RXQ)) {
(gdb) bt
#0 __netdev_dpdk_vhost_send (netdev=0x7f8d1c803f00,
pkts=pkts at entry=0x7f8c56ff8fa0, cnt=cnt at entry=32, may_steal=<optimized
out>) at lib/netdev-dpdk.c:1054
#1 0x00000000005048f7 in netdev_dpdk_vhost_send (netdev=<optimized
out>, qid=<optimized out>, pkts=0x7f8c56ff8fa0, cnt=32,
may_steal=<optimized out>) at lib/netdev-dpdk.c:1196
#2 0x0000000000476950 in netdev_send (netdev=<optimized out>,
qid=<optimized out>, buffers=buffers at entry=0x7f8c56ff8fa0,
cnt=cnt at entry=32, may_steal=may_steal at entry=true) at lib/netdev.c:740
#3 0x000000000045c28c in dp_execute_cb
(aux_=aux_ at entry=0x7f8c56ffc820, packets=packets at entry=0x7f8c56ff8fa0,
cnt=cnt at entry=32, a=a at entry=0x7f8c9c009884, may_steal=true) at
lib/dpif-netdev.c:3396
#4 0x000000000047c451 in odp_execute_actions
(dp=dp at entry=0x7f8c56ffc820, packets=packets at entry=0x7f8c56ff8fa0,
cnt=32, steal=steal at entry=true, actions=<optimized out>,
actions_len=<optimized out>,
dp_execute_action=dp_execute_action at entry=0x45c120 <dp_execute_cb>)
at lib/odp-execute.c:518
#5 0x000000000045bcce in dp_netdev_execute_actions
(actions_len=<optimized out>, actions=<optimized out>, may_steal=true,
cnt=<optimized out>, packets=<optimized out>, pmd=0x1c12e00) at
lib/dpif-netdev.c:3536
#6 packet_batch_execute (now=<optimized out>, pmd=<optimized out>,
batch=0x7f8c56ff8f88) at lib/dpif-netdev.c:3084
#7 dp_netdev_input (pmd=pmd at entry=0x1c12e00,
packets=packets at entry=0x7f8c56ffc920, cnt=<optimized out>) at
lib/dpif-netdev.c:3320
#8 0x000000000045bf22 in dp_netdev_process_rxq_port
(pmd=pmd at entry=0x1c12e00, port=0x17b8f50, rxq=<optimized out>) at
lib/dpif-netdev.c:2513
#9 0x000000000045cf93 in pmd_thread_main (f_=0x1c12e00) at
lib/dpif-netdev.c:2661
#10 0x00000000004aec34 in ovsthread_wrapper (aux_=<optimized out>) at
lib/ovs-thread.c:340
#11 0x00007f8d21e1b184 in start_thread (arg=0x7f8c56ffd700) at
pthread_create.c:312
#12 0x00007f8d21b4837d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
OVS log error:
2016-06-27T15:32:31.235Z|00003|daemon_unix(monitor)|ERR|1 crashes: pid
4628 died, killed (Segmentation fault), core dumped, restarting
This behavior is quite similar to
https://bugzilla.redhat.com/show_bug.cgi?id=1293495.
I am able to reproduce this scenario in 100% by running DPDK 2.1.0 and
latest pktgen or in-tree testpmd on guest.
Please can anyone clarify the issue? Does the vhost-user code on this
particular DPDK v 2.1.0 is suffering from this kind of issue(I can't
find one)?
--
*Best Regards*
*Sergey Matov*
*Mirantis Inc*
More information about the users
mailing list