[dpdk-dev] [RFC] vhost user: add error handling for fd > 1023

Christian Ehrhardt christian.ehrhardt at canonical.com
Thu Apr 7 16:49:27 CEST 2016


Hi Patrick,

On Tue, Apr 5, 2016 at 10:40 AM, Patrik Andersson R <
patrik.r.andersson at ericsson.com> wrote:
>
> The described fault situation arises due to the fact that there is a bug
> in an OpenStack component, Neutron or Nova, that fails to release ports
> on VM deletion. This typically leads to an accumulation of 1-2 file
> descriptors per unreleased port. It could also arise when allocating a
> large
> number (~500?) of vhost user ports and connecting them all to VMs.
>

I can confirm that I'm able to trigger this without Openstack.
Using DPDK 2.2 and OpenVswitch 2.5.
Initially I had at least 2 guests attached to the first two ports, but it
seems not necessary which makes it as easy as:
ovs-vsctl add-br ovsdpdkbr0 -- set bridge ovsdpdkbr0 datapath_type=netdev
ovs-vsctl add-port ovsdpdkbr0 dpdk0 -- set Interface dpdk0 type=dpdk
for idx in {1..1023}; do ovs-vsctl add-port ovsdpdkbr0 vhost-user-${idx} --
set Interface vhost-user-${idx} type=dpdkvhostuser; done

=> as soon as the associated fd is >1023 the vhost_user socket gets
created, but just afterwards I see the crash mentioned by Patrick

#0  0x00007f51cb187518 in __GI_raise (sig=sig at entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f51cb1890ea in __GI_abort () at abort.c:89
#2  0x00007f51cb1c98c4 in __libc_message (do_abort=do_abort at entry=2,
fmt=fmt at entry=0x7f51cb2e1584 "*** %s ***: %s terminated\n") at
../sysdeps/posix/libc_fatal.c:175
#3  0x00007f51cb26af94 in __GI___fortify_fail (msg=<optimized out>,
msg at entry=0x7f51cb2e1515 "buffer overflow detected") at fortify_fail.c:37
#4  0x00007f51cb268fa0 in __GI___chk_fail () at chk_fail.c:28
#5  0x00007f51cb26aee7 in __fdelt_chk (d=<optimized out>) at fdelt_chk.c:25
#6  0x00007f51cbd6d665 in fdset_fill (pfdset=0x7f51cc03dfa0
<g_vhost_server+8192>, wfset=0x7f51c78e4a30, rfset=0x7f51c78e49b0)
    at
/build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:110
#7  fdset_event_dispatch (pfdset=pfdset at entry=0x7f51cc03dfa0
<g_vhost_server+8192>) at
/build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/fd_man.c:243
#8  0x00007f51cbdc1b00 in rte_vhost_driver_session_start () at
/build/dpdk-3lQdSB/dpdk-2.2.0/lib/librte_vhost/vhost_user/vhost-net-user.c:525
#9  0x00000000005061ab in start_vhost_loop (dummy=<optimized out>) at
../lib/netdev-dpdk.c:2047
#10 0x00000000004c2c64 in ovsthread_wrapper (aux_=<optimized out>) at
../lib/ovs-thread.c:340
#11 0x00007f51cba346fa in start_thread (arg=0x7f51c78e5700) at
pthread_create.c:333
#12 0x00007f51cb2592dd in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:109

As Patrick I don't have a "pure" DPDK test yet, but at least OpenStack is
our of the scope now which should help.

[...]


> The key point, I think, is that more than one file descriptor is used per
> vhost user device. This means that there is no real relation between the
> number of devices and the number of file descriptors in use.


Well it is "one per vhost_user device" as far as I've seen, but those are
not the only fd's used overall.

[...]


> In my opinion the problem is that the assumption: number of vhost
> user device == number of file descriptors does not hold. What the actual
> relation might be hard to determine with any certainty.
>

I totally agree to that there is no deterministic rule what to expect.
The only rule is that #fd certainly always is > #vhost_user devices.
In various setup variants I've crossed fd 1024 anywhere between 475 and 970
vhost_user ports.

Once the discussion continues and we have an updates version of the patch
with some more agreement I hope I can help to test it.

Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd


More information about the dev mailing list