+++ This bug was initially created as a clone of Bug #611 +++ Recently we see a deadlock in vdpa example app if SIGTERM signal is sent at early start stage.After debuging, we suspect it's due to vhost_user.mutex is requested twice,once by start sequence start_vdpa(), the other by SIGTERM signal handler which calling close_vdpa(). Also checked other example apps, seems they all have same issue. +++ Testpmd seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to call dev_stop() and dev_close() ethdev ops that almost all take spinlock or mutexes.
Turns out I unintentionnally reproduced it today: # ./build_rc2/app/dpdk-testpmd --file-prefix=vhost --no-pci --vdev eth_vhost0,iface=/tmp/vhost-user1 -- -i EAL: Detected 48 lcore(s) EAL: Detected 2 NUMA nodes EAL: Multi-process socket /var/run/dpdk/vhost/mp_socket EAL: Selected IOVA mode 'PA' EAL: No available 2048 kB hugepages reported EAL: Probing VFIO support... EAL: No legacy callbacks, legacy socket not created Interactive-mode selected Failed to set MTU to 1500 for port 0 testpmd: create a new mbuf pool <mb_pool_0>: n=523456, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc testpmd: create a new mbuf pool <mb_pool_1>: n=523456, size=2176, socket=1 testpmd: preferred mempool ops selected: ring_mp_mc ^C Signal 2 received, preparing to exit... Stopping port 0... Stopping ports... Done Shutting down port 0... Closing ports... Port 0 is closed Done ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... ^C Signal 2 received, preparing to exit... Using host libthread_db library "/lib64/libthread_db.so.1". Missing separate debuginfos, use: dnf debuginfo-install numactl-libs-2.0.14-1.fc33.x86_64 --Type <RET> for more, q to quit, c to continue without paging-- rte_rwlock_write_lock (rwl=0x100000010) at ../lib/librte_eal/include/generic/rte_rwlock.h:161 161 while (success == 0) { (gdb) bt #0 rte_rwlock_write_lock (rwl=0x100000010) at ../lib/librte_eal/include/generic/rte_rwlock.h:161 #1 rte_memzone_free (mz=0x1000070d8) at ../lib/librte_eal/common/eal_common_memzone.c:268 #2 0x0000000000ab3612 in rte_mempool_free_memchunks (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:285 #3 0x0000000000ab4618 in rte_mempool_free (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:726 #4 rte_mempool_free (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:703 #5 0x000000000097074d in pmd_test_exit () at ../app/test-pmd/testpmd.c:3084 #6 0x00000000009707e1 in force_quit () at ../app/test-pmd/testpmd.c:3705 #7 signal_handler (signum=2) at ../app/test-pmd/testpmd.c:3740 #8 signal_handler (signum=2) at ../app/test-pmd/testpmd.c:3727 #9 <signal handler called> #10 __GI___mmap64 (offset=0, fd=210, flags=32785, prot=3, len=1073741824, addr=0x1200000000) at ../sysdeps/unix/sysv/linux/mmap64.c:59 #11 __GI___mmap64 (addr=addr@entry=0x1200000000, len=1073741824, prot=prot@entry=3, flags=32785, fd=fd@entry=210, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:47 #12 0x0000000000ad92e5 in alloc_seg (ms=0x1180000030, addr=0x1200000000, socket_id=1, hi=<optimized out>, list_idx=2, seg_idx=1) at ../lib/librte_eal/linux/eal_memalloc.c:583 #13 0x0000000000ad9d27 in alloc_seg_walk (msl=<optimized out>, arg=0x7fff971e9f30) at ../lib/librte_eal/linux/eal_memalloc.c:875 #14 0x0000000000abfb2b in rte_memseg_list_walk_thread_unsafe (func=func@entry=0xad9bc0 <alloc_seg_walk>, arg=arg@entry=0x7fff971e9f30) at ../lib/librte_eal/common/eal_common_memory.c:760 #15 0x0000000000504846 in eal_memalloc_alloc_seg_bulk (ms=ms@entry=0x22b4960, n_segs=n_segs@entry=2, page_sz=page_sz@entry=1073741824, socket=socket@entry=1, exact=exact@entry=true) at ../lib/librte_eal/linux/eal_memalloc.c:1040 #16 0x0000000000acbf36 in alloc_pages_on_heap (heap=0x100004b80, pg_sz=1073741824, elt_size=1239546816, socket=1, flags=2, align=64, bound=0, contig=false, ms=0x22b4960, n_segs=2) at ../lib/librte_eal/common/malloc_heap.c:313 #17 0x0000000000acc0f1 in try_expand_heap_primary (contig=false, bound=0, align=64, flags=2, socket=1, elt_size=1239546816, pg_sz=1073741824, heap=0x100004b80) at ../lib/librte_eal/common/malloc_heap.c:409 #18 try_expand_heap (heap=0x100004b80, pg_sz=1073741824, elt_size=1239546816, socket=1, flags=2, align=64, bound=0, contig=false) at ../lib/librte_eal/common/malloc_heap.c:500 #19 0x0000000000acc5b0 in alloc_more_mem_on_socket (heap=heap@entry=0x100004b80, size=size@entry=1239546816, socket=socket@entry=1, flags=flags@entry=6, align=64, bound=bound@entry=0, contig=false) at ../lib/librte_eal/common/malloc_heap.c:609 #20 0x0000000000acc9e2 in malloc_heap_alloc_on_heap_id (size=1239546816, heap_id=1, flags=6, align=<optimized out>, bound=0, contig=<optimized out>, type=<optimized out>) at ../lib/librte_eal/common/malloc_heap.c:684 #21 0x0000000000accc3d in malloc_heap_alloc (type=type@entry=0x0, size=size@entry=1239546816, socket_arg=<optimized out>, flags=flags@entry=6, align=align@entry=64, bound=bound@entry=0, contig=false) at ../lib/librte_eal/common/malloc_heap.c:722 #22 0x0000000000ac06fe in memzone_reserve_aligned_thread_unsafe (bound=0, align=64, flags=6, socket_id=<optimized out>, len=1239546816, name=0x7fff971eb330 "MP_mb_pool_1_0") at ../lib/librte_eal/common/eal_common_memzone.c:150 #23 rte_memzone_reserve_thread_safe (name=0x7fff971eb330 "MP_mb_pool_1_0", len=1239546815, socket_id=1, flags=6, align=64, bound=0) at ../lib/librte_eal/common/eal_common_memzone.c:202 #24 0x0000000000ab415e in rte_mempool_populate_default (mp=mp@entry=0x11ffe7df00) at ../lib/librte_mempool/rte_mempool.c:564 #25 0x0000000000aae243 in rte_pktmbuf_pool_create_by_ops (name=<optimized out>, n=<optimized out>, cache_size=<optimized out>, priv_size=<optimized out>, data_room_size=<optimized out>, socket_id=<optimized out>, ops_name=0x0)
Maxime, Are you planning to look at this? Or want someone to take a look?