Bug 628 - Possible deadlock in testpmd
Summary: Possible deadlock in testpmd
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: testpmd (show other bugs)
Version: 20.08
Hardware: x86 All
: High normal
Target Milestone: ---
Assignee: dev
URL:
Depends on: 611
Blocks:
  Show dependency tree
 
Reported: 2021-01-28 17:21 CET by Maxime Coquelin
Modified: 2021-03-09 04:59 CET (History)
4 users (show)



Attachments

Description Maxime Coquelin 2021-01-28 17:21:24 CET
+++ This bug was initially created as a clone of Bug #611 +++

Recently we see a deadlock in vdpa example app if SIGTERM signal is sent at early start stage.After debuging, we suspect it's due to vhost_user.mutex is requested twice,once by start sequence start_vdpa(), the other by SIGTERM signal handler which calling close_vdpa().

Also checked other example apps, seems they all have same issue.

+++

Testpmd seems also impacted. Indeed, pmd_test_exit() gets called, which turns out to call dev_stop() and dev_close() ethdev ops that almost all take spinlock or mutexes.
Comment 1 Maxime Coquelin 2021-02-01 10:25:32 CET
Turns out I unintentionnally reproduced it today:

# ./build_rc2/app/dpdk-testpmd --file-prefix=vhost --no-pci --vdev eth_vhost0,iface=/tmp/vhost-user1 -- -i
EAL: Detected 48 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/vhost/mp_socket
EAL: Selected IOVA mode 'PA'
EAL: No available 2048 kB hugepages reported
EAL: Probing VFIO support...
EAL: No legacy callbacks, legacy socket not created
Interactive-mode selected
Failed to set MTU to 1500 for port 0
testpmd: create a new mbuf pool <mb_pool_0>: n=523456, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
testpmd: create a new mbuf pool <mb_pool_1>: n=523456, size=2176, socket=1
testpmd: preferred mempool ops selected: ring_mp_mc
^C
Signal 2 received, preparing to exit...

Stopping port 0...
Stopping ports...
Done

Shutting down port 0...
Closing ports...
Port 0 is closed
Done



^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...
^C
Signal 2 received, preparing to exit...

Using host libthread_db library "/lib64/libthread_db.so.1".
Missing separate debuginfos, use: dnf debuginfo-install numactl-libs-2.0.14-1.fc33.x86_64
--Type <RET> for more, q to quit, c to continue without paging--
rte_rwlock_write_lock (rwl=0x100000010) at ../lib/librte_eal/include/generic/rte_rwlock.h:161
161		while (success == 0) {
(gdb) bt
#0  rte_rwlock_write_lock (rwl=0x100000010) at ../lib/librte_eal/include/generic/rte_rwlock.h:161
#1  rte_memzone_free (mz=0x1000070d8) at ../lib/librte_eal/common/eal_common_memzone.c:268
#2  0x0000000000ab3612 in rte_mempool_free_memchunks (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:285
#3  0x0000000000ab4618 in rte_mempool_free (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:726
#4  rte_mempool_free (mp=0x17f91b980) at ../lib/librte_mempool/rte_mempool.c:703
#5  0x000000000097074d in pmd_test_exit () at ../app/test-pmd/testpmd.c:3084
#6  0x00000000009707e1 in force_quit () at ../app/test-pmd/testpmd.c:3705
#7  signal_handler (signum=2) at ../app/test-pmd/testpmd.c:3740
#8  signal_handler (signum=2) at ../app/test-pmd/testpmd.c:3727
#9  <signal handler called>
#10 __GI___mmap64 (offset=0, fd=210, flags=32785, prot=3, len=1073741824, addr=0x1200000000) at ../sysdeps/unix/sysv/linux/mmap64.c:59
#11 __GI___mmap64 (addr=addr@entry=0x1200000000, len=1073741824, prot=prot@entry=3, flags=32785, fd=fd@entry=210, offset=0)
    at ../sysdeps/unix/sysv/linux/mmap64.c:47
#12 0x0000000000ad92e5 in alloc_seg (ms=0x1180000030, addr=0x1200000000, socket_id=1, hi=<optimized out>, list_idx=2, seg_idx=1)
    at ../lib/librte_eal/linux/eal_memalloc.c:583
#13 0x0000000000ad9d27 in alloc_seg_walk (msl=<optimized out>, arg=0x7fff971e9f30) at ../lib/librte_eal/linux/eal_memalloc.c:875
#14 0x0000000000abfb2b in rte_memseg_list_walk_thread_unsafe (func=func@entry=0xad9bc0 <alloc_seg_walk>, arg=arg@entry=0x7fff971e9f30)
    at ../lib/librte_eal/common/eal_common_memory.c:760
#15 0x0000000000504846 in eal_memalloc_alloc_seg_bulk (ms=ms@entry=0x22b4960, n_segs=n_segs@entry=2, page_sz=page_sz@entry=1073741824,
    socket=socket@entry=1, exact=exact@entry=true) at ../lib/librte_eal/linux/eal_memalloc.c:1040
#16 0x0000000000acbf36 in alloc_pages_on_heap (heap=0x100004b80, pg_sz=1073741824, elt_size=1239546816, socket=1, flags=2, align=64,
    bound=0, contig=false, ms=0x22b4960, n_segs=2) at ../lib/librte_eal/common/malloc_heap.c:313
#17 0x0000000000acc0f1 in try_expand_heap_primary (contig=false, bound=0, align=64, flags=2, socket=1, elt_size=1239546816,
    pg_sz=1073741824, heap=0x100004b80) at ../lib/librte_eal/common/malloc_heap.c:409
#18 try_expand_heap (heap=0x100004b80, pg_sz=1073741824, elt_size=1239546816, socket=1, flags=2, align=64, bound=0, contig=false)
    at ../lib/librte_eal/common/malloc_heap.c:500
#19 0x0000000000acc5b0 in alloc_more_mem_on_socket (heap=heap@entry=0x100004b80, size=size@entry=1239546816, socket=socket@entry=1,
    flags=flags@entry=6, align=64, bound=bound@entry=0, contig=false) at ../lib/librte_eal/common/malloc_heap.c:609
#20 0x0000000000acc9e2 in malloc_heap_alloc_on_heap_id (size=1239546816, heap_id=1, flags=6, align=<optimized out>, bound=0,
    contig=<optimized out>, type=<optimized out>) at ../lib/librte_eal/common/malloc_heap.c:684
#21 0x0000000000accc3d in malloc_heap_alloc (type=type@entry=0x0, size=size@entry=1239546816, socket_arg=<optimized out>,
    flags=flags@entry=6, align=align@entry=64, bound=bound@entry=0, contig=false) at ../lib/librte_eal/common/malloc_heap.c:722
#22 0x0000000000ac06fe in memzone_reserve_aligned_thread_unsafe (bound=0, align=64, flags=6, socket_id=<optimized out>, len=1239546816,
    name=0x7fff971eb330 "MP_mb_pool_1_0") at ../lib/librte_eal/common/eal_common_memzone.c:150
#23 rte_memzone_reserve_thread_safe (name=0x7fff971eb330 "MP_mb_pool_1_0", len=1239546815, socket_id=1, flags=6, align=64, bound=0)
    at ../lib/librte_eal/common/eal_common_memzone.c:202
#24 0x0000000000ab415e in rte_mempool_populate_default (mp=mp@entry=0x11ffe7df00) at ../lib/librte_mempool/rte_mempool.c:564
#25 0x0000000000aae243 in rte_pktmbuf_pool_create_by_ops (name=<optimized out>, n=<optimized out>, cache_size=<optimized out>,
    priv_size=<optimized out>, data_room_size=<optimized out>, socket_id=<optimized out>, ops_name=0x0)
Comment 2 Ajit Khaparde 2021-03-09 04:59:18 CET
Maxime, Are you planning to look at this? Or want someone to take a look?

Note You need to log in before you can comment on or make changes to this bug.