This problem is produced when a memory free event reaches the mlx5 callback, but no mlx5 device has been initialised (yet). Looking at the code, the mlx5 driver always register a memory callback: RTE_INIT(rte_mlx5_pmd_init); static void rte_mlx5_pmd_init(void) { ... rte_mem_event_callback_register("MLX5_MEM_EVENT_CB", mlx5_mr_mem_event_cb, NULL); } When invoked, this callback tries to take a lock: void mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr, size_t len, void *arg __rte_unused) { struct priv *priv; struct mlx5_dev_list *dev_list = &mlx5_shared_data->mem_event_cb_list; switch (event_type) { case RTE_MEM_EVENT_FREE: rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock); /* Iterate all the existing mlx5 devices. */ But this lock is not initialised unless a mlx5 device has been probed, since its init is done in mlx5_prepare_shared_data() called from mlx5_pci_probe(). Reproducing the issue is not direct, I forced an allocation / liberation in the testpmd code to make sure a free event would be triggered: root@ubuntu1604:~/dpdk# git diff diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index 35cf266..79c9531 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -2772,6 +2772,8 @@ main(int argc, char** argv) } #endif + rte_free(rte_malloc(NULL, 10000000, 0)); + #ifdef RTE_LIBRTE_CMDLINE if (strlen(cmdline_filename) != 0) cmdline_read_from_file(cmdline_filename); Then: root@ubuntu1604:~/dpdk# LD_LIBRARY_PATH=/root/rdma-core/build/lib ./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048 EAL: Detected lcore 0 as core 0 on socket 0 EAL: Detected lcore 1 as core 0 on socket 0 EAL: Detected lcore 2 as core 0 on socket 0 ... EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 90MB Interactive-mode selected testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=2048, size=2176, socket=0 testpmd: preferred mempool ops selected: ring_mp_mc EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 8MB Done EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 EAL: Restoring previous memory policy: 0 EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync EAL: Heap on socket 0 was expanded by 10MB EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'Segmentation fault (core dumped) root@ubuntu1604:~/dpdk# gdb ./build/app/testpmd core ... Core was generated by `./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048'. Program terminated with signal SIGSEGV, Segmentation fault. #0 rte_rwlock_write_lock (rwl=<optimized out>) at /root/dpdk/build/include/generic/rte_rwlock.h:103 103 x = rwl->cnt; [Current thread is 1 (Thread 0x7f1871022c00 (LWP 5732))] (gdb) bt #0 rte_rwlock_write_lock (rwl=<optimized out>) at /root/dpdk/build/include/generic/rte_rwlock.h:103 #1 mlx5_mr_mem_event_cb (event_type=RTE_MEM_EVENT_FREE, addr=0x7f1474a00000, len=10485760, arg=<optimized out>) at /root/dpdk/drivers/net/mlx5/mlx5_mr.c:884 #2 0x000000000054ae86 in eal_memalloc_mem_event_notify () #3 0x0000000000558994 in malloc_heap_free () #4 0x000000000055445f in rte_free () #5 0x0000000000477231 in main ()
Fixed by the following commit: net/mlx5: register memory callback only when probing https://git.dpdk.org/dpdk/commit/?id=44b1d513d58db0ac54e67715470f9a50643a593b