Bug 56 - crash when freeing memory with no mlx5 device attached
Summary: crash when freeing memory with no mlx5 device attached
Status: RESOLVED FIXED
Alias: None
Product: DPDK
Classification: Unclassified
Component: other (show other bugs)
Version: 18.05
Hardware: All All
: Normal critical
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2018-05-30 15:39 CEST by David Marchand
Modified: 2018-06-28 21:01 CEST (History)
1 user (show)



Attachments

Description David Marchand 2018-05-30 15:39:45 CEST
This problem is produced when a memory free event reaches the mlx5 callback, but no mlx5 device has been initialised (yet).

Looking at the code, the mlx5 driver always register a memory callback:

RTE_INIT(rte_mlx5_pmd_init);
static void
rte_mlx5_pmd_init(void)
{
...
	rte_mem_event_callback_register("MLX5_MEM_EVENT_CB",
					mlx5_mr_mem_event_cb, NULL);
}

When invoked, this callback tries to take a lock:

void                                                                             
mlx5_mr_mem_event_cb(enum rte_mem_event event_type, const void *addr,            
                     size_t len, void *arg __rte_unused)                         
{                                                                                
        struct priv *priv;                                                       
        struct mlx5_dev_list *dev_list = &mlx5_shared_data->mem_event_cb_list;   
                                                                                 
        switch (event_type) {                                                    
        case RTE_MEM_EVENT_FREE:                                                 
                rte_rwlock_write_lock(&mlx5_shared_data->mem_event_rwlock);      
                /* Iterate all the existing mlx5 devices. */                     

But this lock is not initialised unless a mlx5 device has been probed, since its init is done in mlx5_prepare_shared_data() called from mlx5_pci_probe().


Reproducing the issue is not direct, I forced an allocation / liberation in the testpmd code to make sure a free event would be triggered:

root@ubuntu1604:~/dpdk# git diff
diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 35cf266..79c9531 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -2772,6 +2772,8 @@ main(int argc, char** argv)
        }
 #endif
 
+       rte_free(rte_malloc(NULL, 10000000, 0));
+
 #ifdef RTE_LIBRTE_CMDLINE
        if (strlen(cmdline_filename) != 0)
                cmdline_read_from_file(cmdline_filename);


Then:

root@ubuntu1604:~/dpdk# LD_LIBRARY_PATH=/root/rdma-core/build/lib ./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048
EAL: Detected lcore 0 as core 0 on socket 0
EAL: Detected lcore 1 as core 0 on socket 0
EAL: Detected lcore 2 as core 0 on socket 0
...
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 90MB
Interactive-mode selected
testpmd: create a new mbuf pool <mbuf_pool_socket_0>: n=2048, size=2176, socket=0
testpmd: preferred mempool ops selected: ring_mp_mc
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 8MB
Done
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 10MB
EAL: Calling mem event callback 'MLX5_MEM_EVENT_CB:(nil)'Segmentation fault (core dumped)


root@ubuntu1604:~/dpdk# gdb ./build/app/testpmd core
...
Core was generated by `./build/app/testpmd --log-level .*,8 -c 0x6 -- -i --total-num-mbufs 2048'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  rte_rwlock_write_lock (rwl=<optimized out>) at /root/dpdk/build/include/generic/rte_rwlock.h:103
103			x = rwl->cnt;
[Current thread is 1 (Thread 0x7f1871022c00 (LWP 5732))]
(gdb) bt
#0  rte_rwlock_write_lock (rwl=<optimized out>) at /root/dpdk/build/include/generic/rte_rwlock.h:103
#1  mlx5_mr_mem_event_cb (event_type=RTE_MEM_EVENT_FREE, addr=0x7f1474a00000, len=10485760, arg=<optimized out>) at /root/dpdk/drivers/net/mlx5/mlx5_mr.c:884
#2  0x000000000054ae86 in eal_memalloc_mem_event_notify ()
#3  0x0000000000558994 in malloc_heap_free ()
#4  0x000000000055445f in rte_free ()
#5  0x0000000000477231 in main ()
Comment 1 Yongseok Koh 2018-06-28 21:01:46 CEST
Fixed by the following commit:
  net/mlx5: register memory callback only when probing
  https://git.dpdk.org/dpdk/commit/?id=44b1d513d58db0ac54e67715470f9a50643a593b

Note You need to log in before you can comment on or make changes to this bug.