Bug 1234

Summary: iavf : crash observed during rte_eth_dev_stop
Product: DPDK Reporter: Sahithi Singam (sahithi.singam)
Component: ethdevAssignee: beilei.xing
Status: UNCONFIRMED ---    
Severity: major CC: david.marchand
Priority: Normal    
Version: 21.11   
Target Milestone: ---   
Hardware: x86   
OS: Linux   

Description Sahithi Singam 2023-05-19 13:59:22 CEST
We are using DPDK in our application which has two processes , first one will be a primary DPDK process where as second one will be a secondary DPDK process. 
While our primary process calls all dpdk initialization routines like rte_eal_init, dev_configure, rx/tx queue setup and dev_start routines, our secondary process will invoke just rte_eal_init. 

In DPDK, rte_eth_dev->data->rx_queues and rte_eth_dev->data->tx_queues is a shared data structure between both primary and secondary processes.  

In iavf pmd, each rxq(i.e above rx_queues) and txq (above tx_queues) holds a pointer to the function ( eg. in rx_queues[index]->ops->release_mbufs) which will be invoked during rte_eth_dev_stop. 

Call to iavf_set_rx_function modifies this function pointer i.e release_mbufs. 

This function pointer will be initially set to a address by primary process -> rte_eth_dev_start() -> iavf_init_queues() -> iavf_set_rx_function().

Later this function pointer is updated by secondary process to its own address -> 
rte_eal_init()) -> iavf_dev_init() -> iavf_set_rx_function() . This address will be invalid in primary process address space. 

During application shutdown, we are invoking rte_eth_dev_stop from primary process which invokes release_mbufs function . As the address stored in release_mbufs function pointer now points to an invalid address , primary process is crashing always.

Note:
This bug will also be observed in other PMDs like ice, ixgbe which uses similar code/design.
Comment 1 Sahithi Singam 2023-05-23 12:15:36 CEST
This issue can be reproduced even with DPDK multi process application. 

=======================================================================
[root@dpdk /]#/boot/examples/dpdk-mp_server -l 2-3 -n 4 --allow 0000:00:0f.0 --allow 0000:00:0d.0 --proc-type=primary  -- -p 0x3 -n 1 
.......
.......
.......

PORTS
-----
Port 0: 'FA:16:42:B2:E4:70'     Port 1: 'FA:16:42:68:9A:7C'

Port 0 - rx:      1500  tx:       650
Port 1 - rx:      1111  tx:       845

CLIENTS
-------
Client  0 - rx:      1495, rx_drop:      1116
            tx:      1495, tx_drop:         0

^CSegmentation fault (core dumped)

=======================================================================
[root@dpdk /]# /boot/examples/dpdk-mp_client -l 4-5 -n 4 --allow 0000:00:0f.0 --allow 0000:00:0d.0 --proc-type=secondary -- -n 0


================This is the crash file bt in gdb===================
(gdb) bt
#0  0x0000000000a96477 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) file /home/dpdk/examples/dpdk-mp_server
Reading symbols from /home/dpdk/examples/dpdk-mp_server...done.
(gdb) bt
#0  0x0000000000a96477 in i40e_flow_parse_fdir_filter () at dpdk/drivers/net/i40e/i40e_flow.c:3272
#1  0x0000000000ab983d in iavf_stop_queues () at dpdk/drivers/net/iavf/iavf_rxtx.c:1036
#2  0x0000000000591aa1 in iavf_dev_stop (dev=0x15a0240 <rte_eth_devices>)
    at dpdk/drivers/net/iavf/iavf_ethdev.c:1019
#3  0x00000000008d7900 in rte_eth_dev_stop () at dpdk/lib/ethdev/rte_ethdev.c:1883
#4  0x00000000006a2a44 in signal_handler (signal=<optimized out>)
    at dpdk/examples/multi_process/client_server_mp/mp_server/main.c:284
#5  0x00007ffff648cb80 in ?? ()
#6  0x0000000000000007 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb)
Comment 2 David Marchand 2023-10-16 09:40:32 CEST
There may be an issue in the example code, but in doubt, assigning to Beilei.