Bug 1234 - iavf : crash observed during rte_eth_dev_stop
Summary: iavf : crash observed during rte_eth_dev_stop
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: 21.11
Hardware: x86 Linux
: Normal major
Target Milestone: ---
Assignee: beilei.xing
URL:
Depends on:
Blocks:
 
Reported: 2023-05-19 13:59 CEST by Sahithi Singam
Modified: 2023-10-16 09:40 CEST (History)
1 user (show)



Attachments

Description Sahithi Singam 2023-05-19 13:59:22 CEST
We are using DPDK in our application which has two processes , first one will be a primary DPDK process where as second one will be a secondary DPDK process. 
While our primary process calls all dpdk initialization routines like rte_eal_init, dev_configure, rx/tx queue setup and dev_start routines, our secondary process will invoke just rte_eal_init. 

In DPDK, rte_eth_dev->data->rx_queues and rte_eth_dev->data->tx_queues is a shared data structure between both primary and secondary processes.  

In iavf pmd, each rxq(i.e above rx_queues) and txq (above tx_queues) holds a pointer to the function ( eg. in rx_queues[index]->ops->release_mbufs) which will be invoked during rte_eth_dev_stop. 

Call to iavf_set_rx_function modifies this function pointer i.e release_mbufs. 

This function pointer will be initially set to a address by primary process -> rte_eth_dev_start() -> iavf_init_queues() -> iavf_set_rx_function().

Later this function pointer is updated by secondary process to its own address -> 
rte_eal_init()) -> iavf_dev_init() -> iavf_set_rx_function() . This address will be invalid in primary process address space. 

During application shutdown, we are invoking rte_eth_dev_stop from primary process which invokes release_mbufs function . As the address stored in release_mbufs function pointer now points to an invalid address , primary process is crashing always.

Note:
This bug will also be observed in other PMDs like ice, ixgbe which uses similar code/design.
Comment 1 Sahithi Singam 2023-05-23 12:15:36 CEST
This issue can be reproduced even with DPDK multi process application. 

=======================================================================
[root@dpdk /]#/boot/examples/dpdk-mp_server -l 2-3 -n 4 --allow 0000:00:0f.0 --allow 0000:00:0d.0 --proc-type=primary  -- -p 0x3 -n 1 
.......
.......
.......

PORTS
-----
Port 0: 'FA:16:42:B2:E4:70'     Port 1: 'FA:16:42:68:9A:7C'

Port 0 - rx:      1500  tx:       650
Port 1 - rx:      1111  tx:       845

CLIENTS
-------
Client  0 - rx:      1495, rx_drop:      1116
            tx:      1495, tx_drop:         0

^CSegmentation fault (core dumped)

=======================================================================
[root@dpdk /]# /boot/examples/dpdk-mp_client -l 4-5 -n 4 --allow 0000:00:0f.0 --allow 0000:00:0d.0 --proc-type=secondary -- -n 0


================This is the crash file bt in gdb===================
(gdb) bt
#0  0x0000000000a96477 in ?? ()
#1  0x0000000000000000 in ?? ()
(gdb) file /home/dpdk/examples/dpdk-mp_server
Reading symbols from /home/dpdk/examples/dpdk-mp_server...done.
(gdb) bt
#0  0x0000000000a96477 in i40e_flow_parse_fdir_filter () at dpdk/drivers/net/i40e/i40e_flow.c:3272
#1  0x0000000000ab983d in iavf_stop_queues () at dpdk/drivers/net/iavf/iavf_rxtx.c:1036
#2  0x0000000000591aa1 in iavf_dev_stop (dev=0x15a0240 <rte_eth_devices>)
    at dpdk/drivers/net/iavf/iavf_ethdev.c:1019
#3  0x00000000008d7900 in rte_eth_dev_stop () at dpdk/lib/ethdev/rte_ethdev.c:1883
#4  0x00000000006a2a44 in signal_handler (signal=<optimized out>)
    at dpdk/examples/multi_process/client_server_mp/mp_server/main.c:284
#5  0x00007ffff648cb80 in ?? ()
#6  0x0000000000000007 in ?? ()
#7  0x0000000000000000 in ?? ()
(gdb)
Comment 2 David Marchand 2023-10-16 09:40:32 CEST
There may be an issue in the example code, but in doubt, assigning to Beilei.

Note You need to log in before you can comment on or make changes to this bug.