Bug 366 - i40e PMD returns 0 for secondary invoking rx_burst on queue 0, when Primary dies
Summary: i40e PMD returns 0 for secondary invoking rx_burst on queue 0, when Primary dies
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: ethdev (show other bugs)
Version: unspecified
Hardware: x86 Linux
: Normal major
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2019-11-13 16:43 CET by Vipin Varghese
Modified: 2023-11-20 02:46 CET (History)
5 users (show)



Attachments
test app used for reproducing the issue (5.93 KB, text/x-csrc)
2019-11-13 16:45 CET, Vipin Varghese
Details

Description Vipin Varghese 2019-11-13 16:43:26 CET
DPDK version: dpdk-stable-19.08.1
Linux: 4.15.0-66-generic
i40e: driverversion=2.1.14-k firmware=6.01
DPDK build: make EXTRA_CFLAGS="-g -O0"

BT:
(gdb) bt
#0  i40e_recv_pkts_vec (rx_queue=0x17ff32c40, rx_pkts=0x7fffffffe010, nb_pkts=<optimized out>)
    at /home/saesrv02/Downloads/dpdksrc/dpdk-stable-19.08.1/drivers/net/i40e/i40e_rxtx_vec_sse.c:470
#1  0x00005555555b6d42 in rte_eth_rx_burst (nb_pkts=32, rx_pkts=0x7fffffffe010, queue_id=0, port_id=0)
    at /home/saesrv02/Downloads/dpdksrc/dpdk-stable-19.08.1//x86_64-native-linuxapp-gcc/include/rte_ethdev.h:4097
#2  lcore_main () at /home/saesrv02/Downloads/dpdksrc/dpdk-stable-19.08.1/examples/skeleton/basicfwd.c:135
#3  0x00005555555b6f1d in main (argc=<optimized out>, argv=<optimized out>)
    at /home/saesrv02/Downloads/dpdksrc/dpdk-stable-19.08.1/examples/skeleton/basicfwd.c:226

Application:
Primary: initializes X710 via i40e PMD and iterates in while(1)
Secondary: probes rx_burst successfully with packets until the primary is killed.

test app:
CMD-1: ./build/basicfwd -w 08:00.2 -l 3
CMD-2: gdb --args ./build/basicfwd -w 08:00.2 -l 4 --proc-type=secondary

traffic generator:  ./build/l2fwd -w 83:00.2 --vdev net_tap0,iface=test =l 5 --file-prefix vipin -- -p 3 -T 1
Comment 1 Vipin Varghese 2019-11-13 16:45:51 CET
Created attachment 74 [details]
test app used for reproducing the issue
Comment 3 Vipin Varghese 2019-11-14 04:48:45 CET
tested with dpdk-19.11-rc2 from `A new DPDK release candidate is ready for testing: https://git.dpdk.org/dpdk/tag/?id=v19.11-rc2`. issue persists

Logs:

Primary:
sudo kill -SIGTERM 62209
sudo kill -SIGTERM 62209
sudo kill -SIGTERM 62209
sudo kill -9 62209

`
Signal 15 received, preparing to exit...
stats: ipackets 55 opackets 55 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
stats: ipackets 4 opackets 4 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
stats: ipackets 4 opackets 4 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
Killed
`

Secondary:
sudo kill -SIGTERM 62215
sudo kill -SIGTERM 62215
sudo kill -SIGTERM 62215

`
stats: ipackets 0 opackets 0 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
stats: ipackets 0 opackets 0 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
stats: ipackets 0 opackets 0 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 15 received, preparing to exit...
`
Comment 4 Ajit Khaparde 2019-11-26 20:50:18 CET
Beilei,
Can you check what is happening?

Thanks
Ajit
Comment 5 beilei.xing 2019-11-27 04:10:58 CET
@Xiao, could you please help to check?
Comment 6 Xiao Zhang 2019-11-28 02:33:57 CET
Hi Ajit,

When primary process dies, the driver will release hardware resources.(eg. If you bind your device to igb_uio, when primary process exit, igbuio_pci_release will be called to disable interrupts and stop the device from further DMA).
Since secondary process sharea the device resources, it will not work as expected.

BTW, can you share why you need secondary process work when primary process dead?

Thanks,
Xiao
Comment 7 Vipin Varghese 2019-11-29 09:41:01 CET
Hi Ajit,

Can you please look into in helping to update

1. http://doc.dpdk.org/guides/nics/i40e.html for this information.
2. If this is related rte_eth_dev library, then http://doc.dpdk.org/api/rte__ethdev_8h.html

reason
a. only mention of limitation is confiuguration is for vdev http://doc.dpdk.org/api/rte__ethdev_8h.html and not physical NIC.
b. delivery of interrupts is mentioned to primary in http://doc.dpdk.org/guides/prog_guide/multi_proc_support.html?highlight=secondary but not of releasing the driver and resources.

with these above understanding, on failure of primary all devices created has to be marked as unusable in secondary. Is there any API which can be invoked from secondary to check the same?

Note: application test model used is
Primary: initializes X710 via i40e PMD and iterates in while(1)
Secondary: probes rx_burst successfully with packets until the primary is killed.
Comment 8 Xiao Zhang 2019-12-11 03:04:43 CET
Hi Vipin,

We think it should be common sense that secondary processes need to run alongside primary process, so this should be not limitation but by design.
And there is no API to check the status of primary process currently. I think one way to check the usablity can be get the primary process status with linux API.

Thanks,
Xiao
Comment 9 Vipin Varghese 2020-01-27 05:35:37 CET
Hi Xiao,

Thanks for your update on the ticket and information shared from your team.

I am not clear and do not agree to your views on the following

1. 'should be common sense that secondary processes need to run alongside primary process, so this should be not limitation but by design.' 
[VV] application requires primary and secondary in such model.

2. And there is no API to check the status of primary process currently
[VV] is not the API 'rte_eal_primary_proc_alive' meant to do this. 

3. I think one way to check the usablity can be get the primary process status with linux API.
[VV] I disagree as shared on the data based on explanation 2.

New findings:

1. assuming this is indeed rte_eth_dev api behaviour, I tested the same with virtio-interfaces and found the behaviour totally baffling to common sense

expected result: as per fortville team, vdev in secondary should stop working.

current result: unlike fortville physical function, killing off primary did not stop secondary from receiving and transmitting packets.

Logs:

```
primary:
^Cstats: ipackets 21 opackets 21 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 2 received, preparing to exit...
^Cstats: ipackets 4 opackets 4 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 2 received, preparing to exit...
^Cstats: ipackets 1 opackets 1 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0

Signal 2 received, preparing to exit...
Killed


Secondary:

Signal 2 received, preparing to exit...
^Cstats: ipackets 21 opackets 21 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=

Signal 2 received, preparing to exit...
^Cstats: ipackets 1 opackets 1 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=

Signal 2 received, preparing to exit...
^Cstats: ipackets 1 opackets 1 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=

Signal 2 received, preparing to exit...
^Cstats: ipackets 1 opackets 1 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=

Signal 2 received, preparing to exit...
^Cstats: ipackets 1 opackets 1 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=

Signal 2 received, preparing to exit...
^Cstats: ipackets 347 opackets 347 imissed 0 ierrors 0 oerrors 0 rx_nombuf 0*=*=*=*=*=

Signal 2 received, preparing to exit...
```

hence assumption 'rte_eth_dev' API is programmed to do the same is not true.
Comment 10 Vipin Varghese 2020-01-27 05:39:42 CET
@Ajith I am re-opening the ticket and assigning this to you. I can only deduct the right behaviour would have been as

1. documentation clarity on expected behaviour for physical and virtual NIC in DPDK.
 
or

2. Standardize similar functional behaviour across all NIC.

or

3. update all secondary PMD to make use of proc_alive to stop in secondary.

I fail to find the same on DPDK release 19.11 with 

```
CMD-1: ./build/basicfwd -vdev net_tap0,iface=test -l 3 --no-pci
CMD-2: gdb --args ./build/basicfwd -vdev net_tap0,iface=test -l 4 --proc-type=secondary
```
Comment 11 Ajit Khaparde 2020-09-17 00:12:36 CEST
Assigning to default. Will visit and reassign appropriately.
Comment 12 dengkaiwen 2023-10-31 03:33:20 CET
Hi All,

No replies for over a month, I'm going to close this ticket for now, so please contact me if you still have questions.

Thanks
Deng Kaiwen
Comment 13 Stephen Hemminger 2023-10-31 16:53:57 CET
Thd DPDK design primary/secondary process requires that primary process keeps runing. And rx_burst() only returns 0..N packets.  Therefore the reported behavior is already correct.

A psecondary process needs to have code to monitor the connection with primary process. The packet capture uses an alarm handler to call rte_eal_primary_proc_alive() periodically and exit if primary process exits.

Note You need to log in before you can comment on or make changes to this bug.