Bug 73 - Secondary processes can not set up ports: document limitation and return error in appropriate functions
Summary: Secondary processes can not set up ports: document limitation and return erro...
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: other (show other bugs)
Version: 18.02
Hardware: x86 Linux
: Normal normal
Target Milestone: ---
Assignee: John McNamara
URL:
Depends on:
Blocks:
 
Reported: 2018-07-17 15:34 CEST by Guillaume Girard
Modified: 2018-10-08 15:13 CEST (History)
4 users (show)



Attachments

Description Guillaume Girard 2018-07-17 15:34:07 CEST
We have built an application on top of DPDK listening to certain streams of packets and reporting statistics on those. The application is working with a specific Ethernet port specified at the command-line.

The application is multi-process aware, so you can start a primary and several secondaries, all listening to their own ports. This works perfectly fine on a X550T-based setup with up to 8 ports (driver is xgbe).

However, when moving to an X772T-based setup (driver is i40e), the secondaries stopped receiving any packets. The primary works as usual, and sees all packets for its own port. The secondaries get "packets" that consists of 60 to 300 bytes, all zeroes, at a rate of one per second or so.

I tried to run the same setup with multiple primaries separated by their file prefix, but the same issue happened (which process gets the packets changes though).

I finally refactored the application to run separate threads in the same process instead, and that worked fine. All threads see all the packets they expect.

I suspect that ixgbe and i40e differ in how they handle this, and that might cause the bug, but obviously, there might be other differences between hosts that might be causing this as well. Not being familiar with DPDK, I haven't searched any further, but I'll be happy to do so if somebody can give me a hint of where to start looking.
Comment 1 Anatoly Burakov 2018-07-17 16:35:17 CEST
Please forgive me for kind of an obvious question, but did you blacklist/whitelist your hardware devices for each process?
Comment 2 Guillaume Girard 2018-07-17 16:48:42 CEST
I'm not sure how to answer that question in whilelisting/blacklisting terms. Each process works with a single different device index and a different lcore index. They do not overlap. Is there some additional explicit "masking" to do on top of that?
Comment 3 Anatoly Burakov 2018-07-17 16:56:18 CEST
Generally, PCI devices cannot be used by different primary processes - if you're running multiple primaries, you have to whitelist devices to prevent them from being mapped into and used by multiple processes.

That, however, should not apply to running secondary processes, so i guess we can rule that out as a cause :)
Comment 4 Guillaume Girard 2018-07-17 16:58:54 CEST
I see. So my trials with multiple primary processes might be invalid, however, the primary/secondary scheme should work.

By PCI device, do you mean device or function? The X7xx setup is a 4 functions, single PCI device.
Comment 5 Anatoly Burakov 2018-07-17 17:07:02 CEST
I mean function, of course.

Yes, i would expect for the primary/secondary case to work.
Comment 6 Ajit Khaparde 2018-08-02 00:56:50 CEST
What is the next step on this? Who owns the next action item? Guillaume, do you need anything else?
Comment 7 Mihai Mihalache 2018-08-09 16:21:42 CEST
Hi,
I'm working with Guillaume on the same project. While he's ooo I'm taking care of the work. I have further findings regarding this topic:

 > Running the multiprocess with --file-prefix, different core mask, dedicated memory and blacklisting the NICs used by previous processes seems to work fine.

 > Regarding primary/secondary approach here are my findings: Our current code implementation passes a unique set of  NIC, port, ip, etc to each instance, therefore the primary dpdk instance configures only its own NIC. So the secondaries must do exactly the same for their NICs. However, this didn't seem to happen, so I moved the configuration for all NICs in the primary process and the secondaries didn't do this anymore, they just use the rings and mempool created by primary process. By configuration I mean the call to: rte_eth_dev_configure, rte_eth_tx_queue_setup, rte_eth_rx_queue_setup, rte_eth_dev_start. With this configuration I got it working. 
   
 However, I couldn't find anywhere in the dpdk doc (apologize if missed it) stating that secondaries are not allowed to do full hw configuration, therefore the primary must do it for all secondaries. I did find though in dpdk docs that memory allocation must be done by primary only and secondaries must do a lookup.
   I did a further experiment on the configuration of NICs used by secondaries:  I moved the call to rte_eth_dev_configure and rx/tx_queue_setup in the primary dpdk process and kept rte_eth_dev_start to be called later by secondary dpdk process (obviously with the same portid).
   However this failed. Now I cannot say if only the rte_eth_dev_start has a limitation on secondaries or all above configuration functions.

So I have 2 final comments here:

 1. If I'm right with my statements above, please add in the dpdk doc clearly that primary should do the NIC configuration for all instances (rte_eth_dev_configure, rte_eth_rx_queue_setup, rte_eth_dev_start, etc, etc), or point me to the doc where this is written if already existing. Also, a kind of error being returned to secondaries when trying to execute those functions would have been useful as it would tell clearly to the user app that the implementation is wrong.
 2. Why was the old implementation working on xgbe driver? Does it keep somehow some hw configuration enough to work fine?

Bottom line, we have now the setup working, however, I'm looking forward to receiving your feedback about my analysis above.

Thanks & Regards,
Mihai.
Comment 8 Anatoly Burakov 2018-08-09 16:46:25 CEST
As far as memory goes, the picture is a little bit more complicated.

Prior to release 18.05, primary initialized all of the memory, and then secondary could allocate to its heart's content. Since memory map is static, primary could even die, and secondary would still be able to allocate, provided there was enough memory left on the malloc heap.

Starting with 18.05, primary process is responsible for allocating/freeing pages at runtime. Without the primary, secondary can only allocate already existing memory, but not add new memory to DPDK.

So, technically, preior to 18.05, nothing stopped secondaries from initializing the devices. Since 18.05, it would still be possible, but one of the following must be true:

1) primary process is alive
2) malloc heap has enough space to satisfy allocation requests for the device

That said, what the drivers did in primary and secondary processes is generally up to the drivers themselves. As far as i know, in practice most (if not all) of them did what you described - primary process allocates, secondary process looks up. We do support device hotplug, but we do not support device hotplug in secondary processes - this is kind of why initialization of hardware devices does not work in the secondary processes right now (because there's no way to tell if secondary is meant to initialize the device, or merely attach to an already initialized one).

That said, i'm not that well versed in device behavior specifics, so if someone else knows better - by all means, please comment :)

As to why ixgbe was working in that scenario - that is beyond my knowledge. If it's not meant to be working but does - it's not the weirdest thing in ixgbe that i've seen working when it shouldn't :)
Comment 9 Mihai Mihalache 2018-08-10 15:19:54 CEST
Thanks for your feedback. As I mentioned above, would be good to state somewhere in dpdk docs clearly that hw configuration is not supported in secondaries. Also, would be good to have a kind of error returned if hw configuration is attempted from there, otherwise, like in our case, the user cannot know where the issue comes from.

Thanks & Regards,
Mihai.
Comment 10 Guillaume Girard 2018-08-27 13:18:59 CEST
I'm reformulating the bug summary to the latest findings:

* Documentation on primary/secondary processes should mention that secondary processes are not allowed to configure/start Ethernet devices

* Functions involved in setting up devices should not fail silently when called from a secondary process.

I'm resetting the assignee.
Comment 11 Ajit Khaparde 2018-08-29 20:17:32 CEST
John,
Looks like a documentation update is being expected in this case.
Can you please take care of it.

Thanks
Ajit
Comment 12 Ferruh YIGIT 2018-10-08 15:13:20 CEST
The device configuration APIs are not thread-safe, but I think this doesn't mean they can't be called by a secondary.

If the application knows what it is doing, it can configure the device from a secondary, but only from a _single_ secondary of course. But this is error prone, and for example, enic PMD has explicit secondary process check in its APIs and return error for secondaries.

This is something we, as dpdk, pushed to application to manage.

I am for clearly documenting this instead of to forbid the secondaries to configure devices.

Note You need to log in before you can comment on or make changes to this bug.