We have built an application on top of DPDK listening to certain streams of packets and reporting statistics on those. The application is working with a specific Ethernet port specified at the command-line. The application is multi-process aware, so you can start a primary and several secondaries, all listening to their own ports. This works perfectly fine on a X550T-based setup with up to 8 ports (driver is xgbe). However, when moving to an X772T-based setup (driver is i40e), the secondaries stopped receiving any packets. The primary works as usual, and sees all packets for its own port. The secondaries get "packets" that consists of 60 to 300 bytes, all zeroes, at a rate of one per second or so. I tried to run the same setup with multiple primaries separated by their file prefix, but the same issue happened (which process gets the packets changes though). I finally refactored the application to run separate threads in the same process instead, and that worked fine. All threads see all the packets they expect. I suspect that ixgbe and i40e differ in how they handle this, and that might cause the bug, but obviously, there might be other differences between hosts that might be causing this as well. Not being familiar with DPDK, I haven't searched any further, but I'll be happy to do so if somebody can give me a hint of where to start looking.
Please forgive me for kind of an obvious question, but did you blacklist/whitelist your hardware devices for each process?
I'm not sure how to answer that question in whilelisting/blacklisting terms. Each process works with a single different device index and a different lcore index. They do not overlap. Is there some additional explicit "masking" to do on top of that?
Generally, PCI devices cannot be used by different primary processes - if you're running multiple primaries, you have to whitelist devices to prevent them from being mapped into and used by multiple processes. That, however, should not apply to running secondary processes, so i guess we can rule that out as a cause :)
I see. So my trials with multiple primary processes might be invalid, however, the primary/secondary scheme should work. By PCI device, do you mean device or function? The X7xx setup is a 4 functions, single PCI device.
I mean function, of course. Yes, i would expect for the primary/secondary case to work.
What is the next step on this? Who owns the next action item? Guillaume, do you need anything else?
Hi, I'm working with Guillaume on the same project. While he's ooo I'm taking care of the work. I have further findings regarding this topic: > Running the multiprocess with --file-prefix, different core mask, dedicated memory and blacklisting the NICs used by previous processes seems to work fine. > Regarding primary/secondary approach here are my findings: Our current code implementation passes a unique set of NIC, port, ip, etc to each instance, therefore the primary dpdk instance configures only its own NIC. So the secondaries must do exactly the same for their NICs. However, this didn't seem to happen, so I moved the configuration for all NICs in the primary process and the secondaries didn't do this anymore, they just use the rings and mempool created by primary process. By configuration I mean the call to: rte_eth_dev_configure, rte_eth_tx_queue_setup, rte_eth_rx_queue_setup, rte_eth_dev_start. With this configuration I got it working. However, I couldn't find anywhere in the dpdk doc (apologize if missed it) stating that secondaries are not allowed to do full hw configuration, therefore the primary must do it for all secondaries. I did find though in dpdk docs that memory allocation must be done by primary only and secondaries must do a lookup. I did a further experiment on the configuration of NICs used by secondaries: I moved the call to rte_eth_dev_configure and rx/tx_queue_setup in the primary dpdk process and kept rte_eth_dev_start to be called later by secondary dpdk process (obviously with the same portid). However this failed. Now I cannot say if only the rte_eth_dev_start has a limitation on secondaries or all above configuration functions. So I have 2 final comments here: 1. If I'm right with my statements above, please add in the dpdk doc clearly that primary should do the NIC configuration for all instances (rte_eth_dev_configure, rte_eth_rx_queue_setup, rte_eth_dev_start, etc, etc), or point me to the doc where this is written if already existing. Also, a kind of error being returned to secondaries when trying to execute those functions would have been useful as it would tell clearly to the user app that the implementation is wrong. 2. Why was the old implementation working on xgbe driver? Does it keep somehow some hw configuration enough to work fine? Bottom line, we have now the setup working, however, I'm looking forward to receiving your feedback about my analysis above. Thanks & Regards, Mihai.
As far as memory goes, the picture is a little bit more complicated. Prior to release 18.05, primary initialized all of the memory, and then secondary could allocate to its heart's content. Since memory map is static, primary could even die, and secondary would still be able to allocate, provided there was enough memory left on the malloc heap. Starting with 18.05, primary process is responsible for allocating/freeing pages at runtime. Without the primary, secondary can only allocate already existing memory, but not add new memory to DPDK. So, technically, preior to 18.05, nothing stopped secondaries from initializing the devices. Since 18.05, it would still be possible, but one of the following must be true: 1) primary process is alive 2) malloc heap has enough space to satisfy allocation requests for the device That said, what the drivers did in primary and secondary processes is generally up to the drivers themselves. As far as i know, in practice most (if not all) of them did what you described - primary process allocates, secondary process looks up. We do support device hotplug, but we do not support device hotplug in secondary processes - this is kind of why initialization of hardware devices does not work in the secondary processes right now (because there's no way to tell if secondary is meant to initialize the device, or merely attach to an already initialized one). That said, i'm not that well versed in device behavior specifics, so if someone else knows better - by all means, please comment :) As to why ixgbe was working in that scenario - that is beyond my knowledge. If it's not meant to be working but does - it's not the weirdest thing in ixgbe that i've seen working when it shouldn't :)
Thanks for your feedback. As I mentioned above, would be good to state somewhere in dpdk docs clearly that hw configuration is not supported in secondaries. Also, would be good to have a kind of error returned if hw configuration is attempted from there, otherwise, like in our case, the user cannot know where the issue comes from. Thanks & Regards, Mihai.
I'm reformulating the bug summary to the latest findings: * Documentation on primary/secondary processes should mention that secondary processes are not allowed to configure/start Ethernet devices * Functions involved in setting up devices should not fail silently when called from a secondary process. I'm resetting the assignee.
John, Looks like a documentation update is being expected in this case. Can you please take care of it. Thanks Ajit
The device configuration APIs are not thread-safe, but I think this doesn't mean they can't be called by a secondary. If the application knows what it is doing, it can configure the device from a secondary, but only from a _single_ secondary of course. But this is error prone, and for example, enic PMD has explicit secondary process check in its APIs and return error for secondaries. This is something we, as dpdk, pushed to application to manage. I am for clearly documenting this instead of to forbid the secondaries to configure devices.