[dpdk-users] running multiple independent dpdk applications randomly locks up machines

Zhongming Qu zhongming at luminatewireless.com
Fri Aug 26 19:55:30 CEST 2016


Hi,


Just an update.

Thanks for all the inputs. I feel obliged to update the latest findings
here so that this thread may become useful for other people.

As it turned out, the rx/tx queue problem is not really the problem. Here
is why:
Our use model is to run two different *primary* dpdk processes each of
which binds to a different port. Both ports are on the same 82599ES nic.
They are separate ports that have independent rx/tx queues (in the sense of
BARs and the BAR0-based registers).

What the problem was, though, was that our application never calls the
rte_eth_dev_stop() function to properly shutdown the device. Simply making
sure that rte_eth_dev_stop() is called solved our problem.

>From the standpoint of a user of the dpdk library, the problem is solved.
BUT it is not understood, yet, how exactly failing to call
rte_eth_dev_stop() could have caused machine lockups. Could someone shed
light upon this question by
  a) simply confirming that I am not the only person seeing this problem,
  b) explain how, at a very low level, race conditions or memory
corruptions or anything could happen that causes a kernel panic, or
  c) provide pointers to potentially relevant information?



Thanks a lot!
Zhongming

On Fri, Aug 19, 2016 at 6:30 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Fri, 19 Aug 2016 18:19:21 -0700
> Zhongming Qu <zhongming at luminatewireless.com> wrote:
>
> > Thanks!
> >
> > I did use a hard coded queue_id of 0 when initializing the rx/tx queues,
> > i.e., rte_eth_rx/tx_queue_setup(). So that is a problem to solve. Will
> fix
> > that and try again.
> >
> > When A and B run at the same time, this lockup problem can be explained
> by
> > the conflicting queue usage. But the lockup happens even in the use case
> > where only one dpdk process is running. That is, A and B take turns to
> run
> > but do not run at the same time.
> >
> > Thanks for pointing out an alternative approach. That sounds really
> > promising. A concern came up when that idea was talked over: What would
> > happen if the primary process dies? Would all the secondary processes
> > eventually go awry at some point? Would `--proc-type auto` solve this
> > problem?
> >
>
> I haven't actually used primary/secondary model, but the recommendation
> is that the primary process does nothing (or is a watchdog) so it would
> be pretty much impossible to crash unless killed by malicious entity.
>
> All the packet logic would be in the secondary.
>


More information about the users mailing list