[dpdk-dev] How to approach packet TX lockups

Matt Laswell laswell at infiniteio.com
Tue Nov 17 01:49:15 CET 2015


Hey Stephen,

Thanks a lot; that's really useful information.  Unfortunately, I'm at a
stage in our release cycle where upgrading to a new version of DPDK isn't
feasible.  Any chance you (or others reading this) has a pointer to the
relevant changes?  While I can't afford to upgrade DPDK entirely,
backporting targeted fixes is more doable.

Again, thanks.

- Matt


On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
stephen at networkplumber.org> wrote:

> On Mon, 16 Nov 2015 17:48:35 -0600
> Matt Laswell <laswell at infiniteio.com> wrote:
>
> > Hey Folks,
> >
> > I sent this to the users email list, but I'm not sure how many people are
> > actively reading that list at this point.  I'm dealing with a situation
> in
> > which my application loses the ability to transmit packets out of a port
> > during times of moderate stress.  I'd love to hear suggestions for how to
> > approach this problem, as I'm a bit at a loss at the moment.
> >
> > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on Haswell
> > processors.  I'm using the 82599 controller, configured to spread packets
> > across multiple queues.  Each queue is accessed by a different lcore in
> my
> > application; there is therefore concurrent access to the controller, but
> > not to any of the queues.  We're binding the ports to the igb_uio driver.
> > The symptoms I see are these:
> >
> >
> >    - All transmit out of a particular port stops
> >    - rte_eth_tx_burst() indicates that it is sending all of the packets
> >    that I give to it
> >    - rte_eth_stats_get() gives me stats indicating that no packets are
> >    being sent on the affected port.  Also, no tx errors, and no pause
> frames
> >    sent or received (opackets = 0, obytes = 0, oerrors = 0, etc.)
> >    - All other ports continue to work normally
> >    - The affected port continues to receive packets without problems;
> only
> >    TX is affected
> >    - Resetting the port via rte_eth_dev_stop() and rte_eth_dev_start()
> >    restores things and packets can flow again
> >    - The problem is replicable on multiple devices, and doesn't follow
> one
> >    particular port
> >
> > I've tried calling rte_mbuf_sanity_check() on all packets before sending
> > them.  I've also instrumented my code to look for packets that have
> already
> > been sent or freed, as well as cycles in chained packets being sent.  I
> > also put a lock around all accesses to rte_eth* calls to synchronize
> access
> > to the NIC.  Given some recent discussion here, I also tried changing the
> > TX RS threshold from 0 to 32, 16, and 1.  None of these strategies proved
> > effective.
> >
> > Like I said at the top, I'm a little at a loss at this point.  If you
> were
> > dealing with this set of symptoms, how would you proceed?
> >
>
> I remember some issues with old DPDK 1.6 with some of the prefetch
> thresholds on 82599. You would be better off going to a later DPDK
> version.
>


More information about the dev mailing list