[dpdk-dev] How to approach packet TX lockups

Matt Laswell laswell at infiniteio.com
Tue Nov 17 17:25:56 CET 2015


Thanks, I'll give that a try.

In my environment, I'm pretty sure we're using the fully-featured
ixgbe_xmit_pkts() and not _simple().   If setting rs_thresh=1 is safer,
I'll stick with that.

Again, thanks to all for the assistance.

- Matt

On Tue, Nov 17, 2015 at 10:20 AM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

> Hi Matt,
>
>
>
> As I said, at least  try to upgrade contents of shared code to the latest
> one.
>
> In previous releases: lib/librte_pmd_ixgbe/ixgbe, now located at:
> drivers/net/ixgbe/.
>
>
>
> > For reference, my transmit function is  rte_eth_tx_burst().
>
> I meant what ixgbe TX function it points to: ixgbe_xmit_pkts or
> ixgbe_xmit_pkts_simple()?
>
> For ixgbe_xmit_pkts_simple() don’t set tx_rs_thresh > 32,
>
> for ixgbe_xmit_pkts() the safest way is to set  tx_rs_thresh=1.
>
> Though as I understand from your previous mails, you already did that, and
> it didn’t help.
>
> Konstantin
>
>
>
>
>
> *From:* Matt Laswell [mailto:laswell at infiniteio.com]
> *Sent:* Tuesday, November 17, 2015 3:05 PM
> *To:* Ananyev, Konstantin
> *Cc:* Stephen Hemminger; dev at dpdk.org
>
> *Subject:* Re: [dpdk-dev] How to approach packet TX lockups
>
>
>
> Hey Konstantin,
>
>
>
> Moving from 1.6r2 to 2.2 is going to be a pretty significant change due to
> things like changes in the MBuf format, API differences, etc.  Even as an
> experiment, that's an awfully large change to absorb.  Is there a subset
> that you're referring to that could be more readily included without
> modifying so many touch points into DPDK?
>
>
>
> For reference, my transmit function is  rte_eth_tx_burst().  It seems to
> reliably tell me that it has enqueued all of the packets that I gave it,
> however the stats from rte_eth_stats_get() indicate that no packets are
> actually being sent.
>
>
>
> Thanks,
>
>
>
> - Matt
>
>
>
> On Tue, Nov 17, 2015 at 8:44 AM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Matt Laswell
> > Sent: Tuesday, November 17, 2015 2:24 PM
> > To: Stephen Hemminger
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] How to approach packet TX lockups
> >
> > Yes, we're on 1.6r2.  That said, I've tried a number of different values
> > for the thresholds without a lot of luck.  Setting wthresh/hthresh/
> pthresh
> > to 0/0/32 or 0/0/0 doesn't appear to fix things.  And, as Matthew
> > suggested, I'm pretty sure using 0 for the thresholds leads to auto-
> config
> > by the driver.  I also tried 1/1/32, which required that I also change
> the
> > rs_thresh value from 0 to 1 to work around a panic in PMD initialization
> > ("TX WTHRESH must be set to 0 if tx_rs_thresh is greater than 1").
> >
> > Any other suggestions?
>
> That's not only DPDK code changed since 1.6.
> I am pretty sure that we also have a new update of shared code since then
> (and as I remember probably more than one).
> One suggestion would be at least try to upgrade the shared code up to the
> latest.
> Another one - even if you can't upgrade to 2.2 in you production
> environment,
> it probably worth to do that in some test environment and then check does
> the problem persist.
> If yes,  then we'll need some guidance how to reproduce it.
>
> Another question it is not clear what TX function do you use?
> Konstantin
>
>
> >
> > On Mon, Nov 16, 2015 at 7:31 PM, Stephen Hemminger <
> > stephen at networkplumber.org> wrote:
> >
> > > On Mon, 16 Nov 2015 18:49:15 -0600
> > > Matt Laswell <laswell at infiniteio.com> wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > Thanks a lot; that's really useful information.  Unfortunately, I'm
> at a
> > > > stage in our release cycle where upgrading to a new version of DPDK
> isn't
> > > > feasible.  Any chance you (or others reading this) has a pointer to
> the
> > > > relevant changes?  While I can't afford to upgrade DPDK entirely,
> > > > backporting targeted fixes is more doable.
> > > >
> > > > Again, thanks.
> > > >
> > > > - Matt
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 6:12 PM, Stephen Hemminger <
> > > > stephen at networkplumber.org> wrote:
> > > >
> > > > > On Mon, 16 Nov 2015 17:48:35 -0600
> > > > > Matt Laswell <laswell at infiniteio.com> wrote:
> > > > >
> > > > > > Hey Folks,
> > > > > >
> > > > > > I sent this to the users email list, but I'm not sure how many
> > > people are
> > > > > > actively reading that list at this point.  I'm dealing with a
> > > situation
> > > > > in
> > > > > > which my application loses the ability to transmit packets out
> of a
> > > port
> > > > > > during times of moderate stress.  I'd love to hear suggestions
> for
> > > how to
> > > > > > approach this problem, as I'm a bit at a loss at the moment.
> > > > > >
> > > > > > Specifically, I'm using DPDK 1.6r2 running on Ubuntu 14.04LTS on
> > > Haswell
> > > > > > processors.  I'm using the 82599 controller, configured to spread
> > > packets
> > > > > > across multiple queues.  Each queue is accessed by a different
> lcore
> > > in
> > > > > my
> > > > > > application; there is therefore concurrent access to the
> controller,
> > > but
> > > > > > not to any of the queues.  We're binding the ports to the
> igb_uio
> > > driver.
> > > > > > The symptoms I see are these:
> > > > > >
> > > > > >
> > > > > >    - All transmit out of a particular port stops
> > > > > >    - rte_eth_tx_burst() indicates that it is sending all of the
> > > packets
> > > > > >    that I give to it
> > > > > >    - rte_eth_stats_get() gives me stats indicating that no
> packets
> > > are
> > > > > >    being sent on the affected port.  Also, no tx errors, and no
> pause
> > > > > frames
> > > > > >    sent or received (opackets = 0, obytes = 0, oerrors = 0,
> etc.)
> > > > > >    - All other ports continue to work normally
> > > > > >    - The affected port continues to receive packets without
> problems;
> > > > > only
> > > > > >    TX is affected
> > > > > >    - Resetting the port via rte_eth_dev_stop() and
> > > rte_eth_dev_start()
> > > > > >    restores things and packets can flow again
> > > > > >    - The problem is replicable on multiple devices, and doesn't
> > > follow
> > > > > one
> > > > > >    particular port
> > > > > >
> > > > > > I've tried calling rte_mbuf_sanity_check() on all packets before
> > > sending
> > > > > > them.  I've also instrumented my code to look for packets that
> have
> > > > > already
> > > > > > been sent or freed, as well as cycles in chained packets being
> > > sent.  I
> > > > > > also put a lock around all accesses to rte_eth* calls to
> synchronize
> > > > > access
> > > > > > to the NIC.  Given some recent discussion here, I also tried
> > > changing the
> > > > > > TX RS threshold from 0 to 32, 16, and 1.  None of these
> strategies
> > > proved
> > > > > > effective.
> > > > > >
> > > > > > Like I said at the top, I'm a little at a loss at this point.
> If you
> > > > > were
> > > > > > dealing with this set of symptoms, how would you proceed?
> > > > > >
> > > > >
> > > > > I remember some issues with old DPDK 1.6 with some of the prefetch
> > > > > thresholds on 82599. You would be better off going to a later DPDK
> > > > > version.
> > > > >
> > >
> > > I hope you are on 1.6.0r2 at least??
> > >
> > > With older DPDK there was no way to get driver to tell you what the
> > > preferred settings were for pthresh/hthresh/wthresh. And the values
> > > in Intel sample applications were broken on some hardware.
> > >
> > > I remember reverse engineering the safe values from reading the Linux
> > > driver.
> > >
> > > The Linux driver is much better tested than the DPDK one...
> > > In the Linux driver, the Transmit Descriptor Controller (txdctl)
> > > is fixed at (for transmit)
> > >    wthresh = 1
> > >    hthresh = 1
> > >    pthresh = 32
> > >
> > > The DPDK 2.2 driver uses:
> > >     wthresh = 0
> > >     hthresh = 0
> > >     pthresh = 32
> > >
> > >
> > >
> > >
> > >
> > >
> > >
>
>
>


More information about the dev mailing list