[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

Zhang, Helin helin.zhang at intel.com
Tue Aug 25 22:13:45 CEST 2015


Yes, I got the perfect answers. Thank you very much!
I just wanted to make sure the test case was OK with the limit of maximum number of descriptors, as I heard there is a hang issue on other NICs of using more descriptors than hardware allowed.
OK. I am still waiting for the answers/confirmation from x540 hardware designers. We need all agree on your patches to avoid risks.

Regards,
Helin

From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
Sent: Tuesday, August 25, 2015 12:30 PM
To: Zhang, Helin
Cc: Lu, Wenzhuo; dev at dpdk.org
Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598


On Aug 25, 2015 22:16, "Zhang, Helin" <helin.zhang at intel.com<mailto:helin.zhang at intel.com>> wrote:
>
>
>
> > -----Original Message-----
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>]
> > Sent: Tuesday, August 25, 2015 11:53 AM
> > To: Zhang, Helin
> > Cc: Lu, Wenzhuo; dev at dpdk.org<mailto:dev at dpdk.org>
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for
> > all NICs but 82598
> >
> >
> >
> > On 08/25/15 21:43, Zhang, Helin wrote:
> > >
> > > Hi Vlad
> > >
> > > I think this could possibly be the root cause of your TX hang issue.
> > > Please try to limit the number to 8 or less, and then see if the issue
> > > will still be there or not?
> > >
> >
> > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > 7.2.1.1 of x540 devices spec:
> >
> > A packet (or multiple packets in transmit segmentation) can span any number of
> > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see
> > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH
> > details). For best performance it is recommended to minimize the number of
> > buffers as possible.
> >
> > Could u, pls., clarify why do u think that the maximum number of data buffers is
> > limited by 8?
> OK, i40e hardware is 8

For i40 it's a bit more complicated than just "not more than 8" - it's not more than 8 for a non-TSO packet and not more than 8 for each MSS including headers buffers for TSO. But this thread is not about i40e so this doesn't seem to be relevant anyway.

, so I'd assume x540 could have a similar one.

x540 spec assumes otherwise... 😉

Yes, in your case,
> the limit could be around 38, right?

If by "around 38" u mean "exactly 38" then u are absolutely right... 😉

> Could you help to make sure there is no packet to be transmitted uses more than
> 38 descriptors?

Just like i've already mentioned, we limit the cluster by at most 33 data segments. Therefore we are good here...

> I heard that there is a similar hang issue on X710 if using more than 8 descriptors for
> a single packet. I am wondering if the issue is similar on x540.

What's x710? If that's xl710 40G nics (i40e driver), then it has its own specs with its own HW limitations i've mentioned above. It has nothing to do with this thread that is all about 10G nics managed by ixgbe driver.

There is a different thread, where i've raised the 40G NICs xmit issues. See "i40e xmit path HW limitation" thread.

>
> Regards,
> Helin
>
> >
> > thanks,
> > vlad
> >
> > > It does not have any check for the number of descriptors to be used
> > > for a single packet, and it relies on the users to give correct mbuf
> > > chains.
> > >
> > > We may need a check of this somewhere. Of cause the point you
> > > indicated we also need to carefully investigate or fix.
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>]
> > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > *To:* Zhang, Helin
> > > *Cc:* Lu, Wenzhuo; dev at dpdk.org<mailto:dev at dpdk.org>
> > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > >
> > >
> > > On Aug 25, 2015 21:14, "Zhang, Helin" <helin.zhang at intel.com<mailto:helin.zhang at intel.com>
> > > <mailto:helin.zhang at intel.com<mailto:helin.zhang at intel.com>>> wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > In addition, I’d double check with you what’s the maximum number of
> > > descriptors would be used for a single packet transmitting?
> > > >
> > > > Datasheet said that it supports up to 8. I am wondering if more than
> > > 8 were used in your case?
> > >
> > > If memory serves me well the maximum number of data descriptors per
> > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > > is always zero it gives us 38 segments. We limit them by 33.
> > >
> > > >
> > > > Thank you very much!
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Zhang, Helin
> > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > To: Vladislav Zolotarov
> > > > Cc: Lu, Wenzhuo; dev at dpdk.org<mailto:dev at dpdk.org> <mailto:dev at dpdk.org<mailto:dev at dpdk.org>>
> > > >
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > Thank you very much for the patches! Give me a few more time to
> > > double check with more guys, and possibly hardware experts.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>
> > > <mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>>]
> > > > Sent: Tuesday, August 18, 2015 9:56 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev at dpdk.org<mailto:dev at dpdk.org> <mailto:dev at dpdk.org<mailto:dev at dpdk.org>>; Zhang, Helin
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > >
> > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo" <wenzhuo.lu at intel.com<mailto:wenzhuo.lu at intel.com>
> > > <mailto:wenzhuo.lu at intel.com<mailto:wenzhuo.lu at intel.com>>> wrote:
> > > > >
> > > > > Hi Helin,
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: dev [mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>
> > > <mailto:dev-bounces at dpdk.org<mailto:dev-bounces at dpdk.org>>] On Behalf Of Vlad Zolotarov
> > > > > > Sent: Friday, August 14, 2015 1:38 PM
> > > > > > To: Zhang, Helin; dev at dpdk.org<mailto:dev at dpdk.org> <mailto:dev at dpdk.org<mailto:dev at dpdk.org>>
> > > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid
> > > tx_rs_thresh above 1 for
> > > > > > all NICs but 82598
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > > > > Hi Vlad
> > > > > > >
> > > > > > > I don't think the changes are needed. It says in datasheet
> > > that the RS
> > > > > > > bit should be set on the last descriptor of every packet, ONLY
> > > WHEN
> > > > > > TXDCTL.WTHRESH equals to ZERO.
> > > > > >
> > > > > > Of course it's needed! See below.
> > > > > > Exactly the same spec a few lines above the place u've just
> > > quoted states:
> > > > > >
> > > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is
> > > greater than
> > > > > > zero."
> > > > > >
> > > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS
> > > bit notation ixgbe PMD
> > > > > > is actually not supporting any value of WTHRESH different from zero.
> > > > > I think Vlad is right. We need to fix this issue. Any suggestion?
> > > If not, I'd like to ack this patch.
> > > >
> > > > Pls., note that there is a v2 of this patch on the list. I forgot to
> > > patch ixgbevf_dev_info_get() in v1.
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > > Helin
> > > > > > >
> > > > > > >> -----Original Message-----
> > > > > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>
> > > <mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>>]
> > > > > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > > > > >> To: dev at dpdk.org<mailto:dev at dpdk.org> <mailto:dev at dpdk.org<mailto:dev at dpdk.org>>
> > > > > > >> Cc: Zhang, Helin; Ananyev, Konstantin;
> > > avi at cloudius-systems.com<mailto:avi at cloudius-systems.com> <mailto:avi at cloudius-systems.com<mailto:avi at cloudius-systems.com>>; Vlad
> > > > > > >> Zolotarov
> > > > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for
> > > > > > all
> > > > > > >> NICs but 82598
> > > > > > >>
> > > > > > >> According to 82599 and x540 HW specifications RS bit *must*
> > > be set in the
> > > > > > last
> > > > > > >> descriptor of *every* packet.
> > > > > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > > > > >
> > > > > > Right and ixgbe PMD requires this condition to be fulfilled in
> > > order to
> > > > > > function. See above.
> > > > > >
> > > > > > >
> > > > > > >> This patch fixes the Tx hang we were constantly hitting with
> > > a seastar-based
> > > > > > >> application on x540 NIC.
> > > > > > > Could you help to share with us how to reproduce the tx hang
> > > issue, with using
> > > > > > > typical DPDK examples?
> > > > > >
> > > > > > Sorry. I'm not very familiar with the typical DPDK examples to
> > > help u
> > > > > > here. However this is quite irrelevant since without this this
> > > > > > patch ixgbe PMD obviously abuses the HW spec as has been explained
> > above.
> > > > > >
> > > > > > We saw the issue when u stressed the xmit path with a lot of
> > > > > > highly fragmented TCP frames (packets with up to 33 fragments
> > > > > > with
> > > non-headers
> > > > > > fragments as small as 4 bytes) with all offload features enabled.
> > > > > >
> > > > > > Thanks,
> > > > > > vlad
> > > > > > >
> > > > > > >> Signed-off-by: Vlad Zolotarov <vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>
> > > <mailto:vladz at cloudius-systems.com<mailto:vladz at cloudius-systems.com>>>
> > > > > > >> ---
> > > > > > >>   drivers/net/ixgbe/ixgbe_ethdev.c |  9 +++++++++
> > > > > > >>   drivers/net/ixgbe/ixgbe_rxtx.c  | 23 ++++++++++++++++++++++-
> > > > > > >>   2 files changed, 31 insertions(+), 1 deletion(-)
> > > > > > >>
> > > > > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> index b8ee1e9..6714fd9 100644
> > > > > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev
> > > *dev,
> > > > > > struct
> > > > > > >> rte_eth_dev_info *dev_info)
> > > > > > >>            .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > > > > > >> ETH_TXQ_FLAGS_NOOFFLOADS,
> > > > > > >>    };
> > > > > > >> +
> > > > > > >> +  /*
> > > > > > >> +   * According to 82599 and x540 specifications RS bit
> > > *must* be set on
> > > > > > the
> > > > > > >> +   * last descriptor of *every* packet. Therefore we will
> > > not allow the
> > > > > > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > >> +   */
> > > > > > >> +  if (hw->mac.type > ixgbe_mac_82598EB)
> > > > > > >> + dev_info->default_txconf.tx_rs_thresh = 1;
> > > > > > >> +
> > > > > > >>    dev_info->hash_key_size = IXGBE_HKEY_MAX_INDEX *
> > > sizeof(uint32_t);
> > > > > > >>    dev_info->reta_size = ETH_RSS_RETA_SIZE_128;
> > > > > > >> dev_info->flow_type_rss_offloads = IXGBE_RSS_OFFLOAD_ALL;
> > > > > > >> diff --
> > > > > > git
> > > > > > >> a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > b/drivers/net/ixgbe/ixgbe_rxtx.c index
> > > > > > >> 91023b9..8dbdffc 100644
> > > > > > >> --- a/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > >> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c
> > > > > > >> @@ -2085,11 +2085,19 @@ ixgbe_dev_tx_queue_setup(struct
> > > rte_eth_dev
> > > > > > >> *dev,
> > > > > > >>    struct ixgbe_tx_queue *txq;
> > > > > > >>    struct ixgbe_hw     *hw;
> > > > > > >>    uint16_t tx_rs_thresh, tx_free_thresh;
> > > > > > >> +  bool rs_deferring_allowed;
> > > > > > >>
> > > > > > >>    PMD_INIT_FUNC_TRACE();
> > > > > > >>    hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private);
> > > > > > >>
> > > > > > >>    /*
> > > > > > >> +   * According to 82599 and x540 specifications RS bit
> > > *must* be set on
> > > > > > the
> > > > > > >> +   * last descriptor of *every* packet. Therefore we will
> > > not allow the
> > > > > > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > > > > > >> +   */
> > > > > > >> +  rs_deferring_allowed = (hw->mac.type <=
> > > > > > >> + ixgbe_mac_82598EB);
> > > > > > >> +
> > > > > > >> +  /*
> > > > > > >>     * Validate number of transmit descriptors.
> > > > > > >>     * It must not exceed hardware maximum, and must be multiple
> > > > > > >>     * of IXGBE_ALIGN.
> > > > > > >> @@ -2110,6 +2118,8 @@ ixgbe_dev_tx_queue_setup(struct
> > > > > > >> rte_eth_dev
> > > > > > *dev,
> > > > > > >>     * to transmit a packet is greater than the number of free TX
> > > > > > >>     * descriptors.
> > > > > > >>     * The following constraints must be satisfied:
> > > > > > >> +   *  tx_rs_thresh must be less than 2 for NICs for which RS
> > > deferring is
> > > > > > >> +   *  forbidden (all but 82598).
> > > > > > >>     *  tx_rs_thresh must be greater than 0.
> > > > > > >>     *  tx_rs_thresh must be less than the size of the ring
> > > minus 2.
> > > > > > >>     *  tx_rs_thresh must be less than or equal to tx_free_thresh.
> > > > > > >> @@ -2121,9 +2131,20 @@ ixgbe_dev_tx_queue_setup(struct
> > > rte_eth_dev
> > > > > > *dev,
> > > > > > >>     * When set to zero use default values.
> > > > > > >>     */
> > > > > > >>    tx_rs_thresh = (uint16_t)((tx_conf->tx_rs_thresh) ?
> > > > > > >> - tx_conf->tx_rs_thresh : DEFAULT_TX_RS_THRESH);
> > > > > > >> + tx_conf->tx_rs_thresh :
> > > > > > >> + (rs_deferring_allowed ? DEFAULT_TX_RS_THRESH : 1));
> > > > > > >>    tx_free_thresh = (uint16_t)((tx_conf->tx_free_thresh) ?
> > > > > > >> tx_conf->tx_free_thresh : DEFAULT_TX_FREE_THRESH);
> > > > > > >> +
> > > > > > >> +  if (!rs_deferring_allowed && tx_rs_thresh > 1) {
> > > > > > >> +          PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > 2 since RS
> > > > > > "
> > > > > > >> +                            "must be set for every packet
> > > for this HW. "
> > > > > > >> + "(tx_rs_thresh=%u port=%d queue=%d)",
> > > > > > >> +                       (unsigned int)tx_rs_thresh,
> > > > > > >> + (int)dev->data->port_id, (int)queue_idx);
> > > > > > >> +          return -(EINVAL);
> > > > > > >> +  }
> > > > > > >> +
> > > > > > >>    if (tx_rs_thresh >= (nb_desc - 2)) {
> > > > > > >>            PMD_INIT_LOG(ERR, "tx_rs_thresh must be less than
> > > > > > >> the
> > > > > > number "
> > > > > > >>                         "of TX descriptors minus 2.
> > > (tx_rs_thresh=%u "
> > > > > > >> --
> > > > > > >> 2.1.0
> > > > >
> > >
>


More information about the dev mailing list