[dpdk-dev] [PATCH] net/ixgbe: fix reconfiguration of rx queues

Zhang, Helin helin.zhang at intel.com
Thu Mar 29 07:14:26 CEST 2018



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, January 31, 2018 9:39 PM
> To: Chas Williams; dev at dpdk.org
> Cc: Lu, Wenzhuo
> Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: fix reconfiguration of rx queues
> 
> 
> >
> >
> > > > >
> > > > > From: "Charles (Chas) Williams" <chas3 at att.com>
> > > > >
> > > > > .dev_configure() may be called again after RX queues have been setup.
> > > > > This has the effect of clearing whatever setting the RX queues
> > > > > made for rx_bulk_alloc_allowed or rx_vec_allowed.  Only reset
> > > > > this configuration is there aren't any currently allocated queues.
> > > > >
> > > > > Fixes: 01fa1d6215fa ("ixgbe: unify Rx setup")
> > > > >
> > > > > Signed-off-by: Chas Williams <chas3 at att.com>
> > > > > ---
> > > > >  drivers/net/ixgbe/ixgbe_ethdev.c | 18 ++++++++++++++++--
> > > > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > index 37eb668..b39249a 100644
> > > > > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > > @@ -2348,6 +2348,7 @@ ixgbe_dev_configure(struct rte_eth_dev
> > > > >*dev)
> > > > >       struct ixgbe_adapter *adapter =
> > > > >               (struct ixgbe_adapter *)dev->data->dev_private;
> > > > >       int ret;
> > > > > +     uint16_t i;
> > > > >
> > > > >       PMD_INIT_FUNC_TRACE();
> > > > >       /* multipe queue mode checking */  @@ -2363,11 +2364,17 @@
> > > > >ixgbe_dev_configure(struct rte_eth_dev *dev)
> > > > >
> > > > >       /*
> > > > >        * Initialize to TRUE. If any of Rx queues doesn't meet
> > > > >the bulk
> > > > > -      * allocation or vector Rx preconditions we will reset it.
> > > > > +      * allocation or vector Rx preconditions we will reset it.
> > > > > +We
> > > > > +      * can only do this is there aren't any existing RX queues.
> > > > >        */
> > > > > +     for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > > > +             if (dev->data->rx_queues[i])
> > > > > +                     goto out;
> > > > > +     }
> > > > I don't see why this is needed.
> > > > It surely should be possible to reconfigure device with different
> > > > number of queues.
> > > > Konstantin
> > > >
> > > > Yes, you can add new queues but you shouldn't reset the bulk and
> > > > vec settings that have already been chosen by the previously allocated
> queues.
> > > Why is that? Might be in new queue setup user will change settings?
> > >
> > > There is no requirement that the user allocates all the RX queues in the
> same way.
> > > Some could have a different numbers of descs which is one of the
> > > checks in
> > > check_rx_burst_bulk_alloc_preconditions()
> > Exactly. That's  why after dev_configure() user has to call
> > queue_setup() for *all* queues he plans to use.
> >
> > Where the API or documentation does it say that this is necessary?  If
> > this was a requirement then rte_eth_dev_configure() should drop every
> > allocated queue.  Since it doesn't do this I can only assume that you
> > are allowed to keep using queues after calling rte_eth_dev_configure()
> without having to set them up again.
> >
> >
> > >
> > >
> > >
> > > > If those queues
> > > > set rx_bulk_alloc_allowed to be false, then this is going to cause
> > > > an issue with queue release later on.
> > >
> > > Could you be a bit more specific here:
> > > What you think will be broken in ixgbe_rx_queue_release() in that case?
> > >
> > > Sorry, I mispoke.  It's this function, ixgbe_reset_rx_queue(),
> > >
> > >         /*
> > >          * By default, the Rx queue setup function allocates enough
> > > memory for
> > >          * IXGBE_MAX_RING_DESC.  The Rx Burst bulk allocation
> > > function requires
> > >          * extra memory at the end of the descriptor ring to be zero'd out.
> > >          */
> > >         if (adapter->rx_bulk_alloc_allowed)
> > >                 /* zero out extra memory */
> > >                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> > >
> > >         /*
> > >          * Zero out HW ring memory. Zero out extra memory at the end
> > > of
> > >          * the H/W ring so look-ahead logic in Rx Burst bulk alloc
> > > function
> > >          * reads extra memory as zeros.
> > >          */
> > >         for (i = 0; i < len; i++) {
> > >                 rxq->rx_ring[i] = zeroed_desc;
> > >         }
> > >
> > > So you potentially write past the rx_ring[] you allocated.
> >
> > We always allocate rx_ring[] to maximum possible size plus space for fake
> descriptors:
> >
> > drivers/net/ixgbe/ixgbe_rxtx.h:
> > #define RX_RING_SZ ((IXGBE_MAX_RING_DESC +
> RTE_PMD_IXGBE_RX_MAX_BURST)
> > * \
> >                     sizeof(union ixgbe_adv_rx_desc))
> >
> > then at ixgbe_dev_rx_queue_setup(...):
> >
> >   /*
> >     * Allocate RX ring hardware descriptors. A memzone large enough to
> >     * handle the maximum ring size is allocated in order to allow for
> >     * resizing in later calls to the queue setup function.
> >     */
> >     rz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_idx,
> >                                       RX_RING_SZ, IXGBE_ALIGN,
> > socket_id);
> >     ...
> >     rxq->rx_ring = (union ixgbe_adv_rx_desc *) rz->addr;
> >
> > What about here?
> >
> >         len = nb_desc;
> >         if (adapter->rx_bulk_alloc_allowed)
> >                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> >
> >         rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
> >                                           sizeof(struct
> > ixgbe_rx_entry) * len,
> >                                           RTE_CACHE_LINE_SIZE,
> > socket_id);
> >
> > This is later walked and reset in ixgbe_reset_rx_queue()
> >
> >         if (adapter->rx_bulk_alloc_allowed)
> >                 /* zero out extra memory */
> >                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> >
> >         /*
> >          * Zero out HW ring memory. Zero out extra memory at the end
> > of
> >          * the H/W ring so look-ahead logic in Rx Burst bulk alloc
> > function
> >          * reads extra memory as zeros.
> >          */
> >         for (i = 0; i < len; i++) {
> >                 rxq->rx_ring[i] = zeroed_desc;
> >         }
> >
> >         /*
> >          * initialize extra software ring entries. Space for these
> > extra
> >          * entries is always allocated
> >          */
> >         memset(&rxq->fake_mbuf, 0x0, sizeof(rxq->fake_mbuf));
> >         for (i = rxq->nb_rx_desc; i < len; ++i) {
> >                 rxq->sw_ring[i].mbuf = &rxq->fake_mbuf;
> >         }
> >
> 
> As I can see, right now well behaving applications (one that always call
> queue_setup() after dev_configure) wouldn't be affected.
> But you right - it is a potential issue and better be addressed.
> Though to fix it all we need is either always allocate space for fake_mbufs
> inside sw_ring, or just store number of fake mbufs inside rxq.
> 
> > Clearly, rx_bulk_alloc_allowed must remain consistent here.
> 
> I don't think so, see above.
> 
> >
> >
> >
> >
> > >
> > > You can't change rx_bulk_alloc_allowed once it has been set and you have
> allocated queues.
> > > In fact, you can't really let some later queue change this setting
> > > after the first queue decides what It should be.
> >
> > See above.
> >
> > Again, I have pointed out where this is a problem.
> >
> 
> Again, this is based on your assumption (wrong one) that it is ok not to call
> queue_setup() after dev_configure().
> 
> >
> > >
> > >
> > >
> > > > This breaks:
> > > >
> > > > rte_eth_dev_configure(..., 1, 1, ...);
> > > > rx_queue_setup(1)
> > > >    [rx_queue_setup decides that it can't support
> > > > rx_bulk_alloc_allowed] ..
> > > >
> > > > Later, you want to add some more queues,  you call
> > > >
> > > > eth_eth_dev_configure(..., 2, 2, ...);
> > >
> > > After you call dev_configure, you'll have to do queue_setup() for all your
> queues.
> > > dev_configure() can changes some global device settings, so each
> > > queue has to be reconfigured.
> > > In your example It should be:
> > > eth_eth_dev_configure(..., 2, 2, ...); rx_queue_setup(...,0, ...);
> > > rx_queue_setup(...,1, ...);
> > >
> > > Konstantin
> > >
> > >
> > > Nothing in the API says this must happen.  If there where true,
> > > shouldn't rte_eth_dev_configure() automatically drop any existing queues?
> >
> > One of the fists things that ixgbe_dev_rx_queue_setup() - calls
> ixgbe_rx_queue_release().
> > In theory that probably could be moved to rte_eth_dev_configure(), but I
> think current model is ok too.
> >
> > >There is existing code that doesn't do this.
> >
> > Then I think it is a bug in that code.
> >
> > > Show me
> > > in the API where I am required to setup all my queues _again_ after
> > > calling rte_eth_dev_configure()
> >
> > If you think the documentation doesn't explain that properly  -  feel
> > free to raise a bug agains doc and/or provide a patch for it.
> > As I  explained above - dev_configure() might change some global
> > settings (max_rx_pkt, flow-director settings, etc.)  that would affect each
> queue in that device.
> > So queue_setup() is necessary after dev_configure() to make sure it will
> operate properly.
> >
> > If you think the documentation is unclear, then *you* feel free to
> > open a bug to clarify what you believe is the correct behavior.  Yes,
> rte_eth_dev_configure() is free to change globals.
> > Any of the API routines are free to change globals.  It is up to your
> > PMD to deal with the fallout from changing such globals.
> 
> It is, but for that application have to follow the defined procedure.
> Current procedure is: dev_configure(); queue_setup(); dev_start().
> You can't skip second step.
> Most drivers allocate and fill their private queue structures here.
> Some of the values inside these structures are calculated based on dev_conf
> parameters.
> Obviously if device configuration might been changed - you need to
> reconfigure your queues too.
> 
> >  Bonding and vmxnet3 are just two examples of this.
> 
> If some PMDs for some cases can work without queue reconfiguration - great.
> But it doesn't mean that all  PMDs would.
> Most Intel PMDs (ixgbe, i40e, fm10k) do expect that behavior.
> From my code readings - mlx and qede too.
> Probably some other PMDs - I didn't check all of them.
> 
> > The "final" queue "setup" isn't done until .dev_start().  At that point, the
> queues are updated.
> 
> In theory,  it is probably possible to do things in a way you suggest:
> make queue_setup() just to store input queue parameters and move all the
> allocation/configuration job into dev_start.
> Though it would require quite a lot of changes in many PMDs and might
> complicate things.
> Personally I don't see big advantage of such change, while the required effort
> would be substantial.
> Especially if we going to move to the model with a separate rx/tx function per
> queue anyway.
> 
> > Again, there are existing examples in the existing code base that do
> > not follow your idea of what the API should do.  Are they wrong as well?
> 
> I think so.
> As I said it might work for some particular PMD, but if it is a generic app, then it
> sounds like a bug to me.
> What particular app we are talking about?
> 
> >  The preponderance of the evidence seems  to be against your idea of
> >what the API should be doing.
> >
> > Queue setup is an expensive operation.  If some PMD doesn't need to
> > drop and setup queues again after an rte_eth_dev_configure() why are you
> forcing it to do so?
> 
> I don't think queue_setup() is that expensive to worry about the price.
> We are in control path here.
> But It is up to PMD what to put inside that routine.
> If PMD is smart enough to detect that nothing really changed from last
> invocation of queue_setup() and no extra work required - then it can just
> return success straightway.
> Again, nothing prevents user to make some stub layer on top of current
> rte_ethdev API to minimize number of eth_queue_setup() invocations.
> 
> >  It was the choice
> > of the author in the ixgbe driver to use globals the way they did.
> >The current code is still
> > somewhat wrong even after the patch I propose.  The configuration of
> >each queue depends  on what the previous queues did.  Since there isn't
> >a burst per queue routine, there needs  to be a single decision made
> >after _all_ of the queues are configured.  That is the only  time when you
> have complete knowledge to make your decisions about which RX mode to use.
> >
> >
> > Konstantin
> >
> > >
> > >
> > > > rx_queue_setup(2)
> > > >   [rx_queue_setup hopefully makes the same choice as rxqid = 1?]
> > > > ...
> > > >
> > > > Is one supposed to release all queues before calling
> > > > rte_eth_dev_configure()? If that is true, it seems like the
> > > > change_mtu examples I see are possibly wrong.  As suggested in
> kenrel_nic_interface.rst:
> > > >
> > > >
> > > >         ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
> > > >         if (ret < 0) {
> > > >             RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n",
> > > > port_id);
> > > >             return ret;
> > > >         }
> > > >
> > > >         /* Restart specific port */
> > > >
> > > >         ret = rte_eth_dev_start(port_id);
> > > >         if (ret < 0) {
> > > >              RTE_LOG(ERR, APP, "Fail to restart port %d\n",
> > > > port_id);
> > > >             return ret;
> > > >         }
> > > >
> > > > This is will obviously reset the rx_bulk_alloc_allowed and not reallocated
> the RX queues.
> > > >
> > > >
> > > > >       adapter->rx_bulk_alloc_allowed = true;
> > > > >       adapter->rx_vec_allowed = true;
> > > > >
> > > > > +out:
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > @@ -4959,6 +4966,7 @@ ixgbevf_dev_configure(struct rte_eth_dev
> > > > >*dev)
> > > > >       struct rte_eth_conf *conf = &dev->data->dev_conf;
> > > > >       struct ixgbe_adapter *adapter =
> > > > >                       (struct ixgbe_adapter
> > > > >*)dev->data->dev_private;
> > > > > +     uint16_t i;
> > > > >
> > > > >       PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id:
> > > > >%d",
> > > > >                    dev->data->port_id);  @@ -4981,11 +4989,17 @@
> > > > >ixgbevf_dev_configure(struct rte_eth_dev *dev)
> > > > >
> > > > >       /*
> > > > >        * Initialize to TRUE. If any of Rx queues doesn't meet
> > > > >the bulk
> > > > > -      * allocation or vector Rx preconditions we will reset it.
> > > > > +      * allocation or vector Rx preconditions we will reset it.
> > > > > +We
> > > > > +      * can only do this is there aren't any existing RX queues.
> > > > >        */
> > > > > +     for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > > > +             if (dev->data->rx_queues[i])
> > > > > +                     goto out;
> > > > > +     }
> > > > >       adapter->rx_bulk_alloc_allowed = true;
> > > > >       adapter->rx_vec_allowed = true;
> > > > >
> > > > > +out:
> > > > >       return 0;
> > > > >  }
> > > > >
> > > > > --
> > > > > 2.9.5

I'd reject this patch for now, according to the discussion above.
Feel free to work our new versions with some agreement. Sorry and thanks!

/Helin


More information about the dev mailing list