[dpdk-dev] [PATCH] net/ixgbe: fix reconfiguration of rx queues

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Jan 31 14:38:33 CET 2018


> 
> 
> > > >
> > > > From: "Charles (Chas) Williams" <chas3 at att.com>
> > > >
> > > > .dev_configure() may be called again after RX queues have been setup.
> > > > This has the effect of clearing whatever setting the RX queues made for
> > > > rx_bulk_alloc_allowed or rx_vec_allowed.  Only reset this configuration
> > > > is there aren't any currently allocated queues.
> > > >
> > > > Fixes: 01fa1d6215fa ("ixgbe: unify Rx setup")
> > > >
> > > > Signed-off-by: Chas Williams <chas3 at att.com>
> > > > ---
> > > >  drivers/net/ixgbe/ixgbe_ethdev.c | 18 ++++++++++++++++--
> > > >  1 file changed, 16 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > index 37eb668..b39249a 100644
> > > > --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > > @@ -2348,6 +2348,7 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
> > > >       struct ixgbe_adapter *adapter =
> > > >               (struct ixgbe_adapter *)dev->data->dev_private;
> > > >       int ret;
> > > > +     uint16_t i;
> > > >
> > > >       PMD_INIT_FUNC_TRACE();
> > > >       /* multipe queue mode checking */
> > > > @@ -2363,11 +2364,17 @@ ixgbe_dev_configure(struct rte_eth_dev *dev)
> > > >
> > > >       /*
> > > >        * Initialize to TRUE. If any of Rx queues doesn't meet the bulk
> > > > -      * allocation or vector Rx preconditions we will reset it.
> > > > +      * allocation or vector Rx preconditions we will reset it. We
> > > > +      * can only do this is there aren't any existing RX queues.
> > > >        */
> > > > +     for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > > +             if (dev->data->rx_queues[i])
> > > > +                     goto out;
> > > > +     }
> > > I don't see why this is needed.
> > > It surely should be possible to reconfigure device with different
> > > number of queues.
> > > Konstantin
> > >
> > > Yes, you can add new queues but you shouldn't reset the bulk and vec settings
> > > that have already been chosen by the previously allocated queues.
> > Why is that? Might be in new queue setup user will change settings?
> >
> > There is no requirement that the user allocates all the RX queues in the same way.
> > Some could have a different numbers of descs which is one of the checks in
> > check_rx_burst_bulk_alloc_preconditions()
> Exactly. That's  why after dev_configure() user has to call queue_setup() for *all*
> queues he plans to use.
> 
> Where the API or documentation does it say that this is necessary?  If this was a requirement
> then rte_eth_dev_configure() should drop every allocated queue.  Since it doesn't do this
> I can only assume that you are allowed to keep using queues after calling rte_eth_dev_configure()
> without having to set them up again.
> 
> 
> >
> >
> >
> > > If those queues
> > > set rx_bulk_alloc_allowed to be false, then this is going to cause an issue with queue
> > > release later on.
> >
> > Could you be a bit more specific here:
> > What you think will be broken in ixgbe_rx_queue_release() in that case?
> >
> > Sorry, I mispoke.  It's this function, ixgbe_reset_rx_queue(),
> >
> >         /*
> >          * By default, the Rx queue setup function allocates enough memory for
> >          * IXGBE_MAX_RING_DESC.  The Rx Burst bulk allocation function requires
> >          * extra memory at the end of the descriptor ring to be zero'd out.
> >          */
> >         if (adapter->rx_bulk_alloc_allowed)
> >                 /* zero out extra memory */
> >                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> >
> >         /*
> >          * Zero out HW ring memory. Zero out extra memory at the end of
> >          * the H/W ring so look-ahead logic in Rx Burst bulk alloc function
> >          * reads extra memory as zeros.
> >          */
> >         for (i = 0; i < len; i++) {
> >                 rxq->rx_ring[i] = zeroed_desc;
> >         }
> >
> > So you potentially write past the rx_ring[] you allocated.
> 
> We always allocate rx_ring[] to maximum possible size plus space for fake descriptors:
> 
> drivers/net/ixgbe/ixgbe_rxtx.h:
> #define RX_RING_SZ ((IXGBE_MAX_RING_DESC + RTE_PMD_IXGBE_RX_MAX_BURST) * \
>                     sizeof(union ixgbe_adv_rx_desc))
> 
> then at ixgbe_dev_rx_queue_setup(...):
> 
>   /*
>     * Allocate RX ring hardware descriptors. A memzone large enough to
>     * handle the maximum ring size is allocated in order to allow for
>     * resizing in later calls to the queue setup function.
>     */
>     rz = rte_eth_dma_zone_reserve(dev, "rx_ring", queue_idx,
>                                       RX_RING_SZ, IXGBE_ALIGN, socket_id);
>     ...
>     rxq->rx_ring = (union ixgbe_adv_rx_desc *) rz->addr;
> 
> What about here?
> 
>         len = nb_desc;
>         if (adapter->rx_bulk_alloc_allowed)
>                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> 
>         rxq->sw_ring = rte_zmalloc_socket("rxq->sw_ring",
>                                           sizeof(struct ixgbe_rx_entry) * len,
>                                           RTE_CACHE_LINE_SIZE, socket_id);
> 
> This is later walked and reset in ixgbe_reset_rx_queue()
> 
>         if (adapter->rx_bulk_alloc_allowed)
>                 /* zero out extra memory */
>                 len += RTE_PMD_IXGBE_RX_MAX_BURST;
> 
>         /*
>          * Zero out HW ring memory. Zero out extra memory at the end of
>          * the H/W ring so look-ahead logic in Rx Burst bulk alloc function
>          * reads extra memory as zeros.
>          */
>         for (i = 0; i < len; i++) {
>                 rxq->rx_ring[i] = zeroed_desc;
>         }
> 
>         /*
>          * initialize extra software ring entries. Space for these extra
>          * entries is always allocated
>          */
>         memset(&rxq->fake_mbuf, 0x0, sizeof(rxq->fake_mbuf));
>         for (i = rxq->nb_rx_desc; i < len; ++i) {
>                 rxq->sw_ring[i].mbuf = &rxq->fake_mbuf;
>         }
> 

As I can see, right now well behaving applications
(one that always call queue_setup() after dev_configure)
wouldn't be affected.
But you right - it is a potential issue and better be addressed.
Though to fix it all we need is either always allocate space for fake_mbufs inside sw_ring,
or just store number of fake mbufs inside rxq.

> Clearly, rx_bulk_alloc_allowed must remain consistent here.

I don't think so, see above.

> 
> 
> 
> 
> >
> > You can't change rx_bulk_alloc_allowed once it has been set and you have allocated queues.
> > In fact, you can't really let some later queue change this setting after the first queue decides
> > what It should be.
> 
> See above.
> 
> Again, I have pointed out where this is a problem.
> 

Again, this is based on your assumption (wrong one) that it is ok not to call queue_setup()
after dev_configure(). 

> 
> >
> >
> >
> > > This breaks:
> > >
> > > rte_eth_dev_configure(..., 1, 1, ...);
> > > rx_queue_setup(1)
> > >    [rx_queue_setup decides that it can't support rx_bulk_alloc_allowed]
> > > ..
> > >
> > > Later, you want to add some more queues,  you call
> > >
> > > eth_eth_dev_configure(..., 2, 2, ...);
> >
> > After you call dev_configure, you'll have to do queue_setup() for all your queues.
> > dev_configure() can changes some global device settings, so each queue has to be
> > reconfigured.
> > In your example It should be:
> > eth_eth_dev_configure(..., 2, 2, ...);
> > rx_queue_setup(...,0, ...);
> > rx_queue_setup(...,1, ...);
> >
> > Konstantin
> >
> >
> > Nothing in the API says this must happen.  If there where true, shouldn't rte_eth_dev_configure()
> > automatically drop any existing queues?
> 
> One of the fists things that ixgbe_dev_rx_queue_setup() - calls ixgbe_rx_queue_release().
> In theory that probably could be moved to rte_eth_dev_configure(), but I think current model is ok too.
> 
> >There is existing code that doesn't do this.
> 
> Then I think it is a bug in that code.
> 
> > Show me
> > in the API where I am required to setup all my queues _again_ after calling rte_eth_dev_configure()
> 
> If you think the documentation doesn't explain that properly  -  feel free to raise a bug agains doc
> and/or provide a patch for it.
> As I  explained above - dev_configure() might change some global settings
> (max_rx_pkt, flow-director settings, etc.)  that would affect each queue in that device.
> So queue_setup() is necessary after dev_configure() to make sure it will operate properly.
> 
> If you think the documentation is unclear, then *you* feel free to open a bug to clarify what
> you believe is the correct behavior.  Yes, rte_eth_dev_configure() is free to change globals.
> Any of the API routines are free to change globals.  It is up to your PMD to deal with the
> fallout from changing such globals.

It is, but for that application have to follow the defined procedure.
Current procedure is: dev_configure(); queue_setup(); dev_start().
You can't skip second step.
Most drivers allocate and fill their private queue structures here.
Some of the values inside these structures are calculated based on dev_conf parameters.
Obviously if device configuration might been changed - you need to reconfigure your queues too.

>  Bonding and vmxnet3 are just two examples of this.

If some PMDs for some cases can work without queue reconfiguration - great.
But it doesn't mean that all  PMDs would.
Most Intel PMDs (ixgbe, i40e, fm10k) do expect that behavior.
From my code readings - mlx and qede too.
Probably some other PMDs - I didn't check all of them. 

> The "final" queue "setup" isn't done until .dev_start().  At that point, the queues are updated.

In theory,  it is probably possible to do things in a way you suggest:
make queue_setup() just to store input queue parameters and
move all the allocation/configuration job into dev_start.
Though it would require quite a lot of changes in many PMDs and might complicate things. 
Personally I don't see big advantage of such change, while the required effort would be substantial.  
Especially if we going to move to the model with a separate rx/tx function per queue anyway.

> Again, there are existing examples in the existing code base that do not follow your idea of
> what the API should do.  Are they wrong as well?

I think so.
As I said it might work for some particular PMD, but if it is a generic app, then it sounds like a bug to me.
What particular app we are talking about?

>  The preponderance of the evidence seems
> to be against your idea of what the API should be doing.
> 
> Queue setup is an expensive operation.  If some PMD doesn't need to drop and setup queues
> again after an rte_eth_dev_configure() why are you forcing it to do so?

I don't think queue_setup() is that expensive to worry about the price.
We are in control path here.
But It is up to PMD what to put inside that routine.
If PMD is smart enough to detect that nothing really changed from last invocation
of queue_setup() and no extra work required - then it can just return success straightway.
Again, nothing prevents user to make some stub layer on top of current rte_ethdev API to
minimize number of eth_queue_setup() invocations.

>  It was the choice
> of the author in the ixgbe driver to use globals the way they did.
>The current code is still
> somewhat wrong even after the patch I propose.  The configuration of each queue depends
> on what the previous queues did.  Since there isn't a burst per queue routine, there needs
> to be a single decision made after _all_ of the queues are configured.  That is the only
> time when you have complete knowledge to make your decisions about which RX mode to use.
> 
> 
> Konstantin
> 
> >
> >
> > > rx_queue_setup(2)
> > >   [rx_queue_setup hopefully makes the same choice as rxqid = 1?]
> > > ...
> > >
> > > Is one supposed to release all queues before calling rte_eth_dev_configure()? If
> > > that is true, it seems like the change_mtu examples I see are possibly wrong.  As
> > > suggested in kenrel_nic_interface.rst:
> > >
> > >
> > >         ret = rte_eth_dev_configure(port_id, 1, 1, &conf);
> > >         if (ret < 0) {
> > >             RTE_LOG(ERR, APP, "Fail to reconfigure port %d\n", port_id);
> > >             return ret;
> > >         }
> > >
> > >         /* Restart specific port */
> > >
> > >         ret = rte_eth_dev_start(port_id);
> > >         if (ret < 0) {
> > >              RTE_LOG(ERR, APP, "Fail to restart port %d\n", port_id);
> > >             return ret;
> > >         }
> > >
> > > This is will obviously reset the rx_bulk_alloc_allowed and not reallocated the RX queues.
> > >
> > >
> > > >       adapter->rx_bulk_alloc_allowed = true;
> > > >       adapter->rx_vec_allowed = true;
> > > >
> > > > +out:
> > > >       return 0;
> > > >  }
> > > >
> > > > @@ -4959,6 +4966,7 @@ ixgbevf_dev_configure(struct rte_eth_dev *dev)
> > > >       struct rte_eth_conf *conf = &dev->data->dev_conf;
> > > >       struct ixgbe_adapter *adapter =
> > > >                       (struct ixgbe_adapter *)dev->data->dev_private;
> > > > +     uint16_t i;
> > > >
> > > >       PMD_INIT_LOG(DEBUG, "Configured Virtual Function port id: %d",
> > > >                    dev->data->port_id);
> > > > @@ -4981,11 +4989,17 @@ ixgbevf_dev_configure(struct rte_eth_dev *dev)
> > > >
> > > >       /*
> > > >        * Initialize to TRUE. If any of Rx queues doesn't meet the bulk
> > > > -      * allocation or vector Rx preconditions we will reset it.
> > > > +      * allocation or vector Rx preconditions we will reset it. We
> > > > +      * can only do this is there aren't any existing RX queues.
> > > >        */
> > > > +     for (i = 0; i < dev->data->nb_rx_queues; i++) {
> > > > +             if (dev->data->rx_queues[i])
> > > > +                     goto out;
> > > > +     }
> > > >       adapter->rx_bulk_alloc_allowed = true;
> > > >       adapter->rx_vec_allowed = true;
> > > >
> > > > +out:
> > > >       return 0;
> > > >  }
> > > >
> > > > --
> > > > 2.9.5



More information about the dev mailing list