[dpdk-dev] [PATCH v3 2/2] mempool: pktmbuf pool default fallback for mempool ops error

Hemant Agrawal hemant.agrawal at nxp.com
Thu Dec 8 09:57:49 CET 2016


Hi Olivier,
	Apology for a delayed response.

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Tuesday, November 22, 2016 2:55 PM
> To: Hemant Agrawal <hemant.agrawal at nxp.com>
> Cc: dev at dpdk.org; jerin.jacob at caviumnetworks.com; david.hunt at intel.com
> Subject: Re: [PATCH v3 2/2] mempool: pktmbuf pool default fallback for
> mempool ops error
> 
> Hi Hemant,
> 
> Back on this topic, please see some comments below.
> 
> On 11/07/2016 01:30 PM, Hemant Agrawal wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> >> Sent: Friday, October 14, 2016 5:41 PM
> >>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
> >>>> Hi Olivier
> >>>>
> >>>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
> >>>>> Hi Hemant,
> >>>>>
> >>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
> >>>>>> In the rte_pktmbuf_pool_create, if the default external mempool
> >>>>>> is not available, the implementation can default to "ring_mp_mc",
> >>>>>> which is an software implementation.
> >>>>>>
> >>>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal at nxp.com>
> >>>>>> ---
> >>>>>> Changes in V3:
> >>>>>> * adding warning message to say that falling back to default sw
> >>>>>> pool
> >>>>>> ---
> >>>>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
> >>>>>>  1 file changed, 8 insertions(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> >>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644
> >>>>>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
> >>>>>> unsigned n,
> >>>>>>
> >>>>>>      rte_errno = rte_mempool_set_ops_byname(mp,
> >>>>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
> >>>>>> +
> >>>>>> +    /* on error, try falling back to the software based default
> >>>>>> pool */
> >>>>>> +    if (rte_errno == -EOPNOTSUPP) {
> >>>>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not
> supported. "
> >>>>>> +            "falling back to sw mempool \"ring_mp_mc\"");
> >>>>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
> >>>>>> NULL);
> >>>>>> +    }
> >>>>>> +
> >>>>>>      if (rte_errno != 0) {
> >>>>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
> >>>>>>          return NULL;
> >>>>>>
> >>>>>
> >>>>> Without adding a new method ".supported()", the first call to
> >>>>> rte_mempool_populate() could return the same error ENOTSUP. In
> >>>>> this case, it is still possible to fallback.
> >>>>>
> >>>> It will be bit late.
> >>>>
> >>>> On failure, than we have to set the default ops and do a goto
> >>>> before rte_pktmbuf_pool_init(mp, &mbp_priv);
> >>
> >> I still think we can do the job without adding the .supported() method.
> >> The following code is just an (untested) example:
> >>
> >> struct rte_mempool *
> >> rte_pktmbuf_pool_create(const char *name, unsigned n,
> >>     unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
> >>     int socket_id)
> >> {
> >>     struct rte_mempool *mp;
> >>     struct rte_pktmbuf_pool_private mbp_priv;
> >>     unsigned elt_size;
> >>     int ret;
> >>     const char *ops[] = {
> >>         RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
> >>     };
> >>     const char **op;
> >>
> >>     if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
> >>         RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
> >>             priv_size);
> >>         rte_errno = EINVAL;
> >>         return NULL;
> >>     }
> >>     elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
> >>         (unsigned)data_room_size;
> >>     mbp_priv.mbuf_data_room_size = data_room_size;
> >>     mbp_priv.mbuf_priv_size = priv_size;
> >>
> >>     for (op = &ops[0]; *op != NULL; op++) {
> >>         mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
> >>             sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
> >>         if (mp == NULL)
> >>             return NULL;
> >>
> >>         ret = rte_mempool_set_ops_byname(mp, *op, NULL);
> >>         if (ret != 0) {
> >>             RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
> >>             rte_mempool_free(mp);
> >>             if (ret == -ENOTSUP)
> >>                 continue;
> >>             rte_errno = -ret;
> >>             return NULL;
> >>         }
> >>         rte_pktmbuf_pool_init(mp, &mbp_priv);
> >>
> >>         ret = rte_mempool_populate_default(mp);
> >>         if (ret < 0) {
> >>             rte_mempool_free(mp);
> >>             if (ret == -ENOTSUP)
> >>                 continue;
> >>             rte_errno = -ret;
> >>             return NULL;
> >>         }
> >>     }
> >>
> >>     rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
> >>
> >>     return mp;
> >> }
> >>
> >>
> > [Hemant]  This look fine to me. Please submit a patch for the same.
> >
> >>>>> I've just submitted an RFC, which I think is quite linked:
> >>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
> >>>>> Assuming a new parameter "mempool_ops" is added to
> >>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
> >>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application
> >>>>> could do the job and decide which sw fallback to use.
> >>>>
> >>>> We ran into this issue when trying to run the standard DPDK
> >>>> examples
> >>>> (l3fwd) in VM. Do you think, is it practical to add fallback
> >>>> handling in each of the DPDK examples?
> >>
> >> OK. What is still unclear for me, is how the software is aware of the
> >> different hardware-assisted handlers. Moreover, we could imagine more
> >> software handlers, which could be used depending on the use case.
> >>
> >> I think this choice has to be made by the user or the application:
> >>
> >> - the application may want to use a specific (sw or hw) handler: in
> >>   this case, it want to be notified if it fails, instead of having
> >>   a quiet fallback to ring_mp_mc
> >> - if several handlers are available, the application may want to
> >>   try them in a specific order
> >> - maybe some handlers will have some limitations with some
> >>   configurations or driver? The application could decide to use
> >>   a different handler according the configuration.
> >>
> >> So that's why I think this is an application decision.
> >>
> > [Hemant]  We should simplify it:  if the application has supplied the handler, it
> is application's responsibility to take care of failure. Only if the application want
> to use the default handler, the implementation can fallback.  The fallback
> handler can again be configurable.
> 
> Honestly, I'm not very convinced that having a quiet fallback is the proper
> solution: the application won't be notified that the fallback occured.
> 
> I understand your patch solves your use-case, but my fear is that we just
> integrate this minimal patch and never do the harder work of solving all the use-
> cases. I think we need to give the tools to the applications to control what
> occurs because we also need to solve these issues:
> - you have a hardware handler, but you want to use the software
>   handler
> - you have several software handlers, and you (as an administrator) or
>   the application knows which one is the best to use in your case.
> 
> An alternative (which I don't like either) to solve your issue without modifying
> the mbuf code is to have your own handler fallbacking to ring_mp_mc.
> 
[Hemant] are you suggesting that we also configure a fallback handler in addition to default? 

I agree that we need to provide full flexibility to the applications, who are willing to implement the fallback mechanism. 

> 
> >>>> Typically when someone is writing a application on host, he need
> >>>> not worry non-availability of the hw offloaded mempool. He may also
> >>>> want to run the same binary in virtual machine. In VM, it is not
> >>>> guaranteed that hw offloaded mempools will be available.
> >>
> >> Running the same binary is of course a need. But if your VM does not
> >> provide the same virtualized hardware than the host, I think the
> >> command line option makes sense.
> >>
> >> AFAIU, on the host, you can use a hw mempool handler or a sw one, and
> >> on the guest, only a sw one. So you need an option to select the
> >> behavior you want on the host, without recompiling.
> >>
> >> I understand that modifying all the applications is not a good option
> >> either. I'm thinking a a EAL parameter that would allow to configure
> >> a library (mbuf in this case). Something like kernel boot options.
> >> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]
> >>
> >> I don't know if this is feasible, this has to be discussed first, but
> >> what do you think on the principle? The value could also be an
> >> ordered list if we want a fallback, and this option could be
> >> overriden by the application before creating the mbuf pool. There was
> >> some discussions about a kind of dpdk global configuration some time ago.
> >>
> > [Hemant] I agree that command line option provide a better control in this
> case.
> > On the flipside, We need to be careful that we do not end up having too many
> command line options.
> 
> Yes, we should avoid having too many command line arguments.
> I think we should go in the direction of having a sort of key/value database in
> EAL. It could be set by:
> - a generic command line argument
> - a config file (maybe later)
> - the application through an API
> 
> Then, it would be used by:
> - dpdk libraries
> - the application (maybe)
> 
> This has been discussed some times, the latest was probably:
> http://dpdk.org/ml/archives/dev/2016-June/040079.html
> 
> I think it would be a good tool for dpdk configurability, and it would help to
> remove some compile-time options and maybe some eal options.
> Unfortunately, I don't have the time to work on this at the moment.
> 
> Regards,
> Olivier



More information about the dev mailing list