[dpdk-stable] [PATCH v6 1/1] fbarray: fix duplicated fbarray file in secondary

David Marchand david.marchand at redhat.com
Tue Nov 5 11:13:57 CET 2019


Hello Anatoly, Yasufumi,

On Mon, Nov 4, 2019 at 11:20 AM Burakov, Anatoly
<anatoly.burakov at intel.com> wrote:
>
> On 01-Nov-19 9:04 AM, yasufum.o at gmail.com wrote:
> > From: Yasufumi Ogawa <ogawa.yasufumi at lab.ntt.co.jp>
> >
> > In secondary_msl_create_walk(), it creates a file for fbarrays with its
> > PID for reserving unique name among secondary processes. However, it
> > does not work if several secondaries run as app containers because each
> > of containerized secondary has PID 1, and failed to reserve unique name
> > other than first one. To reserve unique name in each of containers, use
> > hostname in addition to PID.
> >
> > Cc: stable at dpdk.org

We can't backport this as is, see below.


> >
> > Signed-off-by: Yasufumi Ogawa <yasufum.o at gmail.com>
> > ---
> >   lib/librte_eal/common/include/rte_fbarray.h |  2 +-
> >   lib/librte_eal/linux/eal/eal_memalloc.c     | 11 ++++++++---
> >   2 files changed, 9 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/include/rte_fbarray.h b/lib/librte_eal/common/include/rte_fbarray.h
> > index 6dccdbec9..5c2815093 100644
> > --- a/lib/librte_eal/common/include/rte_fbarray.h
> > +++ b/lib/librte_eal/common/include/rte_fbarray.h
> > @@ -39,7 +39,7 @@ extern "C" {
> >   #include <rte_compat.h>
> >   #include <rte_rwlock.h>
> >
> > -#define RTE_FBARRAY_NAME_LEN 64
> > +#define RTE_FBARRAY_NAME_LEN NAME_MAX

The change on RTE_FBARRAY_NAME_LEN breaks the ABI, so we cannot
backport this as is.
For 19.11, we can allow this breakage, but we need an update of the
release notes.

Besides, what is the impact in terms of memory consumption?


> >
> >   struct rte_fbarray {
> >       char name[RTE_FBARRAY_NAME_LEN]; /**< name associated with an array */
> > diff --git a/lib/librte_eal/linux/eal/eal_memalloc.c b/lib/librte_eal/linux/eal/eal_memalloc.c
> > index af6d0d023..24f0275c9 100644
> > --- a/lib/librte_eal/linux/eal/eal_memalloc.c
> > +++ b/lib/librte_eal/linux/eal/eal_memalloc.c
> > @@ -1365,6 +1365,7 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
> >       struct rte_memseg_list *primary_msl, *local_msl;
> >       char name[PATH_MAX];
> >       int msl_idx, ret;
> > +     char hostname[HOST_NAME_MAX] = { 0 };
> >
> >       if (msl->external)
> >               return 0;
> > @@ -1373,9 +1374,13 @@ secondary_msl_create_walk(const struct rte_memseg_list *msl,
> >       primary_msl = &mcfg->memsegs[msl_idx];
> >       local_msl = &local_memsegs[msl_idx];
> >
> > -     /* create distinct fbarrays for each secondary */
> > -     snprintf(name, RTE_FBARRAY_NAME_LEN, "%s_%i",
> > -             primary_msl->memseg_arr.name, getpid());
> > +     /* Create distinct fbarrays for each secondary by using PID and
> > +      * hostname. The reason why using hostname is because PID could be
> > +      * duplicated among secondaries if it is launched in a container.
> > +      */
> > +     gethostname(hostname, HOST_NAME_MAX);

Personal preference, s/HOST_NAME_MAX/sizeof(hostname)/.


hostname[] is HOST_NAME_MAX bytes long.
In the worst case, we can get a non NULL terminated hostname string.
"
       gethostname() returns the null-terminated hostname in the
character array name, which has a length of len bytes.  If the
null-terminated hostname is too large to fit, then the name is
truncated, and
       no error is returned (but see NOTES below).  POSIX.1-2001 says
that if such truncation occurs, then it is unspecified whether the
returned buffer includes a terminating null byte.
...
NOTES
       SUSv2 guarantees that "Host names are limited to 255 bytes".
POSIX.1-2001 guarantees that "Host names (not including the
terminating null byte) are  limited  to  HOST_NAME_MAX  bytes".   On
Linux,
       HOST_NAME_MAX is defined with the value 64, which has been the
limit since Linux 1.0 (earlier kernels imposed a limit of 8 bytes).
"

How about making hostname[] HOST_NAME_MAX+1 bytes long?

> > +     snprintf(name, RTE_FBARRAY_NAME_LEN, "%s_%s_%d",
> > +                     primary_msl->memseg_arr.name, hostname, (int)getpid());
> >
> >       ret = rte_fbarray_init(&local_msl->memseg_arr, name,
> >               primary_msl->memseg_arr.len,
> >
>
> I think the order should be reversed. Both containers and non-containers
> can have their hostname set, and RTE_FBARRAY_NAME_LEN is of fairly
> limited length, so if the hostname is long enough, the PID never gets
> into the name string, resulting in duplicates. It is better have pid first.

Anatoly,

On the principle, it seems better, yes.
Just the comment on RTE_FBARRAY_NAME_LEN indicates that you missed the
change at the top of the patch.
What do you think of this change?


-- 
David Marchand



More information about the stable mailing list