[dpdk-stable] [dpdk-dev] [PATCH 0/4] Extend --lcores to run on cores > RTE_MAX_LCORE

Stephen Hemminger stephen at networkplumber.org
Fri Feb 21 17:38:28 CET 2020


On Fri, 21 Feb 2020 09:48:58 -0500
Aaron Conole <aconole at redhat.com> wrote:

> David Marchand <david.marchand at redhat.com> writes:
> 
> > On Fri, Feb 21, 2020 at 9:19 AM Song, Keesang <Keesang.Song at amd.com> wrote:  
> >>
> >> [AMD Official Use Only - Internal Distribution Only]  
> >
> > Please, get this header removed.
> > This is a public mailing list.
> >
> >  
> >> Thanks Thomas for bringing this up.
> >> I consider this is not a new feature, but rather a fix to address
> >> the issue with statically assigned maximum lcore limit on
> >> high-density CPU platform such as AMD Epyc.
> >> As I see a lot of DPDK adopters are still using LTS 18.11 & 19.11,
> >> and they have 1~2 yrs of lifetime left, we like to backport this to
> >> LTS 18.11 & 19.11 at least.  
> >
> > It is not a fix.
> >
> > The use of static arrays is a design choice that goes back to the
> > early days in dpdk.
> > The addition of --lcores came in after this, but it was introduced for
> > a different use case than placing lcores on any physical core.
> > And there was no claim that a core > RTE_MAX_LCORE would be usable.
> >
> >
> > When testing on a new hardware, it is normal to observe some limitations.
> > Running DPDK on those platforms should be possible: "should be"
> > because I do not have access to this hardware and saw neither tests
> > reports nor performance numbers.
> > Before this patch, the limitation is that on Epyc, cores >
> > RTE_MAX_LCORE are not usable.
> >
> >
> > Now, this change is quite constrained.
> > If we backport it, I don't expect issues in the main dpdk components
> > (based on code review and ovs tests with a RTE_MAX_LCORE set to 16 on
> > a 24 cores system).
> > There might be issues in some examples or not widely used library
> > which uses a physical core id instead of a lcore id.
> >
> >
> > This is the same recurring question "do we allow new features in a
> > stable branch?".  
> 
> Usually, the answer is 'no'.  But we do allow some "new" things to be
> backported (pci ids, etc) that might be required to enable older
> functionality.  Additionally, I'm sure if some feature were required to
> mitigate a CVE, we'd rather favor backporting it.
> 
> I guess we could pose a litmus test:
> 
>   1. Is the problem this feature solves so widespread that it needs to
>      be addressed ASAP?
>   2. Is there a known workaround to the problem this is solving?
>   3. How intrusive is the feature?
>   4. Is it shown to be stable in the mainline (number of fixes, testing,
>      etc)?
>   5. Is it constrained enough that we know we can support it with even
>      higher priority than other things?
> 
> Probably other questions that will need to be asked.
> 
> And even in that list of question, I'm not sure I'd be able to advocate
> backporting this in the upstream branches - it hasn't had much testing.
> It's unstable.  It's "difficult" to use.  It is not widespread that
> people have so many cores.  The workaround is much simpler than
> supporting this (recompile).
> 
> >
> > --
> > David Marchand  
> 

RTE_MAX_LCORES is exposed in API/ABI to application.
Many applications use that to size internal data structures.
Having rte_lcore_id() potentially return a larger value would cause
out of bounds access (and crash) in that application.


More information about the stable mailing list