[PATCH v2 2/2] eal: remove NUMFLAGS enumeration

Ferruh Yigit ferruh.yigit at amd.com
Wed Sep 27 16:09:29 CEST 2023


On 9/27/2023 2:48 PM, Stanisław Kardach wrote:
> On Wed, Sep 27, 2023 at 1:55 PM Ferruh Yigit <ferruh.yigit at amd.com> wrote:
>>
>> On 9/21/2023 3:49 PM, Stanisław Kardach wrote:
>>> On Thu, Sep 21, 2023, 15:18 Tummala, Sivaprasad
>>> <Sivaprasad.Tummala at amd.com <mailto:Sivaprasad.Tummala at amd.com>> wrote:
>>>
>>>     [AMD Official Use Only - General]
>>>
>>>     > -----Original Message-----
>>>     > From: David Marchand <david.marchand at redhat.com
>>>     <mailto:david.marchand at redhat.com>>
>>>     > Sent: Wednesday, September 20, 2023 1:05 PM
>>>     > To: Stanisław Kardach <kda at semihalf.com
>>>     <mailto:kda at semihalf.com>>; Tummala, Sivaprasad
>>>     > <Sivaprasad.Tummala at amd.com <mailto:Sivaprasad.Tummala at amd.com>>
>>>     > Cc: Ruifeng Wang <ruifeng.wang at arm.com
>>>     <mailto:ruifeng.wang at arm.com>>; Min Zhou <zhoumin at loongson.cn
>>>     <mailto:zhoumin at loongson.cn>>;
>>>     > David Christensen <drc at linux.vnet.ibm.com
>>>     <mailto:drc at linux.vnet.ibm.com>>; Bruce Richardson
>>>     > <bruce.richardson at intel.com <mailto:bruce.richardson at intel.com>>;
>>>     Konstantin Ananyev
>>>     > <konstantin.v.ananyev at yandex.ru
>>>     <mailto:konstantin.v.ananyev at yandex.ru>>; dev <dev at dpdk.org
>>>     <mailto:dev at dpdk.org>>; Yigit, Ferruh
>>>     > <Ferruh.Yigit at amd.com <mailto:Ferruh.Yigit at amd.com>>; Thomas
>>>     Monjalon <thomas at monjalon.net <mailto:thomas at monjalon.net>>
>>>     > Subject: Re: [PATCH v2 2/2] eal: remove NUMFLAGS enumeration
>>>     >
>>>     > Caution: This message originated from an External Source. Use
>>>     proper caution
>>>     > when opening attachments, clicking links, or responding.
>>>     >
>>>     >
>>>     > On Wed, Sep 20, 2023 at 8:01 AM Stanisław Kardach
>>>     <kda at semihalf.com <mailto:kda at semihalf.com>> wrote:
>>>     > >
>>>     > > On Tue, Sep 19, 2023 at 4:47 PM David Marchand
>>>     > <david.marchand at redhat.com <mailto:david.marchand at redhat.com>> wrote:
>>>     > > <snip>
>>>     > > > > Also I see you're still removing the RTE_CPUFLAG_NUMFLAGS
>>>     (what I call a
>>>     > last element canary). Why? If you're concerned with ABI, then
>>>     we're talking about
>>>     > an application linking dynamically with DPDK or talking via some
>>>     RPC channel with
>>>     > another DPDK application. So clashing with this definition does
>>>     not come into
>>>     > question. One should rather use rte_cpu_get_flag_enabled().
>>>     > > > > Also if you want to introduce new features, one would add
>>>     them yo the
>>>     > rte_cpuflags headers, unless you'd like to not add those and keep an
>>>     > undocumented list "above" the last defined element.
>>>     > > > > Could you explain a bit more Your use-case?
>>>     > > >
>>>     > > > Hey Stanislaw,
>>>     > > >
>>>     > > > Talking generically, one problem with such pattern (having a LAST,
>>>     > > > or MAX enum) is when an array sized with such a symbol is exposed.
>>>     > > > As I mentionned in the past, this can have unwanted effects:
>>>     > > >
>>>     https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493
>>>     <https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493>
>>>     > > > -1-david.marchand at redhat.com/
>>>     <http://1-david.marchand@redhat.com/>
>>>     >
>>>     > Argh... who broke copy/paste in my browser ?!
>>>     > Wrt to MAX and arrays, I wanted to point at:
>>>     >
>>>     http://inbox.dpdk.org/dev/CAJFAV8xs5CVdE2xwRtaxk5vE_PiQMV5LY5tKStk3R1gOuR <http://inbox.dpdk.org/dev/CAJFAV8xs5CVdE2xwRtaxk5vE_PiQMV5LY5tKStk3R1gOuR>
>>>     > TsUw at mail.gmail.com/ <http://TsUw@mail.gmail.com/>
>>>     >
>>>     > > I agree, though I'd argue "LAST" and "MAX" semantics are a bit
>>>     different. "LAST"
>>>     > delimits the known enumeration territory while "MAX" is more of a
>>>     `constepxr`
>>>     > value type.
>>>     > > >
>>>     > > > Another issue is when an existing enum meaning changes: from the
>>>     > > > application pov, the (old) MAX value is incorrect, but for the
>>>     > > > library pov, a new meaning has been associated.
>>>     > > > This may trigger bugs in the application when calling a function
>>>     > > > that returns such an enum which never return this MAX value in
>>>     the past.
>>>     > > >
>>>     > > > For at least those two reasons, removing those canary elements is
>>>     > > > being done in DPDK.
>>>     > > >
>>>     > > > This specific removal has been announced:
>>>     > > >
>>>     https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493
>>>     <https://patchwork.dpdk.org/project/dpdk/patch/20230919140430.3251493>
>>>     > > > -1-david.marchand at redhat.com/
>>>     <http://1-david.marchand@redhat.com/>
>>>     > > Thanks for pointing this out but did you mean to link to the
>>>     patch again here?
>>>     >
>>>     > Sorry, same here, bad copy/paste :-(.
>>>     >
>>>     > The intended link is:
>>>     https://git.dpdk.org/dpdk/commit/?id=5da7c13521
>>>     <https://git.dpdk.org/dpdk/commit/?id=5da7c13521>
>>>     > The deprecation notice was badly formulated and this patch here is
>>>     consistent with
>>>     > it.
>>>     >
>>>     >
>>>     > > >
>>>     > > > Now, practically, when I look at the cpuflags API, I don't see us
>>>     > > > exposed to those two issues wrt rte_cpu_flag_t, so maybe this
>>>     change
>>>     > > > is unneeded.
>>>     > > > But on the other hand, is it really an issue for an application to
>>>     > > > lose this (internal) information?
>>>     > > I doubt it, maybe it could be used as a sanity check for
>>>     choosing proper functors
>>>     > in the application. Though the initial description of the reason
>>>     behind this patch was
>>>     > to not break the ABI and I don't think it does that. What it does
>>>     is enforces users to
>>>     > use explicit cpu flag values which is a good thing. Though if so,
>>>     then it should be
>>>     > stated in the commit description.
>>>     >
>>>     > I agree.
>>>     > Siva, can you work on a new revision?
>>>     >
>>>     David, Stanislaw,
>>>
>>>     The original motivation of this patch was to avoid ABI breakage with
>>>     the introduction of new CPU flag
>>>     "RTE_CPUFLAG_MONITORX"
>>>     (http://mails.dpdk.org/archives/test-report/2023-April/382489.html
>>>     <http://mails.dpdk.org/archives/test-report/2023-April/382489.html>).
>>>
>>>     Because of ABI breakage, the feature was postponed to this release.
>>>     https://patchwork.dpdk.org/project/dpdk/patch/20230413115334.43172-3-sivaprasad.tummala@amd.com/ <https://patchwork.dpdk.org/project/dpdk/patch/20230413115334.43172-3-sivaprasad.tummala@amd.com/>
>>>
>>> This test is flawed, reason being that the NUMFLAGS should not be
>>> treated as a flag value and instead as a canary but this test is not
>>> taking into account.
>>>
>>
>> Hi Stanislaw,
>>
>> Why test is flawed?
>>
>> The enum in in the public header, so the 'RTE_CPUFLAG_NUMFLAGS' enum
>> item, and there are APIs using the enum, so the enum exchanged between
>> shared library and the application.
> In a similar way lots of Linux uapi headers contain bits that should
> not be used directly, even though they are defined there. The reason
> for that is the C language syntax, not necessarily the intent of a
> developer.
> Since NUMFLAGS was a canary to make the flag handling code easier, it
> should not be treated as a "real" value and hence my suggestion of a
> flawed test. That said, NUMFLAGS does not bring enough value to not
> remove it. :)
>

Both it doesn't enough value to hang on, and we don't have control on
how it is used by the application once it is exposed by the library.


>>
>> Similar thing discussed before and when enum exchanged between
>> application and shared library, there is an ABI breakage risk when enum
>> extended and general tendency is to eliminate the MAX value to reduce
>> the risk.
> Agreed though as I have mentioned before, "MAX" has a different
> semantics than "NUM". Then again since we have rte_cpu_feature_table,
> we can RTE_DIM to check the user input.
>

Their usage and intention on having them is same I think, can you please
elaborate what is the difference between MAX and NUM enum items that is
added as last item in an enum?


>>
>>
>> When enum value sent from library to application, it is more clear that
>> this can cause an ABI breakage, because application can receive a value
>> that it is not aware in the build time, which can cause unexpected behavior.
>> Simply think about a case application allocated array in
>> 'RTE_CPUFLAG_NUMFLAGS' size and directly accessing the array index based
>> on returned enum item value, if the enum extended in the new version of
>> the shared library, this can cause invalid memory access in application.
> Using the NUM enum element (which serves as a last item canary) to
> size an array is not a good idea unless it's returned from a runtime
> call. Otherwise one hits issues that you've described.
>

I agree :), but that is a way to describe how it can be a problem.
Also last time I argued similar to what you said, that application
should check against MAX value before using it but I have been told
not to assume what application does. My take from it is, expect worst
from application as a library side developer.


>>
>> When enum value sent from application to library, I am not quite sure
>> how problematic it is to be honest. Like being in the
>> 'rte_cpu_get_flag_enabled()' & 'rte_cpu_get_flag_name()' in question.
>> Only when application sends 'RTE_CPUFLAG_NUMFLAGS' to
>> 'rte_cpu_get_flag_name()', it expects a NULL returned, but this won't
>> happen in new version of the shared library, not sure if this can cause
>> any problem for the application.
>> But as I mentioned, general guidance is to eliminate this kind of MAX
>> enum value usage.
>>
>>
>> And for this specific issue, although usage of the enum in
>> 'rte_cpu_get_flag_enabled()' & 'rte_cpu_get_flag_name()' APIs is not
>> clear if it cause ABI breakage,
>> enum being embedded into the 'struct rte_bbdev_driver_info' struct
>> doesn't leave a question, since this struct is returned from library to
>> the application and change in the enum causes an ABI breakage.
> Enum size does not change irrespective of changing its values. So
> size-wise it's not an ABI breakage. Re-ordering values is an ABI
> breakage.>

Agree it is not size-wise issue. But still an issue.


>>
>>
>> Briefly, I think even appending to the end of 'enum rte_cpu_flag_t'
>> cause ABI breakage and removing 'RTE_CPUFLAG_NUMFLAGS' helps to extend
>> this enum in the future.
>> And an outstanding deprecation notice already exists for this:
>> https://git.dpdk.org/dpdk/tree/doc/guides/rel_notes/deprecation.rst?h=v23.07#n63
>>
>>
>>> Your change did not break the ABI because you have properly added the
>>> new flag at the end.
>>> So I would ask to change the commit description to mention that NUMFLAGS
>>> is removed to:
>>> 1. Prevent users from treating it as a usable value or an array size.
>>> 2. Prevent false-positive failures in the ABI test.
>>>
>>> Also it would be good to link to the aforementioned ABI test failure to
>>> give readers some context when inspecting the git tree.
>>>
>>>
>>>
>>>     Can you please add what exactly needs to be reworked in the new version.
>>>
>>>     >
>>>     > Thanks.
>>>     >
>>>     > --
>>>     > David Marchand
>>>
>>
> 
> 



More information about the dev mailing list