[dpdk-dev] [PATCH v4] eal: add cache-line demote support
Maslekar, Omkar
omkar.maslekar at intel.com
Tue Sep 22 23:53:25 CEST 2020
Hi Bruce,
My comments are inline
>-----Original Message-----
>From: Bruce Richardson <bruce.richardson at intel.com>
>Sent: Tuesday, September 22, 2020 1:28 AM
>To: Maslekar, Omkar <omkar.maslekar at intel.com>
>Cc: dev at dpdk.org; Loftus, Ciara <ciara.loftus at intel.com>
>Subject: Re: [PATCH v4] eal: add cache-line demote support
>
>On Mon, Sep 21, 2020 at 06:59:27PM -0700, Omkar Maslekar wrote:
>> rte_cldemote is similar to a prefetch hint - in reverse.
>> cldemote(addr) enables software to hint to hardware that line is likely to be
>shared.
>> Useful in core-to-core communications where cache-line is likely to be
>> shared. ARM and PPC implementation is provided with NOP and can be
>> added if any equivalent instructions could be used for implementation
>> on those architectures.
>>
>> Signed-off-by: Omkar Maslekar <omkar.maslekar at intel.com>
>>
>Few minor suggestions below. With those fixed, feel free to add my ack to
>future versions of this patch.
>
>Acked-by: Bruce Richardson <bruce.richardson at intel.com>
>
>> ---
>> v4: updated bold text for title and fixed margin in release notes
>> *
>> v3: fixed warning regarding whitespace
>> *
>> v2: documentation updated
>> ---
>> ---
>> doc/guides/rel_notes/release_20_11.rst | 6 ++++++
>> lib/librte_eal/arm/include/rte_prefetch_32.h | 5 +++++
>> lib/librte_eal/arm/include/rte_prefetch_64.h | 5 +++++
>> lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
>> lib/librte_eal/ppc/include/rte_prefetch.h | 5 +++++
>> lib/librte_eal/x86/include/rte_prefetch.h | 9 +++++++++
>> 6 files changed, 43 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/release_20_11.rst
>> b/doc/guides/rel_notes/release_20_11.rst
>> index df227a1..b844b96 100644
>> --- a/doc/guides/rel_notes/release_20_11.rst
>> +++ b/doc/guides/rel_notes/release_20_11.rst
>> @@ -55,6 +55,12 @@ New Features
>> Also, make sure to start the actual text at the margin.
>> =======================================================
>>
>> +* **Added new function rte_cldemote in rte_prefetch.h.**
>> +
>> + Added a hardware hint CLDEMOTE, which is similar to prefetch in
>reverse.
>> + CLDEMOTE moves the cache line to the more remote cache, where it
>> + expects sharing to be efficient. Moving the cache line to a level
>> + more distant from the processor helps to accelerate core-to-core
>communication.
>>
>
>I think you need two blank lines between sections here, not just one.
[om] you are right, I will fix in v5.
>
>> Removed Items
>> -------------
>> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
>> b/lib/librte_eal/arm/include/rte_prefetch_32.h
>> index e53420a..ad91edd 100644
>> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
>> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
>> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const
>volatile void *p)
>> rte_prefetch0(p);
>> }
>>
>> +static inline void rte_cldemote(const volatile void *p) {
>> + RTE_SET_USED(p);
>> +}
>> +
>> #ifdef __cplusplus
>> }
>> #endif
>> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
>> b/lib/librte_eal/arm/include/rte_prefetch_64.h
>> index fc2b391..35d278a 100644
>> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
>> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
>> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const
>volatile void *p)
>> asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p)); }
>>
>> +static inline void rte_cldemote(const volatile void *p) {
>> + RTE_SET_USED(p);
>> +}
>> +
>> #ifdef __cplusplus
>> }
>> #endif
>> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
>> b/lib/librte_eal/include/generic/rte_prefetch.h
>> index 6e47bdf..8742412 100644
>> --- a/lib/librte_eal/include/generic/rte_prefetch.h
>> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
>> @@ -51,4 +51,17 @@
>> */
>> static inline void rte_prefetch_non_temporal(const volatile void *p);
>>
>> +/**
>> + * Demote a cache line to a more distant level of cache from the
>processor.
>> + *
>> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
>> +closest to
>> + * the processor to a level more distant from the processor. It is a
>> +hint and
>> + * not guarantee. rte_cldemote is intended to speed up things at the
>> +producer,
>> + * in the producer-consumer case.
>> + *
>
>Two thoughts here:
>1. Is it not more the consumer who benefits more since they are the ones
>receiving the demoted value, while the producer pays a higher cost since
>they have to demote the value on send?
[OM] CLDEMOTE benefits the consumer. My statement "speed up things at the producer" indicate proximity where the distance is reduced.
But I will make it simple and more readable.
>2. Rather than talking about producer consumer case specifically, I think it
>would be good to replace the last sentence with what you have in the cover
>letter about it being for sharing, and to indicate that a line may be accessed
>by a different core in the future.
[OM] Good point, there could be many other cores that can benefit instead of just a single consumer. I will update this.
>
>> + * @param p
>> + * Address to demote
>> + */
>> +static inline void rte_cldemote(const volatile void *p);
>> +
>> #endif /* _RTE_PREFETCH_H_ */
>> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
>> b/lib/librte_eal/ppc/include/rte_prefetch.h
>> index 9ba07c8..3fe9655 100644
>> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
>> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
>> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const
>volatile void *p)
>> rte_prefetch0(p);
>> }
>>
>> +static inline void rte_cldemote(const volatile void *p) {
>> + RTE_SET_USED(p);
>> +}
>> +
>> #ifdef __cplusplus
>> }
>> #endif
>> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
>> b/lib/librte_eal/x86/include/rte_prefetch.h
>> index 384c6b3..029d06e 100644
>> --- a/lib/librte_eal/x86/include/rte_prefetch.h
>> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
>> @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const
>volatile void *p)
>> asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char
>> *)p)); }
>>
>> +/*
>> + * we're using raw byte codes for now as only the newest compiler
>> + * versions support this instruction natively.
>> + */
>> +static inline void rte_cldemote(const volatile void *p) {
>> + asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
>> +
>> #ifdef __cplusplus
>> }
>> #endif
>> --
>> 1.8.3.1
>>
More information about the dev
mailing list