[dpdk-dev] [PATCH v4] eal: add cache-line demote support

Maslekar, Omkar omkar.maslekar at intel.com
Tue Sep 22 23:53:25 CEST 2020


Hi Bruce,

My comments are inline

 >-----Original Message-----
 >From: Bruce Richardson <bruce.richardson at intel.com>
 >Sent: Tuesday, September 22, 2020 1:28 AM
 >To: Maslekar, Omkar <omkar.maslekar at intel.com>
 >Cc: dev at dpdk.org; Loftus, Ciara <ciara.loftus at intel.com>
 >Subject: Re: [PATCH v4] eal: add cache-line demote support
 >
 >On Mon, Sep 21, 2020 at 06:59:27PM -0700, Omkar Maslekar wrote:
 >> rte_cldemote is similar to a prefetch hint - in reverse.
 >> cldemote(addr) enables software to hint to hardware that line is likely to be
 >shared.
 >> Useful in core-to-core communications where cache-line is likely to be
 >> shared. ARM and PPC implementation is provided with NOP and can be
 >> added if any equivalent instructions could be used for implementation
 >> on those architectures.
 >>
 >> Signed-off-by: Omkar Maslekar <omkar.maslekar at intel.com>
 >>
 >Few minor suggestions below. With those fixed, feel free to add my ack to
 >future versions of this patch.
 >
 >Acked-by: Bruce Richardson <bruce.richardson at intel.com>
 >
 >> ---
 >> v4: updated bold text for title and fixed margin in release notes
 >> *
 >> v3: fixed warning regarding whitespace
 >> *
 >> v2: documentation updated
 >> ---
 >> ---
 >>  doc/guides/rel_notes/release_20_11.rst        |  6 ++++++
 >>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 >> lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 >> lib/librte_eal/include/generic/rte_prefetch.h | 13 +++++++++++++
 >>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 >>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 >>  6 files changed, 43 insertions(+)
 >>
 >> diff --git a/doc/guides/rel_notes/release_20_11.rst
 >> b/doc/guides/rel_notes/release_20_11.rst
 >> index df227a1..b844b96 100644
 >> --- a/doc/guides/rel_notes/release_20_11.rst
 >> +++ b/doc/guides/rel_notes/release_20_11.rst
 >> @@ -55,6 +55,12 @@ New Features
 >>       Also, make sure to start the actual text at the margin.
 >>       =======================================================
 >>
 >> +* **Added new function rte_cldemote in rte_prefetch.h.**
 >> +
 >> +  Added a hardware hint CLDEMOTE, which is similar to prefetch in
 >reverse.
 >> +  CLDEMOTE moves the cache line to the more remote cache, where it
 >> + expects  sharing to be efficient. Moving the cache line to a level
 >> + more distant from  the processor helps to accelerate core-to-core
 >communication.
 >>
 >
 >I think you need two blank lines between sections here, not just one.
[om] you are right, I will fix in v5. 
 >
 >>  Removed Items
 >>  -------------
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> index e53420a..ad91edd 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> index fc2b391..35d278a 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
 >> b/lib/librte_eal/include/generic/rte_prefetch.h
 >> index 6e47bdf..8742412 100644
 >> --- a/lib/librte_eal/include/generic/rte_prefetch.h
 >> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
 >> @@ -51,4 +51,17 @@
 >>   */
 >>  static inline void rte_prefetch_non_temporal(const volatile void *p);
 >>
 >> +/**
 >> + * Demote a cache line to a more distant level of cache from the
 >processor.
 >> + *
 >> + * CLDEMOTE hints to hardware to move (demote) a cache line from the
 >> +closest to
 >> + * the processor to a level more distant from the processor. It is a
 >> +hint and
 >> + * not guarantee. rte_cldemote is intended to speed up things at the
 >> +producer,
 >> + * in the producer-consumer case.
 >> + *
 >
 >Two thoughts here:
 >1. Is it not more the consumer who benefits more since they are the ones
 >receiving the demoted value, while the producer pays a higher cost since
 >they have to demote the value on send?

[OM] CLDEMOTE benefits the consumer. My statement "speed up things at the producer" indicate proximity where the distance is reduced.
But I will make it simple and more readable. 

 >2. Rather than talking about producer consumer case specifically, I think it
 >would be good to replace the last sentence with what you have in the cover
 >letter about it being for sharing, and to indicate that a line may be accessed
 >by a different core in the future.

 
[OM] Good point, there could be many other cores that can benefit instead of just a single consumer. I will update this.

 >
 >> + * @param p
 >> + *   Address to demote
 >> + */
 >> +static inline void rte_cldemote(const volatile void *p);
 >> +
 >>  #endif /* _RTE_PREFETCH_H_ */
 >> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> index 9ba07c8..3fe9655 100644
 >> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
 >> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/x86/include/rte_prefetch.h
 >> b/lib/librte_eal/x86/include/rte_prefetch.h
 >> index 384c6b3..029d06e 100644
 >> --- a/lib/librte_eal/x86/include/rte_prefetch.h
 >> +++ b/lib/librte_eal/x86/include/rte_prefetch.h
 >> @@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char
 >> *)p));  }
 >>
 >> +/*
 >> + * we're using raw byte codes for now as only the newest compiler
 >> + * versions support this instruction natively.
 >> + */
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p)); }
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> --
 >> 1.8.3.1
 >>


More information about the dev mailing list