EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h

Message ID 1599700614-22809-2-git-send-email-omkar.maslekar@intel.com (mailing list archive)
State Superseded, archived
Delegated to: David Marchand
Headers
Series EAL: An addition of cache line demote (CLDEMOTE) in rte_prefetch.h |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/iol-broadcom-Performance success Performance Testing PASS
ci/iol-broadcom-Functional success Functional Testing PASS
ci/iol-intel-Functional success Functional Testing PASS
ci/iol-intel-Performance success Performance Testing PASS
ci/Intel-compilation success Compilation OK
ci/iol-testing success Testing PASS
ci/iol-mellanox-Performance success Performance Testing PASS
ci/travis-robot success Travis build: passed

Commit Message

Omkar Maslekar Sept. 10, 2020, 1:16 a.m. UTC
  rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.

Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
---
 doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
 lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
 lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 6 files changed, 35 insertions(+), 22 deletions(-)
  

Comments

Bruce Richardson Sept. 10, 2020, 8:55 a.m. UTC | #1
On Wed, Sep 09, 2020 at 06:16:54PM -0700, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> ---

Hi Omkar,

please see some review comments inline below.

Regards,
/Bruce

>  doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
>  lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
>  lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
>  6 files changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index df227a1..c4a4362 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -27,29 +27,11 @@ New Features
>  .. This section should contain new features added in this release.
>     Sample format:
>  
> -   * **Add a title in the past tense with a full stop.**
> +Added new instruction CLDEMOTE in rte_prefetch.h.

You need to prefix this with the library it is in, in this case EAL. Also,
since this is C code, you are adding a function, not an instruction.

>  
> -     Add a short 1-2 sentence description in the past tense.
> -     The description should be enough to allow someone scanning
> -     the release notes to understand the new feature.
> -
> -     If the feature adds a lot of sub-features you can use a bullet list
> -     like this:
> -
> -     * Added feature foo to do something.
> -     * Enhanced feature bar to do something else.
> -
> -     Refer to the previous release notes for examples.
> -
> -     Suggested order in release notes items:
> -     * Core libs (EAL, mempool, ring, mbuf, buses)
> -     * Device abstraction libs and PMDs
> -       - ethdev (lib, PMDs)
> -       - cryptodev (lib, PMDs)
> -       - eventdev (lib, PMDs)
> -       - etc
> -     * Other libs
> -     * Apps, Examples, Tools (if significant)

Don't remove these lines, they are all also part of the same comment as
below where it says "Do not overwrite or remove it" :-)

> +     Added a hardware hint CLDEMOTE which is similar to prefetch in reverse.
> +     CLDEMOTES moves the cache line to the last shared cache, where it expects
> +     sharing to be efficient.
>  

Reading the instruction description in the Intel instruction set reference,
it says about moving the cache line to a more remote cache-line, rather
than guaranteeing that it goes to the last level cache. Therefore, to
ensure compatiblity with the current spec and make it more flexible to meet
any other hardware implementations, I suggest changing the "last shared
cache ..." to "more remote cache where sharing may be more efficient".

>       This section is a comment. Do not overwrite or remove it.
>       Also, make sure to start the actual text at the margin.
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
> index e53420a..ad91edd 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	rte_prefetch0(p);
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
> index fc2b391..35d278a 100644
> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
>  }
>  
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>  #ifdef __cplusplus
>  }
>  #endif
> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
> index 6e47bdf..89ec69c 100644
> --- a/lib/librte_eal/include/generic/rte_prefetch.h
> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
> @@ -51,4 +51,11 @@
>   */
>  static inline void rte_prefetch_non_temporal(const volatile void *p);
>  
> +/**
> + * Demote a cache line into the last shared cache level.

Same comment as above. Since this will make it into the official API
doxygen documentation, I think a bit fuller of a description would be good
also.
  
David Christensen Sept. 10, 2020, 10:04 p.m. UTC | #2
On 9/9/20 6:16 PM, Omkar Maslekar wrote:
> rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
> enables software to hint to hardware that line is likely to be shared.
> Useful in core-to-core communications where cache-line is likely to be
> shared. ARM and PPC implementation is provided with NOP and can be added
> if any equivalent instructions could be used for implementation on those
> architectures.
> 
> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
> ---

...

> diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
> index 9ba07c8..3fe9655 100644
> --- a/lib/librte_eal/ppc/include/rte_prefetch.h
> +++ b/lib/librte_eal/ppc/include/rte_prefetch.h
> @@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
>   	rte_prefetch0(p);
>   }
> 
> +static inline void rte_cldemote(const volatile void *p)
> +{
> +	RTE_SET_USED(p);
> +}
> +
>   #ifdef __cplusplus
>   }
>   #endif

For POWER there's an instruction defined in the ISA which is the most 
similar, miso (i.e. Make It So), but the instruction is interpreted as a 
NOP in the POWER8/POWER9 CPUs, so NOP is the right choice for PPC at 
this time.

Dave
  
Omkar Maslekar Sept. 10, 2020, 11:30 p.m. UTC | #3
Hi Bruce,

 >-----Original Message-----
 >From: Bruce Richardson <bruce.richardson@intel.com>
 >Sent: Thursday, September 10, 2020 1:55 AM
 >To: Maslekar, Omkar <omkar.maslekar@intel.com>
 >Cc: dev@dpdk.org; Loftus, Ciara <ciara.loftus@intel.com>
 >Subject: Re: [PATCH] EAL: An addition of cache line demote (CLDEMOTE) in
 >rte_prefetch.h
 >
 >On Wed, Sep 09, 2020 at 06:16:54PM -0700, Omkar Maslekar wrote:
 >> rte_cldemote is similar to a prefetch hint - in reverse.
 >> cldemote(addr) enables software to hint to hardware that line is likely to be
 >shared.
 >> Useful in core-to-core communications where cache-line is likely to be
 >> shared. ARM and PPC implementation is provided with NOP and can be
 >> added if any equivalent instructions could be used for implementation
 >> on those architectures.
 >>
 >> Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
 >> ---
 >
 >Hi Omkar,
 >
 >please see some review comments inline below.
 >
 >Regards,
 >/Bruce
 >
 >>  doc/guides/rel_notes/release_20_11.rst        | 26 ++++----------------------
 >>  lib/librte_eal/arm/include/rte_prefetch_32.h  |  5 +++++
 >> lib/librte_eal/arm/include/rte_prefetch_64.h  |  5 +++++
 >> lib/librte_eal/include/generic/rte_prefetch.h |  7 +++++++
 >>  lib/librte_eal/ppc/include/rte_prefetch.h     |  5 +++++
 >>  lib/librte_eal/x86/include/rte_prefetch.h     |  9 +++++++++
 >>  6 files changed, 35 insertions(+), 22 deletions(-)
 >>
 >> diff --git a/doc/guides/rel_notes/release_20_11.rst
 >> b/doc/guides/rel_notes/release_20_11.rst
 >> index df227a1..c4a4362 100644
 >> --- a/doc/guides/rel_notes/release_20_11.rst
 >> +++ b/doc/guides/rel_notes/release_20_11.rst
 >> @@ -27,29 +27,11 @@ New Features
 >>  .. This section should contain new features added in this release.
 >>     Sample format:
 >>
 >> -   * **Add a title in the past tense with a full stop.**
 >> +Added new instruction CLDEMOTE in rte_prefetch.h.
 >
 >You need to prefix this with the library it is in, in this case EAL. Also, since this
 >is C code, you are adding a function, not an instruction.

[I will fix these release notes] 

 >
 >>
 >> -     Add a short 1-2 sentence description in the past tense.
 >> -     The description should be enough to allow someone scanning
 >> -     the release notes to understand the new feature.
 >> -
 >> -     If the feature adds a lot of sub-features you can use a bullet list
 >> -     like this:
 >> -
 >> -     * Added feature foo to do something.
 >> -     * Enhanced feature bar to do something else.
 >> -
 >> -     Refer to the previous release notes for examples.
 >> -
 >> -     Suggested order in release notes items:
 >> -     * Core libs (EAL, mempool, ring, mbuf, buses)
 >> -     * Device abstraction libs and PMDs
 >> -       - ethdev (lib, PMDs)
 >> -       - cryptodev (lib, PMDs)
 >> -       - eventdev (lib, PMDs)
 >> -       - etc
 >> -     * Other libs
 >> -     * Apps, Examples, Tools (if significant)
 >
 >Don't remove these lines, they are all also part of the same comment as
 >below where it says "Do not overwrite or remove it" :-)

[I will revert original comment and add appropriate] 

 >
 >> +     Added a hardware hint CLDEMOTE which is similar to prefetch in
 >reverse.
 >> +     CLDEMOTES moves the cache line to the last shared cache, where it
 >expects
 >> +     sharing to be efficient.
 >>
 >
 >Reading the instruction description in the Intel instruction set reference, it
 >says about moving the cache line to a more remote cache-line, rather than
 >guaranteeing that it goes to the last level cache. Therefore, to ensure
 >compatiblity with the current spec and make it more flexible to meet any
 >other hardware implementations, I suggest changing the "last shared cache
 >..." to "more remote cache where sharing may be more efficient".

[I will make these changes as per suggestion and make sure it is in sync with software development manual ] 

 >
 >>       This section is a comment. Do not overwrite or remove it.
 >>       Also, make sure to start the actual text at the margin.
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> index e53420a..ad91edd 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
 >> @@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	rte_prefetch0(p);
 >>  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> index fc2b391..35d278a 100644
 >> --- a/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> +++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
 >> @@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const
 >volatile void *p)
 >>  	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));  }
 >>
 >> +static inline void rte_cldemote(const volatile void *p) {
 >> +	RTE_SET_USED(p);
 >> +}
 >> +
 >>  #ifdef __cplusplus
 >>  }
 >>  #endif
 >> diff --git a/lib/librte_eal/include/generic/rte_prefetch.h
 >> b/lib/librte_eal/include/generic/rte_prefetch.h
 >> index 6e47bdf..89ec69c 100644
 >> --- a/lib/librte_eal/include/generic/rte_prefetch.h
 >> +++ b/lib/librte_eal/include/generic/rte_prefetch.h
 >> @@ -51,4 +51,11 @@
 >>   */
 >>  static inline void rte_prefetch_non_temporal(const volatile void *p);
 >>
 >> +/**
 >> + * Demote a cache line into the last shared cache level.
 >
 >Same comment as above. Since this will make it into the official API doxygen
 >documentation, I think a bit fuller of a description would be good also.

[I will add more documentation]
  

Patch

diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index df227a1..c4a4362 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -27,29 +27,11 @@  New Features
 .. This section should contain new features added in this release.
    Sample format:
 
-   * **Add a title in the past tense with a full stop.**
+Added new instruction CLDEMOTE in rte_prefetch.h.
 
-     Add a short 1-2 sentence description in the past tense.
-     The description should be enough to allow someone scanning
-     the release notes to understand the new feature.
-
-     If the feature adds a lot of sub-features you can use a bullet list
-     like this:
-
-     * Added feature foo to do something.
-     * Enhanced feature bar to do something else.
-
-     Refer to the previous release notes for examples.
-
-     Suggested order in release notes items:
-     * Core libs (EAL, mempool, ring, mbuf, buses)
-     * Device abstraction libs and PMDs
-       - ethdev (lib, PMDs)
-       - cryptodev (lib, PMDs)
-       - eventdev (lib, PMDs)
-       - etc
-     * Other libs
-     * Apps, Examples, Tools (if significant)
+     Added a hardware hint CLDEMOTE which is similar to prefetch in reverse.
+     CLDEMOTES moves the cache line to the last shared cache, where it expects
+     sharing to be efficient.
 
      This section is a comment. Do not overwrite or remove it.
      Also, make sure to start the actual text at the margin.
diff --git a/lib/librte_eal/arm/include/rte_prefetch_32.h b/lib/librte_eal/arm/include/rte_prefetch_32.h
index e53420a..ad91edd 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_32.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_32.h
@@ -33,6 +33,11 @@  static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/arm/include/rte_prefetch_64.h b/lib/librte_eal/arm/include/rte_prefetch_64.h
index fc2b391..35d278a 100644
--- a/lib/librte_eal/arm/include/rte_prefetch_64.h
+++ b/lib/librte_eal/arm/include/rte_prefetch_64.h
@@ -32,6 +32,11 @@  static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/include/generic/rte_prefetch.h b/lib/librte_eal/include/generic/rte_prefetch.h
index 6e47bdf..89ec69c 100644
--- a/lib/librte_eal/include/generic/rte_prefetch.h
+++ b/lib/librte_eal/include/generic/rte_prefetch.h
@@ -51,4 +51,11 @@ 
  */
 static inline void rte_prefetch_non_temporal(const volatile void *p);
 
+/**
+ * Demote a cache line into the last shared cache level.
+ * @param p
+ *   Address to demote
+ */
+static inline void rte_cldemote(const volatile void *p);
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/librte_eal/ppc/include/rte_prefetch.h b/lib/librte_eal/ppc/include/rte_prefetch.h
index 9ba07c8..3fe9655 100644
--- a/lib/librte_eal/ppc/include/rte_prefetch.h
+++ b/lib/librte_eal/ppc/include/rte_prefetch.h
@@ -34,6 +34,11 @@  static inline void rte_prefetch_non_temporal(const volatile void *p)
 	rte_prefetch0(p);
 }
 
+static inline void rte_cldemote(const volatile void *p)
+{
+	RTE_SET_USED(p);
+}
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_eal/x86/include/rte_prefetch.h b/lib/librte_eal/x86/include/rte_prefetch.h
index 384c6b3..029d06e 100644
--- a/lib/librte_eal/x86/include/rte_prefetch.h
+++ b/lib/librte_eal/x86/include/rte_prefetch.h
@@ -32,6 +32,15 @@  static inline void rte_prefetch_non_temporal(const volatile void *p)
 	asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
 }
 
+/*
+ * we're using raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+	asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
 #ifdef __cplusplus
 }
 #endif