[v8] eal: add cache-line demote support
Checks
Commit Message
rte_cldemote is similar to a prefetch hint - in reverse. cldemote(addr)
enables software to hint to hardware that line is likely to be shared.
Useful in core-to-core communications where cache-line is likely to be
shared. ARM and PPC implementation is provided with NOP and can be added
if any equivalent instructions could be used for implementation on those
architectures.
Signed-off-by: Omkar Maslekar <omkar.maslekar@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
Acked-by: David Christensen <drc@linux.vnet.ibm.com>
Acked-by: Jerin Jacob <jerinj@marvell.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v8: removed unnecessary comment in test_prefetch.h
removed header file rte_compat.h from specific arch
rearranged sequence in the release notes
fixed coding style in test_prefetch.h and grammar issue in documentation
added tag Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
v7: fixed experimental tag
v6: marked rte_cldemote as experimental
added rte_cldemote call in existing app/test_prefetch.c
v5: documentation updated
fixed formatting issue in release notes
added Acked-by: Bruce Richardson <bruce.richardson@intel.com>
*
v4: updated bold text for title and fixed margin in release notes
*
v3: fixed warning regarding whitespace
*
v2: documentation updated
---
---
app/test/test_prefetch.c | 2 ++
doc/guides/rel_notes/release_20_11.rst | 8 ++++++++
lib/librte_eal/arm/include/rte_prefetch_32.h | 5 +++++
lib/librte_eal/arm/include/rte_prefetch_64.h | 5 +++++
lib/librte_eal/include/generic/rte_prefetch.h | 18 ++++++++++++++++++
lib/librte_eal/ppc/include/rte_prefetch.h | 5 +++++
lib/librte_eal/x86/include/rte_prefetch.h | 9 +++++++++
7 files changed, 52 insertions(+)
@@ -30,6 +30,8 @@
rte_prefetch1_write(&a);
rte_prefetch2_write(&a);
+ rte_cldemote(&a);
+
return 0;
}
@@ -68,6 +68,14 @@ New Features
which allow the programmer to prefetch a cache line and also indicate
the intention to write.
+* **Added new function rte_cldemote in rte_prefetch.h.**
+
+ Added a hardware hint CLDEMOTE, which is similar to prefetch in reverse.
+ CLDEMOTE moves the cache line to the more remote cache, where it expects
+ sharing to be efficient. Moving the cache line to a level more distant from
+ the processor helps to accelerate core-to-core communication.This is X86
+ specific implementation.
+
* **Updated CRC modules of the net library.**
* Added runtime selection of the optimal architecture-specific CRC path.
@@ -33,6 +33,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
rte_prefetch0(p);
}
+static inline void rte_cldemote(const volatile void *p)
+{
+ RTE_SET_USED(p);
+}
+
#ifdef __cplusplus
}
#endif
@@ -32,6 +32,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
asm volatile ("PRFM PLDL1STRM, [%0]" : : "r" (p));
}
+static inline void rte_cldemote(const volatile void *p)
+{
+ RTE_SET_USED(p);
+}
+
#ifdef __cplusplus
}
#endif
@@ -116,4 +116,22 @@
__builtin_prefetch(p, 1, 1);
}
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Demote a cache line to a more distant level of cache from the processor.
+ * CLDEMOTE hints to hardware to move (demote) a cache line from the closest to
+ * the processor to a level more distant from the processor. It is a hint and
+ * not guaranteed. rte_cldemote is intended to move the cache line to the more
+ * remote cache, where it expects sharing to be efficient and to indicate that
+ * a line may be accessed by a different core in the future.
+ *
+ * @param p
+ * Address to demote
+ */
+__rte_experimental
+static inline void
+rte_cldemote(const volatile void *p);
+
#endif /* _RTE_PREFETCH_H_ */
@@ -34,6 +34,11 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
rte_prefetch0(p);
}
+static inline void rte_cldemote(const volatile void *p)
+{
+ RTE_SET_USED(p);
+}
+
#ifdef __cplusplus
}
#endif
@@ -32,6 +32,15 @@ static inline void rte_prefetch_non_temporal(const volatile void *p)
asm volatile ("prefetchnta %[p]" : : [p] "m" (*(const volatile char *)p));
}
+/*
+ * we use raw byte codes for now as only the newest compiler
+ * versions support this instruction natively.
+ */
+static inline void rte_cldemote(const volatile void *p)
+{
+ asm volatile(".byte 0x0f, 0x1c, 0x06" :: "S" (p));
+}
+
#ifdef __cplusplus
}
#endif