[dpdk-dev,v3,1/8] eal: introduce DMA memory barriers

Message ID 20180119004430.15305-2-yskoh@mellanox.com (mailing list archive)
State Superseded, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Yongseok Koh Jan. 19, 2018, 12:44 a.m. UTC
  This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to
guarantee the ordering of coherent shared memory between the CPU and a DMA
capable device.

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 lib/librte_eal/common/include/generic/rte_atomic.h | 52 ++++++++++++++++++++++
 1 file changed, 52 insertions(+)
  

Comments

Andrew Rybchenko Jan. 19, 2018, 7:16 a.m. UTC | #1
On 01/19/2018 03:44 AM, Yongseok Koh wrote:
> This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to
> guarantee the ordering of coherent shared memory between the CPU and a DMA
> capable device.
>
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

It is already really good. Many thanks.

Maybe it would be useful to:
  - avoid duplication of so long explanations (put in in one place and 
add reference?)
  - explain why it is bound to DMA or call it in a different way, since 
right now it is bound
    to coherent-mapped IO (rte_cio_rmb() ?). Yes, I see benefits to 
follow Linux
    terminology, but may be DPDK can do better :) I just add my 
concerns, but let
    EAL code maintainers to decide
  - as I understand right now there is no control over DMA mapping since
    mapping is done by UIO/VFIO drivers. Should documentation be updated 
that
    DMA mapping is assumed to be coherent?
  - when it is applied, may be it makes sense to send HEADS UP to dev@,
    it definitely deserves to be mentioned in the release notes

<...>
  
Yongseok Koh Jan. 22, 2018, 6:29 p.m. UTC | #2
> On Jan 18, 2018, at 11:16 PM, Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> 
> On 01/19/2018 03:44 AM, Yongseok Koh wrote:
>> This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to
>> guarantee the ordering of coherent shared memory between the CPU and a DMA
>> capable device.
>> 
>> Signed-off-by: Yongseok Koh 
>> <yskoh@mellanox.com>
> 
> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> 
> It is already really good. Many thanks.

Thank you!

> Maybe it would be useful to:
>  - avoid duplication of so long explanations (put in in one place and add reference?)

May have to ask Thomas how to do this. Thomas?

>  - explain why it is bound to DMA or call it in a different way, since right now it is bound
>    to coherent-mapped IO (rte_cio_rmb() ?). Yes, I see benefits to follow Linux
>    terminology, but may be DPDK can do better :) I just add my concerns, but let
>    EAL code maintainers to decide

Good idea. Like to hear from other people. But, following linux terms sometime
could be good to welcome developers from kernel community to DPDK world. :-)

To people in the cc list, any other concerns?
Especially ARM users - Jianbo and Jerin?

Thanks,
Yongseok
  
Thomas Monjalon Jan. 22, 2018, 8:59 p.m. UTC | #3
22/01/2018 19:29, Yongseok Koh:
> > On Jan 18, 2018, at 11:16 PM, Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> > Maybe it would be useful to:
> >  - avoid duplication of so long explanations (put in in one place and add reference?)
> 
> May have to ask Thomas how to do this. Thomas?

You can group barriers by type (SMP, IO, DMA) with this doxygen syntax:
	https://www.stack.nl/~dimitri/doxygen/manual/grouping.html#memgroup
So you can have a common description of the group, plus a description
of each function.
  
Jerin Jacob Jan. 23, 2018, 4:35 a.m. UTC | #4
-----Original Message-----
> Date: Mon, 22 Jan 2018 18:29:31 +0000
> From: Yongseok Koh <yskoh@mellanox.com>
> To: Andrew Rybchenko <arybchenko@solarflare.com>, Thomas Monjalon
>  <thomas@monjalon.net>, "jianbo.liu@arm.com" <jianbo.liu@arm.com>, Jerin
>  Jacob <jerin.jacob@caviumnetworks.com>
> CC: Adrien Mazarguil <adrien.mazarguil@6wind.com>, Nélio Laranjeiro
>  <nelio.laranjeiro@6wind.com>, "bruce.richardson@intel.com"
>  <bruce.richardson@intel.com>, "Ananyev, Konstantin"
>  <konstantin.ananyev@intel.com>, Chao Zhu <chaozhu@linux.vnet.ibm.com>,
>  "dev@dpdk.org" <dev@dpdk.org>
> Subject: Re: [dpdk-dev] [PATCH v3 1/8] eal: introduce DMA memory barriers
> 
> 
> > On Jan 18, 2018, at 11:16 PM, Andrew Rybchenko <arybchenko@solarflare.com> wrote:
> > 
> > On 01/19/2018 03:44 AM, Yongseok Koh wrote:
> >> This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to
> >> guarantee the ordering of coherent shared memory between the CPU and a DMA
> >> capable device.
> >> 
> >> Signed-off-by: Yongseok Koh 
> >> <yskoh@mellanox.com>
> > 
> > Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>
> > 
> > It is already really good. Many thanks.
> 
> Thank you!
> 
> > Maybe it would be useful to:
> >  - avoid duplication of so long explanations (put in in one place and add reference?)
> 
> May have to ask Thomas how to do this. Thomas?
> 
> >  - explain why it is bound to DMA or call it in a different way, since right now it is bound
> >    to coherent-mapped IO (rte_cio_rmb() ?). Yes, I see benefits to follow Linux
> >    terminology, but may be DPDK can do better :) I just add my concerns, but let
> >    EAL code maintainers to decide
> 
> Good idea. Like to hear from other people. But, following linux terms sometime
> could be good to welcome developers from kernel community to DPDK world. :-)
> 
> To people in the cc list, any other concerns?
> Especially ARM users - Jianbo and Jerin?

I like Andrew's suggestion. IMO, rte_cio_?mb() makes more sense.

> 
> Thanks,
> Yongseok
  
Yongseok Koh Jan. 25, 2018, 7:08 p.m. UTC | #5
> On Jan 22, 2018, at 8:35 PM, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote:

> 

> -----Original Message-----

>> Date: Mon, 22 Jan 2018 18:29:31 +0000

>> From: Yongseok Koh <yskoh@mellanox.com>

>> To: Andrew Rybchenko <arybchenko@solarflare.com>, Thomas Monjalon

>> <thomas@monjalon.net>, "jianbo.liu@arm.com" <jianbo.liu@arm.com>, Jerin

>> Jacob <jerin.jacob@caviumnetworks.com>

>> CC: Adrien Mazarguil <adrien.mazarguil@6wind.com>, Nélio Laranjeiro

>> <nelio.laranjeiro@6wind.com>, "bruce.richardson@intel.com"

>> <bruce.richardson@intel.com>, "Ananyev, Konstantin"

>> <konstantin.ananyev@intel.com>, Chao Zhu <chaozhu@linux.vnet.ibm.com>,

>> "dev@dpdk.org" <dev@dpdk.org>

>> Subject: Re: [dpdk-dev] [PATCH v3 1/8] eal: introduce DMA memory barriers

>> 

>> 

>>> On Jan 18, 2018, at 11:16 PM, Andrew Rybchenko <arybchenko@solarflare.com> wrote:

>>> 

>>> On 01/19/2018 03:44 AM, Yongseok Koh wrote:

>>>> This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to

>>>> guarantee the ordering of coherent shared memory between the CPU and a DMA

>>>> capable device.

>>>> 

>>>> Signed-off-by: Yongseok Koh 

>>>> <yskoh@mellanox.com>

>>> 

>>> Acked-by: Andrew Rybchenko <arybchenko@solarflare.com>

>>> 

>>> It is already really good. Many thanks.

>> 

>> Thank you!

>> 

>>> Maybe it would be useful to:

>>> - avoid duplication of so long explanations (put in in one place and add reference?)

>> 

>> May have to ask Thomas how to do this. Thomas?

>> 

>>> - explain why it is bound to DMA or call it in a different way, since right now it is bound

>>>   to coherent-mapped IO (rte_cio_rmb() ?). Yes, I see benefits to follow Linux

>>>   terminology, but may be DPDK can do better :) I just add my concerns, but let

>>>   EAL code maintainers to decide

>> 

>> Good idea. Like to hear from other people. But, following linux terms sometime

>> could be good to welcome developers from kernel community to DPDK world. :-)

>> 

>> To people in the cc list, any other concerns?

>> Especially ARM users - Jianbo and Jerin?

> 

> I like Andrew's suggestion. IMO, rte_cio_?mb() makes more sense.


If there's no more suggestion or objection, will send out v3 with the changes requested here by Andrew.

Thank you for reviews and comments.

Yongseok
  

Patch

diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h b/lib/librte_eal/common/include/generic/rte_atomic.h
index 3ba7245a3..1ffa51e31 100644
--- a/lib/librte_eal/common/include/generic/rte_atomic.h
+++ b/lib/librte_eal/common/include/generic/rte_atomic.h
@@ -98,6 +98,58 @@  static inline void rte_io_wmb(void);
  */
 static inline void rte_io_rmb(void);
 
+/**
+ * Write memory barrier for coherent memory between lcore and IO device
+ *
+ * Guarantees that the STORE operations on coherent memory that
+ * precede the rte_dma_wmb() call are visible to I/O device before the
+ * STORE operations that follow it.
+ *
+ * DMA memory barrier is a lightweight version of I/O device barriers
+ * which are system-wide data synchronization barriers. This is for
+ * only coherent memory domain between lcore and IO device but it is
+ * same as the I/O device barriers in most of architectures. However,
+ * some architecture provides even lighter barriers which are
+ * somewhere in between I/O device barriers and SMP barriers. For
+ * example, in case of ARMv8, data memory barrier can have different
+ * shareability domains - inner-shareable and outer-shareable. And
+ * inner-shareable data memory barrier fits for SMP barriers and
+ * outer-shareable one for DMA barriers, which acts on coherent
+ * memory.
+ *
+ * In most cases, I/O device barriers are safer but if operations are
+ * on coherent memory instead of incoherent MMIO region of a device,
+ * then DMA barriers can be used and this could bring performance gain
+ * depending on architectures.
+ */
+static inline void rte_dma_wmb(void);
+
+/**
+ * Read memory barrier for coherent memory between lcore and IO device
+ *
+ * Guarantees that the LOAD operations on coherent memory updated by
+ * IO device that precede the rte_dma_rmb() call are visible to CPU
+ * before the LOAD operations that follow it.
+ *
+ * DMA memory barrier is a lightweight version of I/O device barriers
+ * which are system-wide data synchronization barriers. This is for
+ * only coherent memory domain between lcore and IO device but it is
+ * same as the I/O device barriers in most of architectures. However,
+ * some architecture provides even lighter barriers which are
+ * somewhere in between I/O device barriers and SMP barriers. For
+ * example, in case of ARMv8, data memory barrier can have different
+ * shareability domains - inner-shareable and outer-shareable. And
+ * inner-shareable data memory barrier fits for SMP barriers and
+ * outer-shareable one for DMA barriers, which acts on coherent
+ * memory.
+ *
+ * In most cases, I/O device barriers are safer but if operations are
+ * on coherent memory instead of incoherent MMIO region of a device,
+ * then DMA barriers can be used and this could bring performance gain
+ * depending on architectures.
+ */
+static inline void rte_dma_rmb(void);
+
 #endif /* __DOXYGEN__ */
 
 /**