diff mbox series

[v3] dmadev: introduce DMA device library

Message ID	1626179263-14645-1-git-send-email-fengchengwen@huawei.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	From: Chengwen Feng <fengchengwen@huawei.com> To: <thomas@monjalon.net>, <ferruh.yigit@intel.com>, <bruce.richardson@intel.com>, <jerinj@marvell.com>, <jerinjacobk@gmail.com>, <andrew.rybchenko@oktetlabs.ru> CC: <dev@dpdk.org>, <mb@smartsharesystems.com>, <nipun.gupta@nxp.com>, <hemant.agrawal@nxp.com>, <maxime.coquelin@redhat.com>, <honnappa.nagarahalli@arm.com>, <david.marchand@redhat.com>, <sburla@marvell.com>, <pkapoor@marvell.com>, <konstantin.ananyev@intel.com> Date: Tue, 13 Jul 2021 20:27:43 +0800 Message-ID: <1626179263-14645-1-git-send-email-fengchengwen@huawei.com> In-Reply-To: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> References: <1625231891-2963-1-git-send-email-fengchengwen@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Subject: [dpdk-dev] [PATCH v3] dmadev: introduce DMA device library Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	[v3] dmadev: introduce DMA device library \| [v3] dmadev: introduce DMA device library

Checks

Context	Check	Description
ci/checkpatch	warning	coding style issues
ci/github-robot	success	github build: passed
ci/iol-abi-testing	success	Testing PASS
ci/iol-intel-Functional	success	Functional Testing PASS
ci/Intel-compilation	success	Compilation OK
ci/iol-testing	fail	Testing issues
ci/intel-Testing	success	Testing PASS
ci/iol-intel-Performance	fail	Performance Testing issues

Commit Message

fengchengwen July 13, 2021, 12:27 p.m. UTC

  This patch introduce 'dmadevice' which is a generic type of DMA
device.

The APIs of dmadev library exposes some generic operations which can
enable configuration and I/O with the DMA devices.

Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
---
v3:
* rm reset and fill_sg ops.
* rm MT-safe capabilities.
* add submit flag.
* redefine rte_dma_sg to implement asymmetric copy.
* delete some reserved field for future use.
* rearrangement rte_dmadev/rte_dmadev_data struct.
* refresh rte_dmadev.h copyright.
* update vchan setup parameter.
* modified some inappropriate descriptions.
* arrange version.map alphabetically.
* other minor modifications from review comment.
---
 MAINTAINERS                  |   4 +
 config/rte_config.h          |   3 +
 lib/dmadev/meson.build       |   7 +
 lib/dmadev/rte_dmadev.c      | 561 +++++++++++++++++++++++++
 lib/dmadev/rte_dmadev.h      | 968 +++++++++++++++++++++++++++++++++++++++++++
 lib/dmadev/rte_dmadev_core.h | 161 +++++++
 lib/dmadev/rte_dmadev_pmd.h  |  72 ++++
 lib/dmadev/version.map       |  37 ++
 lib/meson.build              |   1 +
 9 files changed, 1814 insertions(+)
 create mode 100644 lib/dmadev/meson.build
 create mode 100644 lib/dmadev/rte_dmadev.c
 create mode 100644 lib/dmadev/rte_dmadev.h
 create mode 100644 lib/dmadev/rte_dmadev_core.h
 create mode 100644 lib/dmadev/rte_dmadev_pmd.h
 create mode 100644 lib/dmadev/version.map

Comments

fengchengwen July 13, 2021, 1:06 p.m. UTC | #1

Thank you for your valuable comments, and I think we've taken a big step forward.

@andrew Could you provide the copyright line so that I can add it to relevant file.

@burce, jerin  Some unmodified review comments are returned here:

1.
COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
to be wrapped so aggressively.

REPLY: Our CI still has 80 characters limit, and I review most framework still comply.

2.
COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
RTE_DMA_DIRECTION_...

REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.

3.
COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
We are not making release as pubic API in other device class. See ethdev spec.
bbdev/eventdev/rawdev

REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
I think release is OK, BTW: bbdev/eventdev also have release ops.

4.
COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future fields */
> +};
Please add the capability for each counter in info structure as one
device may support all
the counters.

REPLY: This is a statistics function. If this function is not supported, then do not need
to implement the stats ops function. Also could to set the unimplemented ones to zero.

5.
COMMENT: > +#endif
> +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
Instead of every driver set the NOP function, In the common code, If
the CAPA is not set,
common code can set NOP function for this with <0 return value.

REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
don't call API which driver not supported (which could get from capabilities).

6.
COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +                          const uint16_t nb_status, uint32_t *status,
uint32_t -> enum rte_dma_status_code

REPLY：I'm still evaluating this. It takes a long time for the driver to perform error code
conversion in this API. Do we need to provide an error code conversion function alone ?

7.
COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> +                                struct rte_dmadev_info *dev_info);
Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
as this header is exported.

REPLY: I prefer not add 'rte_' prefix, it make the define too long.

8.
COMMENT: > + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.
Please rename this to "completed_status" to allow the return of information
other than just errors. As I suggested before, I think this should also be
usable as a slower version of "completed" even in the case where there are
no errors, in that it returns status information for each and every job
rather than just returning as soon as it hits a failure.

REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
and we can build the slow path function on the two API.

9.
COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support mem-to-mem transfer.
Do we need this? Can we assume that any device appearing as a dmadev can
do mem-to-mem copies, and drop the capability for mem-to-mem and the
capability for copying?
also for RTE_DMA_DEV_CAPA_OPS_COPY

REPLY: yes, I insist on adding this for the sake of conceptual integrity.
For ioat driver just make a statement.

10.
COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};
Let's add rte_dmadev_conf struct into this to return the configuration
settings.

REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.

[snip]

On 2021/7/13 20:27, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> v3:
> * rm reset and fill_sg ops.
> * rm MT-safe capabilities.
> * add submit flag.
> * redefine rte_dma_sg to implement asymmetric copy.
> * delete some reserved field for future use.
> * rearrangement rte_dmadev/rte_dmadev_data struct.
> * refresh rte_dmadev.h copyright.
> * update vchan setup parameter.
> * modified some inappropriate descriptions.
> * arrange version.map alphabetically.
> * other minor modifications from review comment.
> ---
>  MAINTAINERS                  |   4 +
>  config/rte_config.h          |   3 +
>  lib/dmadev/meson.build       |   7 +
>  lib/dmadev/rte_dmadev.c      | 561 +++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 968 +++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h | 161 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |  72 ++++
>  lib/dmadev/version.map       |  37 ++
>  lib/meson.build              |   1 +

Bruce Richardson July 13, 2021, 1:37 p.m. UTC | #2

On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> Thank you for your valuable comments, and I think we've taken a big step forward.
> 
> @andrew Could you provide the copyright line so that I can add it to relevant file.
> 
> @burce, jerin  Some unmodified review comments are returned here:

Thanks. Some further comments inline below. Most points you make I'm ok
with, but I do disagree on a number of others.

/Bruce

> 
> 1.
> COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> to be wrapped so aggressively.
> 
> REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> 
Ok.

> 2.
> COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> RTE_DMA_DIRECTION_...
> 
> REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> 
DIRECTION could be shortened to DIR, but I think this is probably ok as is
too.

> 3.
> COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> We are not making release as pubic API in other device class. See ethdev spec.
> bbdev/eventdev/rawdev
> 
> REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> I think release is OK, BTW: bbdev/eventdev also have release ops.
> 
Ok

> 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> fields */
> > +};
> Please add the capability for each counter in info structure as one
> device may support all the counters.
> 
> REPLY: This is a statistics function. If this function is not supported,
> then do not need to implement the stats ops function. Also could to set
> the unimplemented ones to zero.
> 
+1
The stats functions should be a minimum set that is supported by all
drivers. Each of these stats can be easily tracked by software if HW
support for it is not available, so I agree that we should not have each
stat as a capability.

> 5.
> COMMENT: > +#endif
> > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> Instead of every driver set the NOP function, In the common code, If
> the CAPA is not set,
> common code can set NOP function for this with <0 return value.
> 
> REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> don't call API which driver not supported (which could get from capabilities).
> 
For datapath functions, +1.

> 6.
> COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > +                          const uint16_t nb_status, uint32_t *status,
> uint32_t -> enum rte_dma_status_code
> 
> REPLY：I'm still evaluating this. It takes a long time for the driver to perform error code
> conversion in this API. Do we need to provide an error code conversion function alone ?
> 
It's not that difficult a conversion to do, and so long as we have the
regular "completed" function which doesn't do all the error manipulation we
should be fine. Performance in the case of errors is not expected to be as
good, since errors should be very rare.

> 7.
> COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > +                                struct rte_dmadev_info *dev_info);
> Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> as this header is exported.
> 
> REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> 
I disagree on this, they need the rte_ prefix, despite the fact it makes
them longer. If length is a concern, these can be changed from "dmadev_" to
"rte_dma_", which is only one character longer.
In fact, I believe Morten already suggested we use "rte_dma" rather than
"rte_dmadev" as a function prefix across the library.

> 8.
> COMMENT: > + *        - rte_dmadev_completed_fails()
> > + *            - return the number of operation requests failed to complete.
> Please rename this to "completed_status" to allow the return of information
> other than just errors. As I suggested before, I think this should also be
> usable as a slower version of "completed" even in the case where there are
> no errors, in that it returns status information for each and every job
> rather than just returning as soon as it hits a failure.
> 
> REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> and we can build the slow path function on the two API.
> 
I still disagree on this too. We have a "completed" op where we get
informed of what has completed and minimal error indication, and a
"completed_status" operation which provides status information for each
operation completed, at the cost of speed.

> 9.
> COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> > +/**< DMA device support mem-to-mem transfer.
> Do we need this? Can we assume that any device appearing as a dmadev can
> do mem-to-mem copies, and drop the capability for mem-to-mem and the
> capability for copying?
> also for RTE_DMA_DEV_CAPA_OPS_COPY
> 
> REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> For ioat driver just make a statement.
> 

Ok. It seems a wasted bit to me, but I don't see us running out of them
soon.

> 10.
> COMMENT: > +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > +};
> Let's add rte_dmadev_conf struct into this to return the configuration
> settings.
> 
> REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> 
Yes, that is true, but I fail to see why that is a major problem. It just
means that if the conf structure changes we have two functions to version
instead of one. The information is still useful.

If you don't want the actual conf structure explicitly put into the info
struct, we can instead put the fields in directly. I really think that the
info_get function should provide back to the user the details of what way
the device was configured previously.

regards,
/Bruce

Bruce Richardson July 13, 2021, 4:02 p.m. UTC | #3

On Tue, Jul 13, 2021 at 08:27:43PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> ---
> v3:
> * rm reset and fill_sg ops.
> * rm MT-safe capabilities.
> * add submit flag.
> * redefine rte_dma_sg to implement asymmetric copy.
> * delete some reserved field for future use.
> * rearrangement rte_dmadev/rte_dmadev_data struct.
> * refresh rte_dmadev.h copyright.
> * update vchan setup parameter.
> * modified some inappropriate descriptions.
> * arrange version.map alphabetically.
> * other minor modifications from review comment.
> ---

Thanks, some further comments inline below on the .c file initially.

/Bruce

>  MAINTAINERS                  |   4 +
>  config/rte_config.h          |   3 +
>  lib/dmadev/meson.build       |   7 +
>  lib/dmadev/rte_dmadev.c      | 561 +++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev.h      | 968 +++++++++++++++++++++++++++++++++++++++++++
>  lib/dmadev/rte_dmadev_core.h | 161 +++++++
>  lib/dmadev/rte_dmadev_pmd.h  |  72 ++++
>  lib/dmadev/version.map       |  37 ++
>  lib/meson.build              |   1 +
>  9 files changed, 1814 insertions(+)
>  create mode 100644 lib/dmadev/meson.build
>  create mode 100644 lib/dmadev/rte_dmadev.c
>  create mode 100644 lib/dmadev/rte_dmadev.h
>  create mode 100644 lib/dmadev/rte_dmadev_core.h
>  create mode 100644 lib/dmadev/rte_dmadev_pmd.h
>  create mode 100644 lib/dmadev/version.map
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index af2a91d..e01a07f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -495,6 +495,10 @@ F: drivers/raw/skeleton/
>  F: app/test/test_rawdev.c
>  F: doc/guides/prog_guide/rawdev.rst
>  
> +DMA device API - EXPERIMENTAL
> +M: Chengwen Feng <fengchengwen@huawei.com>
> +F: lib/dmadev/
> +
>  
>  Memory Pool Drivers
>  -------------------
> diff --git a/config/rte_config.h b/config/rte_config.h
> index 590903c..331a431 100644
> --- a/config/rte_config.h
> +++ b/config/rte_config.h
> @@ -81,6 +81,9 @@
>  /* rawdev defines */
>  #define RTE_RAWDEV_MAX_DEVS 64
>  
> +/* dmadev defines */
> +#define RTE_DMADEV_MAX_DEVS 64
> +
>  /* ip_fragmentation defines */
>  #define RTE_LIBRTE_IP_FRAG_MAX_FRAG 4
>  #undef RTE_LIBRTE_IP_FRAG_TBL_STAT
> diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
> new file mode 100644
> index 0000000..d2fc85e
> --- /dev/null
> +++ b/lib/dmadev/meson.build
> @@ -0,0 +1,7 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2021 HiSilicon Limited.
> +
> +sources = files('rte_dmadev.c')
> +headers = files('rte_dmadev.h')
> +indirect_headers += files('rte_dmadev_core.h')
> +driver_sdk_headers += files('rte_dmadev_pmd.h')
> diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
> new file mode 100644
> index 0000000..1bca463
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.c
> @@ -0,0 +1,561 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#include <ctype.h>
> +#include <inttypes.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +
> +#include <rte_debug.h>
> +#include <rte_dev.h>
> +#include <rte_eal.h>
> +#include <rte_errno.h>
> +#include <rte_lcore.h>
> +#include <rte_log.h>
> +#include <rte_memory.h>
> +#include <rte_memzone.h>
> +#include <rte_malloc.h>
> +#include <rte_string_fns.h>
> +
> +#include "rte_dmadev.h"
> +#include "rte_dmadev_pmd.h"
> +
> +struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
> +
> +static const char *MZ_RTE_DMADEV_DATA = "rte_dmadev_data";
> +/* Shared memory between primary and secondary processes. */
> +static struct {
> +	struct rte_dmadev_data data[RTE_DMADEV_MAX_DEVS];
> +} *dmadev_shared_data;
> +
> +RTE_LOG_REGISTER(rte_dmadev_logtype, lib.dmadev, INFO);

There is an RTE_LOG_REGISTER_DEFAULT macro which can be used here instead.
Also, since the logtype is not exposed outside this file, we can drop the
prefix on it to shorten it:

"RTE_LOG_REGISTER_DEFAULT(logtype, INFO);"

> +#define RTE_DMADEV_LOG(level, ...) \
> +	rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
> +
> +/* Macros to check for valid device id */
> +#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return retval; \
> +	} \
> +} while (0)
> +
> +#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
> +	if (!rte_dmadev_is_valid_dev(dev_id)) { \
> +		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
> +		return; \
> +	} \
> +} while (0)
> +
Looking through the code, this macro appears unused, since all functions
return values.
The former "RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET" can also be shorted to
remove prefixes, because it's again local to the file. Suggest:
"VALID_DEV_ID_OR_ERR"

> +/* Macro to check for invalid pointers */
> +#define RTE_DMADEV_PTR_OR_ERR_RET(ptr, retval) do { \
> +	if ((ptr) == NULL) \
> +		return retval; \
> +} while (0)
> +
This is a very short macro, so in practice it's only saving one line of
code. Also, with current use, the "retval" is always -EINVAL. I'd tend
towards dropping the macro, but if we want one, I'd suggest a short
one-line one:

"#define CHECK_PTR_PARAM(ptr) if ((ptr) == NULL) return -EINVAL"

However, overall I don't think it's worth it - case in point, see the check
for "name" below which skips using the macro anyway.

> +static int
> +dmadev_check_name(const char *name)
> +{
> +	size_t name_len;
> +
> +	if (name == NULL) {
> +		RTE_DMADEV_LOG(ERR, "Name can't be NULL\n");
> +		return -EINVAL;
> +	}
> +
> +	name_len = strnlen(name, RTE_DMADEV_NAME_MAX_LEN);
> +	if (name_len == 0) {
> +		RTE_DMADEV_LOG(ERR, "Zero length DMA device name\n");
> +		return -EINVAL;
> +	}
> +	if (name_len >= RTE_DMADEV_NAME_MAX_LEN) {
> +		RTE_DMADEV_LOG(ERR, "DMA device name is too long\n");
> +		return -EINVAL;
> +	}
> +
> +	return 0;
> +}
> +
> +static uint16_t
> +dmadev_find_free_dev(void)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (dmadev_shared_data->data[i].dev_name[0] == '\0') {
> +			RTE_ASSERT(rte_dmadevices[i].state ==
> +				   RTE_DMADEV_UNUSED);
> +			return i;
> +		}
> +	}
> +
> +	return RTE_DMADEV_MAX_DEVS;
> +}
> +
> +static struct rte_dmadev*
> +dmadev_find(const char *name)
> +{
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if ((rte_dmadevices[i].state == RTE_DMADEV_ATTACHED) &&
> +		    (!strcmp(name, rte_dmadevices[i].data->dev_name)))
> +			return &rte_dmadevices[i];
> +	}
> +
> +	return NULL;
> +}
> +
> +static int
> +dmadev_shared_data_prepare(void)
> +{
> +	const struct rte_memzone *mz;
> +
> +	if (dmadev_shared_data == NULL) {
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +			/* Allocate port data and ownership shared memory. */
> +			mz = rte_memzone_reserve(MZ_RTE_DMADEV_DATA,
> +					 sizeof(*dmadev_shared_data),
> +					 rte_socket_id(), 0);
> +		} else {
> +			mz = rte_memzone_lookup(MZ_RTE_DMADEV_DATA);
> +		}

Minor nit, our coding style for DPDK says to omit the braces around
single-statement legs like this.

> +		if (mz == NULL)
> +			return -ENOMEM;
> +
> +		dmadev_shared_data = mz->addr;
> +		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +			memset(dmadev_shared_data->data, 0,
> +			       sizeof(dmadev_shared_data->data));

I believe all memzones are zero on allocation anyway, so this memset is
unecessary and can be dropped.

> +	}
> +
> +	return 0;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t dev_id;
> +
> +	dev = dmadev_find(name);
> +	if (dev != NULL) {
> +		RTE_DMADEV_LOG(ERR, "DMA device already allocated\n");
> +		return NULL;
> +	}
> +
> +	dev_id = dmadev_find_free_dev();
> +	if (dev_id == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR, "Reached maximum number of DMA devices\n");
> +		return NULL;
> +	}
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +	dev->data = &dmadev_shared_data->data[dev_id];
> +	dev->data->dev_id = dev_id;
> +	strlcpy(dev->data->dev_name, name, sizeof(dev->data->dev_name));
> +
> +	return dev;
> +}
> +
> +static struct rte_dmadev *
> +dmadev_attach_secondary(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +	uint16_t i;
> +
> +	if (dmadev_shared_data_prepare() != 0) {
> +		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
> +		return NULL;
> +	}
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (!strcmp(dmadev_shared_data->data[i].dev_name, name))
> +			break;
> +	}
> +	if (i == RTE_DMADEV_MAX_DEVS) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %s is not driven by the primary process\n",
> +			name);
> +		return NULL;
> +	}
> +
> +	dev = &rte_dmadevices[i];
> +	dev->data = &dmadev_shared_data->data[i];
> +	RTE_ASSERT(dev->data->dev_id == i);
> +
> +	return dev;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name)
> +{
> +	struct rte_dmadev *dev;
> +
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> +		dev = dmadev_allocate(name);
> +	else
> +		dev = dmadev_attach_secondary(name);
> +
> +	if (dev == NULL)
> +		return NULL;
> +	dev->state = RTE_DMADEV_ATTACHED;
> +
> +	return dev;
> +}
> +
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev)
> +{
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	if (dev->state == RTE_DMADEV_UNUSED)
> +		return 0;
> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		rte_free(dev->data->dev_private);

There seems an imbalance here. If we "free" on release, we should similarly
"malloc" on allocate, otherwise we run the risk of dev_private being
allocated using regular malloc in a driver, for example. I think some other
allocation APIs take as parameter the private data size to reserve, and we
can follow that model.

> +		memset(dev->data, 0, sizeof(struct rte_dmadev_data));
> +	}
> +
> +	memset(dev, 0, sizeof(struct rte_dmadev));
> +	dev->state = RTE_DMADEV_UNUSED;
> +
> +	return 0;
> +}
> +
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name)
> +{
> +	if (dmadev_check_name(name) != 0)
> +		return NULL;
> +	return dmadev_find(name);
> +}
> +
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id)
> +{
> +	if (dev_id >= RTE_DMADEV_MAX_DEVS ||
> +	    rte_dmadevices[dev_id].state != RTE_DMADEV_ATTACHED)
> +		return false;
> +	return true;
> +}

Can be a one-line function:
"return (dev_id < RTE_DMADEV_MAX_DEVS && 
		rte_dmadevices[dev_id].state == RTE_DMADEV_ATTACHED);"

> +
> +uint16_t
> +rte_dmadev_count(void)
> +{
> +	uint16_t count = 0;
> +	uint16_t i;
> +
> +	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
> +		if (rte_dmadevices[i].state == RTE_DMADEV_ATTACHED)
> +			count++;
> +	}
> +
> +	return count;
> +}
> +
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
> +{
> +	const struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_DMADEV_PTR_OR_ERR_RET(dev_info, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];

This line can be merged into the definition of dev, since it's just
assigning an address and never referencing it.

> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -ENOTSUP);
> +	memset(dev_info, 0, sizeof(struct rte_dmadev_info));
> +	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info,
> +					    sizeof(struct rte_dmadev_info));
> +	if (ret != 0)
> +		return ret;
> +
> +	dev_info->device = dev->device;
> +	dev_info->nb_vchans = dev->data->dev_conf.max_vchans;
> +
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_DMADEV_PTR_OR_ERR_RET(dev_conf, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];

As above, merge into definition line of dev as:
"struct rte_dmadev *dev = &rte_dmadevices[dev_id];"

> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (dev_conf->max_vchans > info.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u configure too many vchans\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped to allow configuration\n",
> +			dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP);

Rather than putting in all these checks and returning -ENOTSUP, I'd like
propose that we instead have the ops structure assigned as part of 
"rte_dmadev_pmd_allocate()" function. That then allows us to enforce that
each device supports the minimum set of functions, i.e. info_get,
configure, etc. etc.

> +	ret = (*dev->dev_ops->dev_configure)(dev, dev_conf);
> +	if (ret == 0)
> +		memcpy(&dev->data->dev_conf, dev_conf, sizeof(*dev_conf));
> +
> +	return ret;
> +}
> +
> +int
> +rte_dmadev_start(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started != 0) {
> +		RTE_DMADEV_LOG(WARNING, "Device %u already started\n", dev_id);
> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_start == NULL)
> +		goto mark_started;
> +
> +	ret = (*dev->dev_ops->dev_start)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_started:
> +	dev->data->dev_started = 1;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_stop(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (dev->data->dev_started == 0) {
> +		RTE_DMADEV_LOG(WARNING, "Device %u already stopped\n", dev_id);
> +		return 0;
> +	}
> +
> +	if (dev->dev_ops->dev_stop == NULL)
> +		goto mark_stopped;
> +
> +	ret = (*dev->dev_ops->dev_stop)(dev);
> +	if (ret != 0)
> +		return ret;
> +
> +mark_stopped:
> +	dev->data->dev_started = 0;
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_close(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	/* Device must be stopped before it can be closed */
> +	if (dev->data->dev_started == 1) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u must be stopped before closing\n", dev_id);
> +		return -EBUSY;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_close, -ENOTSUP);
> +	return (*dev->dev_ops->dev_close)(dev);
> +}
> +
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +		       const struct rte_dmadev_vchan_conf *conf)
> +{
> +	struct rte_dmadev_info info;
> +	struct rte_dmadev *dev;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_DMADEV_PTR_OR_ERR_RET(conf, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction == 0 ||
> +	    conf->direction & ~RTE_DMA_TRANSFER_DIR_ALL) {
> +		RTE_DMADEV_LOG(ERR, "Device %u direction invalid!\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_MEM_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_MEM_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support mem2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_MEM &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_MEM)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2mem transfer\n", dev_id);
> +		return -EINVAL;
> +	}
> +	if (conf->direction & RTE_DMA_DEV_TO_DEV &&
> +	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_DEV)) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u don't support dev2dev transfer\n", dev_id);
> +		return -EINVAL;
> +	}

Rather than checking each one of these individually, can we just merge
these checks into one?

> +	if (conf->nb_desc < info.min_desc || conf->nb_desc > info.max_desc) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u number of descriptors invalid\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_setup, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_setup)(dev, conf);
> +}
> +
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan >= dev->data->dev_conf.max_vchans) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_release, -ENOTSUP);
> +	return (*dev->dev_ops->vchan_release)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
> +		     struct rte_dmadev_stats *stats)
> +{
> +	const struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_DMADEV_PTR_OR_ERR_RET(stats, -EINVAL);
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan >= dev->data->dev_conf.max_vchans &&
> +	    vchan != RTE_DMADEV_ALL_VCHAN) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
> +	return (*dev->dev_ops->stats_get)(dev, vchan, stats,
> +					  sizeof(struct rte_dmadev_stats));
> +}
> +
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	if (vchan >= dev->data->dev_conf.max_vchans &&
> +	    vchan != RTE_DMADEV_ALL_VCHAN) {
> +		RTE_DMADEV_LOG(ERR,
> +			"Device %u vchan %u out of range\n", dev_id, vchan);
> +		return -EINVAL;
> +	}
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_reset, -ENOTSUP);
> +	return (*dev->dev_ops->stats_reset)(dev, vchan);
> +}
> +
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f)
> +{
> +	const struct rte_dmadev *dev;
> +	struct rte_dmadev_info info;
> +	int ret;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	RTE_DMADEV_PTR_OR_ERR_RET(f, -EINVAL);
> +
> +	ret = rte_dmadev_info_get(dev_id, &info);
> +	if (ret != 0) {
> +		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
> +		return -EINVAL;
> +	}
> +
> +	dev = &rte_dmadevices[dev_id];
> +
> +	fprintf(f, "DMA Dev %u, '%s' [%s]\n",
> +		dev->data->dev_id,
> +		dev->data->dev_name,
> +		dev->data->dev_started ? "started" : "stopped");
> +	fprintf(f, "  dev_capa: 0x%" PRIx64 "\n", info.dev_capa);
> +	fprintf(f, "  max_vchans_supported: %u\n", info.max_vchans);
> +	fprintf(f, "  max_vchans_configured: %u\n", info.nb_vchans);
> +
> +	if (dev->dev_ops->dev_dump != NULL)
> +		return (*dev->dev_ops->dev_dump)(dev, f);
> +
> +	return 0;
> +}
> +
> +int
> +rte_dmadev_selftest(uint16_t dev_id)
> +{
> +	struct rte_dmadev *dev;
> +
> +	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
> +	dev = &rte_dmadevices[dev_id];
> +
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_selftest, -ENOTSUP);
> +	return (*dev->dev_ops->dev_selftest)(dev_id);
> +}

Nipun Gupta July 14, 2021, 12:22 p.m. UTC | #4

<snip>

> +/**
> + * A structure used to configure a virtual DMA channel.
> + */
> +struct rte_dmadev_vchan_conf {
> +	uint8_t direction;
> +	/**< Set of supported transfer directions
> +	 * @see RTE_DMA_MEM_TO_MEM
> +	 * @see RTE_DMA_MEM_TO_DEV
> +	 * @see RTE_DMA_DEV_TO_MEM
> +	 * @see RTE_DMA_DEV_TO_DEV
> +	 */
> +	/** Number of descriptor for the virtual DMA channel */
> +	uint16_t nb_desc;
> +	/** 1) Used to describes the port parameter in the device-to-memory
> +	 * transfer scenario.
> +	 * 2) Used to describes the source port parameter in the
> +	 * device-to-device transfer scenario.
> +	 * @see struct rte_dmadev_port_parameters
> +	 */

There should also be a configuration to support no response (per Virtual Channel),
And if that is enabled, user will not be required to call 'rte_dmadev_completed' API.
This shall also be part of capability.

> +	struct rte_dmadev_port_parameters src_port;
> +	/** 1) Used to describes the port parameter in the memory-to-device-to
> +	 * transfer scenario.
> +	 * 2) Used to describes the destination port parameter in the
> +	 * device-to-device transfer scenario.
> +	 * @see struct rte_dmadev_port_parameters
> +	 */
> +	struct rte_dmadev_port_parameters dst_port;
> +};
> +

<snip>

> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list copy operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.

This would need update with the submit flag.
The statement should be true only when the flag is set?
Similar comment I see on 'rte_dmadev_copy_sg' and 'rte_dma_fill' APIs

> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
> + *   - <0: Error code returned by the driver copy scatterlist function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg
> *sg,
> +		   uint64_t flags)
> +{
> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	if (!rte_dmadev_is_valid_dev(dev_id) ||
> +	    vchan >= dev->data->dev_conf.max_vchans ||
> +	    sg == NULL)
> +		return -EINVAL;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> +#endif
> +	return (*dev->copy_sg)(dev, vchan, sg, flags);
> +}
> +

Bruce Richardson July 14, 2021, 4:05 p.m. UTC | #5

On Tue, Jul 13, 2021 at 08:27:43PM +0800, Chengwen Feng wrote:
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
> 
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
> 
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

More review comments - mostly stylistic - inline below.

/Bruce

> ---
> v3:
> * rm reset and fill_sg ops.
> * rm MT-safe capabilities.
> * add submit flag.
> * redefine rte_dma_sg to implement asymmetric copy.
> * delete some reserved field for future use.
> * rearrangement rte_dmadev/rte_dmadev_data struct.
> * refresh rte_dmadev.h copyright.
> * update vchan setup parameter.
> * modified some inappropriate descriptions.
> * arrange version.map alphabetically.
> * other minor modifications from review comment.
> ---
<snip>

> +
> +#include <rte_common.h>
> +#include <rte_compat.h>
> +#ifdef RTE_DMADEV_DEBUG
> +#include <rte_dev.h>
> +#endif

I don't see the value in conditionally including this. I'd simplify by just
always including it.

> +#include <rte_errno.h>
> +#include <rte_memory.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @param dev_id
> + *   DMA device index.
> + *
> + * @return
> + *   - If the device index is valid (true) or not (false).
> + */
> +__rte_experimental
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the total number of DMA devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable DMA devices.
> + */
> +__rte_experimental
> +uint16_t
> +rte_dmadev_count(void);
> +
> +/**
> + * The capabilities of a DMA device
> + */
This should be a non-doxygen comment, as it doesn't apply to a code
element.

> +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA device support memory-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
These comments should come before the items they refer to, not after.

> +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA device support memory-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM	(1ull << 2)
> +/**< DMA device support device-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV	(1ull << 3)
> +/**< DMA device support device-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_COPY	(1ull << 4)

Do we want to leave gaps in the flags so that they are grouped by op type?
Is it possible that we might ahve more RTE_DMA_DEV_X_TO_Y flags in future,
because if so, we should move this out to bit 8, for example.

> +/**< DMA device support copy ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_FILL	(1ull << 5)
> +/**< DMA device support fill ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_SG		(1ull << 6)
> +/**< DMA device support scatter-list ops.
> + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */

Rather than a general SG flag, this should probably be for SG_COPY, since
we aren't offering an SG_FILL option.

> +#define RTE_DMA_DEV_CAPA_FENCE		(1ull << 7)
> +/**< DMA device support fence.
> + * If device support fence, then application could set a fence flags when
> + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> + * If a operation has a fence flags, it means the operation must be processed
> + * only after all previous operations are completed.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */

Drop this flag as unnecessary. All devices either always provide ordering
guarantee - in which case it's a no-op - or else support the flag.

> +#define RTE_DMA_DEV_CAPA_SVA		(1ull << 8)

Again, if we are ok to leave gaps, I'd suggest moving this one well down,
e.g. to bit 32.

> +/**< DMA device support SVA which could use VA as DMA address.
> + * If device support SVA then application could pass any VA address like memory
> + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> + * If device don't support SVA, then application should pass IOVA address which
> + * from rte_malloc(), rte_memzone().
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> +	struct rte_device *device; /**< Generic Device information */
> +	uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_*) */
> +	/** Maximum number of virtual DMA channels supported */
> +	uint16_t max_vchans;
> +	/** Maximum allowed number of virtual DMA channel descriptors */
> +	uint16_t max_desc;
> +	/** Minimum allowed number of virtual DMA channel descriptors */
> +	uint16_t min_desc;
> +	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};

Minor nit - I suggest standardizing the comment format here and have them
all either before, or all afterwards. Since they won't all fit in your
80-column limit, make all comments appear before the item.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the contextual information of a DMA device.

Suggest shortening to "Retrieve information about a DMA device". There is
no context info provided here.

> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> + *   contextual information of the device.

I'd drop the word "contextual" here too.

> + *
> + * @return
> + *   - =0: Success, driver updates the contextual information of the DMA device
> + *   - <0: Error code returned by the driver info get function.
> + *
> + */
> +__rte_experimental
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
<snip>
> +
> +/**
> + * DMA transfer direction defines.
> + */
> +#define RTE_DMA_MEM_TO_MEM	(1ull << 0)
> +/**< DMA transfer direction - from memory to memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */

As with other bit flags, please put the comments on top.

> +#define RTE_DMA_MEM_TO_DEV	(1ull << 1)
> +/**< DMA transfer direction - from memory to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> + * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
> + * mode, it could initiate a DMA move request from memory (which is ARM memory)
> + * to device (which is x86 host memory).
<snip>
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/fill.
> + */
> +#define RTE_DMA_OP_FLAG_FENCE	(1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + */
> +#define RTE_DMA_OP_FLAG_SUBMIT	(1ull << 1)
> +/**< DMA submit flag
> + * It means the operation with this flag must issue doorbell to hardware after
> + * enqueued jobs.
> + */

Comments before define.

> +
> +/**
<snip>
> +/**
> + * DMA transfer status code defines
> + */
> +enum rte_dma_status_code {
> +	/** The operation completed successfully */
> +	RTE_DMA_STATUS_SUCCESSFUL = 0,
> +	/** The operation failed to complete due active drop
> +	 * This is mainly used when processing dev_stop, allow outstanding
> +	 * requests to be completed as much as possible.
> +	 */
> +	RTE_DMA_STATUS_ACTIVE_DROP,

Is this saying that the operation is aborted? I'm not familiar with the
phrase "active drop".

> +	/** The operation failed to complete due invalid source address */
> +	RTE_DMA_STATUS_INVALID_SRC_ADDR,
> +	/** The operation failed to complete due invalid destination address */
> +	RTE_DMA_STATUS_INVALID_DST_ADDR,
> +	/** The operation failed to complete due invalid length */
> +	RTE_DMA_STATUS_INVALID_LENGTH,
> +	/** The operation failed to complete due invalid opcode
> +	 * The DMA descriptor could have multiple format, which are
> +	 * distinguished by the opcode field.
> +	 */
> +	RTE_DMA_STATUS_INVALID_OPCODE,
> +	/** The operation failed to complete due bus err */
> +	RTE_DMA_STATUS_BUS_ERROR,
> +	/** The operation failed to complete due data poison */
> +	RTE_DMA_STATUS_DATA_POISION,
> +	/** The operation failed to complete due descriptor read error */
> +	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> +	/** The operation failed to complete due device link error
> +	 * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> +	 * dev-to-dev transfer scenario.
> +	 */
> +	RTE_DMA_STATUS_DEV_LINK_ERROR,
> +	/** The operation failed to complete due unknown reason */
> +	RTE_DMA_STATUS_UNKNOWN,
> +	/** Driver specific status code offset
> +	 * Start status code for the driver to define its own error code.
> +	 */
> +	RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> +};

I think we need a status error code for "not attempted", where jobs in a
particular batch are not attempted because they appeared after a fence
where a previous job failed. In our HW implementation it's possible for
jobs from later batches would be completed, though, so we need to report
the status from the not attempted jobs before reporting those newer
completed jobs.

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_status
> + *   Indicates the size of status array.
> + * @param[out] status
> + *   The error code of operations that failed to complete.
> + *   Some standard error code are described in 'enum rte_dma_status_code'
> + *   @see rte_dma_status_code
> + * @param[out] last_idx
> + *   The last failed completed operation's index.
> + *
> + * @return
> + *   The number of operations that failed to complete.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +			   const uint16_t nb_status, uint32_t *status,
> +			   uint16_t *last_idx)
> +{

Switch the final two parameters around, so that the prototype matches that
of the previous completed() function, i.e. all start with dev_id, vchan,
"count", last_idx, and then only differ in the final parameter.

> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +	if (!rte_dmadev_is_valid_dev(dev_id) ||
> +	    vchan >= dev->data->dev_conf.max_vchans ||
> +	    nb_status == 0 ||
> +	    status == NULL ||
> +	    last_idx == NULL)
> +		return -EINVAL;
> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> +#endif

Unlike "completed" there is no fallback assigning to non-null parameters.
If we want to make the final two parameters mandatory, we should document
this.

> +	return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_H_ */
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..b0b6494
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,161 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types, that are used by the DMA devices
> + * in order to expose their ops to the class.
> + *
> + * Applications should not use these API directly.
> + *
> + */
> +
> +struct rte_dmadev;
> +
> +/** @internal Used to get device information of a device. */
> +typedef int (*dmadev_info_get_t)(const struct rte_dmadev *dev,
> +				 struct rte_dmadev_info *dev_info,
> +				 uint32_t info_sz);
> +

Since rte_dmadev_core.h is included in rte_dmadev.h, these will be in the
public namespace for all apps using dmadev, so they do need the "rte_"
prefix.

> +/** @internal Used to configure a device. */
> +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> +				  const struct rte_dmadev_conf *dev_conf);
<snip>
> +/**
> + * @internal
> + * The generic data structure associated with each DMA device.
> + *
> + * The dataplane APIs are located at the beginning of the structure, along
> + * with the pointer to where all the data elements for the particular device
> + * are stored in shared memory. This split scheme allows the function pointer
> + * and driver data to be per-process, while the actual configuration data for
> + * the device is shared.
> + */
> +struct rte_dmadev {
> +	dmadev_copy_t copy;
> +	dmadev_copy_sg_t copy_sg;
> +	dmadev_fill_t fill;
> +	dmadev_submit_t submit;
> +	dmadev_completed_t completed;
> +	dmadev_completed_fails_t completed_fails;
> +	void *reserved_ptr; /**< Reserved for future IO function */
> +	struct rte_dmadev_data *data; /**< Pointer to device data. */
> +

I think we will get better performance if we move this back down the
struct, and instead put a copy of the data->dev_private pointer in its
place. Driver implementations tend to use the private data much more than
the generic public data struct.

> +	const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> +	/** Device info which supplied during device initialization. */
> +	struct rte_device *device;
> +	enum rte_dmadev_state state; /**< Flag indicating the device state */
> +	uint64_t reserved[2]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +
> +#endif /* _RTE_DMADEV_CORE_H_ */
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h

<snip>

Jerin Jacob July 15, 2021, 6:44 a.m. UTC | #6

On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> > Thank you for your valuable comments, and I think we've taken a big step forward.
> >
> > @andrew Could you provide the copyright line so that I can add it to relevant file.
> >
> > @burce, jerin  Some unmodified review comments are returned here:
>
> Thanks. Some further comments inline below. Most points you make I'm ok
> with, but I do disagree on a number of others.
>
> /Bruce
>
> >
> > 1.
> > COMMENT: We allow up to 100 characters per line for DPDK code, so these don't need
> > to be wrapped so aggressively.
> >
> > REPLY: Our CI still has 80 characters limit, and I review most framework still comply.
> >
> Ok.
>
> > 2.
> > COMMENT: > +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)
> > RTE_DMA_DIRECTION_...
> >
> > REPLY: add the 'DIRECTION' may the macro too long, I prefer keep it simple.
> >
> DIRECTION could be shortened to DIR, but I think this is probably ok as is
> too.
>

I prefer to keep DIR so that it easy to point in documentation like
@see RTE_DMA_DIR_*


> > 3.
> > COMMENT: > +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
> > We are not making release as pubic API in other device class. See ethdev spec.
> > bbdev/eventdev/rawdev
> >
> > REPLY: because ethdev's queue is hard-queue, and here is the software defined channels,
> > I think release is OK, BTW: bbdev/eventdev also have release ops.

I don't see any API like rte_event_queue_release() in event dev. It
has the only setup.

Typical flow is
1) configure() the N vchan
2) for i..N setup() the chan
3) start()
3) stop()
4) configure again with M vchan
5)  for i..M setup() the chan
5) start()

And above is documented at the beginning of the rte_dmadev.h header file.
I think, above sequence makes it easy for drivers. Just like other
device class _release can be
PMD hook which will be handled in configure() common code.



> >
> Ok


> > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > fields */
> > > +};
> > Please add the capability for each counter in info structure as one
> > device may support all the counters.
> >
> > REPLY: This is a statistics function. If this function is not supported,
> > then do not need to implement the stats ops function. Also could to set
> > the unimplemented ones to zero.
> >
> +1
> The stats functions should be a minimum set that is supported by all
> drivers. Each of these stats can be easily tracked by software if HW
> support for it is not available, so I agree that we should not have each
> stat as a capability.

In our current HW, submitted_count and completed_count offloaded to HW.
In addition to that, we have a provision for getting stats for bytes
copied.( We can make it as xstat, if other drivers won't support)

our plan is to use enqueued_count and completed_fail_count in SW under
condition compilation flags or another scheme as it is in fastpath.

If we are not planning to add capability, IMO, we need to update the
documentation,
like unimplemented counters will return zero. But there is the
question of how to differentiate between
unimplemented vs genuine zero value. IMO, we can update the doc for
this case as well or
add capability.


>
> > 5.
> > COMMENT: > +#endif
> > > +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> > Instead of every driver set the NOP function, In the common code, If
> > the CAPA is not set,
> > common code can set NOP function for this with <0 return value.
> >
> > REPLY: I don't think it's a good idea to judge in IO path, it's application duty to ensure
> > don't call API which driver not supported (which could get from capabilities).
> >
> For datapath functions, +1.

OK. Probably add some NOP function(returns it as error) in pmd.h so
that all drivers can reuse.
No strong opnion.

>
> > 6.
> > COMMENT: > +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> > > +                          const uint16_t nb_status, uint32_t *status,
> > uint32_t -> enum rte_dma_status_code
> >
> > REPLY：I'm still evaluating this. It takes a long time for the driver to perform error code
> > conversion in this API. Do we need to provide an error code conversion function alone ?
> >
> It's not that difficult a conversion to do, and so long as we have the
> regular "completed" function which doesn't do all the error manipulation we
> should be fine. Performance in the case of errors is not expected to be as
> good, since errors should be very rare.

+1

>
> > 7.
> > COMMENT: > +typedef int (*dmadev_info_get_t)(struct rte_dmadev *dev,
> > > +                                struct rte_dmadev_info *dev_info);
> > Please change to rte_dmadev_info_get_t to avoid conflict due to namespace issue
> > as this header is exported.
> >
> > REPLY: I prefer not add 'rte_' prefix, it make the define too long.
> >
> I disagree on this, they need the rte_ prefix, despite the fact it makes
> them longer. If length is a concern, these can be changed from "dmadev_" to
> "rte_dma_", which is only one character longer.
> In fact, I believe Morten already suggested we use "rte_dma" rather than
> "rte_dmadev" as a function prefix across the library.

+1

>
> > 8.
> > COMMENT: > + *        - rte_dmadev_completed_fails()
> > > + *            - return the number of operation requests failed to complete.
> > Please rename this to "completed_status" to allow the return of information
> > other than just errors. As I suggested before, I think this should also be
> > usable as a slower version of "completed" even in the case where there are
> > no errors, in that it returns status information for each and every job
> > rather than just returning as soon as it hits a failure.
> >
> > REPLY: well, I think it maybe confuse (current OK/FAIL API is easy to understand.),
> > and we can build the slow path function on the two API.
> >
> I still disagree on this too. We have a "completed" op where we get
> informed of what has completed and minimal error indication, and a
> "completed_status" operation which provides status information for each
> operation completed, at the cost of speed.

+1

>
> > 9.
> > COMMENT: > +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM       (1ull << 0)
> > > +/**< DMA device support mem-to-mem transfer.
> > Do we need this? Can we assume that any device appearing as a dmadev can
> > do mem-to-mem copies, and drop the capability for mem-to-mem and the
> > capability for copying?
> > also for RTE_DMA_DEV_CAPA_OPS_COPY
> >
> > REPLY: yes, I insist on adding this for the sake of conceptual integrity.
> > For ioat driver just make a statement.
> >
>
> Ok. It seems a wasted bit to me, but I don't see us running out of them
> soon.
>
> > 10.
> > COMMENT: > +  uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> > > +};
> > Let's add rte_dmadev_conf struct into this to return the configuration
> > settings.
> >
> > REPLY: If we add rte_dmadev_conf in, it may break ABI when rte_dmadev_conf add fields.
> >
> Yes, that is true, but I fail to see why that is a major problem. It just
> means that if the conf structure changes we have two functions to version
> instead of one. The information is still useful.
>
> If you don't want the actual conf structure explicitly put into the info
> struct, we can instead put the fields in directly. I really think that the
> info_get function should provide back to the user the details of what way
> the device was configured previously.
>
> regards,
> /Bruce

Jerin Jacob July 15, 2021, 7:10 a.m. UTC | #7

)
 a

On Tue, Jul 13, 2021 at 6:01 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
>
> This patch introduce 'dmadevice' which is a generic type of DMA
> device.
>
> The APIs of dmadev library exposes some generic operations which can
> enable configuration and I/O with the DMA devices.
>
> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>

Thanks for v3. Seems like all major items as covered. Some more
comments below inline.

I would suggest v4 to split the patch like (so that we can review and
ack each patch)
1) Only public header file with Doxygen inclusion, (There is a lot of
Doxygen syntax issue in the patch)
2) 1 or more patches for implementation.


> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> new file mode 100644
> index 0000000..f6cc4e5
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -0,0 +1,968 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + * Copyright(c) 2021 Marvell International Ltd.
> + * Copyright(c) 2021 SmartShare Systems.
> + */
> +
> +#ifndef _RTE_DMADEV_H_
> +#define _RTE_DMADEV_H_
> +
> +/**
> + * @file rte_dmadev.h
> + *
> + * RTE DMA (Direct Memory Access) device APIs.
> + *
> + * The DMA framework is built on the following model:
> + *
> + *     ---------------   ---------------       ---------------
> + *     | virtual DMA |   | virtual DMA |       | virtual DMA |
> + *     | channel     |   | channel     |       | channel     |
> + *     ---------------   ---------------       ---------------
> + *            |                |                      |
> + *            ------------------                      |
> + *                     |                              |
> + *               ------------                    ------------
> + *               |  dmadev  |                    |  dmadev  |
> + *               ------------                    ------------
> + *                     |                              |
> + *            ------------------               ------------------
> + *            | HW-DMA-channel |               | HW-DMA-channel |
> + *            ------------------               ------------------
> + *                     |                              |
> + *                     --------------------------------
> + *                                     |
> + *                           ---------------------
> + *                           | HW-DMA-Controller |
> + *                           ---------------------
> + *
> + * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
> + * each HW-DMA-channel should be represented by a dmadev.
> + *
> + * The dmadev could create multiple virtual DMA channel, each virtual DMA
> + * channel represents a different transfer context. The DMA operation request
> + * must be submitted to the virtual DMA channel.
> + * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
> + *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
> + *      scenario.
> + *
> + * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
> + * PCI/SoC device probing phase performed at EAL initialization time. And could
> + * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
> + * phase.
> + *
> + * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
> + * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
> + *
> + * The functions exported by the dmadev API to setup a device designated by its
> + * device identifier must be invoked in the following order:
> + *     - rte_dmadev_configure()
> + *     - rte_dmadev_vchan_setup()
> + *     - rte_dmadev_start()
> + *
> + * Then, the application can invoke dataplane APIs to process jobs.
> + *
> + * If the application wants to change the configuration (i.e. call
> + * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
> + * device and then do the reconfiguration before calling rte_dmadev_start()
> + * again. The dataplane APIs should not be invoked when the device is stopped.
> + *
> + * Finally, an application can close a dmadev by invoking the
> + * rte_dmadev_close() function.
> + *
> + * The dataplane APIs include two parts:
> + *   a) The first part is the submission of operation requests:
> + *        - rte_dmadev_copy()
> + *        - rte_dmadev_copy_sg() - scatter-gather form of copy
> + *        - rte_dmadev_fill()
> + *        - rte_dmadev_fill_sg() - scatter-gather form of fill

rte_dmadev_fill_sg already removed.


> + *        - rte_dmadev_perform() - issue doorbell to hardware
> + *      These APIs could work with different virtual DMA channels which have
> + *      different contexts.

Please tell about SUBMIT flag option as well.


> + *      The first four APIs are used to submit the operation request to the
> + *      virtual DMA channel, if the submission is successful, a uint16_t
> + *      ring_idx is returned, otherwise a negative number is returned.
> + *   b) The second part is to obtain the result of requests:
> + *        - rte_dmadev_completed()
> + *            - return the number of operation requests completed successfully.
> + *        - rte_dmadev_completed_fails()
> + *            - return the number of operation requests failed to complete.
> + *
> + * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg() returned,
> + * the rules are as follows:
> + *   a) ring_idx for each virtual DMA channel are independent.
> + *   b) For a virtual DMA channel, the ring_idx is monotonically incremented,
> + *      when it reach UINT16_MAX, it wraps back to zero.
> + *   c) This ring_idx can be used by applications to track per-operation
> + *      metadata in an application-defined circular ring.
> + *   d) The initial ring_idx of a virtual DMA channel is zero, after the device
> + *      is stopped, the ring_idx needs to be reset to zero.
> + *   Example:
> + *      step-1: start one dmadev
> + *      step-2: enqueue a copy operation, the ring_idx return is 0
> + *      step-3: enqueue a copy operation again, the ring_idx return is 1
> + *      ...
> + *      step-101: stop the dmadev
> + *      step-102: start the dmadev
> + *      step-103: enqueue a copy operation, the cookie return is 0
> + *      ...
> + *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
> + *      step-x+1: enqueue a copy operation, the ring_idx return is 0
> + *      ...
> + *
> + * By default, all the functions of the dmadev API exported by a PMD are
> + * lock-free functions which assume to not be invoked in parallel on different
> + * logical cores to work on the same target object.
> + *
> + */
> +
> +#include <rte_common.h>
> +#include <rte_compat.h>
> +#ifdef RTE_DMADEV_DEBUG
> +#include <rte_dev.h>
> +#endif
> +#include <rte_errno.h>
> +#include <rte_memory.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#define RTE_DMADEV_NAME_MAX_LEN        RTE_DEV_NAME_MAX_LEN
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * @param dev_id
> + *   DMA device index.
> + *
> + * @return
> + *   - If the device index is valid (true) or not (false).
> + */
> +__rte_experimental
> +bool
> +rte_dmadev_is_valid_dev(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Get the total number of DMA devices that have been successfully
> + * initialised.
> + *
> + * @return
> + *   The total number of usable DMA devices.
> + */
> +__rte_experimental
> +uint16_t
> +rte_dmadev_count(void);
> +
> +/**
> + * The capabilities of a DMA device
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_MEM    (1ull << 0)
> +/**< DMA device support memory-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_MEM_TO_DEV    (1ull << 1)
> +/**< DMA device support memory-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_MEM    (1ull << 2)
> +/**< DMA device support device-to-memory transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_DEV_TO_DEV    (1ull << 3)
> +/**< DMA device support device-to-device transfer.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_COPY      (1ull << 4)
> +/**< DMA device support copy ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_FILL      (1ull << 5)
> +/**< DMA device support fill ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_OPS_SG                (1ull << 6)
> +/**< DMA device support scatter-list ops.
> + * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_FENCE         (1ull << 7)
> +/**< DMA device support fence.
> + * If device support fence, then application could set a fence flags when
> + * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
> + * If a operation has a fence flags, it means the operation must be processed
> + * only after all previous operations are completed.
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +#define RTE_DMA_DEV_CAPA_SVA           (1ull << 8)
> +/**< DMA device support SVA which could use VA as DMA address.
> + * If device support SVA then application could pass any VA address like memory
> + * from rte_malloc(), rte_memzone(), malloc, stack memory.
> + * If device don't support SVA, then application should pass IOVA address which
> + * from rte_malloc(), rte_memzone().
> + *
> + * @see struct rte_dmadev_info::dev_capa
> + */
> +
> +/**
> + * A structure used to retrieve the contextual information of
> + * an DMA device
> + */
> +struct rte_dmadev_info {
> +       struct rte_device *device; /**< Generic Device information */
> +       uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_*) */
> +       /** Maximum number of virtual DMA channels supported */
> +       uint16_t max_vchans;
> +       /** Maximum allowed number of virtual DMA channel descriptors */
> +       uint16_t max_desc;
> +       /** Minimum allowed number of virtual DMA channel descriptors */
> +       uint16_t min_desc;
> +       uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve the contextual information of a DMA device.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param[out] dev_info
> + *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
> + *   contextual information of the device.
> + *
> + * @return
> + *   - =0: Success, driver updates the contextual information of the DMA device
> + *   - <0: Error code returned by the driver info get function.
> + *
> + */
> +__rte_experimental
> +int
> +rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
> +
> +/**
> + * A structure used to configure a DMA device.
> + */
> +struct rte_dmadev_conf {
> +       /** Maximum number of virtual DMA channel to use.
> +        * This value cannot be greater than the field 'max_vchans' of struct
> +        * rte_dmadev_info which get from rte_dmadev_info_get().
> +        */
> +       uint16_t max_vchans;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Configure a DMA device.
> + *
> + * This function must be invoked first before any other function in the
> + * API. This function can also be re-invoked when a device is in the
> + * stopped state.
> + *
> + * @param dev_id
> + *   The identifier of the device to configure.
> + * @param dev_conf
> + *   The DMA device configuration structure encapsulated into rte_dmadev_conf
> + *   object.
> + *
> + * @return
> + *   - =0: Success, device configured.
> + *   - <0: Error code returned by the driver configuration function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Start a DMA device.
> + *
> + * The device start step is the last one and consists of setting the DMA
> + * to start accepting jobs.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device started.
> + *   - <0: Error code returned by the driver start function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_start(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Stop a DMA device.
> + *
> + * The device can be restarted with a call to rte_dmadev_start()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - =0: Success, device stopped.
> + *   - <0: Error code returned by the driver stop function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stop(uint16_t dev_id);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Close a DMA device.
> + *
> + * The device cannot be restarted after this call.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *  - =0: Successfully close device
> + *  - <0: Failure to close device
> + */
> +__rte_experimental
> +int
> +rte_dmadev_close(uint16_t dev_id);
> +
> +/**
> + * DMA transfer direction defines.
> + */
> +#define RTE_DMA_MEM_TO_MEM     (1ull << 0)

RTE_DMA_DIR_MEM_TO_MEM


> +/**< DMA transfer direction - from memory to memory.
> + *
> + * @see struct rte_dmadev_vchan_conf::direction
> + */
> +#define RTE_DMA_MEM_TO_DEV     (1ull << 1)
> +/**< DMA transfer direction - from memory to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> + * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
> + * mode, it could initiate a DMA move request from memory (which is ARM memory)
> + * to device (which is x86 host memory).
> + *
> + * @see struct rte_dmadev_vchan_conf::direction

Also point rte_dmadev_port_parameters::port_type

> + */
> +#define RTE_DMA_DEV_TO_MEM     (1ull << 2)
> +/**< DMA transfer direction - from device to memory.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> + * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
> + * mode, it could initiate a DMA move request from device (which is x86 host
> + * memory) to memory (which is ARM memory).
> + *
> + * @see struct rte_dmadev_vchan_conf::direction

Also point rte_dmadev_port_parameters::port_type

> + */
> +#define RTE_DMA_DEV_TO_DEV     (1ull << 3)
> +/**< DMA transfer direction - from device to device.
> + * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
> + * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
> + * mode, it could initiate a DMA move request from device (which is x86 host
> + * memory) to device (which is another x86 host memory).
> + *
> + * @see struct rte_dmadev_vchan_conf::direction

Also point rte_dmadev_port_parameters::port_type

> + */
> +#define RTE_DMA_TRANSFER_DIR_ALL       (RTE_DMA_MEM_TO_MEM | \
> +                                        RTE_DMA_MEM_TO_DEV | \
> +                                        RTE_DMA_DEV_TO_MEM | \
> +                                        RTE_DMA_DEV_TO_DEV)

RTE_DMA_DIR_ALL ??

> +
> +/**
> + * enum rte_dmadev_port_type - DMA port type defines
> + * When
> + */
> +enum rte_dmadev_port_type {
> +       /** The device port type is PCIE. */
> +       RTE_DMADEV_PORT_OF_PCIE = 1,

Is OF required ? RTE_DMADEV_PORT_PCIE

> +};
> +
> +/**
> + * A structure used to descript DMA port parameters.
> + */
> +struct rte_dmadev_port_parameters {

Please make this as param or params. rte_dmadev_port_param


> +       enum rte_dmadev_port_type port_type;
missing doxgen comment for this.
> +       union {
> +               /** For PCIE port
> +                *
> +                * The following model show SoC's PCIE module connects to
> +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> +                * has an integrate DMA controller.
> +                * If the DMA wants to access the memory of host A, it can be
> +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> +                *
> +                * System Bus
> +                *    |     ----------PCIE module----------
> +                *    |     Bus
> +                *    |     Interface
> +                *    |     -----        ------------------
> +                *    |     |   |        | PCIE Core0     |
> +                *    |     |   |        |                |        -----------
> +                *    |     |   |        |   PF-0 -- VF-0 |        | Host A  |
> +                *    |     |   |--------|        |- VF-1 |--------| Root    |
> +                *    |     |   |        |   PF-1         |        | Complex |
> +                *    |     |   |        |   PF-2         |        -----------
> +                *    |     |   |        ------------------
> +                *    |     |   |
> +                *    |     |   |        ------------------
> +                *    |     |   |        | PCIE Core1     |
> +                *    |     |   |        |                |        -----------
> +                *    |     |   |        |   PF-0 -- VF-0 |        | Host B  |
> +                *    |-----|   |--------|   PF-1 -- VF-0 |--------| Root    |
> +                *    |     |   |        |        |- VF-1 |        | Complex |
> +                *    |     |   |        |   PF-2         |        -----------
> +                *    |     |   |        ------------------
> +                *    |     |   |
> +                *    |     |   |        ------------------
> +                *    |     |DMA|        |                |        ------
> +                *    |     |   |        |                |--------| EP |
> +                *    |     |   |--------| PCIE Core2     |        ------
> +                *    |     |   |        |                |        ------
> +                *    |     |   |        |                |--------| EP |
> +                *    |     |   |        |                |        ------
> +                *    |     -----        ------------------
> +                *
> +                * The following structure is used to describe the above access
> +                * port.
> +                */
> +               struct {
> +                       uint64_t coreid : 3; /**< PCIE core id used */
> +                       uint64_t pfid : 6; /**< PF id used */
> +                       uint64_t vfen : 1; /**< VF enable bit */
> +                       uint64_t vfid : 8; /**< VF id used */

We support up to 12bit. So please make this as 12bit.
Also, this is in a slow path, we may not bitfield here.

> +                       /** The pasid filed in TLP packet */
> +                       uint64_t pasid : 20;
> +                       /** The attributes filed in TLP packet */
> +                       uint64_t attr : 3;
> +                       /** The processing hint filed in TLP packet */
> +                       uint64_t ph : 2;
> +                       /** The steering tag filed in TLP packet */
> +                       uint64_t st : 16;

We don't support a few attributes like passid, ph, st. Do we need
the capability of this? or ignore this. In either case, please update the doc.

We also support additional flags for allocating LLC flag.
This is a hint to DMA engine that the cache blocks should be allocated
in the LLC (if they were not already).
When the MEM pointer is a destination in DMA operation, the referenced
cache blocks are allocated into the cache as part of completing the
DMA (when not already present in the LLC)
this is helpful if software has to access the data right after dma is completed.

Could you add bit or flag for the same?



> +               } pcie;
> +       };
> +       uint64_t reserved[2]; /**< Reserved for future fields */
> +};
> +
> +/**
> + * A structure used to configure a virtual DMA channel.
> + */
> +struct rte_dmadev_vchan_conf {
> +       uint8_t direction;
> +       /**< Set of supported transfer directions
> +        * @see RTE_DMA_MEM_TO_MEM
> +        * @see RTE_DMA_MEM_TO_DEV
> +        * @see RTE_DMA_DEV_TO_MEM
> +        * @see RTE_DMA_DEV_TO_DEV

Since we can set of only one direction per vchan . Should be we make
it as enum to
make it clear.

> +        */
> +       /** Number of descriptor for the virtual DMA channel */
> +       uint16_t nb_desc;
> +       /** 1) Used to describes the port parameter in the device-to-memory
> +        * transfer scenario.
> +        * 2) Used to describes the source port parameter in the
> +        * device-to-device transfer scenario.
> +        * @see struct rte_dmadev_port_parameters
> +        */
> +       struct rte_dmadev_port_parameters src_port;
> +       /** 1) Used to describes the port parameter in the memory-to-device-to
> +        * transfer scenario.
> +        * 2) Used to describes the destination port parameter in the
> +        * device-to-device transfer scenario.
> +        * @see struct rte_dmadev_port_parameters
> +        */
> +       struct rte_dmadev_port_parameters dst_port;
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Allocate and set up a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param conf
> + *   The virtual DMA channel configuration structure encapsulated into
> + *   rte_dmadev_vchan_conf object.
> + *
> + * @return
> + *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
> + *          be less than the field 'max_vchans' of struct rte_dmadev_conf
> + *          which configured by rte_dmadev_configure().
> + *   - <0: Error code returned by the driver virtual channel setup function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_setup(uint16_t dev_id,
> +                      const struct rte_dmadev_vchan_conf *conf);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Release a virtual DMA channel.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel which return by vchan setup.
> + *
> + * @return
> + *   - =0: Successfully release the virtual DMA channel.
> + *   - <0: Error code returned by the driver virtual channel release function.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);

I would like to remove this to align with other device class in DPDK and use
configure and start again if there change in vchannel setup/

> +
> +/**
> + * rte_dmadev_stats - running statistics.
> + */
> +struct rte_dmadev_stats {
> +       /** Count of operations which were successfully enqueued */
> +       uint64_t enqueued_count;
> +       /** Count of operations which were submitted to hardware */
> +       uint64_t submitted_count;
> +       /** Count of operations which failed to complete */
> +       uint64_t completed_fail_count;
> +       /** Count of operations which successfully complete */
> +       uint64_t completed_count;
> +};

Provided comment on stats, in another thread.

> +
> +#define RTE_DMADEV_ALL_VCHAN   0xFFFFu

RTE_DMADEV_VCHAN_ALL ??

> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Retrieve basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
> + * @param[out] stats
> + *   The basic statistics structure encapsulated into rte_dmadev_stats
> + *   object.
> + *
> + * @return
> + *   - =0: Successfully retrieve stats.
> + *   - <0: Failure to retrieve stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
> +                    struct rte_dmadev_stats *stats);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Reset basic statistics of a or all virtual DMA channel(s).
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
> + *
> + * @return
> + *   - =0: Successfully reset stats.
> + *   - <0: Failure to reset stats.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Dump DMA device info.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param f
> + *   The file to write the output to.
> + *
> + * @return
> + *   0 on success. Non-zero otherwise.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_dump(uint16_t dev_id, FILE *f);
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger the dmadev self test.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + *
> + * @return
> + *   - 0: Selftest successful.
> + *   - -ENOTSUP if the device doesn't support selftest
> + *   - other values < 0 on failure.
> + */
> +__rte_experimental
> +int
> +rte_dmadev_selftest(uint16_t dev_id);
> +
> +/**
> + * rte_dma_sge - can hold scatter DMA operation request entry
> + */
> +struct rte_dma_sge {
> +       rte_iova_t addr;
> +       uint32_t length;
> +};
> +
> +/**
> + * rte_dma_sg - can hold scatter DMA operation request
> + */
> +struct rte_dma_sg {
> +       struct rte_dma_sge *src;
> +       struct rte_dma_sge *dst;
> +       uint16_t nb_src; /**< The number of src entry */
> +       uint16_t nb_dst; /**< The number of dst entry */
> +};
> +
> +#include "rte_dmadev_core.h"
> +
> +/**
> + *  DMA flags to augment operation preparation.
> + *  Used as the 'flags' parameter of rte_dmadev_copy/fill.
> + */
> +#define RTE_DMA_OP_FLAG_FENCE  (1ull << 0)
> +/**< DMA fence flag
> + * It means the operation with this flag must be processed only after all
> + * previous operations are completed.
> + *
> + * @see rte_dmadev_copy()
> + * @see rte_dmadev_copy_sg()
> + * @see rte_dmadev_fill()
> + */
> +#define RTE_DMA_OP_FLAG_SUBMIT (1ull << 1)
> +/**< DMA submit flag
> + * It means the operation with this flag must issue doorbell to hardware after
> + * enqueued jobs.
> + */
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a copy operation onto the virtual DMA channel.
> + *
> + * This queues up a copy operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param src
> + *   The address of the source buffer.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the data to be copied.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy job.
> + *   - <0: Error code returned by the driver copy function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
> +               uint32_t length, uint64_t flags)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
> +#endif
> +       return (*dev->copy)(dev, vchan, src, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> + *
> + * This queues up a scatter list copy operation to be performed by hardware,
> + * but does not trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.,
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param sg
> + *   The pointer of scatterlist.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
> + *   - <0: Error code returned by the driver copy scatterlist function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
> +                  uint64_t flags)

In order to avoid population of rte_dma_sg in stack (as it is
fastpath), I would like
to change the API as
rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, struct rte_dma_sge
*src,  struct rte_dma_sge *dst,   uint16_t nb_src, uint16_t nb_dst,
uint64_t flags)


> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           sg == NULL)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> +#endif
> +       return (*dev->copy_sg)(dev, vchan, sg, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Enqueue a fill operation onto the virtual DMA channel.
> + *
> + * This queues up a fill operation to be performed by hardware, but does not
> + * trigger hardware to begin that operation.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param pattern
> + *   The pattern to populate the destination buffer with.
> + * @param dst
> + *   The address of the destination buffer.
> + * @param length
> + *   The length of the destination buffer.
> + * @param flags
> + *   An flags for this operation.
> + *   @see RTE_DMA_OP_FLAG_*
> + *
> + * @return
> + *   - 0..UINT16_MAX: index of enqueued fill job.
> + *   - <0: Error code returned by the driver fill function.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
> +               rte_iova_t dst, uint32_t length, uint64_t flags)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
> +#endif
> +       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Trigger hardware to begin performing enqueued operations.
> + *
> + * This API is used to write the "doorbell" to the hardware to trigger it
> + * to begin the operations previously enqueued by rte_dmadev_copy/fill()
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + *
> + * @return
> + *   - =0: Successfully trigger hardware.
> + *   - <0: Failure to trigger hardware.
> + */
> +__rte_experimental
> +static inline int
> +rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
> +#endif
> +       return (*dev->submit)(dev, vchan);
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that have been successfully completed.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_cpls
> + *   The maximum number of completed operations that can be processed.
> + * @param[out] last_idx
> + *   The last completed operation's index.
> + *   If not required, NULL can be passed in.
> + * @param[out] has_error
> + *   Indicates if there are transfer error.
> + *   If not required, NULL can be passed in.
> + *
> + * @return
> + *   The number of operations that successfully completed.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
> +                    uint16_t *last_idx, bool *has_error)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +       uint16_t idx;
> +       bool err;
> +
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           nb_cpls == 0)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
> +#endif
> +
> +       /* Ensure the pointer values are non-null to simplify drivers.
> +        * In most cases these should be compile time evaluated, since this is
> +        * an inline function.
> +        * - If NULL is explicitly passed as parameter, then compiler knows the
> +        *   value is NULL
> +        * - If address of local variable is passed as parameter, then compiler
> +        *   can know it's non-NULL.
> +        */
> +       if (last_idx == NULL)
> +               last_idx = &idx;
> +       if (has_error == NULL)
> +               has_error = &err;
> +
> +       *has_error = false;
> +       return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
> +}
> +
> +/**
> + * DMA transfer status code defines
> + */
> +enum rte_dma_status_code {
> +       /** The operation completed successfully */
> +       RTE_DMA_STATUS_SUCCESSFUL = 0,
> +       /** The operation failed to complete due active drop
> +        * This is mainly used when processing dev_stop, allow outstanding
> +        * requests to be completed as much as possible.
> +        */
> +       RTE_DMA_STATUS_ACTIVE_DROP,
> +       /** The operation failed to complete due invalid source address */
> +       RTE_DMA_STATUS_INVALID_SRC_ADDR,
> +       /** The operation failed to complete due invalid destination address */
> +       RTE_DMA_STATUS_INVALID_DST_ADDR,
> +       /** The operation failed to complete due invalid length */
> +       RTE_DMA_STATUS_INVALID_LENGTH,
> +       /** The operation failed to complete due invalid opcode
> +        * The DMA descriptor could have multiple format, which are
> +        * distinguished by the opcode field.
> +        */
> +       RTE_DMA_STATUS_INVALID_OPCODE,
> +       /** The operation failed to complete due bus err */
> +       RTE_DMA_STATUS_BUS_ERROR,
> +       /** The operation failed to complete due data poison */
> +       RTE_DMA_STATUS_DATA_POISION,
> +       /** The operation failed to complete due descriptor read error */
> +       RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
> +       /** The operation failed to complete due device link error
> +        * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
> +        * dev-to-dev transfer scenario.
> +        */
> +       RTE_DMA_STATUS_DEV_LINK_ERROR,
> +       /** The operation failed to complete due unknown reason */
> +       RTE_DMA_STATUS_UNKNOWN,
> +       /** Driver specific status code offset
> +        * Start status code for the driver to define its own error code.
> +        */
> +       RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
> +};
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Returns the number of operations that failed to complete.
> + * NOTE: This API was used when rte_dmadev_completed has_error was set.
> + *
> + * @param dev_id
> + *   The identifier of the device.
> + * @param vchan
> + *   The identifier of virtual DMA channel.
> + * @param nb_status
> + *   Indicates the size of status array.
> + * @param[out] status
> + *   The error code of operations that failed to complete.
> + *   Some standard error code are described in 'enum rte_dma_status_code'
> + *   @see rte_dma_status_code
> + * @param[out] last_idx
> + *   The last failed completed operation's index.
> + *
> + * @return
> + *   The number of operations that failed to complete.
> + */
> +__rte_experimental
> +static inline uint16_t
> +rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
> +                          const uint16_t nb_status, uint32_t *status,
> +                          uint16_t *last_idx)
> +{
> +       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> +#ifdef RTE_DMADEV_DEBUG
> +       if (!rte_dmadev_is_valid_dev(dev_id) ||
> +           vchan >= dev->data->dev_conf.max_vchans ||
> +           nb_status == 0 ||
> +           status == NULL ||
> +           last_idx == NULL)
> +               return -EINVAL;
> +       RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
> +#endif
> +       return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
> +}
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_H_ */
> diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
> new file mode 100644
> index 0000000..b0b6494
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_core.h
> @@ -0,0 +1,161 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + * Copyright(c) 2021 Intel Corporation.
> + */
> +
> +#ifndef _RTE_DMADEV_CORE_H_
> +#define _RTE_DMADEV_CORE_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device internal header.
> + *
> + * This header contains internal data types, that are used by the DMA devices
> + * in order to expose their ops to the class.
> + *
> + * Applications should not use these API directly.
> + *
> + */
> +
> +struct rte_dmadev;
> +
> +/** @internal Used to get device information of a device. */
> +typedef int (*dmadev_info_get_t)(const struct rte_dmadev *dev,
> +                                struct rte_dmadev_info *dev_info,
> +                                uint32_t info_sz);
> +
> +/** @internal Used to configure a device. */
> +typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
> +                                 const struct rte_dmadev_conf *dev_conf);
> +
> +/** @internal Used to start a configured device. */
> +typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to stop a configured device. */
> +typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to close a configured device. */
> +typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
> +
> +/** @internal Used to allocate and set up a virtual DMA channel. */
> +typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
> +                                   const struct rte_dmadev_vchan_conf *conf);
> +
> +/** @internal Used to release a virtual DMA channel. */
> +typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to retrieve basic statistics. */
> +typedef int (*dmadev_stats_get_t)(const struct rte_dmadev *dev, uint16_t vchan,
> +                                 struct rte_dmadev_stats *stats,
> +                                 uint32_t stats_sz);
> +
> +/** @internal Used to reset basic statistics. */
> +typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to dump internal information. */
> +typedef int (*dmadev_dump_t)(const struct rte_dmadev *dev, FILE *f);
> +
> +/** @internal Used to start dmadev selftest. */
> +typedef int (*dmadev_selftest_t)(uint16_t dev_id);
> +
> +/** @internal Used to enqueue a copy operation. */
> +typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
> +                            rte_iova_t src, rte_iova_t dst,
> +                            uint32_t length, uint64_t flags);
> +
> +/** @internal Used to enqueue a scatter list copy operation. */
> +typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
> +                               const struct rte_dma_sg *sg, uint64_t flags);
> +
> +/** @internal Used to enqueue a fill operation. */
> +typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
> +                            uint64_t pattern, rte_iova_t dst,
> +                            uint32_t length, uint64_t flags);
> +
> +/** @internal Used to trigger hardware to begin working. */
> +typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
> +
> +/** @internal Used to return number of successful completed operations. */
> +typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
> +                                      const uint16_t nb_cpls,
> +                                      uint16_t *last_idx, bool *has_error);
> +
> +/** @internal Used to return number of failed completed operations. */
> +typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
> +                       uint16_t vchan, const uint16_t nb_status,
> +                       uint32_t *status, uint16_t *last_idx);
> +
> +/**
> + * Possible states of a DMA device.
> + */
> +enum rte_dmadev_state {
> +       /** Device is unused before being probed. */
> +       RTE_DMADEV_UNUSED = 0,
> +       /** Device is attached when allocated in probing. */
> +       RTE_DMADEV_ATTACHED,
> +};
> +
> +/**
> + * DMA device operations function pointer table
> + */
> +struct rte_dmadev_ops {
> +       dmadev_info_get_t dev_info_get;
> +       dmadev_configure_t dev_configure;
> +       dmadev_start_t dev_start;
> +       dmadev_stop_t dev_stop;
> +       dmadev_close_t dev_close;
> +       dmadev_vchan_setup_t vchan_setup;
> +       dmadev_vchan_release_t vchan_release;
> +       dmadev_stats_get_t stats_get;
> +       dmadev_stats_reset_t stats_reset;
> +       dmadev_dump_t dev_dump;
> +       dmadev_selftest_t dev_selftest;
> +};
> +
> +/**
> + * @internal
> + * The data part, with no function pointers, associated with each DMA device.
> + *
> + * This structure is safe to place in shared memory to be common among different
> + * processes in a multi-process configuration.
> + */
> +struct rte_dmadev_data {
> +       void *dev_private; /**< PMD-specific private data. */
> +       uint16_t dev_id; /**< Device [external] identifier. */
> +       char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
> +       struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
> +       uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
> +       uint64_t reserved[2]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +
> +/**
> + * @internal
> + * The generic data structure associated with each DMA device.
> + *
> + * The dataplane APIs are located at the beginning of the structure, along
> + * with the pointer to where all the data elements for the particular device
> + * are stored in shared memory. This split scheme allows the function pointer
> + * and driver data to be per-process, while the actual configuration data for
> + * the device is shared.
> + */
> +struct rte_dmadev {
> +       dmadev_copy_t copy;
> +       dmadev_copy_sg_t copy_sg;
> +       dmadev_fill_t fill;
> +       dmadev_submit_t submit;
> +       dmadev_completed_t completed;
> +       dmadev_completed_fails_t completed_fails;
> +       void *reserved_ptr; /**< Reserved for future IO function */
> +       struct rte_dmadev_data *data; /**< Pointer to device data. */
> +
> +       const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
> +       /** Device info which supplied during device initialization. */
> +       struct rte_device *device;
> +       enum rte_dmadev_state state; /**< Flag indicating the device state */
> +       uint64_t reserved[2]; /**< Reserved for future fields */
> +} __rte_cache_aligned;
> +
> +extern struct rte_dmadev rte_dmadevices[];
> +
> +#endif /* _RTE_DMADEV_CORE_H_ */
> diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
> new file mode 100644
> index 0000000..45141f9
> --- /dev/null
> +++ b/lib/dmadev/rte_dmadev_pmd.h
> @@ -0,0 +1,72 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2021 HiSilicon Limited.
> + */
> +
> +#ifndef _RTE_DMADEV_PMD_H_
> +#define _RTE_DMADEV_PMD_H_
> +
> +/**
> + * @file
> + *
> + * RTE DMA Device PMD APIs
> + *
> + * Driver facing APIs for a DMA device. These are not to be called directly by
> + * any application.
> + */
> +
> +#include "rte_dmadev.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + * @internal
> + * Allocates a new dmadev slot for an DMA device and returns the pointer
> + * to that slot for the driver to use.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_pmd_allocate(const char *name);
> +
> +/**
> + * @internal
> + * Release the specified dmadev.
> + *
> + * @param dev
> + *   Device to be released.
> + *
> + * @return
> + *   - 0 on success, negative on error
> + */
> +__rte_internal
> +int
> +rte_dmadev_pmd_release(struct rte_dmadev *dev);
> +
> +/**
> + * @internal
> + * Return the DMA device based on the device name.
> + *
> + * @param name
> + *   DMA device name.
> + *
> + * @return
> + *   A pointer to the DMA device slot case of success,
> + *   NULL otherwise.
> + */
> +__rte_internal
> +struct rte_dmadev *
> +rte_dmadev_get_device_by_name(const char *name);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_DMADEV_PMD_H_ */
> diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
> new file mode 100644
> index 0000000..2af78e4
> --- /dev/null
> +++ b/lib/dmadev/version.map
> @@ -0,0 +1,37 @@
> +EXPERIMENTAL {
> +       global:
> +
> +       rte_dmadev_close;
> +       rte_dmadev_completed;
> +       rte_dmadev_completed_fails;
> +       rte_dmadev_configure;
> +       rte_dmadev_copy;
> +       rte_dmadev_copy_sg;
> +       rte_dmadev_count;
> +       rte_dmadev_dump;
> +       rte_dmadev_fill;
> +       rte_dmadev_info_get;
> +       rte_dmadev_is_valid_dev;
> +       rte_dmadev_selftest;
> +       rte_dmadev_start;
> +       rte_dmadev_stats_get;
> +       rte_dmadev_stats_reset;
> +       rte_dmadev_stop;
> +       rte_dmadev_submit;
> +       rte_dmadev_vchan_release;
> +       rte_dmadev_vchan_setup;
> +
> +       local: *;
> +};
> +
> +INTERNAL {
> +        global:
> +
> +       rte_dmadevices;
> +       rte_dmadev_get_device_by_name;
> +       rte_dmadev_pmd_allocate;
> +       rte_dmadev_pmd_release;
> +
> +       local: *;
> +};
> +
> diff --git a/lib/meson.build b/lib/meson.build
> index 1673ca4..68d239f 100644
> --- a/lib/meson.build
> +++ b/lib/meson.build
> @@ -60,6 +60,7 @@ libraries = [
>          'bpf',
>          'graph',
>          'node',
> +        'dmadev',
>  ]
>
>  if is_windows
> --
> 2.8.1
>

Bruce Richardson July 15, 2021, 8:25 a.m. UTC | #8

On Thu, Jul 15, 2021 at 12:14:05PM +0530, Jerin Jacob wrote:
> On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> 
> > > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > > fields */
> > > > +};
> > > Please add the capability for each counter in info structure as one
> > > device may support all the counters.
> > >
> > > REPLY: This is a statistics function. If this function is not supported,
> > > then do not need to implement the stats ops function. Also could to set
> > > the unimplemented ones to zero.
> > >
> > +1
> > The stats functions should be a minimum set that is supported by all
> > drivers. Each of these stats can be easily tracked by software if HW
> > support for it is not available, so I agree that we should not have each
> > stat as a capability.
> 
> In our current HW, submitted_count and completed_count offloaded to HW.
> In addition to that, we have a provision for getting stats for bytes
> copied.( We can make it as xstat, if other drivers won't support)
> 
> our plan is to use enqueued_count and completed_fail_count in SW under
> condition compilation flags or another scheme as it is in fastpath.
> 
> If we are not planning to add capability, IMO, we need to update the
> documentation,
> like unimplemented counters will return zero. But there is the
> question of how to differentiate between
> unimplemented vs genuine zero value. IMO, we can update the doc for
> this case as well or
> add capability.
> 

While we could add capabilities for stats, I'd really rather not. Let's
just get an agreed upon minimum set. Seems like submitted and completed are
fine for all, which just leaves two to discuss for an in/out decision.

Jerin, can fail count be kept without conditional compilation, perhaps,
because it should not be touched in the fastpath but just on error legs?

For enqueued_count, in our driver I was just going to track the difference
between last doorbell and this one - which we would be tracking anyway, or
could compute very easily by saving last doorbell counter -  and add that to
the submitted count when stats are requested. That would again ensure no
fastpath impact bar perhaps storing one additional variable (old DB) per
burst. If that is felt too cumbersome, I think we can drop it, but let's at
least keep error count.

Thanks,
/Bruce

fengchengwen July 15, 2021, 8:29 a.m. UTC | #9

On 2021/7/14 20:22, Nipun Gupta wrote:
> <snip>
> 
>> +/**
>> + * A structure used to configure a virtual DMA channel.
>> + */
>> +struct rte_dmadev_vchan_conf {
>> +	uint8_t direction;
>> +	/**< Set of supported transfer directions
>> +	 * @see RTE_DMA_MEM_TO_MEM
>> +	 * @see RTE_DMA_MEM_TO_DEV
>> +	 * @see RTE_DMA_DEV_TO_MEM
>> +	 * @see RTE_DMA_DEV_TO_DEV
>> +	 */
>> +	/** Number of descriptor for the virtual DMA channel */
>> +	uint16_t nb_desc;
>> +	/** 1) Used to describes the port parameter in the device-to-memory
>> +	 * transfer scenario.
>> +	 * 2) Used to describes the source port parameter in the
>> +	 * device-to-device transfer scenario.
>> +	 * @see struct rte_dmadev_port_parameters
>> +	 */
> 
> There should also be a configuration to support no response (per Virtual Channel),
> And if that is enabled, user will not be required to call 'rte_dmadev_completed' API.
> This shall also be part of capability.

Do you mean some silent mode? The application only needs to submit requests to the
hardware.

Could you briefly describe the working principles and application scenarios of the
corresponding device?

> 
>> +	struct rte_dmadev_port_parameters src_port;
>> +	/** 1) Used to describes the port parameter in the memory-to-device-to
>> +	 * transfer scenario.
>> +	 * 2) Used to describes the destination port parameter in the
>> +	 * device-to-device transfer scenario.
>> +	 * @see struct rte_dmadev_port_parameters
>> +	 */
>> +	struct rte_dmadev_port_parameters dst_port;
>> +};
>> +
> 
> <snip>
> 
>> +/**
>> + * @warning
>> + * @b EXPERIMENTAL: this API may change without prior notice.
>> + *
>> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
>> + *
>> + * This queues up a scatter list copy operation to be performed by hardware,
>> + * but does not trigger hardware to begin that operation.
> 
> This would need update with the submit flag.
> The statement should be true only when the flag is set?
> Similar comment I see on 'rte_dmadev_copy_sg' and 'rte_dma_fill' APIs

OK, will fix in V4

> 
>> + *
>> + * @param dev_id
>> + *   The identifier of the device.
>> + * @param vchan
>> + *   The identifier of virtual DMA channel.
>> + * @param sg
>> + *   The pointer of scatterlist.
>> + * @param flags
>> + *   An flags for this operation.
>> + *   @see RTE_DMA_OP_FLAG_*
>> + *
>> + * @return
>> + *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
>> + *   - <0: Error code returned by the driver copy scatterlist function.
>> + */
>> +__rte_experimental
>> +static inline int
>> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg
>> *sg,
>> +		   uint64_t flags)
>> +{
>> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
>> +#ifdef RTE_DMADEV_DEBUG
>> +	if (!rte_dmadev_is_valid_dev(dev_id) ||
>> +	    vchan >= dev->data->dev_conf.max_vchans ||
>> +	    sg == NULL)
>> +		return -EINVAL;
>> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
>> +#endif
>> +	return (*dev->copy_sg)(dev, vchan, sg, flags);
>> +}
>> +
> 
> .
>

Bruce Richardson July 15, 2021, 9:03 a.m. UTC | #10

On Thu, Jul 15, 2021 at 12:40:01PM +0530, Jerin Jacob wrote:
> )
>  a
> 
> On Tue, Jul 13, 2021 at 6:01 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> >
> > This patch introduce 'dmadevice' which is a generic type of DMA
> > device.
> >
> > The APIs of dmadev library exposes some generic operations which can
> > enable configuration and I/O with the DMA devices.
> >
> > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> 
> Thanks for v3. Seems like all major items as covered. Some more
> comments below inline.
> 
> I would suggest v4 to split the patch like (so that we can review and
> ack each patch)
> 1) Only public header file with Doxygen inclusion, (There is a lot of
> Doxygen syntax issue in the patch)
> 2) 1 or more patches for implementation.
> 

One additional follow-up comment on flags below.

/Bruce

> 
> > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > new file mode 100644
> > index 0000000..f6cc4e5
<snip>
> > +       enum rte_dmadev_port_type port_type;
> missing doxgen comment for this.
> > +       union {
> > +               /** For PCIE port
> > +                *
> > +                * The following model show SoC's PCIE module connects to
> > +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> > +                * has an integrate DMA controller.
> > +                * If the DMA wants to access the memory of host A, it can be
> > +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> > +                *
<snip>
> +                       /** The pasid filed in TLP packet */
> > +                       uint64_t pasid : 20;
> > +                       /** The attributes filed in TLP packet */
> > +                       uint64_t attr : 3;
> > +                       /** The processing hint filed in TLP packet */
> > +                       uint64_t ph : 2;
> > +                       /** The steering tag filed in TLP packet */
> > +                       uint64_t st : 16;
> 
> We don't support a few attributes like passid, ph, st. Do we need
> the capability of this? or ignore this. In either case, please update the doc.
> 
> We also support additional flags for allocating LLC flag.
> This is a hint to DMA engine that the cache blocks should be allocated
> in the LLC (if they were not already).
> When the MEM pointer is a destination in DMA operation, the referenced
> cache blocks are allocated into the cache as part of completing the
> DMA (when not already present in the LLC)
> this is helpful if software has to access the data right after dma is completed.
> 
> Could you add bit or flag for the same?
> 

I wonder if this is the best location for such a flag for LLC vs memory
writes. It would also apply to memory-to-memory transactions, not just for
those done to PCI devices. As well as that, I think any flag should default
to "on" rather than "off" since writing to cache rather than DRAM is
generally the desired behaviour, I would think. Should it be a
per-operation flag, rather than per context?

<snip>

Jerin Jacob July 15, 2021, 9:30 a.m. UTC | #11

On Thu, Jul 15, 2021 at 2:33 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Jul 15, 2021 at 12:40:01PM +0530, Jerin Jacob wrote:
> > )
> >  a
> >
> > On Tue, Jul 13, 2021 at 6:01 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > >
> > > This patch introduce 'dmadevice' which is a generic type of DMA
> > > device.
> > >
> > > The APIs of dmadev library exposes some generic operations which can
> > > enable configuration and I/O with the DMA devices.
> > >
> > > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> >
> > Thanks for v3. Seems like all major items as covered. Some more
> > comments below inline.
> >
> > I would suggest v4 to split the patch like (so that we can review and
> > ack each patch)
> > 1) Only public header file with Doxygen inclusion, (There is a lot of
> > Doxygen syntax issue in the patch)
> > 2) 1 or more patches for implementation.
> >
>
> One additional follow-up comment on flags below.
>
> /Bruce
>
> >
> > > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > > new file mode 100644
> > > index 0000000..f6cc4e5
> <snip>
> > > +       enum rte_dmadev_port_type port_type;
> > missing doxgen comment for this.
> > > +       union {
> > > +               /** For PCIE port
> > > +                *
> > > +                * The following model show SoC's PCIE module connects to
> > > +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> > > +                * has an integrate DMA controller.
> > > +                * If the DMA wants to access the memory of host A, it can be
> > > +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> > > +                *
> <snip>
> > +                       /** The pasid filed in TLP packet */
> > > +                       uint64_t pasid : 20;
> > > +                       /** The attributes filed in TLP packet */
> > > +                       uint64_t attr : 3;
> > > +                       /** The processing hint filed in TLP packet */
> > > +                       uint64_t ph : 2;
> > > +                       /** The steering tag filed in TLP packet */
> > > +                       uint64_t st : 16;
> >
> > We don't support a few attributes like passid, ph, st. Do we need
> > the capability of this? or ignore this. In either case, please update the doc.
> >
> > We also support additional flags for allocating LLC flag.
> > This is a hint to DMA engine that the cache blocks should be allocated
> > in the LLC (if they were not already).
> > When the MEM pointer is a destination in DMA operation, the referenced
> > cache blocks are allocated into the cache as part of completing the
> > DMA (when not already present in the LLC)
> > this is helpful if software has to access the data right after dma is completed.
> >
> > Could you add bit or flag for the same?
> >
>
> I wonder if this is the best location for such a flag for LLC vs memory
> writes. It would also apply to memory-to-memory transactions, not just for
> those done to PCI devices.

Ack. it can be used for MEM to MEM

>  As well as that, I think any flag should default
> to "on" rather than "off" since writing to cache rather than DRAM is
> generally the desired behaviour, I would think.

I think, keeping it is "allocate in LLC" on all transfer will not be good.
As large transters polute the LLC and dataplane may not touch the complete
data only header. Also in device copy, Adding it LLC there is an
additional cost unline MEM-MEM.

So IMO, better to add the flag to allow to allocate to LLC as a HINT.

> Should it be a per-operation flag, rather than per context?

Yes. better it be per-operation as it is the hint.


>
> <snip>

Jerin Jacob July 15, 2021, 9:49 a.m. UTC | #12

On Thu, Jul 15, 2021 at 1:55 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Jul 15, 2021 at 12:14:05PM +0530, Jerin Jacob wrote:
> > On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> >
> > > > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > > > fields */
> > > > > +};
> > > > Please add the capability for each counter in info structure as one
> > > > device may support all the counters.
> > > >
> > > > REPLY: This is a statistics function. If this function is not supported,
> > > > then do not need to implement the stats ops function. Also could to set
> > > > the unimplemented ones to zero.
> > > >
> > > +1
> > > The stats functions should be a minimum set that is supported by all
> > > drivers. Each of these stats can be easily tracked by software if HW
> > > support for it is not available, so I agree that we should not have each
> > > stat as a capability.
> >
> > In our current HW, submitted_count and completed_count offloaded to HW.
> > In addition to that, we have a provision for getting stats for bytes
> > copied.( We can make it as xstat, if other drivers won't support)
> >
> > our plan is to use enqueued_count and completed_fail_count in SW under
> > condition compilation flags or another scheme as it is in fastpath.
> >
> > If we are not planning to add capability, IMO, we need to update the
> > documentation,
> > like unimplemented counters will return zero. But there is the
> > question of how to differentiate between
> > unimplemented vs genuine zero value. IMO, we can update the doc for
> > this case as well or
> > add capability.
> >
>
> While we could add capabilities for stats, I'd really rather not. Let's
> just get an agreed upon minimum set. Seems like submitted and completed are
> fine for all, which just leaves two to discuss for an in/out decision.
>
> Jerin, can fail count be kept without conditional compilation, perhaps,
> because it should not be touched in the fastpath but just on error legs?

Agree.

>
> For enqueued_count, in our driver I was just going to track the difference
> between last doorbell and this one - which we would be tracking anyway, or
> could compute very easily by saving last doorbell counter -  and add that to
> the submitted count when stats are requested. That would again ensure no
> fastpath impact bar perhaps storing one additional variable (old DB) per
> burst. If that is felt too cumbersome, I think we can drop it, but lets at
> least keep error count.

+1 to keep submitted_count, completed_count, and fail count.

enqueue count can be move to xstat if it is supported by drivers.
Also since drivers are returning, 0-2^16 monotonically incrementing counter ,
even applications can track the enqueue count if needed without the driver
support.



>
> Thanks,
> /Bruce

Bruce Richardson July 15, 2021, 10 a.m. UTC | #13

On Thu, Jul 15, 2021 at 03:19:55PM +0530, Jerin Jacob wrote:
> On Thu, Jul 15, 2021 at 1:55 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Thu, Jul 15, 2021 at 12:14:05PM +0530, Jerin Jacob wrote:
> > > On Tue, Jul 13, 2021 at 7:08 PM Bruce Richardson
> > > <bruce.richardson@intel.com> wrote:
> > > >
> > > > On Tue, Jul 13, 2021 at 09:06:39PM +0800, fengchengwen wrote:
> > >
> > > > > 4.  COMMENT:> +       uint64_t reserved[4]; /**< Reserved for future
> > > > > fields */
> > > > > > +};
> > > > > Please add the capability for each counter in info structure as one
> > > > > device may support all the counters.
> > > > >
> > > > > REPLY: This is a statistics function. If this function is not supported,
> > > > > then do not need to implement the stats ops function. Also could to set
> > > > > the unimplemented ones to zero.
> > > > >
> > > > +1
> > > > The stats functions should be a minimum set that is supported by all
> > > > drivers. Each of these stats can be easily tracked by software if HW
> > > > support for it is not available, so I agree that we should not have each
> > > > stat as a capability.
> > >
> > > In our current HW, submitted_count and completed_count offloaded to HW.
> > > In addition to that, we have a provision for getting stats for bytes
> > > copied.( We can make it as xstat, if other drivers won't support)
> > >
> > > our plan is to use enqueued_count and completed_fail_count in SW under
> > > condition compilation flags or another scheme as it is in fastpath.
> > >
> > > If we are not planning to add capability, IMO, we need to update the
> > > documentation,
> > > like unimplemented counters will return zero. But there is the
> > > question of how to differentiate between
> > > unimplemented vs genuine zero value. IMO, we can update the doc for
> > > this case as well or
> > > add capability.
> > >
> >
> > While we could add capabilities for stats, I'd really rather not. Let's
> > just get an agreed upon minimum set. Seems like submitted and completed are
> > fine for all, which just leaves two to discuss for an in/out decision.
> >
> > Jerin, can fail count be kept without conditional compilation, perhaps,
> > because it should not be touched in the fastpath but just on error legs?
> 
> Agree.
> 
> >
> > For enqueued_count, in our driver I was just going to track the difference
> > between last doorbell and this one - which we would be tracking anyway, or
> > could compute very easily by saving last doorbell counter -  and add that to
> > the submitted count when stats are requested. That would again ensure no
> > fastpath impact bar perhaps storing one additional variable (old DB) per
> > burst. If that is felt too cumbersome, I think we can drop it, but lets at
> > least keep error count.
> 
> +1 to keep submitted_count, completed_count, and fail count.
> 
> enqueue count can be move to xstat if it is supported by drivers.
> Also since drivers are returning, 0-2^16 monotonically incrementing counter ,
> even applications can track the enqueue count if needed without the driver
> support.
>
Agreed. Let's just stick to 3 basic stats.

/Bruce

Bruce Richardson July 15, 2021, 10:03 a.m. UTC | #14

On Thu, Jul 15, 2021 at 03:00:01PM +0530, Jerin Jacob wrote:
> On Thu, Jul 15, 2021 at 2:33 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Thu, Jul 15, 2021 at 12:40:01PM +0530, Jerin Jacob wrote:
> > > )
> > >  a
> > >
> > > On Tue, Jul 13, 2021 at 6:01 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > > >
> > > > This patch introduce 'dmadevice' which is a generic type of DMA
> > > > device.
> > > >
> > > > The APIs of dmadev library exposes some generic operations which can
> > > > enable configuration and I/O with the DMA devices.
> > > >
> > > > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > >
> > > Thanks for v3. Seems like all major items as covered. Some more
> > > comments below inline.
> > >
> > > I would suggest v4 to split the patch like (so that we can review and
> > > ack each patch)
> > > 1) Only public header file with Doxygen inclusion, (There is a lot of
> > > Doxygen syntax issue in the patch)
> > > 2) 1 or more patches for implementation.
> > >
> >
> > One additional follow-up comment on flags below.
> >
> > /Bruce
> >
> > >
> > > > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > > > new file mode 100644
> > > > index 0000000..f6cc4e5
> > <snip>
> > > > +       enum rte_dmadev_port_type port_type;
> > > missing doxgen comment for this.
> > > > +       union {
> > > > +               /** For PCIE port
> > > > +                *
> > > > +                * The following model show SoC's PCIE module connects to
> > > > +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> > > > +                * has an integrate DMA controller.
> > > > +                * If the DMA wants to access the memory of host A, it can be
> > > > +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> > > > +                *
> > <snip>
> > > +                       /** The pasid filed in TLP packet */
> > > > +                       uint64_t pasid : 20;
> > > > +                       /** The attributes filed in TLP packet */
> > > > +                       uint64_t attr : 3;
> > > > +                       /** The processing hint filed in TLP packet */
> > > > +                       uint64_t ph : 2;
> > > > +                       /** The steering tag filed in TLP packet */
> > > > +                       uint64_t st : 16;
> > >
> > > We don't support a few attributes like passid, ph, st. Do we need
> > > the capability of this? or ignore this. In either case, please update the doc.
> > >
> > > We also support additional flags for allocating LLC flag.
> > > This is a hint to DMA engine that the cache blocks should be allocated
> > > in the LLC (if they were not already).
> > > When the MEM pointer is a destination in DMA operation, the referenced
> > > cache blocks are allocated into the cache as part of completing the
> > > DMA (when not already present in the LLC)
> > > this is helpful if software has to access the data right after dma is completed.
> > >
> > > Could you add bit or flag for the same?
> > >
> >
> > I wonder if this is the best location for such a flag for LLC vs memory
> > writes. It would also apply to memory-to-memory transactions, not just for
> > those done to PCI devices.
> 
> Ack. it can be used for MEM to MEM
> 
> >  As well as that, I think any flag should default
> > to "on" rather than "off" since writing to cache rather than DRAM is
> > generally the desired behaviour, I would think.
> 
> I think, keeping it is "allocate in LLC" on all transfer will not be good.
> As large transters polute the LLC and dataplane may not touch the complete
> data only header. Also in device copy, Adding it LLC there is an
> additional cost unline MEM-MEM.
> 
> So IMO, better to add the flag to allow to allocate to LLC as a HINT.
> 
> > Should it be a per-operation flag, rather than per context?
> 
> Yes. better it be per-operation as it is the hint.
> 
Ok. Let's define a new per-op flag for LLC allocation, and keep default
(without flag) as no-alloc.

Bruce Richardson July 15, 2021, 10:05 a.m. UTC | #15

On Thu, Jul 15, 2021 at 11:03:08AM +0100, Bruce Richardson wrote:
> On Thu, Jul 15, 2021 at 03:00:01PM +0530, Jerin Jacob wrote:
> > On Thu, Jul 15, 2021 at 2:33 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > On Thu, Jul 15, 2021 at 12:40:01PM +0530, Jerin Jacob wrote:
> > > > )
> > > >  a
> > > >
> > > > On Tue, Jul 13, 2021 at 6:01 PM Chengwen Feng <fengchengwen@huawei.com> wrote:
> > > > >
> > > > > This patch introduce 'dmadevice' which is a generic type of DMA
> > > > > device.
> > > > >
> > > > > The APIs of dmadev library exposes some generic operations which can
> > > > > enable configuration and I/O with the DMA devices.
> > > > >
> > > > > Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
> > > >
> > > > Thanks for v3. Seems like all major items as covered. Some more
> > > > comments below inline.
> > > >
> > > > I would suggest v4 to split the patch like (so that we can review and
> > > > ack each patch)
> > > > 1) Only public header file with Doxygen inclusion, (There is a lot of
> > > > Doxygen syntax issue in the patch)
> > > > 2) 1 or more patches for implementation.
> > > >
> > >
> > > One additional follow-up comment on flags below.
> > >
> > > /Bruce
> > >
> > > >
> > > > > diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> > > > > new file mode 100644
> > > > > index 0000000..f6cc4e5
> > > <snip>
> > > > > +       enum rte_dmadev_port_type port_type;
> > > > missing doxgen comment for this.
> > > > > +       union {
> > > > > +               /** For PCIE port
> > > > > +                *
> > > > > +                * The following model show SoC's PCIE module connects to
> > > > > +                * multiple PCIE hosts and multiple endpoints. The PCIE module
> > > > > +                * has an integrate DMA controller.
> > > > > +                * If the DMA wants to access the memory of host A, it can be
> > > > > +                * initiated by PF1 in core0, or by VF0 of PF0 in core0.
> > > > > +                *
> > > <snip>
> > > > +                       /** The pasid filed in TLP packet */
> > > > > +                       uint64_t pasid : 20;
> > > > > +                       /** The attributes filed in TLP packet */
> > > > > +                       uint64_t attr : 3;
> > > > > +                       /** The processing hint filed in TLP packet */
> > > > > +                       uint64_t ph : 2;
> > > > > +                       /** The steering tag filed in TLP packet */
> > > > > +                       uint64_t st : 16;
> > > >
> > > > We don't support a few attributes like passid, ph, st. Do we need
> > > > the capability of this? or ignore this. In either case, please update the doc.
> > > >
> > > > We also support additional flags for allocating LLC flag.
> > > > This is a hint to DMA engine that the cache blocks should be allocated
> > > > in the LLC (if they were not already).
> > > > When the MEM pointer is a destination in DMA operation, the referenced
> > > > cache blocks are allocated into the cache as part of completing the
> > > > DMA (when not already present in the LLC)
> > > > this is helpful if software has to access the data right after dma is completed.
> > > >
> > > > Could you add bit or flag for the same?
> > > >
> > >
> > > I wonder if this is the best location for such a flag for LLC vs memory
> > > writes. It would also apply to memory-to-memory transactions, not just for
> > > those done to PCI devices.
> > 
> > Ack. it can be used for MEM to MEM
> > 
> > >  As well as that, I think any flag should default
> > > to "on" rather than "off" since writing to cache rather than DRAM is
> > > generally the desired behaviour, I would think.
> > 
> > I think, keeping it is "allocate in LLC" on all transfer will not be good.
> > As large transters polute the LLC and dataplane may not touch the complete
> > data only header. Also in device copy, Adding it LLC there is an
> > additional cost unline MEM-MEM.
> > 
> > So IMO, better to add the flag to allow to allocate to LLC as a HINT.
> > 
> > > Should it be a per-operation flag, rather than per context?
> > 
> > Yes. better it be per-operation as it is the hint.
> > 
> Ok. Let's define a new per-op flag for LLC allocation, and keep default
> (without flag) as no-alloc.

[Apologies for self-reply]

Let's also be clear in the documentation for the flag that this is a HINT,
and that drivers may not follow this. That way we don't need to add a
capability flag for it, or to return error from a function which doesn't
support it, etc. etc.

Nipun Gupta July 15, 2021, 11:16 a.m. UTC | #16

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Thursday, July 15, 2021 1:59 PM
> To: Nipun Gupta <nipun.gupta@nxp.com>; thomas@monjalon.net;
> ferruh.yigit@intel.com; bruce.richardson@intel.com; jerinj@marvell.com;
> jerinjacobk@gmail.com; andrew.rybchenko@oktetlabs.ru
> Cc: dev@dpdk.org; mb@smartsharesystems.com; Hemant Agrawal
> <hemant.agrawal@nxp.com>; maxime.coquelin@redhat.com;
> honnappa.nagarahalli@arm.com; david.marchand@redhat.com;
> sburla@marvell.com; pkapoor@marvell.com; konstantin.ananyev@intel.com;
> Gagandeep Singh <G.Singh@nxp.com>
> Subject: Re: [PATCH v3] dmadev: introduce DMA device library
> 
> On 2021/7/14 20:22, Nipun Gupta wrote:
> > <snip>
> >
> >> +/**
> >> + * A structure used to configure a virtual DMA channel.
> >> + */
> >> +struct rte_dmadev_vchan_conf {
> >> +	uint8_t direction;
> >> +	/**< Set of supported transfer directions
> >> +	 * @see RTE_DMA_MEM_TO_MEM
> >> +	 * @see RTE_DMA_MEM_TO_DEV
> >> +	 * @see RTE_DMA_DEV_TO_MEM
> >> +	 * @see RTE_DMA_DEV_TO_DEV
> >> +	 */
> >> +	/** Number of descriptor for the virtual DMA channel */
> >> +	uint16_t nb_desc;
> >> +	/** 1) Used to describes the port parameter in the device-to-memory
> >> +	 * transfer scenario.
> >> +	 * 2) Used to describes the source port parameter in the
> >> +	 * device-to-device transfer scenario.
> >> +	 * @see struct rte_dmadev_port_parameters
> >> +	 */
> >
> > There should also be a configuration to support no response (per Virtual
> Channel),
> > And if that is enabled, user will not be required to call 'rte_dmadev_completed'
> API.
> > This shall also be part of capability.
> 
> Do you mean some silent mode? The application only needs to submit requests
> to the
> hardware.
> 
> Could you briefly describe the working principles and application scenarios of
> the
> corresponding device?

It is kind of a silent mode w.r.t. the command completion from QDMA.

There could be level of synchronization in the applications at a higher level due
To which QDMA status dequeue would not be necessary and be an overhead.
In this mode extra data/bytes could be passed with DMA which would indirectly
indicate if DMA is complete or not.

> 
> >
> >> +	struct rte_dmadev_port_parameters src_port;
> >> +	/** 1) Used to describes the port parameter in the memory-to-device-to
> >> +	 * transfer scenario.
> >> +	 * 2) Used to describes the destination port parameter in the
> >> +	 * device-to-device transfer scenario.
> >> +	 * @see struct rte_dmadev_port_parameters
> >> +	 */
> >> +	struct rte_dmadev_port_parameters dst_port;
> >> +};
> >> +
> >
> > <snip>
> >
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice.
> >> + *
> >> + * Enqueue a scatter list copy operation onto the virtual DMA channel.
> >> + *
> >> + * This queues up a scatter list copy operation to be performed by hardware,
> >> + * but does not trigger hardware to begin that operation.
> >
> > This would need update with the submit flag.
> > The statement should be true only when the flag is set?
> > Similar comment I see on 'rte_dmadev_copy_sg' and 'rte_dma_fill' APIs
> 
> OK, will fix in V4
> 
> >
> >> + *
> >> + * @param dev_id
> >> + *   The identifier of the device.
> >> + * @param vchan
> >> + *   The identifier of virtual DMA channel.
> >> + * @param sg
> >> + *   The pointer of scatterlist.
> >> + * @param flags
> >> + *   An flags for this operation.
> >> + *   @see RTE_DMA_OP_FLAG_*
> >> + *
> >> + * @return
> >> + *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
> >> + *   - <0: Error code returned by the driver copy scatterlist function.
> >> + */
> >> +__rte_experimental
> >> +static inline int
> >> +rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct
> rte_dma_sg
> >> *sg,
> >> +		   uint64_t flags)
> >> +{
> >> +	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
> >> +#ifdef RTE_DMADEV_DEBUG
> >> +	if (!rte_dmadev_is_valid_dev(dev_id) ||
> >> +	    vchan >= dev->data->dev_conf.max_vchans ||
> >> +	    sg == NULL)
> >> +		return -EINVAL;
> >> +	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
> >> +#endif
> >> +	return (*dev->copy_sg)(dev, vchan, sg, flags);
> >> +}
> >> +
> >
> > .
> >

Bruce Richardson July 15, 2021, 12:11 p.m. UTC | #17

On Thu, Jul 15, 2021 at 11:16:54AM +0000, Nipun Gupta wrote:
> 
> 
> > -----Original Message-----
> > From: fengchengwen <fengchengwen@huawei.com>
> > Sent: Thursday, July 15, 2021 1:59 PM
> > To: Nipun Gupta <nipun.gupta@nxp.com>; thomas@monjalon.net;
> > ferruh.yigit@intel.com; bruce.richardson@intel.com; jerinj@marvell.com;
> > jerinjacobk@gmail.com; andrew.rybchenko@oktetlabs.ru
> > Cc: dev@dpdk.org; mb@smartsharesystems.com; Hemant Agrawal
> > <hemant.agrawal@nxp.com>; maxime.coquelin@redhat.com;
> > honnappa.nagarahalli@arm.com; david.marchand@redhat.com;
> > sburla@marvell.com; pkapoor@marvell.com; konstantin.ananyev@intel.com;
> > Gagandeep Singh <G.Singh@nxp.com>
> > Subject: Re: [PATCH v3] dmadev: introduce DMA device library
> > 
> > On 2021/7/14 20:22, Nipun Gupta wrote:
> > > <snip>
> > >
> > >> +/**
> > >> + * A structure used to configure a virtual DMA channel.
> > >> + */
> > >> +struct rte_dmadev_vchan_conf {
> > >> +	uint8_t direction;
> > >> +	/**< Set of supported transfer directions
> > >> +	 * @see RTE_DMA_MEM_TO_MEM
> > >> +	 * @see RTE_DMA_MEM_TO_DEV
> > >> +	 * @see RTE_DMA_DEV_TO_MEM
> > >> +	 * @see RTE_DMA_DEV_TO_DEV
> > >> +	 */
> > >> +	/** Number of descriptor for the virtual DMA channel */
> > >> +	uint16_t nb_desc;
> > >> +	/** 1) Used to describes the port parameter in the device-to-memory
> > >> +	 * transfer scenario.
> > >> +	 * 2) Used to describes the source port parameter in the
> > >> +	 * device-to-device transfer scenario.
> > >> +	 * @see struct rte_dmadev_port_parameters
> > >> +	 */
> > >
> > > There should also be a configuration to support no response (per Virtual
> > Channel),
> > > And if that is enabled, user will not be required to call 'rte_dmadev_completed'
> > API.
> > > This shall also be part of capability.
> > 
> > Do you mean some silent mode? The application only needs to submit requests
> > to the
> > hardware.
> > 
> > Could you briefly describe the working principles and application scenarios of
> > the
> > corresponding device?
> 
> It is kind of a silent mode w.r.t. the command completion from QDMA.
> 
> There could be level of synchronization in the applications at a higher level due
> To which QDMA status dequeue would not be necessary and be an overhead.
> In this mode extra data/bytes could be passed with DMA which would indirectly
> indicate if DMA is complete or not.
> 
I'm wondering if such a setting could be per-device (i.e. per HW queue)
rather than per virtual channel? Something like this would be easier to
support in that way, because we could use different function pointers for
the fastpath operations depending on whether completions are to be tracked
or not. For example: only occasional descriptors will need completion
addresses specified in the "enqueue" calls, and the "submit" function would
also do any ring cleanup that would otherwise be done by "completed" call.
Having separate function calls would reduce the number of branches that
need to be evaluated in this mode, as well as simplifying the code.

/Bruce

Jerin Jacob July 15, 2021, 12:31 p.m. UTC | #18

On Thu, Jul 15, 2021 at 5:41 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Thu, Jul 15, 2021 at 11:16:54AM +0000, Nipun Gupta wrote:
> >
> >
> > > -----Original Message-----
> > > From: fengchengwen <fengchengwen@huawei.com>
> > > Sent: Thursday, July 15, 2021 1:59 PM
> > > To: Nipun Gupta <nipun.gupta@nxp.com>; thomas@monjalon.net;
> > > ferruh.yigit@intel.com; bruce.richardson@intel.com; jerinj@marvell.com;
> > > jerinjacobk@gmail.com; andrew.rybchenko@oktetlabs.ru
> > > Cc: dev@dpdk.org; mb@smartsharesystems.com; Hemant Agrawal
> > > <hemant.agrawal@nxp.com>; maxime.coquelin@redhat.com;
> > > honnappa.nagarahalli@arm.com; david.marchand@redhat.com;
> > > sburla@marvell.com; pkapoor@marvell.com; konstantin.ananyev@intel.com;
> > > Gagandeep Singh <G.Singh@nxp.com>
> > > Subject: Re: [PATCH v3] dmadev: introduce DMA device library
> > >
> > > On 2021/7/14 20:22, Nipun Gupta wrote:
> > > > <snip>
> > > >
> > > >> +/**
> > > >> + * A structure used to configure a virtual DMA channel.
> > > >> + */
> > > >> +struct rte_dmadev_vchan_conf {
> > > >> +        uint8_t direction;
> > > >> +        /**< Set of supported transfer directions
> > > >> +         * @see RTE_DMA_MEM_TO_MEM
> > > >> +         * @see RTE_DMA_MEM_TO_DEV
> > > >> +         * @see RTE_DMA_DEV_TO_MEM
> > > >> +         * @see RTE_DMA_DEV_TO_DEV
> > > >> +         */
> > > >> +        /** Number of descriptor for the virtual DMA channel */
> > > >> +        uint16_t nb_desc;
> > > >> +        /** 1) Used to describes the port parameter in the device-to-memory
> > > >> +         * transfer scenario.
> > > >> +         * 2) Used to describes the source port parameter in the
> > > >> +         * device-to-device transfer scenario.
> > > >> +         * @see struct rte_dmadev_port_parameters
> > > >> +         */
> > > >
> > > > There should also be a configuration to support no response (per Virtual
> > > Channel),
> > > > And if that is enabled, user will not be required to call 'rte_dmadev_completed'
> > > API.
> > > > This shall also be part of capability.
> > >
> > > Do you mean some silent mode? The application only needs to submit requests
> > > to the
> > > hardware.
> > >
> > > Could you briefly describe the working principles and application scenarios of
> > > the
> > > corresponding device?
> >
> > It is kind of a silent mode w.r.t. the command completion from QDMA.
> >
> > There could be level of synchronization in the applications at a higher level due
> > To which QDMA status dequeue would not be necessary and be an overhead.
> > In this mode extra data/bytes could be passed with DMA which would indirectly
> > indicate if DMA is complete or not.
> >
> I'm wondering if such a setting could be per-device (i.e. per HW queue)
> rather than per virtual channel? Something like this would be easier to
> support in that way, because we could use different function pointers for
> the fastpath operations depending on whether completions are to be tracked
> or not. For example: only occasional descriptors will need completion
> addresses specified in the "enqueue" calls, and the "submit" function would
> also do any ring cleanup that would otherwise be done by "completed" call.
> Having separate function calls would reduce the number of branches that
> need to be evaluated in this mode, as well as simplifying the code.


+1 to add in config param ie. for the device.

>
> /Bruce

Nipun Gupta July 15, 2021, 12:34 p.m. UTC | #19

> -----Original Message-----
> From: Jerin Jacob <jerinjacobk@gmail.com>
> Sent: Thursday, July 15, 2021 6:02 PM
> To: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Nipun Gupta <nipun.gupta@nxp.com>; fengchengwen
> <fengchengwen@huawei.com>; thomas@monjalon.net; ferruh.yigit@intel.com;
> jerinj@marvell.com; andrew.rybchenko@oktetlabs.ru; dev@dpdk.org;
> mb@smartsharesystems.com; Hemant Agrawal <hemant.agrawal@nxp.com>;
> maxime.coquelin@redhat.com; honnappa.nagarahalli@arm.com;
> david.marchand@redhat.com; sburla@marvell.com; pkapoor@marvell.com;
> konstantin.ananyev@intel.com; Gagandeep Singh <G.Singh@nxp.com>
> Subject: Re: [PATCH v3] dmadev: introduce DMA device library
> 
> On Thu, Jul 15, 2021 at 5:41 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > On Thu, Jul 15, 2021 at 11:16:54AM +0000, Nipun Gupta wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: fengchengwen <fengchengwen@huawei.com>
> > > > Sent: Thursday, July 15, 2021 1:59 PM
> > > > To: Nipun Gupta <nipun.gupta@nxp.com>; thomas@monjalon.net;
> > > > ferruh.yigit@intel.com; bruce.richardson@intel.com; jerinj@marvell.com;
> > > > jerinjacobk@gmail.com; andrew.rybchenko@oktetlabs.ru
> > > > Cc: dev@dpdk.org; mb@smartsharesystems.com; Hemant Agrawal
> > > > <hemant.agrawal@nxp.com>; maxime.coquelin@redhat.com;
> > > > honnappa.nagarahalli@arm.com; david.marchand@redhat.com;
> > > > sburla@marvell.com; pkapoor@marvell.com;
> konstantin.ananyev@intel.com;
> > > > Gagandeep Singh <G.Singh@nxp.com>
> > > > Subject: Re: [PATCH v3] dmadev: introduce DMA device library
> > > >
> > > > On 2021/7/14 20:22, Nipun Gupta wrote:
> > > > > <snip>
> > > > >
> > > > >> +/**
> > > > >> + * A structure used to configure a virtual DMA channel.
> > > > >> + */
> > > > >> +struct rte_dmadev_vchan_conf {
> > > > >> +        uint8_t direction;
> > > > >> +        /**< Set of supported transfer directions
> > > > >> +         * @see RTE_DMA_MEM_TO_MEM
> > > > >> +         * @see RTE_DMA_MEM_TO_DEV
> > > > >> +         * @see RTE_DMA_DEV_TO_MEM
> > > > >> +         * @see RTE_DMA_DEV_TO_DEV
> > > > >> +         */
> > > > >> +        /** Number of descriptor for the virtual DMA channel */
> > > > >> +        uint16_t nb_desc;
> > > > >> +        /** 1) Used to describes the port parameter in the device-to-
> memory
> > > > >> +         * transfer scenario.
> > > > >> +         * 2) Used to describes the source port parameter in the
> > > > >> +         * device-to-device transfer scenario.
> > > > >> +         * @see struct rte_dmadev_port_parameters
> > > > >> +         */
> > > > >
> > > > > There should also be a configuration to support no response (per Virtual
> > > > Channel),
> > > > > And if that is enabled, user will not be required to call
> 'rte_dmadev_completed'
> > > > API.
> > > > > This shall also be part of capability.
> > > >
> > > > Do you mean some silent mode? The application only needs to submit
> requests
> > > > to the
> > > > hardware.
> > > >
> > > > Could you briefly describe the working principles and application scenarios
> of
> > > > the
> > > > corresponding device?
> > >
> > > It is kind of a silent mode w.r.t. the command completion from QDMA.
> > >
> > > There could be level of synchronization in the applications at a higher level
> due
> > > To which QDMA status dequeue would not be necessary and be an overhead.
> > > In this mode extra data/bytes could be passed with DMA which would
> indirectly
> > > indicate if DMA is complete or not.
> > >
> > I'm wondering if such a setting could be per-device (i.e. per HW queue)
> > rather than per virtual channel? Something like this would be easier to
> > support in that way, because we could use different function pointers for
> > the fastpath operations depending on whether completions are to be tracked
> > or not. For example: only occasional descriptors will need completion
> > addresses specified in the "enqueue" calls, and the "submit" function would
> > also do any ring cleanup that would otherwise be done by "completed" call.
> > Having separate function calls would reduce the number of branches that
> > need to be evaluated in this mode, as well as simplifying the code.

Agree, adding config for the device makes sense.

> 
> 
> +1 to add in config param ie. for the device.
> 
> >
> > /Bruce

diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index af2a91d..e01a07f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -495,6 +495,10 @@  F: drivers/raw/skeleton/
 F: app/test/test_rawdev.c
 F: doc/guides/prog_guide/rawdev.rst
 
+DMA device API - EXPERIMENTAL
+M: Chengwen Feng <fengchengwen@huawei.com>
+F: lib/dmadev/
+
 
 Memory Pool Drivers
 -------------------
diff --git a/config/rte_config.h b/config/rte_config.h
index 590903c..331a431 100644
--- a/config/rte_config.h
+++ b/config/rte_config.h
@@ -81,6 +81,9 @@ 
 /* rawdev defines */
 #define RTE_RAWDEV_MAX_DEVS 64
 
+/* dmadev defines */
+#define RTE_DMADEV_MAX_DEVS 64
+
 /* ip_fragmentation defines */
 #define RTE_LIBRTE_IP_FRAG_MAX_FRAG 4
 #undef RTE_LIBRTE_IP_FRAG_TBL_STAT
diff --git a/lib/dmadev/meson.build b/lib/dmadev/meson.build
new file mode 100644
index 0000000..d2fc85e
--- /dev/null
+++ b/lib/dmadev/meson.build
@@ -0,0 +1,7 @@ 
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2021 HiSilicon Limited.
+
+sources = files('rte_dmadev.c')
+headers = files('rte_dmadev.h')
+indirect_headers += files('rte_dmadev_core.h')
+driver_sdk_headers += files('rte_dmadev_pmd.h')
diff --git a/lib/dmadev/rte_dmadev.c b/lib/dmadev/rte_dmadev.c
new file mode 100644
index 0000000..1bca463
--- /dev/null
+++ b/lib/dmadev/rte_dmadev.c
@@ -0,0 +1,561 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited.
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#include <ctype.h>
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <rte_debug.h>
+#include <rte_dev.h>
+#include <rte_eal.h>
+#include <rte_errno.h>
+#include <rte_lcore.h>
+#include <rte_log.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_malloc.h>
+#include <rte_string_fns.h>
+
+#include "rte_dmadev.h"
+#include "rte_dmadev_pmd.h"
+
+struct rte_dmadev rte_dmadevices[RTE_DMADEV_MAX_DEVS];
+
+static const char *MZ_RTE_DMADEV_DATA = "rte_dmadev_data";
+/* Shared memory between primary and secondary processes. */
+static struct {
+	struct rte_dmadev_data data[RTE_DMADEV_MAX_DEVS];
+} *dmadev_shared_data;
+
+RTE_LOG_REGISTER(rte_dmadev_logtype, lib.dmadev, INFO);
+#define RTE_DMADEV_LOG(level, ...) \
+	rte_log(RTE_LOG_ ## level, rte_dmadev_logtype, "" __VA_ARGS__)
+
+/* Macros to check for valid device id */
+#define RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, retval) do { \
+	if (!rte_dmadev_is_valid_dev(dev_id)) { \
+		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
+		return retval; \
+	} \
+} while (0)
+
+#define RTE_DMADEV_VALID_DEV_ID_OR_RET(dev_id) do { \
+	if (!rte_dmadev_is_valid_dev(dev_id)) { \
+		RTE_DMADEV_LOG(ERR, "Invalid dev_id=%u\n", dev_id); \
+		return; \
+	} \
+} while (0)
+
+/* Macro to check for invalid pointers */
+#define RTE_DMADEV_PTR_OR_ERR_RET(ptr, retval) do { \
+	if ((ptr) == NULL) \
+		return retval; \
+} while (0)
+
+static int
+dmadev_check_name(const char *name)
+{
+	size_t name_len;
+
+	if (name == NULL) {
+		RTE_DMADEV_LOG(ERR, "Name can't be NULL\n");
+		return -EINVAL;
+	}
+
+	name_len = strnlen(name, RTE_DMADEV_NAME_MAX_LEN);
+	if (name_len == 0) {
+		RTE_DMADEV_LOG(ERR, "Zero length DMA device name\n");
+		return -EINVAL;
+	}
+	if (name_len >= RTE_DMADEV_NAME_MAX_LEN) {
+		RTE_DMADEV_LOG(ERR, "DMA device name is too long\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static uint16_t
+dmadev_find_free_dev(void)
+{
+	uint16_t i;
+
+	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
+		if (dmadev_shared_data->data[i].dev_name[0] == '\0') {
+			RTE_ASSERT(rte_dmadevices[i].state ==
+				   RTE_DMADEV_UNUSED);
+			return i;
+		}
+	}
+
+	return RTE_DMADEV_MAX_DEVS;
+}
+
+static struct rte_dmadev*
+dmadev_find(const char *name)
+{
+	uint16_t i;
+
+	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
+		if ((rte_dmadevices[i].state == RTE_DMADEV_ATTACHED) &&
+		    (!strcmp(name, rte_dmadevices[i].data->dev_name)))
+			return &rte_dmadevices[i];
+	}
+
+	return NULL;
+}
+
+static int
+dmadev_shared_data_prepare(void)
+{
+	const struct rte_memzone *mz;
+
+	if (dmadev_shared_data == NULL) {
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+			/* Allocate port data and ownership shared memory. */
+			mz = rte_memzone_reserve(MZ_RTE_DMADEV_DATA,
+					 sizeof(*dmadev_shared_data),
+					 rte_socket_id(), 0);
+		} else {
+			mz = rte_memzone_lookup(MZ_RTE_DMADEV_DATA);
+		}
+		if (mz == NULL)
+			return -ENOMEM;
+
+		dmadev_shared_data = mz->addr;
+		if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+			memset(dmadev_shared_data->data, 0,
+			       sizeof(dmadev_shared_data->data));
+	}
+
+	return 0;
+}
+
+static struct rte_dmadev *
+dmadev_allocate(const char *name)
+{
+	struct rte_dmadev *dev;
+	uint16_t dev_id;
+
+	dev = dmadev_find(name);
+	if (dev != NULL) {
+		RTE_DMADEV_LOG(ERR, "DMA device already allocated\n");
+		return NULL;
+	}
+
+	dev_id = dmadev_find_free_dev();
+	if (dev_id == RTE_DMADEV_MAX_DEVS) {
+		RTE_DMADEV_LOG(ERR, "Reached maximum number of DMA devices\n");
+		return NULL;
+	}
+
+	if (dmadev_shared_data_prepare() != 0) {
+		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
+		return NULL;
+	}
+
+	dev = &rte_dmadevices[dev_id];
+	dev->data = &dmadev_shared_data->data[dev_id];
+	dev->data->dev_id = dev_id;
+	strlcpy(dev->data->dev_name, name, sizeof(dev->data->dev_name));
+
+	return dev;
+}
+
+static struct rte_dmadev *
+dmadev_attach_secondary(const char *name)
+{
+	struct rte_dmadev *dev;
+	uint16_t i;
+
+	if (dmadev_shared_data_prepare() != 0) {
+		RTE_DMADEV_LOG(ERR, "Cannot allocate DMA shared data\n");
+		return NULL;
+	}
+
+	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
+		if (!strcmp(dmadev_shared_data->data[i].dev_name, name))
+			break;
+	}
+	if (i == RTE_DMADEV_MAX_DEVS) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %s is not driven by the primary process\n",
+			name);
+		return NULL;
+	}
+
+	dev = &rte_dmadevices[i];
+	dev->data = &dmadev_shared_data->data[i];
+	RTE_ASSERT(dev->data->dev_id == i);
+
+	return dev;
+}
+
+struct rte_dmadev *
+rte_dmadev_pmd_allocate(const char *name)
+{
+	struct rte_dmadev *dev;
+
+	if (dmadev_check_name(name) != 0)
+		return NULL;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+		dev = dmadev_allocate(name);
+	else
+		dev = dmadev_attach_secondary(name);
+
+	if (dev == NULL)
+		return NULL;
+	dev->state = RTE_DMADEV_ATTACHED;
+
+	return dev;
+}
+
+int
+rte_dmadev_pmd_release(struct rte_dmadev *dev)
+{
+	if (dev == NULL)
+		return -EINVAL;
+
+	if (dev->state == RTE_DMADEV_UNUSED)
+		return 0;
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		rte_free(dev->data->dev_private);
+		memset(dev->data, 0, sizeof(struct rte_dmadev_data));
+	}
+
+	memset(dev, 0, sizeof(struct rte_dmadev));
+	dev->state = RTE_DMADEV_UNUSED;
+
+	return 0;
+}
+
+struct rte_dmadev *
+rte_dmadev_get_device_by_name(const char *name)
+{
+	if (dmadev_check_name(name) != 0)
+		return NULL;
+	return dmadev_find(name);
+}
+
+bool
+rte_dmadev_is_valid_dev(uint16_t dev_id)
+{
+	if (dev_id >= RTE_DMADEV_MAX_DEVS ||
+	    rte_dmadevices[dev_id].state != RTE_DMADEV_ATTACHED)
+		return false;
+	return true;
+}
+
+uint16_t
+rte_dmadev_count(void)
+{
+	uint16_t count = 0;
+	uint16_t i;
+
+	for (i = 0; i < RTE_DMADEV_MAX_DEVS; i++) {
+		if (rte_dmadevices[i].state == RTE_DMADEV_ATTACHED)
+			count++;
+	}
+
+	return count;
+}
+
+int
+rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info)
+{
+	const struct rte_dmadev *dev;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	RTE_DMADEV_PTR_OR_ERR_RET(dev_info, -EINVAL);
+
+	dev = &rte_dmadevices[dev_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_info_get, -ENOTSUP);
+	memset(dev_info, 0, sizeof(struct rte_dmadev_info));
+	ret = (*dev->dev_ops->dev_info_get)(dev, dev_info,
+					    sizeof(struct rte_dmadev_info));
+	if (ret != 0)
+		return ret;
+
+	dev_info->device = dev->device;
+	dev_info->nb_vchans = dev->data->dev_conf.max_vchans;
+
+	return 0;
+}
+
+int
+rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf)
+{
+	struct rte_dmadev_info info;
+	struct rte_dmadev *dev;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	RTE_DMADEV_PTR_OR_ERR_RET(dev_conf, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	ret = rte_dmadev_info_get(dev_id, &info);
+	if (ret != 0) {
+		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
+		return -EINVAL;
+	}
+	if (dev_conf->max_vchans > info.max_vchans) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u configure too many vchans\n", dev_id);
+		return -EINVAL;
+	}
+
+	if (dev->data->dev_started != 0) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u must be stopped to allow configuration\n",
+			dev_id);
+		return -EBUSY;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_configure, -ENOTSUP);
+	ret = (*dev->dev_ops->dev_configure)(dev, dev_conf);
+	if (ret == 0)
+		memcpy(&dev->data->dev_conf, dev_conf, sizeof(*dev_conf));
+
+	return ret;
+}
+
+int
+rte_dmadev_start(uint16_t dev_id)
+{
+	struct rte_dmadev *dev;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	if (dev->data->dev_started != 0) {
+		RTE_DMADEV_LOG(WARNING, "Device %u already started\n", dev_id);
+		return 0;
+	}
+
+	if (dev->dev_ops->dev_start == NULL)
+		goto mark_started;
+
+	ret = (*dev->dev_ops->dev_start)(dev);
+	if (ret != 0)
+		return ret;
+
+mark_started:
+	dev->data->dev_started = 1;
+	return 0;
+}
+
+int
+rte_dmadev_stop(uint16_t dev_id)
+{
+	struct rte_dmadev *dev;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	if (dev->data->dev_started == 0) {
+		RTE_DMADEV_LOG(WARNING, "Device %u already stopped\n", dev_id);
+		return 0;
+	}
+
+	if (dev->dev_ops->dev_stop == NULL)
+		goto mark_stopped;
+
+	ret = (*dev->dev_ops->dev_stop)(dev);
+	if (ret != 0)
+		return ret;
+
+mark_stopped:
+	dev->data->dev_started = 0;
+	return 0;
+}
+
+int
+rte_dmadev_close(uint16_t dev_id)
+{
+	struct rte_dmadev *dev;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	/* Device must be stopped before it can be closed */
+	if (dev->data->dev_started == 1) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u must be stopped before closing\n", dev_id);
+		return -EBUSY;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_close, -ENOTSUP);
+	return (*dev->dev_ops->dev_close)(dev);
+}
+
+int
+rte_dmadev_vchan_setup(uint16_t dev_id,
+		       const struct rte_dmadev_vchan_conf *conf)
+{
+	struct rte_dmadev_info info;
+	struct rte_dmadev *dev;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	RTE_DMADEV_PTR_OR_ERR_RET(conf, -EINVAL);
+
+	dev = &rte_dmadevices[dev_id];
+
+	ret = rte_dmadev_info_get(dev_id, &info);
+	if (ret != 0) {
+		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->direction == 0 ||
+	    conf->direction & ~RTE_DMA_TRANSFER_DIR_ALL) {
+		RTE_DMADEV_LOG(ERR, "Device %u direction invalid!\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->direction & RTE_DMA_MEM_TO_MEM &&
+	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_MEM)) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u don't support mem2mem transfer\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->direction & RTE_DMA_MEM_TO_DEV &&
+	    !(info.dev_capa & RTE_DMA_DEV_CAPA_MEM_TO_DEV)) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u don't support mem2dev transfer\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->direction & RTE_DMA_DEV_TO_MEM &&
+	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_MEM)) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u don't support dev2mem transfer\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->direction & RTE_DMA_DEV_TO_DEV &&
+	    !(info.dev_capa & RTE_DMA_DEV_CAPA_DEV_TO_DEV)) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u don't support dev2dev transfer\n", dev_id);
+		return -EINVAL;
+	}
+	if (conf->nb_desc < info.min_desc || conf->nb_desc > info.max_desc) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u number of descriptors invalid\n", dev_id);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_setup, -ENOTSUP);
+	return (*dev->dev_ops->vchan_setup)(dev, conf);
+}
+
+int
+rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan)
+{
+	struct rte_dmadev *dev;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	if (vchan >= dev->data->dev_conf.max_vchans) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u vchan %u out of range\n", dev_id, vchan);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->vchan_release, -ENOTSUP);
+	return (*dev->dev_ops->vchan_release)(dev, vchan);
+}
+
+int
+rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
+		     struct rte_dmadev_stats *stats)
+{
+	const struct rte_dmadev *dev;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	RTE_DMADEV_PTR_OR_ERR_RET(stats, -EINVAL);
+
+	dev = &rte_dmadevices[dev_id];
+
+	if (vchan >= dev->data->dev_conf.max_vchans &&
+	    vchan != RTE_DMADEV_ALL_VCHAN) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u vchan %u out of range\n", dev_id, vchan);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_get, -ENOTSUP);
+	return (*dev->dev_ops->stats_get)(dev, vchan, stats,
+					  sizeof(struct rte_dmadev_stats));
+}
+
+int
+rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan)
+{
+	struct rte_dmadev *dev;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	if (vchan >= dev->data->dev_conf.max_vchans &&
+	    vchan != RTE_DMADEV_ALL_VCHAN) {
+		RTE_DMADEV_LOG(ERR,
+			"Device %u vchan %u out of range\n", dev_id, vchan);
+		return -EINVAL;
+	}
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->stats_reset, -ENOTSUP);
+	return (*dev->dev_ops->stats_reset)(dev, vchan);
+}
+
+int
+rte_dmadev_dump(uint16_t dev_id, FILE *f)
+{
+	const struct rte_dmadev *dev;
+	struct rte_dmadev_info info;
+	int ret;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	RTE_DMADEV_PTR_OR_ERR_RET(f, -EINVAL);
+
+	ret = rte_dmadev_info_get(dev_id, &info);
+	if (ret != 0) {
+		RTE_DMADEV_LOG(ERR, "Device %u get device info fail\n", dev_id);
+		return -EINVAL;
+	}
+
+	dev = &rte_dmadevices[dev_id];
+
+	fprintf(f, "DMA Dev %u, '%s' [%s]\n",
+		dev->data->dev_id,
+		dev->data->dev_name,
+		dev->data->dev_started ? "started" : "stopped");
+	fprintf(f, "  dev_capa: 0x%" PRIx64 "\n", info.dev_capa);
+	fprintf(f, "  max_vchans_supported: %u\n", info.max_vchans);
+	fprintf(f, "  max_vchans_configured: %u\n", info.nb_vchans);
+
+	if (dev->dev_ops->dev_dump != NULL)
+		return (*dev->dev_ops->dev_dump)(dev, f);
+
+	return 0;
+}
+
+int
+rte_dmadev_selftest(uint16_t dev_id)
+{
+	struct rte_dmadev *dev;
+
+	RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+	dev = &rte_dmadevices[dev_id];
+
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->dev_ops->dev_selftest, -ENOTSUP);
+	return (*dev->dev_ops->dev_selftest)(dev_id);
+}
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
new file mode 100644
index 0000000..f6cc4e5
--- /dev/null
+++ b/lib/dmadev/rte_dmadev.h
@@ -0,0 +1,968 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited.
+ * Copyright(c) 2021 Intel Corporation.
+ * Copyright(c) 2021 Marvell International Ltd.
+ * Copyright(c) 2021 SmartShare Systems.
+ */
+
+#ifndef _RTE_DMADEV_H_
+#define _RTE_DMADEV_H_
+
+/**
+ * @file rte_dmadev.h
+ *
+ * RTE DMA (Direct Memory Access) device APIs.
+ *
+ * The DMA framework is built on the following model:
+ *
+ *     ---------------   ---------------       ---------------
+ *     | virtual DMA |   | virtual DMA |       | virtual DMA |
+ *     | channel     |   | channel     |       | channel     |
+ *     ---------------   ---------------       ---------------
+ *            |                |                      |
+ *            ------------------                      |
+ *                     |                              |
+ *               ------------                    ------------
+ *               |  dmadev  |                    |  dmadev  |
+ *               ------------                    ------------
+ *                     |                              |
+ *            ------------------               ------------------
+ *            | HW-DMA-channel |               | HW-DMA-channel |
+ *            ------------------               ------------------
+ *                     |                              |
+ *                     --------------------------------
+ *                                     |
+ *                           ---------------------
+ *                           | HW-DMA-Controller |
+ *                           ---------------------
+ *
+ * The DMA controller could have multiple HW-DMA-channels (aka. HW-DMA-queues),
+ * each HW-DMA-channel should be represented by a dmadev.
+ *
+ * The dmadev could create multiple virtual DMA channel, each virtual DMA
+ * channel represents a different transfer context. The DMA operation request
+ * must be submitted to the virtual DMA channel.
+ * E.G. Application could create virtual DMA channel 0 for mem-to-mem transfer
+ *      scenario, and create virtual DMA channel 1 for mem-to-dev transfer
+ *      scenario.
+ *
+ * The dmadev are dynamically allocated by rte_dmadev_pmd_allocate() during the
+ * PCI/SoC device probing phase performed at EAL initialization time. And could
+ * be released by rte_dmadev_pmd_release() during the PCI/SoC device removing
+ * phase.
+ *
+ * This framework uses 'uint16_t dev_id' as the device identifier of a dmadev,
+ * and 'uint16_t vchan' as the virtual DMA channel identifier in one dmadev.
+ *
+ * The functions exported by the dmadev API to setup a device designated by its
+ * device identifier must be invoked in the following order:
+ *     - rte_dmadev_configure()
+ *     - rte_dmadev_vchan_setup()
+ *     - rte_dmadev_start()
+ *
+ * Then, the application can invoke dataplane APIs to process jobs.
+ *
+ * If the application wants to change the configuration (i.e. call
+ * rte_dmadev_configure()), it must call rte_dmadev_stop() first to stop the
+ * device and then do the reconfiguration before calling rte_dmadev_start()
+ * again. The dataplane APIs should not be invoked when the device is stopped.
+ *
+ * Finally, an application can close a dmadev by invoking the
+ * rte_dmadev_close() function.
+ *
+ * The dataplane APIs include two parts:
+ *   a) The first part is the submission of operation requests:
+ *        - rte_dmadev_copy()
+ *        - rte_dmadev_copy_sg() - scatter-gather form of copy
+ *        - rte_dmadev_fill()
+ *        - rte_dmadev_fill_sg() - scatter-gather form of fill
+ *        - rte_dmadev_perform() - issue doorbell to hardware
+ *      These APIs could work with different virtual DMA channels which have
+ *      different contexts.
+ *      The first four APIs are used to submit the operation request to the
+ *      virtual DMA channel, if the submission is successful, a uint16_t
+ *      ring_idx is returned, otherwise a negative number is returned.
+ *   b) The second part is to obtain the result of requests:
+ *        - rte_dmadev_completed()
+ *            - return the number of operation requests completed successfully.
+ *        - rte_dmadev_completed_fails()
+ *            - return the number of operation requests failed to complete.
+ *
+ * About the ring_idx which rte_dmadev_copy/copy_sg/fill/fill_sg() returned,
+ * the rules are as follows:
+ *   a) ring_idx for each virtual DMA channel are independent.
+ *   b) For a virtual DMA channel, the ring_idx is monotonically incremented,
+ *      when it reach UINT16_MAX, it wraps back to zero.
+ *   c) This ring_idx can be used by applications to track per-operation
+ *      metadata in an application-defined circular ring.
+ *   d) The initial ring_idx of a virtual DMA channel is zero, after the device
+ *      is stopped, the ring_idx needs to be reset to zero.
+ *   Example:
+ *      step-1: start one dmadev
+ *      step-2: enqueue a copy operation, the ring_idx return is 0
+ *      step-3: enqueue a copy operation again, the ring_idx return is 1
+ *      ...
+ *      step-101: stop the dmadev
+ *      step-102: start the dmadev
+ *      step-103: enqueue a copy operation, the cookie return is 0
+ *      ...
+ *      step-x+0: enqueue a fill operation, the ring_idx return is 65535
+ *      step-x+1: enqueue a copy operation, the ring_idx return is 0
+ *      ...
+ *
+ * By default, all the functions of the dmadev API exported by a PMD are
+ * lock-free functions which assume to not be invoked in parallel on different
+ * logical cores to work on the same target object.
+ *
+ */
+
+#include <rte_common.h>
+#include <rte_compat.h>
+#ifdef RTE_DMADEV_DEBUG
+#include <rte_dev.h>
+#endif
+#include <rte_errno.h>
+#include <rte_memory.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DMADEV_NAME_MAX_LEN	RTE_DEV_NAME_MAX_LEN
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * @param dev_id
+ *   DMA device index.
+ *
+ * @return
+ *   - If the device index is valid (true) or not (false).
+ */
+__rte_experimental
+bool
+rte_dmadev_is_valid_dev(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Get the total number of DMA devices that have been successfully
+ * initialised.
+ *
+ * @return
+ *   The total number of usable DMA devices.
+ */
+__rte_experimental
+uint16_t
+rte_dmadev_count(void);
+
+/**
+ * The capabilities of a DMA device
+ */
+#define RTE_DMA_DEV_CAPA_MEM_TO_MEM	(1ull << 0)
+/**< DMA device support memory-to-memory transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_MEM_TO_DEV	(1ull << 1)
+/**< DMA device support memory-to-device transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_DEV_TO_MEM	(1ull << 2)
+/**< DMA device support device-to-memory transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_DEV_TO_DEV	(1ull << 3)
+/**< DMA device support device-to-device transfer.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_OPS_COPY	(1ull << 4)
+/**< DMA device support copy ops.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_OPS_FILL	(1ull << 5)
+/**< DMA device support fill ops.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_OPS_SG		(1ull << 6)
+/**< DMA device support scatter-list ops.
+ * If device support ops_copy and ops_sg, it means supporting copy_sg ops.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_FENCE		(1ull << 7)
+/**< DMA device support fence.
+ * If device support fence, then application could set a fence flags when
+ * enqueue operation by rte_dma_copy/copy_sg/fill/fill_sg.
+ * If a operation has a fence flags, it means the operation must be processed
+ * only after all previous operations are completed.
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+#define RTE_DMA_DEV_CAPA_SVA		(1ull << 8)
+/**< DMA device support SVA which could use VA as DMA address.
+ * If device support SVA then application could pass any VA address like memory
+ * from rte_malloc(), rte_memzone(), malloc, stack memory.
+ * If device don't support SVA, then application should pass IOVA address which
+ * from rte_malloc(), rte_memzone().
+ *
+ * @see struct rte_dmadev_info::dev_capa
+ */
+
+/**
+ * A structure used to retrieve the contextual information of
+ * an DMA device
+ */
+struct rte_dmadev_info {
+	struct rte_device *device; /**< Generic Device information */
+	uint64_t dev_capa; /**< Device capabilities (RTE_DMA_DEV_CAPA_*) */
+	/** Maximum number of virtual DMA channels supported */
+	uint16_t max_vchans;
+	/** Maximum allowed number of virtual DMA channel descriptors */
+	uint16_t max_desc;
+	/** Minimum allowed number of virtual DMA channel descriptors */
+	uint16_t min_desc;
+	uint16_t nb_vchans; /**< Number of virtual DMA channel configured */
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve the contextual information of a DMA device.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param[out] dev_info
+ *   A pointer to a structure of type *rte_dmadev_info* to be filled with the
+ *   contextual information of the device.
+ *
+ * @return
+ *   - =0: Success, driver updates the contextual information of the DMA device
+ *   - <0: Error code returned by the driver info get function.
+ *
+ */
+__rte_experimental
+int
+rte_dmadev_info_get(uint16_t dev_id, struct rte_dmadev_info *dev_info);
+
+/**
+ * A structure used to configure a DMA device.
+ */
+struct rte_dmadev_conf {
+	/** Maximum number of virtual DMA channel to use.
+	 * This value cannot be greater than the field 'max_vchans' of struct
+	 * rte_dmadev_info which get from rte_dmadev_info_get().
+	 */
+	uint16_t max_vchans;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Configure a DMA device.
+ *
+ * This function must be invoked first before any other function in the
+ * API. This function can also be re-invoked when a device is in the
+ * stopped state.
+ *
+ * @param dev_id
+ *   The identifier of the device to configure.
+ * @param dev_conf
+ *   The DMA device configuration structure encapsulated into rte_dmadev_conf
+ *   object.
+ *
+ * @return
+ *   - =0: Success, device configured.
+ *   - <0: Error code returned by the driver configuration function.
+ */
+__rte_experimental
+int
+rte_dmadev_configure(uint16_t dev_id, const struct rte_dmadev_conf *dev_conf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Start a DMA device.
+ *
+ * The device start step is the last one and consists of setting the DMA
+ * to start accepting jobs.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   - =0: Success, device started.
+ *   - <0: Error code returned by the driver start function.
+ */
+__rte_experimental
+int
+rte_dmadev_start(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Stop a DMA device.
+ *
+ * The device can be restarted with a call to rte_dmadev_start()
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   - =0: Success, device stopped.
+ *   - <0: Error code returned by the driver stop function.
+ */
+__rte_experimental
+int
+rte_dmadev_stop(uint16_t dev_id);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Close a DMA device.
+ *
+ * The device cannot be restarted after this call.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *  - =0: Successfully close device
+ *  - <0: Failure to close device
+ */
+__rte_experimental
+int
+rte_dmadev_close(uint16_t dev_id);
+
+/**
+ * DMA transfer direction defines.
+ */
+#define RTE_DMA_MEM_TO_MEM	(1ull << 0)
+/**< DMA transfer direction - from memory to memory.
+ *
+ * @see struct rte_dmadev_vchan_conf::direction
+ */
+#define RTE_DMA_MEM_TO_DEV	(1ull << 1)
+/**< DMA transfer direction - from memory to device.
+ * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
+ * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
+ * mode, it could initiate a DMA move request from memory (which is ARM memory)
+ * to device (which is x86 host memory).
+ *
+ * @see struct rte_dmadev_vchan_conf::direction
+ */
+#define RTE_DMA_DEV_TO_MEM	(1ull << 2)
+/**< DMA transfer direction - from device to memory.
+ * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
+ * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
+ * mode, it could initiate a DMA move request from device (which is x86 host
+ * memory) to memory (which is ARM memory).
+ *
+ * @see struct rte_dmadev_vchan_conf::direction
+ */
+#define RTE_DMA_DEV_TO_DEV	(1ull << 3)
+/**< DMA transfer direction - from device to device.
+ * In a typical scenario, ARM SoCs are installed on x86 servers as iNICs
+ * through the PCIE interface. In this case, the ARM SoCs works in EP(endpoint)
+ * mode, it could initiate a DMA move request from device (which is x86 host
+ * memory) to device (which is another x86 host memory).
+ *
+ * @see struct rte_dmadev_vchan_conf::direction
+ */
+#define RTE_DMA_TRANSFER_DIR_ALL	(RTE_DMA_MEM_TO_MEM | \
+					 RTE_DMA_MEM_TO_DEV | \
+					 RTE_DMA_DEV_TO_MEM | \
+					 RTE_DMA_DEV_TO_DEV)
+
+/**
+ * enum rte_dmadev_port_type - DMA port type defines
+ * When
+ */
+enum rte_dmadev_port_type {
+	/** The device port type is PCIE. */
+	RTE_DMADEV_PORT_OF_PCIE = 1,
+};
+
+/**
+ * A structure used to descript DMA port parameters.
+ */
+struct rte_dmadev_port_parameters {
+	enum rte_dmadev_port_type port_type;
+	union {
+		/** For PCIE port
+		 *
+		 * The following model show SoC's PCIE module connects to
+		 * multiple PCIE hosts and multiple endpoints. The PCIE module
+		 * has an integrate DMA controller.
+		 * If the DMA wants to access the memory of host A, it can be
+		 * initiated by PF1 in core0, or by VF0 of PF0 in core0.
+		 *
+		 * System Bus
+		 *    |     ----------PCIE module----------
+		 *    |     Bus
+		 *    |     Interface
+		 *    |     -----        ------------------
+		 *    |     |   |        | PCIE Core0     |
+		 *    |     |   |        |                |        -----------
+		 *    |     |   |        |   PF-0 -- VF-0 |        | Host A  |
+		 *    |     |   |--------|        |- VF-1 |--------| Root    |
+		 *    |     |   |        |   PF-1         |        | Complex |
+		 *    |     |   |        |   PF-2         |        -----------
+		 *    |     |   |        ------------------
+		 *    |     |   |
+		 *    |     |   |        ------------------
+		 *    |     |   |        | PCIE Core1     |
+		 *    |     |   |        |                |        -----------
+		 *    |     |   |        |   PF-0 -- VF-0 |        | Host B  |
+		 *    |-----|   |--------|   PF-1 -- VF-0 |--------| Root    |
+		 *    |     |   |        |        |- VF-1 |        | Complex |
+		 *    |     |   |        |   PF-2         |        -----------
+		 *    |     |   |        ------------------
+		 *    |     |   |
+		 *    |     |   |        ------------------
+		 *    |     |DMA|        |                |        ------
+		 *    |     |   |        |                |--------| EP |
+		 *    |     |   |--------| PCIE Core2     |        ------
+		 *    |     |   |        |                |        ------
+		 *    |     |   |        |                |--------| EP |
+		 *    |     |   |        |                |        ------
+		 *    |     -----        ------------------
+		 *
+		 * The following structure is used to describe the above access
+		 * port.
+		 */
+		struct {
+			uint64_t coreid : 3; /**< PCIE core id used */
+			uint64_t pfid : 6; /**< PF id used */
+			uint64_t vfen : 1; /**< VF enable bit */
+			uint64_t vfid : 8; /**< VF id used */
+			/** The pasid filed in TLP packet */
+			uint64_t pasid : 20;
+			/** The attributes filed in TLP packet */
+			uint64_t attr : 3;
+			/** The processing hint filed in TLP packet */
+			uint64_t ph : 2;
+			/** The steering tag filed in TLP packet */
+			uint64_t st : 16;
+		} pcie;
+	};
+	uint64_t reserved[2]; /**< Reserved for future fields */
+};
+
+/**
+ * A structure used to configure a virtual DMA channel.
+ */
+struct rte_dmadev_vchan_conf {
+	uint8_t direction;
+	/**< Set of supported transfer directions
+	 * @see RTE_DMA_MEM_TO_MEM
+	 * @see RTE_DMA_MEM_TO_DEV
+	 * @see RTE_DMA_DEV_TO_MEM
+	 * @see RTE_DMA_DEV_TO_DEV
+	 */
+	/** Number of descriptor for the virtual DMA channel */
+	uint16_t nb_desc;
+	/** 1) Used to describes the port parameter in the device-to-memory
+	 * transfer scenario.
+	 * 2) Used to describes the source port parameter in the
+	 * device-to-device transfer scenario.
+	 * @see struct rte_dmadev_port_parameters
+	 */
+	struct rte_dmadev_port_parameters src_port;
+	/** 1) Used to describes the port parameter in the memory-to-device-to
+	 * transfer scenario.
+	 * 2) Used to describes the destination port parameter in the
+	 * device-to-device transfer scenario.
+	 * @see struct rte_dmadev_port_parameters
+	 */
+	struct rte_dmadev_port_parameters dst_port;
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Allocate and set up a virtual DMA channel.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param conf
+ *   The virtual DMA channel configuration structure encapsulated into
+ *   rte_dmadev_vchan_conf object.
+ *
+ * @return
+ *   - >=0: Allocate success, it is the virtual DMA channel id. This value must
+ *          be less than the field 'max_vchans' of struct rte_dmadev_conf
+ *          which configured by rte_dmadev_configure().
+ *   - <0: Error code returned by the driver virtual channel setup function.
+ */
+__rte_experimental
+int
+rte_dmadev_vchan_setup(uint16_t dev_id,
+		       const struct rte_dmadev_vchan_conf *conf);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Release a virtual DMA channel.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel which return by vchan setup.
+ *
+ * @return
+ *   - =0: Successfully release the virtual DMA channel.
+ *   - <0: Error code returned by the driver virtual channel release function.
+ */
+__rte_experimental
+int
+rte_dmadev_vchan_release(uint16_t dev_id, uint16_t vchan);
+
+/**
+ * rte_dmadev_stats - running statistics.
+ */
+struct rte_dmadev_stats {
+	/** Count of operations which were successfully enqueued */
+	uint64_t enqueued_count;
+	/** Count of operations which were submitted to hardware */
+	uint64_t submitted_count;
+	/** Count of operations which failed to complete */
+	uint64_t completed_fail_count;
+	/** Count of operations which successfully complete */
+	uint64_t completed_count;
+};
+
+#define RTE_DMADEV_ALL_VCHAN	0xFFFFu
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Retrieve basic statistics of a or all virtual DMA channel(s).
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
+ * @param[out] stats
+ *   The basic statistics structure encapsulated into rte_dmadev_stats
+ *   object.
+ *
+ * @return
+ *   - =0: Successfully retrieve stats.
+ *   - <0: Failure to retrieve stats.
+ */
+__rte_experimental
+int
+rte_dmadev_stats_get(uint16_t dev_id, uint16_t vchan,
+		     struct rte_dmadev_stats *stats);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Reset basic statistics of a or all virtual DMA channel(s).
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *   If equal RTE_DMADEV_ALL_VCHAN means all channels.
+ *
+ * @return
+ *   - =0: Successfully reset stats.
+ *   - <0: Failure to reset stats.
+ */
+__rte_experimental
+int
+rte_dmadev_stats_reset(uint16_t dev_id, uint16_t vchan);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Dump DMA device info.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param f
+ *   The file to write the output to.
+ *
+ * @return
+ *   0 on success. Non-zero otherwise.
+ */
+__rte_experimental
+int
+rte_dmadev_dump(uint16_t dev_id, FILE *f);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Trigger the dmadev self test.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ *
+ * @return
+ *   - 0: Selftest successful.
+ *   - -ENOTSUP if the device doesn't support selftest
+ *   - other values < 0 on failure.
+ */
+__rte_experimental
+int
+rte_dmadev_selftest(uint16_t dev_id);
+
+/**
+ * rte_dma_sge - can hold scatter DMA operation request entry
+ */
+struct rte_dma_sge {
+	rte_iova_t addr;
+	uint32_t length;
+};
+
+/**
+ * rte_dma_sg - can hold scatter DMA operation request
+ */
+struct rte_dma_sg {
+	struct rte_dma_sge *src;
+	struct rte_dma_sge *dst;
+	uint16_t nb_src; /**< The number of src entry */
+	uint16_t nb_dst; /**< The number of dst entry */
+};
+
+#include "rte_dmadev_core.h"
+
+/**
+ *  DMA flags to augment operation preparation.
+ *  Used as the 'flags' parameter of rte_dmadev_copy/fill.
+ */
+#define RTE_DMA_OP_FLAG_FENCE	(1ull << 0)
+/**< DMA fence flag
+ * It means the operation with this flag must be processed only after all
+ * previous operations are completed.
+ *
+ * @see rte_dmadev_copy()
+ * @see rte_dmadev_copy_sg()
+ * @see rte_dmadev_fill()
+ */
+#define RTE_DMA_OP_FLAG_SUBMIT	(1ull << 1)
+/**< DMA submit flag
+ * It means the operation with this flag must issue doorbell to hardware after
+ * enqueued jobs.
+ */
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a copy operation onto the virtual DMA channel.
+ *
+ * This queues up a copy operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param src
+ *   The address of the source buffer.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the data to be copied.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued copy job.
+ *   - <0: Error code returned by the driver copy function.
+ */
+__rte_experimental
+static inline int
+rte_dmadev_copy(uint16_t dev_id, uint16_t vchan, rte_iova_t src, rte_iova_t dst,
+		uint32_t length, uint64_t flags)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy, -ENOTSUP);
+#endif
+	return (*dev->copy)(dev, vchan, src, dst, length, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a scatter list copy operation onto the virtual DMA channel.
+ *
+ * This queues up a scatter list copy operation to be performed by hardware,
+ * but does not trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param sg
+ *   The pointer of scatterlist.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued copy scatterlist job.
+ *   - <0: Error code returned by the driver copy scatterlist function.
+ */
+__rte_experimental
+static inline int
+rte_dmadev_copy_sg(uint16_t dev_id, uint16_t vchan, const struct rte_dma_sg *sg,
+		   uint64_t flags)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans ||
+	    sg == NULL)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->copy_sg, -ENOTSUP);
+#endif
+	return (*dev->copy_sg)(dev, vchan, sg, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a fill operation onto the virtual DMA channel.
+ *
+ * This queues up a fill operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param flags
+ *   An flags for this operation.
+ *   @see RTE_DMA_OP_FLAG_*
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued fill job.
+ *   - <0: Error code returned by the driver fill function.
+ */
+__rte_experimental
+static inline int
+rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
+		rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
+#endif
+	return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Trigger hardware to begin performing enqueued operations.
+ *
+ * This API is used to write the "doorbell" to the hardware to trigger it
+ * to begin the operations previously enqueued by rte_dmadev_copy/fill()
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ *
+ * @return
+ *   - =0: Successfully trigger hardware.
+ *   - <0: Failure to trigger hardware.
+ */
+__rte_experimental
+static inline int
+rte_dmadev_submit(uint16_t dev_id, uint16_t vchan)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->submit, -ENOTSUP);
+#endif
+	return (*dev->submit)(dev, vchan);
+}
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of operations that have been successfully completed.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   The maximum number of completed operations that can be processed.
+ * @param[out] last_idx
+ *   The last completed operation's index.
+ *   If not required, NULL can be passed in.
+ * @param[out] has_error
+ *   Indicates if there are transfer error.
+ *   If not required, NULL can be passed in.
+ *
+ * @return
+ *   The number of operations that successfully completed.
+ */
+__rte_experimental
+static inline uint16_t
+rte_dmadev_completed(uint16_t dev_id, uint16_t vchan, const uint16_t nb_cpls,
+		     uint16_t *last_idx, bool *has_error)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+	uint16_t idx;
+	bool err;
+
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans ||
+	    nb_cpls == 0)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed, -ENOTSUP);
+#endif
+
+	/* Ensure the pointer values are non-null to simplify drivers.
+	 * In most cases these should be compile time evaluated, since this is
+	 * an inline function.
+	 * - If NULL is explicitly passed as parameter, then compiler knows the
+	 *   value is NULL
+	 * - If address of local variable is passed as parameter, then compiler
+	 *   can know it's non-NULL.
+	 */
+	if (last_idx == NULL)
+		last_idx = &idx;
+	if (has_error == NULL)
+		has_error = &err;
+
+	*has_error = false;
+	return (*dev->completed)(dev, vchan, nb_cpls, last_idx, has_error);
+}
+
+/**
+ * DMA transfer status code defines
+ */
+enum rte_dma_status_code {
+	/** The operation completed successfully */
+	RTE_DMA_STATUS_SUCCESSFUL = 0,
+	/** The operation failed to complete due active drop
+	 * This is mainly used when processing dev_stop, allow outstanding
+	 * requests to be completed as much as possible.
+	 */
+	RTE_DMA_STATUS_ACTIVE_DROP,
+	/** The operation failed to complete due invalid source address */
+	RTE_DMA_STATUS_INVALID_SRC_ADDR,
+	/** The operation failed to complete due invalid destination address */
+	RTE_DMA_STATUS_INVALID_DST_ADDR,
+	/** The operation failed to complete due invalid length */
+	RTE_DMA_STATUS_INVALID_LENGTH,
+	/** The operation failed to complete due invalid opcode
+	 * The DMA descriptor could have multiple format, which are
+	 * distinguished by the opcode field.
+	 */
+	RTE_DMA_STATUS_INVALID_OPCODE,
+	/** The operation failed to complete due bus err */
+	RTE_DMA_STATUS_BUS_ERROR,
+	/** The operation failed to complete due data poison */
+	RTE_DMA_STATUS_DATA_POISION,
+	/** The operation failed to complete due descriptor read error */
+	RTE_DMA_STATUS_DESCRIPTOR_READ_ERROR,
+	/** The operation failed to complete due device link error
+	 * Used to indicates that the link error in the mem-to-dev/dev-to-mem/
+	 * dev-to-dev transfer scenario.
+	 */
+	RTE_DMA_STATUS_DEV_LINK_ERROR,
+	/** The operation failed to complete due unknown reason */
+	RTE_DMA_STATUS_UNKNOWN,
+	/** Driver specific status code offset
+	 * Start status code for the driver to define its own error code.
+	 */
+	RTE_DMA_STATUS_DRV_SPECIFIC_OFFSET = 0x10000,
+};
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of operations that failed to complete.
+ * NOTE: This API was used when rte_dmadev_completed has_error was set.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_status
+ *   Indicates the size of status array.
+ * @param[out] status
+ *   The error code of operations that failed to complete.
+ *   Some standard error code are described in 'enum rte_dma_status_code'
+ *   @see rte_dma_status_code
+ * @param[out] last_idx
+ *   The last failed completed operation's index.
+ *
+ * @return
+ *   The number of operations that failed to complete.
+ */
+__rte_experimental
+static inline uint16_t
+rte_dmadev_completed_fails(uint16_t dev_id, uint16_t vchan,
+			   const uint16_t nb_status, uint32_t *status,
+			   uint16_t *last_idx)
+{
+	struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+	if (!rte_dmadev_is_valid_dev(dev_id) ||
+	    vchan >= dev->data->dev_conf.max_vchans ||
+	    nb_status == 0 ||
+	    status == NULL ||
+	    last_idx == NULL)
+		return -EINVAL;
+	RTE_FUNC_PTR_OR_ERR_RET(*dev->completed_fails, -ENOTSUP);
+#endif
+	return (*dev->completed_fails)(dev, vchan, nb_status, status, last_idx);
+}
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DMADEV_H_ */
diff --git a/lib/dmadev/rte_dmadev_core.h b/lib/dmadev/rte_dmadev_core.h
new file mode 100644
index 0000000..b0b6494
--- /dev/null
+++ b/lib/dmadev/rte_dmadev_core.h
@@ -0,0 +1,161 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited.
+ * Copyright(c) 2021 Intel Corporation.
+ */
+
+#ifndef _RTE_DMADEV_CORE_H_
+#define _RTE_DMADEV_CORE_H_
+
+/**
+ * @file
+ *
+ * RTE DMA Device internal header.
+ *
+ * This header contains internal data types, that are used by the DMA devices
+ * in order to expose their ops to the class.
+ *
+ * Applications should not use these API directly.
+ *
+ */
+
+struct rte_dmadev;
+
+/** @internal Used to get device information of a device. */
+typedef int (*dmadev_info_get_t)(const struct rte_dmadev *dev,
+				 struct rte_dmadev_info *dev_info,
+				 uint32_t info_sz);
+
+/** @internal Used to configure a device. */
+typedef int (*dmadev_configure_t)(struct rte_dmadev *dev,
+				  const struct rte_dmadev_conf *dev_conf);
+
+/** @internal Used to start a configured device. */
+typedef int (*dmadev_start_t)(struct rte_dmadev *dev);
+
+/** @internal Used to stop a configured device. */
+typedef int (*dmadev_stop_t)(struct rte_dmadev *dev);
+
+/** @internal Used to close a configured device. */
+typedef int (*dmadev_close_t)(struct rte_dmadev *dev);
+
+/** @internal Used to allocate and set up a virtual DMA channel. */
+typedef int (*dmadev_vchan_setup_t)(struct rte_dmadev *dev,
+				    const struct rte_dmadev_vchan_conf *conf);
+
+/** @internal Used to release a virtual DMA channel. */
+typedef int (*dmadev_vchan_release_t)(struct rte_dmadev *dev, uint16_t vchan);
+
+/** @internal Used to retrieve basic statistics. */
+typedef int (*dmadev_stats_get_t)(const struct rte_dmadev *dev, uint16_t vchan,
+				  struct rte_dmadev_stats *stats,
+				  uint32_t stats_sz);
+
+/** @internal Used to reset basic statistics. */
+typedef int (*dmadev_stats_reset_t)(struct rte_dmadev *dev, uint16_t vchan);
+
+/** @internal Used to dump internal information. */
+typedef int (*dmadev_dump_t)(const struct rte_dmadev *dev, FILE *f);
+
+/** @internal Used to start dmadev selftest. */
+typedef int (*dmadev_selftest_t)(uint16_t dev_id);
+
+/** @internal Used to enqueue a copy operation. */
+typedef int (*dmadev_copy_t)(struct rte_dmadev *dev, uint16_t vchan,
+			     rte_iova_t src, rte_iova_t dst,
+			     uint32_t length, uint64_t flags);
+
+/** @internal Used to enqueue a scatter list copy operation. */
+typedef int (*dmadev_copy_sg_t)(struct rte_dmadev *dev, uint16_t vchan,
+				const struct rte_dma_sg *sg, uint64_t flags);
+
+/** @internal Used to enqueue a fill operation. */
+typedef int (*dmadev_fill_t)(struct rte_dmadev *dev, uint16_t vchan,
+			     uint64_t pattern, rte_iova_t dst,
+			     uint32_t length, uint64_t flags);
+
+/** @internal Used to trigger hardware to begin working. */
+typedef int (*dmadev_submit_t)(struct rte_dmadev *dev, uint16_t vchan);
+
+/** @internal Used to return number of successful completed operations. */
+typedef uint16_t (*dmadev_completed_t)(struct rte_dmadev *dev, uint16_t vchan,
+				       const uint16_t nb_cpls,
+				       uint16_t *last_idx, bool *has_error);
+
+/** @internal Used to return number of failed completed operations. */
+typedef uint16_t (*dmadev_completed_fails_t)(struct rte_dmadev *dev,
+			uint16_t vchan, const uint16_t nb_status,
+			uint32_t *status, uint16_t *last_idx);
+
+/**
+ * Possible states of a DMA device.
+ */
+enum rte_dmadev_state {
+	/** Device is unused before being probed. */
+	RTE_DMADEV_UNUSED = 0,
+	/** Device is attached when allocated in probing. */
+	RTE_DMADEV_ATTACHED,
+};
+
+/**
+ * DMA device operations function pointer table
+ */
+struct rte_dmadev_ops {
+	dmadev_info_get_t dev_info_get;
+	dmadev_configure_t dev_configure;
+	dmadev_start_t dev_start;
+	dmadev_stop_t dev_stop;
+	dmadev_close_t dev_close;
+	dmadev_vchan_setup_t vchan_setup;
+	dmadev_vchan_release_t vchan_release;
+	dmadev_stats_get_t stats_get;
+	dmadev_stats_reset_t stats_reset;
+	dmadev_dump_t dev_dump;
+	dmadev_selftest_t dev_selftest;
+};
+
+/**
+ * @internal
+ * The data part, with no function pointers, associated with each DMA device.
+ *
+ * This structure is safe to place in shared memory to be common among different
+ * processes in a multi-process configuration.
+ */
+struct rte_dmadev_data {
+	void *dev_private; /**< PMD-specific private data. */
+	uint16_t dev_id; /**< Device [external] identifier. */
+	char dev_name[RTE_DMADEV_NAME_MAX_LEN]; /**< Unique identifier name */
+	struct rte_dmadev_conf dev_conf; /**< DMA device configuration. */
+	uint8_t dev_started : 1; /**< Device state: STARTED(1)/STOPPED(0). */
+	uint64_t reserved[2]; /**< Reserved for future fields */
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ * The generic data structure associated with each DMA device.
+ *
+ * The dataplane APIs are located at the beginning of the structure, along
+ * with the pointer to where all the data elements for the particular device
+ * are stored in shared memory. This split scheme allows the function pointer
+ * and driver data to be per-process, while the actual configuration data for
+ * the device is shared.
+ */
+struct rte_dmadev {
+	dmadev_copy_t copy;
+	dmadev_copy_sg_t copy_sg;
+	dmadev_fill_t fill;
+	dmadev_submit_t submit;
+	dmadev_completed_t completed;
+	dmadev_completed_fails_t completed_fails;
+	void *reserved_ptr; /**< Reserved for future IO function */
+	struct rte_dmadev_data *data; /**< Pointer to device data. */
+
+	const struct rte_dmadev_ops *dev_ops; /**< Functions exported by PMD. */
+	/** Device info which supplied during device initialization. */
+	struct rte_device *device;
+	enum rte_dmadev_state state; /**< Flag indicating the device state */
+	uint64_t reserved[2]; /**< Reserved for future fields */
+} __rte_cache_aligned;
+
+extern struct rte_dmadev rte_dmadevices[];
+
+#endif /* _RTE_DMADEV_CORE_H_ */
diff --git a/lib/dmadev/rte_dmadev_pmd.h b/lib/dmadev/rte_dmadev_pmd.h
new file mode 100644
index 0000000..45141f9
--- /dev/null
+++ b/lib/dmadev/rte_dmadev_pmd.h
@@ -0,0 +1,72 @@ 
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2021 HiSilicon Limited.
+ */
+
+#ifndef _RTE_DMADEV_PMD_H_
+#define _RTE_DMADEV_PMD_H_
+
+/**
+ * @file
+ *
+ * RTE DMA Device PMD APIs
+ *
+ * Driver facing APIs for a DMA device. These are not to be called directly by
+ * any application.
+ */
+
+#include "rte_dmadev.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ * @internal
+ * Allocates a new dmadev slot for an DMA device and returns the pointer
+ * to that slot for the driver to use.
+ *
+ * @param name
+ *   DMA device name.
+ *
+ * @return
+ *   A pointer to the DMA device slot case of success,
+ *   NULL otherwise.
+ */
+__rte_internal
+struct rte_dmadev *
+rte_dmadev_pmd_allocate(const char *name);
+
+/**
+ * @internal
+ * Release the specified dmadev.
+ *
+ * @param dev
+ *   Device to be released.
+ *
+ * @return
+ *   - 0 on success, negative on error
+ */
+__rte_internal
+int
+rte_dmadev_pmd_release(struct rte_dmadev *dev);
+
+/**
+ * @internal
+ * Return the DMA device based on the device name.
+ *
+ * @param name
+ *   DMA device name.
+ *
+ * @return
+ *   A pointer to the DMA device slot case of success,
+ *   NULL otherwise.
+ */
+__rte_internal
+struct rte_dmadev *
+rte_dmadev_get_device_by_name(const char *name);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_DMADEV_PMD_H_ */
diff --git a/lib/dmadev/version.map b/lib/dmadev/version.map
new file mode 100644
index 0000000..2af78e4
--- /dev/null
+++ b/lib/dmadev/version.map
@@ -0,0 +1,37 @@ 
+EXPERIMENTAL {
+	global:
+
+	rte_dmadev_close;
+	rte_dmadev_completed;
+	rte_dmadev_completed_fails;
+	rte_dmadev_configure;
+	rte_dmadev_copy;
+	rte_dmadev_copy_sg;
+	rte_dmadev_count;
+	rte_dmadev_dump;
+	rte_dmadev_fill;
+	rte_dmadev_info_get;
+	rte_dmadev_is_valid_dev;
+	rte_dmadev_selftest;
+	rte_dmadev_start;
+	rte_dmadev_stats_get;
+	rte_dmadev_stats_reset;
+	rte_dmadev_stop;
+	rte_dmadev_submit;
+	rte_dmadev_vchan_release;
+	rte_dmadev_vchan_setup;
+
+	local: *;
+};
+
+INTERNAL {
+        global:
+
+	rte_dmadevices;
+	rte_dmadev_get_device_by_name;
+	rte_dmadev_pmd_allocate;
+	rte_dmadev_pmd_release;
+
+	local: *;
+};
+
diff --git a/lib/meson.build b/lib/meson.build
index 1673ca4..68d239f 100644
--- a/lib/meson.build
+++ b/lib/meson.build
@@ -60,6 +60,7 @@  libraries = [
         'bpf',
         'graph',
         'node',
+        'dmadev',
 ]
 
 if is_windows