[EXT] [PATCH v5] app/dma-perf: introduce dma-perf application

Jiang, Cheng1 cheng1.jiang at intel.com
Mon Jun 12 09:40:17 CEST 2023
Previous message (by thread): [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
Next message (by thread): [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,

Thanks for your comments, the replies are inline.

Thanks,
Cheng

> -----Original Message-----
> From: Anoob Joseph <anoobj at marvell.com>
> Sent: Friday, June 9, 2023 7:44 PM
> To: Jiang, Cheng1 <cheng1.jiang at intel.com>; thomas at monjalon.net;
> Richardson, Bruce <bruce.richardson at intel.com>;
> mb at smartsharesystems.com; Xia, Chenbo <chenbo.xia at intel.com>
> Cc: dev at dpdk.org; Hu, Jiayu <jiayu.hu at intel.com>; Ding, Xuan
> <xuan.ding at intel.com>; Ma, WenwuX <wenwux.ma at intel.com>; Wang,
> YuanX <yuanx.wang at intel.com>; He, Xingguang <xingguang.he at intel.com>;
> Jerin Jacob Kollanukkaran <jerinj at marvell.com>; Vamsi Krishna Attunuru
> <vattunuru at marvell.com>; Amit Prakash Shukla
> <amitprakashs at marvell.com>; Satha Koteswara Rao Kottidi
> <skoteshwar at marvell.com>; Gowrishankar Muthukrishnan
> <gmuthukrishn at marvell.com>; Vidya Sagar Velumuri
> <vvelumuri at marvell.com>
> Subject: RE: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
> 
> Hi,
> 
> Thanks for adding the app. Few comments inline. Please check.
> 
> Thanks,
> Anoob
> 
> > -----Original Message-----
> > From: Cheng Jiang <cheng1.jiang at intel.com>
> > Sent: Thursday, June 8, 2023 2:14 PM
> > To: thomas at monjalon.net; bruce.richardson at intel.com;
> > mb at smartsharesystems.com; chenbo.xia at intel.com
> > Cc: dev at dpdk.org; jiayu.hu at intel.com; xuan.ding at intel.com;
> > wenwux.ma at intel.com; yuanx.wang at intel.com; xingguang.he at intel.com;
> > Cheng Jiang <cheng1.jiang at intel.com>
> > Subject: [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > There are many high-performance DMA devices supported in DPDK now,
> and
> > these DMA devices can also be integrated into other modules of DPDK as
> > accelerators, such as Vhost. Before integrating DMA into applications,
> > developers need to know the performance of these DMA devices in
> > various scenarios and the performance of CPUs in the same scenario,
> > such as different buffer lengths. Only in this way can we know the
> > target performance of the application accelerated by using them. This
> > patch introduces a high-performance testing tool, which supports
> > comparing the performance of CPU and DMA in different scenarios
> > automatically with a pre- set config file. Memory Copy performance test
> are supported for now.
> >
> > Signed-off-by: Cheng Jiang <cheng1.jiang at intel.com>
> > Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> > Signed-off-by: Yuan Wang <yuanx.wang at intel.com>
> > Acked-by: Morten Brørup <mb at smartsharesystems.com>
> > Acked-by: Chenbo Xia <chenbo.xia at intel.com>
> > ---
> > v5:
> >   fixed some LONG_LINE warnings;
> > v4:
> >   fixed inaccuracy of the memory footprint display;
> > v3:
> >   fixed some typos;
> > v2:
> >   added lcore/dmadev designation;
> >   added error case process;
> >   removed worker_threads parameter from config.ini;
> >   improved the logs;
> >   improved config file;
> >
> >  app/meson.build               |   1 +
> >  app/test-dma-perf/benchmark.c | 472 ++++++++++++++++++++++++++++
> > app/test-dma-perf/config.ini  |  59 ++++
> >  app/test-dma-perf/main.c      | 569
> > ++++++++++++++++++++++++++++++++++
> >  app/test-dma-perf/main.h      |  69 +++++
> >  app/test-dma-perf/meson.build |  17 +
> >  6 files changed, 1187 insertions(+)
> >  create mode 100644 app/test-dma-perf/benchmark.c  create mode
> 100644
> > app/test-dma-perf/config.ini  create mode 100644 app/test-dma-
> > perf/main.c  create mode 100644 app/test-dma-perf/main.h  create mode
> > 100644 app/test-dma-perf/meson.build
> >
> 
> <snip>
> 
> > +
> > +/* Configuration of device. */
> > +static void
> > +configure_dmadev_queue(uint32_t dev_id, uint32_t ring_size) {
> > +	uint16_t vchan = 0;
> > +	struct rte_dma_info info;
> > +	struct rte_dma_conf dev_config = { .nb_vchans = 1 };
> 
> [Anoob] Is it possible to use more vchans? The code launches as many
> threads as the number of dma devices. Instead it should be total number of
> vchans.

[Cheng] Really good suggestion. This is feasible, but in the initial stage, we want to keep things simple. Perhaps in the future, we can add a parameter to configure the number of vchans for each device and then launch the corresponding number of threads for each vchan.

> 
> > +	struct rte_dma_vchan_conf qconf = {
> > +		.direction = RTE_DMA_DIR_MEM_TO_MEM,
> > +		.nb_desc = ring_size
> > +	};
> > +
> > +	if (rte_dma_configure(dev_id, &dev_config) != 0)
> > +		rte_exit(EXIT_FAILURE, "Error with dma configure.\n");
> > +
> > +	if (rte_dma_vchan_setup(dev_id, vchan, &qconf) != 0)
> > +		rte_exit(EXIT_FAILURE, "Error with queue configuration.\n");
> > +
> > +	rte_dma_info_get(dev_id, &info);
> > +	if (info.nb_vchans != 1)
> > +		rte_exit(EXIT_FAILURE, "Error, no configured queues
> > reported on device id. %u\n",
> > +				dev_id);
> > +
> > +	if (rte_dma_start(dev_id) != 0)
> > +		rte_exit(EXIT_FAILURE, "Error with dma start.\n"); }
> > +
> > +
> 
> <snip>
> 
> > +static inline int
> > +do_dma_mem_copy(void *p)
> > +{
> > +	uint16_t *para_idx = (uint16_t *)p;
> > +	volatile struct lcore_params *para = worker_params[*para_idx];
> > +	volatile struct worker_info *worker_info = &(para->worker_info);
> > +	uint16_t dev_id = para->dev_id;
> > +	uint32_t nr_buf = para->nr_buf;
> > +	uint16_t kick_batch = para->kick_batch;
> 
> [Anoob] Some of these variables can be made const. Since this is fast path,
> might be beneficial doing that way.

[Cheng] Good idea, I'll improve it in the next version.

> 
> > +	uint32_t buf_size = para->buf_size;
> > +	struct rte_mbuf **srcs = para->srcs;
> > +	struct rte_mbuf **dsts = para->dsts;
> > +	int64_t async_cnt = 0;
> > +	int nr_cpl = 0;
> > +	uint32_t i;
> > +	uint32_t poll_cnt = 0;
> > +
> > +	worker_info->stop_flag = false;
> > +	worker_info->ready_flag = true;
> > +
> > +	while (!worker_info->start_flag)
> > +		;
> > +
> > +	while (1) {
> > +		for (i = 0; i < nr_buf; i++) {
> > +			if (unlikely(rte_dma_copy(dev_id,
> > +						0,
> > +						rte_pktmbuf_iova(srcs[i]),
> > +						rte_pktmbuf_iova(dsts[i]),
> > +						buf_size,
> > +						0) < 0)) {
> > +				rte_dma_submit(dev_id, 0);
> > +				while (rte_dma_burst_capacity(dev_id, 0) ==
> > 0) {
> > +					nr_cpl = rte_dma_completed(dev_id,
> > 0, MAX_DMA_CPL_NB,
> > +								NULL, NULL);
> > +					async_cnt -= nr_cpl;
> > +					worker_info->total_cpl += nr_cpl;
> > +				}
> > +				if (rte_dma_copy(dev_id,
> > +						0,
> > +						rte_pktmbuf_iova(srcs[i]),
> > +						rte_pktmbuf_iova(dsts[i]),
> > +						buf_size,
> > +						0) < 0) {
> > +					printf("enqueue fail again at %u\n",
> > i);
> > +					printf("space:%d\n",
> > rte_dma_burst_capacity(dev_id, 0));
> > +					rte_exit(EXIT_FAILURE, "DMA
> > enqueue failed\n");
> > +				}
> 
> [Anoob] Only if the API returns -ENOSPC we should retry submitting, right?
> Other errors should be treated as fatal errors.
> 
> Do we need to use rte_dma_burst_capacity() API?
> 
> Can't we try something like,
> 
> dma_copy:
> 		ret =  rte_dma_copy(dev_id, 0, rte_pktmbuf_iova(srcs[i]),
> rte_pktmbuf_iova(dsts[i]), buf_size, 0);
> 		if (unlikely (ret < 0) {
> 			if (ret == -ENOSPC) {
> 				rte_dma_submit(dev_id, 0);
> 				/* DMA completed & other handling */
> 				goto dma_copy;
> 			} else {
> 				/* Error exit */
> 			}
> 		}
> 
> 

[Cheng] Good idea, we don't have to check the capacity explicitly. I think your implementation is more clear, thanks. I will fix it in the next version.

> > +			}
> > +			async_cnt++;
> > +
> > +			if ((async_cnt % kick_batch) == 0) {
> > +				rte_dma_submit(dev_id, 0);
> > +				/* add a poll to avoid ring full */
> > +				nr_cpl = rte_dma_completed(dev_id, 0,
> > MAX_DMA_CPL_NB, NULL, NULL);
> > +				async_cnt -= nr_cpl;
> > +				worker_info->total_cpl += nr_cpl;
> 
> [Anoob] Above code can be made as a static inline function so that in cases
> rte_dma_copy returns -ENOSPC, same static inline can be called.
> 

[Cheng] sure, got it. Thanks!

> <snip>
Previous message (by thread): [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
Next message (by thread): [EXT] [PATCH v5] app/dma-perf: introduce dma-perf application
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list