[dpdk-dev,v4,10/13] app/eventdev: add pipeline queue worker functions

Message ID 20180112164416.21374-10-pbhagavatula@caviumnetworks.com (mailing list archive)
State Superseded, archived
Delegated to: Jerin Jacob
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail apply patch file failure

Commit Message

Pavan Nikhilesh Jan. 12, 2018, 4:44 p.m. UTC
  Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
---
 app/test-eventdev/test_pipeline_common.h |  80 +++++++++
 app/test-eventdev/test_pipeline_queue.c  | 289 ++++++++++++++++++++++++++++++-
 2 files changed, 368 insertions(+), 1 deletion(-)
  

Comments

Van Haaren, Harry Jan. 16, 2018, 12:05 p.m. UTC | #1
> From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com]
> Sent: Friday, January 12, 2018 4:44 PM
> To: jerin.jacob@caviumnetworks.com; santosh.shukla@caviumnetworks.com; Van
> Haaren, Harry <harry.van.haaren@intel.com>; Eads, Gage
> <gage.eads@intel.com>; hemant.agrawal@nxp.com; nipun.gupta@nxp.com; Ma,
> Liang J <liang.j.ma@intel.com>
> Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
> Subject: [dpdk-dev] [PATCH v4 10/13] app/eventdev: add pipeline queue worker
> functions
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>

Some of the code feels like duplication - but there are differences in each implementation as far as I can see. The common initialization parts are "macro-ed" to ensure minimal duplication. In short, I'm not aware of a better solution/implementation.

Please see note on branch-optimization in the code below.

Acked-by: Harry van Haaren <harry.van.haaren@intel.com>


>  static int
>  worker_wrapper(void *arg)
>  {
> -	RTE_SET_USED(arg);
> +	struct worker_data *w  = arg;
> +	struct evt_options *opt = w->t->opt;
> +	const bool burst = evt_has_burst_mode(w->dev_id);
> +	const bool mt_safe = !w->t->mt_unsafe;
> +	const uint8_t nb_stages = opt->nb_stages;
> +	RTE_SET_USED(opt);
> +
> +	/* allow compiler to optimize */
> +	if (nb_stages == 1) {
> +		if (!burst && mt_safe)
> +			return pipeline_queue_worker_single_stage_tx(arg);
> +		else if (!burst && !mt_safe)
> +			return pipeline_queue_worker_single_stage_fwd(arg);
> +		else if (burst && mt_safe)
> +			return pipeline_queue_worker_single_stage_burst_tx(arg);
> +		else if (burst && !mt_safe)
> +			return pipeline_queue_worker_single_stage_burst_fwd(
> +					arg);
> +	} else {
> +		if (!burst && mt_safe)
> +			return pipeline_queue_worker_multi_stage_tx(arg);
> +		else if (!burst && !mt_safe)
> +			return pipeline_queue_worker_multi_stage_fwd(arg);
> +		else if (burst && mt_safe)
> +			return pipeline_queue_worker_multi_stage_burst_tx(arg);
> +		else if (burst && !mt_safe)
> +			return pipeline_queue_worker_multi_stage_burst_fwd(arg);
> +
> +	}


I don't think the compiler will optimize the below, as the nb_stages value comes from opt, which is a uint8 originating from a pointer-dereference... 

As far as I know, an optimizing compiler will do "constant propagation" through function calls, but I would be amazed if it found a pointer initialized to a specific value, and optimized the branches away here based on the contents of that pointer deref.

So I see 2 options;
1) accept as is - I'll ack this patch here
2) Rework so that the worker_wrapper() accepts nb_stages, burst and mt_safe as function arguments, allowing constant propagation for optimizing away the branches.

@Pavan, I'll leave that choice up to you!
  
Pavan Nikhilesh Jan. 16, 2018, 12:31 p.m. UTC | #2
Hi Harry,

Thanks for the review.

On Tue, Jan 16, 2018 at 12:05:48PM +0000, Van Haaren, Harry wrote:
> > From: Pavan Nikhilesh [mailto:pbhagavatula@caviumnetworks.com]
> > Sent: Friday, January 12, 2018 4:44 PM
> > To: jerin.jacob@caviumnetworks.com; santosh.shukla@caviumnetworks.com; Van
> > Haaren, Harry <harry.van.haaren@intel.com>; Eads, Gage
> > <gage.eads@intel.com>; hemant.agrawal@nxp.com; nipun.gupta@nxp.com; Ma,
> > Liang J <liang.j.ma@intel.com>
> > Cc: dev@dpdk.org; Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
> > Subject: [dpdk-dev] [PATCH v4 10/13] app/eventdev: add pipeline queue worker
> > functions
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@caviumnetworks.com>
>
> Some of the code feels like duplication - but there are differences in each implementation as far as I can see. The common initialization parts are "macro-ed" to ensure minimal duplication. In short, I'm not aware of a better solution/implementation.
>
> Please see note on branch-optimization in the code below.
>
> Acked-by: Harry van Haaren <harry.van.haaren@intel.com>
>
>
> >  static int
> >  worker_wrapper(void *arg)
> >  {
> > -	RTE_SET_USED(arg);
> > +	struct worker_data *w  = arg;
> > +	struct evt_options *opt = w->t->opt;
> > +	const bool burst = evt_has_burst_mode(w->dev_id);
> > +	const bool mt_safe = !w->t->mt_unsafe;
> > +	const uint8_t nb_stages = opt->nb_stages;
> > +	RTE_SET_USED(opt);
> > +
> > +	/* allow compiler to optimize */
> > +	if (nb_stages == 1) {
> > +		if (!burst && mt_safe)
> > +			return pipeline_queue_worker_single_stage_tx(arg);
> > +		else if (!burst && !mt_safe)
> > +			return pipeline_queue_worker_single_stage_fwd(arg);
> > +		else if (burst && mt_safe)
> > +			return pipeline_queue_worker_single_stage_burst_tx(arg);
> > +		else if (burst && !mt_safe)
> > +			return pipeline_queue_worker_single_stage_burst_fwd(
> > +					arg);
> > +	} else {
> > +		if (!burst && mt_safe)
> > +			return pipeline_queue_worker_multi_stage_tx(arg);
> > +		else if (!burst && !mt_safe)
> > +			return pipeline_queue_worker_multi_stage_fwd(arg);
> > +		else if (burst && mt_safe)
> > +			return pipeline_queue_worker_multi_stage_burst_tx(arg);
> > +		else if (burst && !mt_safe)
> > +			return pipeline_queue_worker_multi_stage_burst_fwd(arg);
> > +
> > +	}
>
>
> I don't think the compiler will optimize the below, as the nb_stages value comes from opt, which is a uint8 originating from a pointer-dereference...
>
> As far as I know, an optimizing compiler will do "constant propagation" through function calls, but I would be amazed if it found a pointer initialized to a specific value, and optimized the branches away here based on the contents of that pointer deref.
>
> So I see 2 options;
> 1) accept as is - I'll ack this patch here
> 2) Rework so that the worker_wrapper() accepts nb_stages, burst and mt_safe as function arguments, allowing constant propagation for optimizing away the branches.
>
> @Pavan, I'll leave that choice up to you!

As the worker functions are not affected, I will keep the conditions as is and
remove the comment for now.

Cheers,
Pavan.
>
  

Patch

diff --git a/app/test-eventdev/test_pipeline_common.h b/app/test-eventdev/test_pipeline_common.h
index db2517baf..5fb91607d 100644
--- a/app/test-eventdev/test_pipeline_common.h
+++ b/app/test-eventdev/test_pipeline_common.h
@@ -63,6 +63,86 @@  struct test_pipeline {
 	uint8_t sched_type_list[EVT_MAX_STAGES] __rte_cache_aligned;
 } __rte_cache_aligned;
 
+#define BURST_SIZE 16
+
+#define PIPELINE_WROKER_SINGLE_STAGE_INIT \
+	struct worker_data *w  = arg;     \
+	struct test_pipeline *t = w->t;   \
+	const uint8_t dev = w->dev_id;    \
+	const uint8_t port = w->port_id;  \
+	struct rte_event ev
+
+#define PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT \
+	int i;                                  \
+	struct worker_data *w  = arg;           \
+	struct test_pipeline *t = w->t;         \
+	const uint8_t dev = w->dev_id;          \
+	const uint8_t port = w->port_id;        \
+	struct rte_event ev[BURST_SIZE + 1]
+
+#define PIPELINE_WROKER_MULTI_STAGE_INIT                         \
+	struct worker_data *w  = arg;                            \
+	struct test_pipeline *t = w->t;                          \
+	uint8_t cq_id;                                           \
+	const uint8_t dev = w->dev_id;                           \
+	const uint8_t port = w->port_id;                         \
+	const uint8_t last_queue = t->opt->nb_stages - 1;        \
+	uint8_t *const sched_type_list = &t->sched_type_list[0]; \
+	struct rte_event ev
+
+#define PIPELINE_WROKER_MULTI_STAGE_BURST_INIT                   \
+	int i;                                  \
+	struct worker_data *w  = arg;                            \
+	struct test_pipeline *t = w->t;                          \
+	uint8_t cq_id;                                           \
+	const uint8_t dev = w->dev_id;                           \
+	const uint8_t port = w->port_id;                         \
+	const uint8_t last_queue = t->opt->nb_stages - 1;        \
+	uint8_t *const sched_type_list = &t->sched_type_list[0]; \
+	struct rte_event ev[BURST_SIZE + 1]
+
+static __rte_always_inline void
+pipeline_fwd_event(struct rte_event *ev, uint8_t sched)
+{
+	ev->event_type = RTE_EVENT_TYPE_CPU;
+	ev->op = RTE_EVENT_OP_FORWARD;
+	ev->sched_type = sched;
+}
+
+static __rte_always_inline void
+pipeline_event_enqueue(const uint8_t dev, const uint8_t port,
+		struct rte_event *ev)
+{
+	while (rte_event_enqueue_burst(dev, port, ev, 1) != 1)
+		rte_pause();
+}
+
+static __rte_always_inline void
+pipeline_event_enqueue_burst(const uint8_t dev, const uint8_t port,
+		struct rte_event *ev, const uint16_t nb_rx)
+{
+	uint16_t enq;
+
+	enq = rte_event_enqueue_burst(dev, port, ev, nb_rx);
+	while (enq < nb_rx) {
+		enq += rte_event_enqueue_burst(dev, port,
+						ev + enq, nb_rx - enq);
+	}
+}
+
+static __rte_always_inline void
+pipeline_tx_pkt(struct rte_mbuf *mbuf)
+{
+	while (rte_eth_tx_burst(mbuf->port, 0, &mbuf, 1) != 1)
+		rte_pause();
+}
+
+static inline int
+pipeline_nb_event_ports(struct evt_options *opt)
+{
+	return evt_nr_active_lcores(opt->wlcores);
+}
+
 int pipeline_test_result(struct evt_test *test, struct evt_options *opt);
 int pipeline_opt_check(struct evt_options *opt, uint64_t nb_queues);
 int pipeline_test_setup(struct evt_test *test, struct evt_options *opt);
diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-eventdev/test_pipeline_queue.c
index 773c3ecaa..ea784d2c6 100644
--- a/app/test-eventdev/test_pipeline_queue.c
+++ b/app/test-eventdev/test_pipeline_queue.c
@@ -15,10 +15,297 @@  pipeline_queue_nb_event_queues(struct evt_options *opt)
 	return (eth_count * opt->nb_stages) + eth_count;
 }
 
+static int
+pipeline_queue_worker_single_stage_tx(void *arg)
+{
+	PIPELINE_WROKER_SINGLE_STAGE_INIT;
+
+	while (t->done == false) {
+		uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
+
+		if (!event) {
+			rte_pause();
+			continue;
+		}
+
+		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
+			pipeline_tx_pkt(ev.mbuf);
+			w->processed_pkts++;
+		} else {
+			ev.queue_id++;
+			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
+			pipeline_event_enqueue(dev, port, &ev);
+		}
+	}
+
+	return 0;
+}
+
+static int
+pipeline_queue_worker_single_stage_fwd(void *arg)
+{
+	PIPELINE_WROKER_SINGLE_STAGE_INIT;
+	const uint8_t tx_queue = t->tx_service.queue_id;
+
+	while (t->done == false) {
+		uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
+
+		if (!event) {
+			rte_pause();
+			continue;
+		}
+
+		ev.queue_id = tx_queue;
+		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
+		pipeline_event_enqueue(dev, port, &ev);
+		w->processed_pkts++;
+	}
+
+	return 0;
+}
+
+static int
+pipeline_queue_worker_single_stage_burst_tx(void *arg)
+{
+	PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT;
+
+	while (t->done == false) {
+		uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev,
+				BURST_SIZE, 0);
+
+		if (!nb_rx) {
+			rte_pause();
+			continue;
+		}
+
+		for (i = 0; i < nb_rx; i++) {
+			rte_prefetch0(ev[i + 1].mbuf);
+			if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {
+
+				pipeline_tx_pkt(ev[i].mbuf);
+				ev[i].op = RTE_EVENT_OP_RELEASE;
+				w->processed_pkts++;
+			} else {
+				ev[i].queue_id++;
+				pipeline_fwd_event(&ev[i],
+						RTE_SCHED_TYPE_ATOMIC);
+			}
+		}
+
+		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
+	}
+
+	return 0;
+}
+
+static int
+pipeline_queue_worker_single_stage_burst_fwd(void *arg)
+{
+	PIPELINE_WROKER_SINGLE_STAGE_BURST_INIT;
+	const uint8_t tx_queue = t->tx_service.queue_id;
+
+	while (t->done == false) {
+		uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev,
+				BURST_SIZE, 0);
+
+		if (!nb_rx) {
+			rte_pause();
+			continue;
+		}
+
+		for (i = 0; i < nb_rx; i++) {
+			rte_prefetch0(ev[i + 1].mbuf);
+			ev[i].queue_id = tx_queue;
+			pipeline_fwd_event(&ev[i], RTE_SCHED_TYPE_ATOMIC);
+			w->processed_pkts++;
+		}
+
+		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
+	}
+
+	return 0;
+}
+
+
+static int
+pipeline_queue_worker_multi_stage_tx(void *arg)
+{
+	PIPELINE_WROKER_MULTI_STAGE_INIT;
+	const uint8_t nb_stages = t->opt->nb_stages + 1;
+
+	while (t->done == false) {
+		uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
+
+		if (!event) {
+			rte_pause();
+			continue;
+		}
+
+		cq_id = ev.queue_id % nb_stages;
+
+		if (cq_id >= last_queue) {
+			if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
+
+				pipeline_tx_pkt(ev.mbuf);
+				w->processed_pkts++;
+				continue;
+			}
+			ev.queue_id += (cq_id == last_queue) ? 1 : 0;
+			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
+		} else {
+			ev.queue_id++;
+			pipeline_fwd_event(&ev, sched_type_list[cq_id]);
+		}
+
+		pipeline_event_enqueue(dev, port, &ev);
+	}
+	return 0;
+}
+
+static int
+pipeline_queue_worker_multi_stage_fwd(void *arg)
+{
+	PIPELINE_WROKER_MULTI_STAGE_INIT;
+	const uint8_t nb_stages = t->opt->nb_stages + 1;
+	const uint8_t tx_queue = t->tx_service.queue_id;
+
+	while (t->done == false) {
+		uint16_t event = rte_event_dequeue_burst(dev, port, &ev, 1, 0);
+
+		if (!event) {
+			rte_pause();
+			continue;
+		}
+
+		cq_id = ev.queue_id % nb_stages;
+
+		if (cq_id == last_queue) {
+			ev.queue_id = tx_queue;
+			pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
+			w->processed_pkts++;
+		} else {
+			ev.queue_id++;
+			pipeline_fwd_event(&ev, sched_type_list[cq_id]);
+		}
+
+		pipeline_event_enqueue(dev, port, &ev);
+	}
+	return 0;
+}
+
+static int
+pipeline_queue_worker_multi_stage_burst_tx(void *arg)
+{
+	PIPELINE_WROKER_MULTI_STAGE_BURST_INIT;
+	const uint8_t nb_stages = t->opt->nb_stages + 1;
+
+	while (t->done == false) {
+		uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev,
+				BURST_SIZE, 0);
+
+		if (!nb_rx) {
+			rte_pause();
+			continue;
+		}
+
+		for (i = 0; i < nb_rx; i++) {
+			rte_prefetch0(ev[i + 1].mbuf);
+			cq_id = ev[i].queue_id % nb_stages;
+
+			if (cq_id >= last_queue) {
+				if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {
+
+					pipeline_tx_pkt(ev[i].mbuf);
+					ev[i].op = RTE_EVENT_OP_RELEASE;
+					w->processed_pkts++;
+					continue;
+				}
+
+				ev[i].queue_id += (cq_id == last_queue) ? 1 : 0;
+				pipeline_fwd_event(&ev[i],
+						RTE_SCHED_TYPE_ATOMIC);
+			} else {
+				ev[i].queue_id++;
+				pipeline_fwd_event(&ev[i],
+						sched_type_list[cq_id]);
+			}
+
+		}
+
+		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
+	}
+	return 0;
+}
+
+static int
+pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
+{
+	PIPELINE_WROKER_MULTI_STAGE_BURST_INIT;
+	const uint8_t nb_stages = t->opt->nb_stages + 1;
+	const uint8_t tx_queue = t->tx_service.queue_id;
+
+	while (t->done == false) {
+		uint16_t nb_rx = rte_event_dequeue_burst(dev, port, ev,
+				BURST_SIZE, 0);
+
+		if (!nb_rx) {
+			rte_pause();
+			continue;
+		}
+
+		for (i = 0; i < nb_rx; i++) {
+			rte_prefetch0(ev[i + 1].mbuf);
+			cq_id = ev[i].queue_id % nb_stages;
+
+			if (cq_id == last_queue) {
+				ev[i].queue_id = tx_queue;
+				pipeline_fwd_event(&ev[i],
+						RTE_SCHED_TYPE_ATOMIC);
+				w->processed_pkts++;
+			} else {
+				ev[i].queue_id++;
+				pipeline_fwd_event(&ev[i],
+						sched_type_list[cq_id]);
+			}
+		}
+
+		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
+	}
+	return 0;
+}
+
 static int
 worker_wrapper(void *arg)
 {
-	RTE_SET_USED(arg);
+	struct worker_data *w  = arg;
+	struct evt_options *opt = w->t->opt;
+	const bool burst = evt_has_burst_mode(w->dev_id);
+	const bool mt_safe = !w->t->mt_unsafe;
+	const uint8_t nb_stages = opt->nb_stages;
+	RTE_SET_USED(opt);
+
+	/* allow compiler to optimize */
+	if (nb_stages == 1) {
+		if (!burst && mt_safe)
+			return pipeline_queue_worker_single_stage_tx(arg);
+		else if (!burst && !mt_safe)
+			return pipeline_queue_worker_single_stage_fwd(arg);
+		else if (burst && mt_safe)
+			return pipeline_queue_worker_single_stage_burst_tx(arg);
+		else if (burst && !mt_safe)
+			return pipeline_queue_worker_single_stage_burst_fwd(
+					arg);
+	} else {
+		if (!burst && mt_safe)
+			return pipeline_queue_worker_multi_stage_tx(arg);
+		else if (!burst && !mt_safe)
+			return pipeline_queue_worker_multi_stage_fwd(arg);
+		else if (burst && mt_safe)
+			return pipeline_queue_worker_multi_stage_burst_tx(arg);
+		else if (burst && !mt_safe)
+			return pipeline_queue_worker_multi_stage_burst_fwd(arg);
+
+	}
 	rte_panic("invalid worker\n");
 }