[dpdk-dev,v5,1/3] examples/eventdev_pipeline: added sample app

Message ID 20170705052853.GA8031@jerin (mailing list archive)
State Not Applicable, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Jerin Jacob July 5, 2017, 5:30 a.m. UTC
  -----Original Message-----
> Date: Tue, 4 Jul 2017 08:55:25 +0100
> From: "Hunt, David" <david.hunt@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>  <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>  Thunderbird/45.8.0
> 
> Hi Jerin,

Hi David,

I have checked the v6. Some comments below.

> 
> 
> On 3/7/2017 4:57 AM, Jerin Jacob wrote:
> > -----Original Message-----
> > > From: Harry van Haaren<harry.van.haaren@intel.com>
> > > 
> > > This commit adds a sample app for the eventdev library.
> > > The app has been tested with DPDK 17.05-rc2, hence this
> > > release (or later) is recommended.
> > > 
> > > The sample app showcases a pipeline processing use-case,
> > > with event scheduling and processing defined per stage.
> > > The application receives traffic as normal, with each
> > > packet traversing the pipeline. Once the packet has
> > > been processed by each of the pipeline stages, it is
> > > transmitted again.
> > > 
> > > The app provides a framework to utilize cores for a single
> > > role or multiple roles. Examples of roles are the RX core,
> > > TX core, Scheduling core (in the case of the event/sw PMD),
> > > and worker cores.
> > > 
> > > Various flags are available to configure numbers of stages,
> > > cycles of work at each stage, type of scheduling, number of
> > > worker cores, queue depths etc. For a full explaination,
> > > please refer to the documentation.
> > A few comments on bugs and "to avoid the future rework on base code when
> > HW PMD is introduced". As we agreed, We will keep the functionality intact to
> > provide an application to test ethdev + eventdev with _SW PMD_ for 17.08
> > 
> 
> Sure OK. I will Address.
> 
> > > ---
> > >   examples/Makefile                   |   2 +
> > >   examples/eventdev_pipeline/Makefile |  49 ++
> > >   examples/eventdev_pipeline/main.c   | 999 ++++++++++++++++++++++++++++++++++++
> > >   3 files changed, 1050 insertions(+)
> > >   create mode 100644 examples/eventdev_pipeline/Makefile
> > >   create mode 100644 examples/eventdev_pipeline/main.c
> > Do we need to update the MAINTAINERS file?
> 
> Updated
> > > diff --git a/examples/Makefile b/examples/Makefile
> > > index 6298626..a6dcc2b 100644
> > > --- a/examples/Makefile
> > > +++ b/examples/Makefile
> > > @@ -100,4 +100,6 @@ $(info vm_power_manager requires libvirt >= 0.9.3)
> > >   endif
> > >   endif
> > > +DIRS-y += eventdev_pipeline
> > Can you change to eventdev_pipeline_sw_pmd to emphasis on the scope.
> > We will rename to eventdev_pipeline once it working effectively on both SW and HW
> > PMD with ethdev.
> 
> OK, I've updated the directory, app name and relevant docs across the board
> so they're all
> eventdev_pipeline_sw_pmd. This should make it clear to anyone using it that
> it's for the
> sw_pmd only, and an updated version will be provided later.
> 
> 
> > > +
> > >   include $(RTE_SDK)/mk/rte.extsubdir.mk
> > > diff --git a/examples/eventdev_pipeline/Makefile b/examples/eventdev_pipeline/Makefile
> > > new file mode 100644
> > > index 0000000..4c26e15
> > > --- /dev/null
> > > +++ b/examples/eventdev_pipeline/Makefile
> > > @@ -0,0 +1,49 @@
> > > +#   BSD LICENSE
> > > +#
> > > +#   Copyright(c) 2016 Intel Corporation. All rights reserved.
> > 2016-2017
> 
> Done.
> 
> > > +#
> > > +#   Redistribution and use in source and binary forms, with or without
> > > +#   modification, are permitted provided that the following conditions
> > > +#   are met:
> > > +#
> > > +
> > > +static unsigned int active_cores;
> > > +static unsigned int num_workers;
> > > +static long num_packets = (1L << 25); /* do ~32M packets */
> > > +static unsigned int num_fids = 512;
> > > +static unsigned int num_stages = 1;
> > > +static unsigned int worker_cq_depth = 16;

Any reason not move this to struct config_data?

> > > +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
> > > +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1};
> > > +static int16_t qid[MAX_NUM_STAGES] = {-1};
> > > +static int worker_cycles;

used in fastpath move to struct fastpath_data.

> > > +static int enable_queue_priorities;

struct config_data ?
> > > +
> > > +struct prod_data {
> > > +	uint8_t dev_id;
> > > +	uint8_t port_id;
> > > +	int32_t qid;
> > > +	unsigned int num_nic_ports;
> > > +} __rte_cache_aligned;
> > > +
> > > +struct cons_data {
> > > +	uint8_t dev_id;
> > > +	uint8_t port_id;
> > > +} __rte_cache_aligned;
> > > +
> > > +static struct prod_data prod_data;
> > > +static struct cons_data cons_data;
> > > +
> > > +struct worker_data {
> > > +	uint8_t dev_id;
> > > +	uint8_t port_id;
> > > +} __rte_cache_aligned;
> > > +
> > > +static unsigned int *enqueue_cnt;
> > > +static unsigned int *dequeue_cnt;
> > Not been consumed. Remove it.
> 
> Done.
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +
> > > +static inline void
> > > +work(struct rte_mbuf *m)
> > > +{
> > > +	struct ether_hdr *eth;
> > > +	struct ether_addr addr;
> > > +
> > > +	/* change mac addresses on packet (to use mbuf data) */
> > > +	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
> > > +	ether_addr_copy(&eth->d_addr, &addr);
> > > +	ether_addr_copy(&eth->s_addr, &eth->d_addr);
> > > +	ether_addr_copy(&addr, &eth->s_addr);
> > If it is even number of stages(say 2), Will mac swap be negated? as we are
> > swapping on each stage NOT in consumer?
> 
> The mac swap is just to touch the mbuf. It does not matter if it is negated.

Are you sure or I am missing something here? for example,If I add following piece
of code, irrespective number of stages it should send the same packet if
source packet is same. Right ?

But not happening as expected(Check the src and dest mac address)

stage == 1
00000000: 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |....'+....',..E.
00000010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 |................
00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr

stage == 2
00000000: 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |....',....'+..E.
00000010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 |................
00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr


Either fix the mac swap properly or remove it.


> > > +
> > > +	if (!quiet) {
> > > +		printf("\nPort Workload distribution:\n");
> > > +		uint32_t i;
> > > +		uint64_t tot_pkts = 0;
> > > +		uint64_t pkts_per_wkr[RTE_MAX_LCORE] = {0};
> > > +		for (i = 0; i < num_workers; i++) {
> > > +			char statname[64];
> > > +			snprintf(statname, sizeof(statname), "port_%u_rx",
> > > +					worker_data[i].port_id);
> > > +			pkts_per_wkr[i] = rte_event_dev_xstats_by_name_get(
> > > +					dev_id, statname, NULL);
> > As discussed, Check the the given xstat supported on the PMD first.
> 
> Checking has now been implemented. It'd done by calling
> rte_event_dev_xstats_by_name_get()
> and seeing if the result is -ENOTSUP. However there is a bug in the function
> in that it is declared
> as a uint64_t, but then returns a -ENOTSUP, so I have to cast the -ENOTSUP
> as a uint64_t for
> comparison. This will need to be fixed when the function is patched.
> 
>                         retval = rte_event_dev_xstats_by_name_get(
>                                         dev_id, statname, NULL);
>                         if (retval != (uint64_t)-ENOTSUP) {
>                                 pkts_per_wkr[i] =  retval;
>                                 tot_pkts += pkts_per_wkr[i];
>                         }
> 
> 
> 
> > > +			tot_pkts += pkts_per_wkr[i];
> > > +		}
> > > +		for (i = 0; i < num_workers; i++) {
> > > +			float pc = pkts_per_wkr[i]  * 100 /
> > > +				((float)tot_pkts);

This will print NAN.

How about, move the specific xstat as static inline function.
port_stat(...)
{
		char statname[64];
		uint64_t retval;
		snprintf(statname, sizeof(statname),"port_%u_rx",      
				worker_data[i].port_id);                
		retval = rte_event_dev_xstats_by_name_get(              
				dev_id, statname, NULL); 
}
and add check in the beginning of the "if" condition.
ie.

	if (!cdata.quiet && port_stat(,,) != (uint64_t)-ENOTSUP) {
		printf("\nPort Workload distribution:\n");
		...
	}


> > > +			printf("worker %i :\t%.1f %% (%"PRIu64" pkts)\n",
> > > +					i, pc, pkts_per_wkr[i]);
> > > +		}
> > > +
> > > +	}
> > > +
> > > +	return 0;
> > > +}
  

Comments

Hunt, David July 5, 2017, 11:15 a.m. UTC | #1
Hi Jerin,


On 5/7/2017 6:30 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Tue, 4 Jul 2017 08:55:25 +0100
>> From: "Hunt, David" <david.hunt@intel.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>>   <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
>> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
>> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>>   Thunderbird/45.8.0
>>
>> Hi Jerin,
> Hi David,
>
> I have checked the v6. Some comments below.
>
>>
>> On 3/7/2017 4:57 AM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>
--snip--
>>>> +#
>>>> +#   Redistribution and use in source and binary forms, with or without
>>>> +#   modification, are permitted provided that the following conditions
>>>> +#   are met:
>>>> +#
>>>> +
>>>> +static unsigned int active_cores;
>>>> +static unsigned int num_workers;
>>>> +static long num_packets = (1L << 25); /* do ~32M packets */
>>>> +static unsigned int num_fids = 512;
>>>> +static unsigned int num_stages = 1;
>>>> +static unsigned int worker_cq_depth = 16;
> Any reason not move this to struct config_data?

Sure. All vars now moved to either config_data or fastpath_data.

>>>> +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
>>>> +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1};
>>>> +static int16_t qid[MAX_NUM_STAGES] = {-1};
>>>> +static int worker_cycles;
> used in fastpath move to struct fastpath_data.

Performance drops by ~20% when I put these in fastpath_data. No 
performace drop when in config_data.
I think we need to leaving in config_data for now.

>>>> +static int enable_queue_priorities;
> struct config_data ?

Done.

>>>> +
>>>> +struct prod_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +	int32_t qid;
>>>> +	unsigned int num_nic_ports;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +struct cons_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +static struct prod_data prod_data;
>>>> +static struct cons_data cons_data;
>>>> +
>>>> +struct worker_data {
>>>> +	uint8_t dev_id;
>>>> +	uint8_t port_id;
>>>> +} __rte_cache_aligned;
>>>> +
>>>> +static unsigned int *enqueue_cnt;
>>>> +static unsigned int *dequeue_cnt;
>>> Not been consumed. Remove it.
>> Done.
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +
>>>> +static inline void
>>>> +work(struct rte_mbuf *m)
>>>> +{
>>>> +	struct ether_hdr *eth;
>>>> +	struct ether_addr addr;
>>>> +
>>>> +	/* change mac addresses on packet (to use mbuf data) */
>>>> +	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>> +	ether_addr_copy(&eth->d_addr, &addr);
>>>> +	ether_addr_copy(&eth->s_addr, &eth->d_addr);
>>>> +	ether_addr_copy(&addr, &eth->s_addr);
>>> If it is even number of stages(say 2), Will mac swap be negated? as we are
>>> swapping on each stage NOT in consumer?
>> The mac swap is just to touch the mbuf. It does not matter if it is negated.
> Are you sure or I am missing something here? for example,If I add following piece
> of code, irrespective number of stages it should send the same packet if
> source packet is same. Right ?
> But not happening as expected(Check the src and dest mac address)
>
> stage == 1
> 00000000: 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |....'+....',..E.
> 00000010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 |................
> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>
> stage == 2
> 00000000: 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |....',....'+..E.
> 00000010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 |................
> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>
>
> diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c
> index c62cba2..a7aaf37 100644
> --- a/examples/eventdev_pipeline_sw_pmd/main.c
> +++ b/examples/eventdev_pipeline_sw_pmd/main.c
> @@ -147,8 +147,9 @@ consumer(void)
>          if (start_time == 0)
> @@ -157,6 +158,7 @@ consumer(void)
>          received += n;
>          for (i = 0; i < n; i++) {
>                  uint8_t outport = packets[i].mbuf->port;
> +               rte_pktmbuf_dump(stdout, packets[i].mbuf, 64);
>                  rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport],
>                                  packets[i].mbuf);
>
> Either fix the mac swap properly or remove it.

Temporary fix added. Now reading in addr and writing it back without 
swapping. Not ideal,
will probably need more work in the future. Added a FIXME in the code 
with agreement from Jerin.

>
>>>> +
>>>> +	if (!quiet) {
>>>> +		printf("\nPort Workload distribution:\n");
>>>> +		uint32_t i;
>>>> +		uint64_t tot_pkts = 0;
>>>> +		uint64_t pkts_per_wkr[RTE_MAX_LCORE] = {0};
>>>> +		for (i = 0; i < num_workers; i++) {
>>>> +			char statname[64];
>>>> +			snprintf(statname, sizeof(statname), "port_%u_rx",
>>>> +					worker_data[i].port_id);
>>>> +			pkts_per_wkr[i] = rte_event_dev_xstats_by_name_get(
>>>> +					dev_id, statname, NULL);
>>> As discussed, Check the the given xstat supported on the PMD first.
>> Checking has now been implemented. It'd done by calling
>> rte_event_dev_xstats_by_name_get()
>> and seeing if the result is -ENOTSUP. However there is a bug in the function
>> in that it is declared
>> as a uint64_t, but then returns a -ENOTSUP, so I have to cast the -ENOTSUP
>> as a uint64_t for
>> comparison. This will need to be fixed when the function is patched.
>>
>>                          retval = rte_event_dev_xstats_by_name_get(
>>                                          dev_id, statname, NULL);
>>                          if (retval != (uint64_t)-ENOTSUP) {
>>                                  pkts_per_wkr[i] =  retval;
>>                                  tot_pkts += pkts_per_wkr[i];
>>                          }
>>
>>
>>
>>>> +			tot_pkts += pkts_per_wkr[i];
>>>> +		}
>>>> +		for (i = 0; i < num_workers; i++) {
>>>> +			float pc = pkts_per_wkr[i]  * 100 /
>>>> +				((float)tot_pkts);
> This will print NAN.
>
> How about, move the specific xstat as static inline function.
> port_stat(...)
> {
> 		char statname[64];
> 		uint64_t retval;
> 		snprintf(statname, sizeof(statname),"port_%u_rx",
> 				worker_data[i].port_id);
> 		retval = rte_event_dev_xstats_by_name_get(
> 				dev_id, statname, NULL);
> }
> and add check in the beginning of the "if" condition.
> ie.
>
> 	if (!cdata.quiet && port_stat(,,) != (uint64_t)-ENOTSUP) {
> 		printf("\nPort Workload distribution:\n");
> 		...
> 	}
>

Sure. Adding the port_stat simplifies a lot. Will do.

>>>> +			printf("worker %i :\t%.1f %% (%"PRIu64" pkts)\n",
>>>> +					i, pc, pkts_per_wkr[i]);
>>>> +		}
>>>> +
>>>> +	}
>>>> +
>>>> +	return 0;
>>>> +}
  
Jerin Jacob July 6, 2017, 3:31 a.m. UTC | #2
-----Original Message-----
> Date: Wed, 5 Jul 2017 12:15:51 +0100
> From: "Hunt, David" <david.hunt@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>  <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>  Thunderbird/45.8.0
> 
> Hi Jerin,
> 
> 
> On 5/7/2017 6:30 AM, Jerin Jacob wrote:
> > -----Original Message-----
> > > Date: Tue, 4 Jul 2017 08:55:25 +0100
> > > From: "Hunt, David" <david.hunt@intel.com>
> > > To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> > > CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
> > >   <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
> > > Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
> > > User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
> > >   Thunderbird/45.8.0
> > > 
> > > Hi Jerin,
> > Hi David,
> > 
> > I have checked the v6. Some comments below.
> > 
> > > 
> > > On 3/7/2017 4:57 AM, Jerin Jacob wrote:
> > > > -----Original Message-----
> > > > 
> --snip--
> > > > > +#
> > > > > +#   Redistribution and use in source and binary forms, with or without
> > > > > +#   modification, are permitted provided that the following conditions
> > > > > +#   are met:
> > > > > +#
> > > > > +
> > > > > +static unsigned int active_cores;
> > > > > +static unsigned int num_workers;
> > > > > +static long num_packets = (1L << 25); /* do ~32M packets */
> > > > > +static unsigned int num_fids = 512;
> > > > > +static unsigned int num_stages = 1;
> > > > > +static unsigned int worker_cq_depth = 16;
> > Any reason not move this to struct config_data?
> 
> Sure. All vars now moved to either config_data or fastpath_data.
> 
> > > > > +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
> > > > > +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1};
> > > > > +static int16_t qid[MAX_NUM_STAGES] = {-1};
> > > > > +static int worker_cycles;
> > used in fastpath move to struct fastpath_data.
> 
> Performance drops by ~20% when I put these in fastpath_data. No performace
> drop when in config_data.
> I think we need to leaving in config_data for now.

I checked v7 it looks to OK to merge. Can you fix following minor issue with
check patch and check-git-log.sh

check-git-log.sh
-----------------
Wrong headline lowercase:
	doc: add sw eventdev pipeline to sample app ug

### examples/eventdev_pipeline: added sample app

Note:
Change application to new name.

checkpatch.sh
-----------------

WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
using 'consumer', this function's name, in a string
#294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178:
+		printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, "

WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
using 'worker', this function's name, in a string
#453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337:
+		printf("  worker %u thread done. RX=%zu TX=%zu\n",

total: 0 errors, 2 warnings, 1078 lines checked

I will give pull request Thomas on Friday morning. I will include this change set
in the pull request.

Regarding the performance drop, Can you add __rte_cache_aligned on those
variable which creates regression in moving to rte_malloc area. The
cache line could be shared? If not fixing then its fine we will look into that latter.

and

Can you share your command to test for this regression, I will also try
on x86 box if I get time?


> 
> > > > > +static int enable_queue_priorities;
> > struct config_data ?
> 
> Done.
> 
> > > > > +
> > > > > +struct prod_data {
> > > > > +	uint8_t dev_id;
> > > > > +	uint8_t port_id;
> > > > > +	int32_t qid;
> > > > > +	unsigned int num_nic_ports;
> > > > > +} __rte_cache_aligned;
> > > > > +
> > > > > +struct cons_data {
> > > > > +	uint8_t dev_id;
> > > > > +	uint8_t port_id;
> > > > > +} __rte_cache_aligned;
> > > > > +
> > > > > +static struct prod_data prod_data;
> > > > > +static struct cons_data cons_data;
> > > > > +
> > > > > +struct worker_data {
> > > > > +	uint8_t dev_id;
> > > > > +	uint8_t port_id;
> > > > > +} __rte_cache_aligned;
> > > > > +
> > > > > +static unsigned int *enqueue_cnt;
> > > > > +static unsigned int *dequeue_cnt;
> > > > Not been consumed. Remove it.
> > > Done.
> > > > > +
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > > +
> > > > > +static inline void
> > > > > +work(struct rte_mbuf *m)
> > > > > +{
> > > > > +	struct ether_hdr *eth;
> > > > > +	struct ether_addr addr;
> > > > > +
> > > > > +	/* change mac addresses on packet (to use mbuf data) */
> > > > > +	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
> > > > > +	ether_addr_copy(&eth->d_addr, &addr);
> > > > > +	ether_addr_copy(&eth->s_addr, &eth->d_addr);
> > > > > +	ether_addr_copy(&addr, &eth->s_addr);
> > > > If it is even number of stages(say 2), Will mac swap be negated? as we are
> > > > swapping on each stage NOT in consumer?
> > > The mac swap is just to touch the mbuf. It does not matter if it is negated.
> > Are you sure or I am missing something here? for example,If I add following piece
> > of code, irrespective number of stages it should send the same packet if
> > source packet is same. Right ?
> > But not happening as expected(Check the src and dest mac address)
> > 
> > stage == 1
> > 00000000: 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |....'+....',..E.
> > 00000010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 |................
> > 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> > 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
> > 
> > stage == 2
> > 00000000: 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |....',....'+..E.
> > 00000010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 |................
> > 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
> > 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
> > 
> > 
> > diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c
> > index c62cba2..a7aaf37 100644
> > --- a/examples/eventdev_pipeline_sw_pmd/main.c
> > +++ b/examples/eventdev_pipeline_sw_pmd/main.c
> > @@ -147,8 +147,9 @@ consumer(void)
> >          if (start_time == 0)
> > @@ -157,6 +158,7 @@ consumer(void)
> >          received += n;
> >          for (i = 0; i < n; i++) {
> >                  uint8_t outport = packets[i].mbuf->port;
> > +               rte_pktmbuf_dump(stdout, packets[i].mbuf, 64);
> >                  rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport],
> >                                  packets[i].mbuf);
> > 
> > Either fix the mac swap properly or remove it.
> 
> Temporary fix added. Now reading in addr and writing it back without
> swapping. Not ideal,
> will probably need more work in the future. Added a FIXME in the code with
> agreement from Jerin.

On a agreement that, Before moving to generic application it has to be
fixed in SW PMD driver or if its specific behavior of SW PMD then it has
to abstracted in proper way in fastpath.
  
Hunt, David July 6, 2017, 10:04 a.m. UTC | #3
On 6/7/2017 4:31 AM, Jerin Jacob wrote:
> -----Original Message-----
>> Date: Wed, 5 Jul 2017 12:15:51 +0100
>> From: "Hunt, David" <david.hunt@intel.com>
>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>>   <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
>> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
>> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>>   Thunderbird/45.8.0
>>
>> Hi Jerin,
>>
>>
>> On 5/7/2017 6:30 AM, Jerin Jacob wrote:
>>> -----Original Message-----
>>>> Date: Tue, 4 Jul 2017 08:55:25 +0100
>>>> From: "Hunt, David" <david.hunt@intel.com>
>>>> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>>>>    <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
>>>> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
>>>> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>>>>    Thunderbird/45.8.0
>>>>
>>>> Hi Jerin,
>>> Hi David,
>>>
>>> I have checked the v6. Some comments below.
>>>
>>>> On 3/7/2017 4:57 AM, Jerin Jacob wrote:
>>>>> -----Original Message-----
>>>>>
>> --snip--
>>>>>> +#
>>>>>> +#   Redistribution and use in source and binary forms, with or without
>>>>>> +#   modification, are permitted provided that the following conditions
>>>>>> +#   are met:
>>>>>> +#
>>>>>> +
>>>>>> +static unsigned int active_cores;
>>>>>> +static unsigned int num_workers;
>>>>>> +static long num_packets = (1L << 25); /* do ~32M packets */
>>>>>> +static unsigned int num_fids = 512;
>>>>>> +static unsigned int num_stages = 1;
>>>>>> +static unsigned int worker_cq_depth = 16;
>>> Any reason not move this to struct config_data?
>> Sure. All vars now moved to either config_data or fastpath_data.
>>
>>>>>> +static int queue_type = RTE_EVENT_QUEUE_CFG_ATOMIC_ONLY;
>>>>>> +static int16_t next_qid[MAX_NUM_STAGES+1] = {-1};
>>>>>> +static int16_t qid[MAX_NUM_STAGES] = {-1};
>>>>>> +static int worker_cycles;
>>> used in fastpath move to struct fastpath_data.
>> Performance drops by ~20% when I put these in fastpath_data. No performace
>> drop when in config_data.
>> I think we need to leaving in config_data for now.
> I checked v7 it looks to OK to merge. Can you fix following minor issue with
> check patch and check-git-log.sh
>
> check-git-log.sh
> -----------------
> Wrong headline lowercase:
> 	doc: add sw eventdev pipeline to sample app ug

Will Do. Change sw to SW

> ### examples/eventdev_pipeline: added sample app

Will Do. Add _sw_pmd

> Note:
> Change application to new name.
>
> checkpatch.sh
> -----------------
>
> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
> using 'consumer', this function's name, in a string
> #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178:
> +		printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, "
>
> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
> using 'worker', this function's name, in a string
> #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337:
> +		printf("  worker %u thread done. RX=%zu TX=%zu\n",
>
> total: 0 errors, 2 warnings, 1078 lines checked

These are false positives. The text in the messages are not meant to be 
the function name.
If anything, I would prefer to change the function names to have " _thread"?

> I will give pull request Thomas on Friday morning. I will include this change set
> in the pull request.
>
> Regarding the performance drop, Can you add __rte_cache_aligned on those
> variable which creates regression in moving to rte_malloc area. The
> cache line could be shared? If not fixing then its fine we will look into that latter.

I will investigate and post a new patch in a few hours.

> and
>
> Can you share your command to test for this regression, I will also try
> on x86 box if I get time?

Sure. I'm using this:
./examples/eventdev_pipeline_sw_pmd/build/app/eventdev_pipeline_sw_pmd 
-c ff0  -w 0000:18:00.0 -w 0000:18:00.1 -w 0000:18:00.2 -w 0000:18:00.3 
--vdev=event_sw0 -- -s 4 -r 10 -t 10 -e 20 -w f00



>
>>>>>> +static int enable_queue_priorities;
>>> struct config_data ?
>> Done.
>>
>>>>>> +
>>>>>> +struct prod_data {
>>>>>> +	uint8_t dev_id;
>>>>>> +	uint8_t port_id;
>>>>>> +	int32_t qid;
>>>>>> +	unsigned int num_nic_ports;
>>>>>> +} __rte_cache_aligned;
>>>>>> +
>>>>>> +struct cons_data {
>>>>>> +	uint8_t dev_id;
>>>>>> +	uint8_t port_id;
>>>>>> +} __rte_cache_aligned;
>>>>>> +
>>>>>> +static struct prod_data prod_data;
>>>>>> +static struct cons_data cons_data;
>>>>>> +
>>>>>> +struct worker_data {
>>>>>> +	uint8_t dev_id;
>>>>>> +	uint8_t port_id;
>>>>>> +} __rte_cache_aligned;
>>>>>> +
>>>>>> +static unsigned int *enqueue_cnt;
>>>>>> +static unsigned int *dequeue_cnt;
>>>>> Not been consumed. Remove it.
>>>> Done.
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +
>>>>>> +static inline void
>>>>>> +work(struct rte_mbuf *m)
>>>>>> +{
>>>>>> +	struct ether_hdr *eth;
>>>>>> +	struct ether_addr addr;
>>>>>> +
>>>>>> +	/* change mac addresses on packet (to use mbuf data) */
>>>>>> +	eth = rte_pktmbuf_mtod(m, struct ether_hdr *);
>>>>>> +	ether_addr_copy(&eth->d_addr, &addr);
>>>>>> +	ether_addr_copy(&eth->s_addr, &eth->d_addr);
>>>>>> +	ether_addr_copy(&addr, &eth->s_addr);
>>>>> If it is even number of stages(say 2), Will mac swap be negated? as we are
>>>>> swapping on each stage NOT in consumer?
>>>> The mac swap is just to touch the mbuf. It does not matter if it is negated.
>>> Are you sure or I am missing something here? for example,If I add following piece
>>> of code, irrespective number of stages it should send the same packet if
>>> source packet is same. Right ?
>>> But not happening as expected(Check the src and dest mac address)
>>>
>>> stage == 1
>>> 00000000: 00 0F B7 11 27 2B 00 0F B7 11 27 2C 08 00 45 00 |....'+....',..E.
>>> 00000010: 00 2E 00 00 00 00 04 11 D5 B1 C0 A8 12 02 0E 01 |................
>>> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
>>> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>>>
>>> stage == 2
>>> 00000000: 00 0F B7 11 27 2C 00 0F B7 11 27 2B 08 00 45 00 |....',....'+..E.
>>> 00000010: 00 2E 00 00 00 00 04 11 D5 B0 C0 A8 12 03 0E 01 |................
>>> 00000020: 00 63 10 00 10 01 00 1A 00 00 61 62 63 64 65 66 |.c........abcdef
>>> 00000030: 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 |  |  |  |  | ghijklmnopqr
>>>
>>>
>>> diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c
>>> index c62cba2..a7aaf37 100644
>>> --- a/examples/eventdev_pipeline_sw_pmd/main.c
>>> +++ b/examples/eventdev_pipeline_sw_pmd/main.c
>>> @@ -147,8 +147,9 @@ consumer(void)
>>>           if (start_time == 0)
>>> @@ -157,6 +158,7 @@ consumer(void)
>>>           received += n;
>>>           for (i = 0; i < n; i++) {
>>>                   uint8_t outport = packets[i].mbuf->port;
>>> +               rte_pktmbuf_dump(stdout, packets[i].mbuf, 64);
>>>                   rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport],
>>>                                   packets[i].mbuf);
>>>
>>> Either fix the mac swap properly or remove it.
>> Temporary fix added. Now reading in addr and writing it back without
>> swapping. Not ideal,
>> will probably need more work in the future. Added a FIXME in the code with
>> agreement from Jerin.
> On a agreement that, Before moving to generic application it has to be
> fixed in SW PMD driver or if its specific behavior of SW PMD then it has
> to abstracted in proper way in fastpath.
>
  
Hunt, David July 6, 2017, 10:39 a.m. UTC | #4
On 6/7/2017 11:04 AM, Hunt, David wrote:
>
>
> On 6/7/2017 4:31 AM, Jerin Jacob wrote:
>
>> Note:
>> Change application to new name.
>>
>> checkpatch.sh
>> -----------------
>>
>> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
>> using 'consumer', this function's name, in a string
>> #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178:
>> +        printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, "
>>
>> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
>> using 'worker', this function's name, in a string
>> #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337:
>> +        printf("  worker %u thread done. RX=%zu TX=%zu\n",
>>
>> total: 0 errors, 2 warnings, 1078 lines checked
>
> These are false positives. The text in the messages are not meant to 
> be the function name.
> If anything, I would prefer to change the function names to have " 
> _thread"?
>


Or perhaps, better still, change the function names to verbs, i.e. 
produce() consume(), do_work().

Regards,
Dave.
  
Hunt, David July 6, 2017, 1:26 p.m. UTC | #5
Hi Jerin,

On 6/7/2017 11:04 AM, Hunt, David wrote:
>
> On 6/7/2017 4:31 AM, Jerin Jacob wrote:
>> -----Original Message-----
>>>
--snip--
>> I checked v7 it looks to OK to merge. Can you fix following minor 
>> issue with
>> check patch and check-git-log.sh
>>
>> check-git-log.sh
>> -----------------
>> Wrong headline lowercase:
>>     doc: add sw eventdev pipeline to sample app ug
>
> Will Do. Change sw to SW
>
>> ### examples/eventdev_pipeline: added sample app
>
> Will Do. Add _sw_pmd
>

Both of these will be in next patch.

>> Note:
>> Change application to new name.
>>
>> checkpatch.sh
>> -----------------
>>
>> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
>> using 'consumer', this function's name, in a string
>> #294: FILE: examples/eventdev_pipeline_sw_pmd/main.c:178:
>> +        printf("# consumer RX=%"PRIu64", time %"PRIu64 "ms, "
>>
>> WARNING:EMBEDDED_FUNCTION_NAME: Prefer using '"%s...", __func__' to
>> using 'worker', this function's name, in a string
>> #453: FILE: examples/eventdev_pipeline_sw_pmd/main.c:337:
>> +        printf("  worker %u thread done. RX=%zu TX=%zu\n",
>>
>> total: 0 errors, 2 warnings, 1078 lines checked
>
> These are false positives. The text in the messages are not meant to 
> be the function name.
> If anything, I would prefer to change the function names to have " 
> _thread"?
>

Having looked at this a bit more, and unable to reproduce with my 
original kernel version checkpatch, and the patchwork version does not 
show, and the 4.11.9 stable kernel version does not show, I suggest we 
mark these down as false positives, as the string is not intended to 
show the function name.


>> I will give pull request Thomas on Friday morning. I will include 
>> this change set
>> in the pull request.
>>
>> Regarding the performance drop, Can you add __rte_cache_aligned on those
>> variable which creates regression in moving to rte_malloc area. The
>> cache line could be shared? If not fixing then its fine we will look 
>> into that latter.
>
> I will investigate and post a new patch in a few hours.
>

Of the 4 variables I am attempting to move into fastpath structure, no 
matter whether I move them one at a time or all at once, with 
__rte_cache_align or not, I still see a significant performance 
degradation. I suggest looking into this later.

I will push a patch in the next couple of hours with the first two 
changes mentioned above.  OK with you?

Regards,
Dave.
  
Jerin Jacob July 6, 2017, 1:38 p.m. UTC | #6
-----Original Message-----
> Date: Thu, 6 Jul 2017 14:26:47 +0100
> From: "Hunt, David" <david.hunt@intel.com>
> To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> CC: dev@dpdk.org, harry.van.haaren@intel.com, Gage Eads
>  <gage.eads@intel.com>, Bruce Richardson <bruce.richardson@intel.com>
> Subject: Re: [PATCH v5 1/3] examples/eventdev_pipeline: added sample app
> User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:45.0) Gecko/20100101
>  Thunderbird/45.8.0
> 
> Hi Jerin,
> 
> On 6/7/2017 11:04 AM, Hunt, David wrote:
> > 
> > On 6/7/2017 4:31 AM, Jerin Jacob wrote:
> 
> Having looked at this a bit more, and unable to reproduce with my original
> kernel version checkpatch, and the patchwork version does not show, and the
> 4.11.9 stable kernel version does not show, I suggest we mark these down as
> false positives, as the string is not intended to show the function name.
> 
> 
> > > I will give pull request Thomas on Friday morning. I will include
> > > this change set
> > > in the pull request.
> > > 
> > > Regarding the performance drop, Can you add __rte_cache_aligned on those
> > > variable which creates regression in moving to rte_malloc area. The
> > > cache line could be shared? If not fixing then its fine we will look
> > > into that latter.
> > 
> > I will investigate and post a new patch in a few hours.
> > 
> 
> Of the 4 variables I am attempting to move into fastpath structure, no
> matter whether I move them one at a time or all at once, with
> __rte_cache_align or not, I still see a significant performance degradation.
> I suggest looking into this later.
> 
> I will push a patch in the next couple of hours with the first two changes
> mentioned above.  OK with you?


OK.


> 
> Regards,
> Dave.
> 
> 
> 
>
  

Patch

diff --git a/examples/eventdev_pipeline_sw_pmd/main.c b/examples/eventdev_pipeline_sw_pmd/main.c
index c62cba2..a7aaf37 100644
--- a/examples/eventdev_pipeline_sw_pmd/main.c
+++ b/examples/eventdev_pipeline_sw_pmd/main.c
@@ -147,8 +147,9 @@  consumer(void)
        if (start_time == 0)
@@ -157,6 +158,7 @@  consumer(void)
        received += n;
        for (i = 0; i < n; i++) {
                uint8_t outport = packets[i].mbuf->port;
+               rte_pktmbuf_dump(stdout, packets[i].mbuf, 64);
                rte_eth_tx_buffer(outport, 0, fdata->tx_buf[outport],
                                packets[i].mbuf);