[dpdk-dev,v7,5/6] lib: added new library for latency stats

Message ID 1484583573-30163-6-git-send-email-remy.horton@intel.com (mailing list archive)
State Superseded, archived
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel compilation success Compilation OK

Commit Message

Remy Horton Jan. 16, 2017, 4:19 p.m. UTC
  From: Reshma Pattan <reshma.pattan@intel.com>

Add a library designed to calculate latency statistics and report them
to the application when queried. The library measures minimum, average and
maximum latencies, and jitter in nano seconds. The current implementation
supports global latency stats, i.e. per application stats.

Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
 MAINTAINERS                                        |   4 +
 config/common_base                                 |   5 +
 doc/api/doxy-api-index.md                          |   1 +
 doc/api/doxy-api.conf                              |   1 +
 doc/guides/rel_notes/release_17_02.rst             |   5 +
 lib/Makefile                                       |   1 +
 lib/librte_latencystats/Makefile                   |  57 +++
 lib/librte_latencystats/rte_latencystats.c         | 389 +++++++++++++++++++++
 lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
 .../rte_latencystats_version.map                   |  10 +
 lib/librte_mbuf/rte_mbuf.h                         |   3 +
 mk/rte.app.mk                                      |   2 +-
 12 files changed, 623 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_latencystats/Makefile
 create mode 100644 lib/librte_latencystats/rte_latencystats.c
 create mode 100644 lib/librte_latencystats/rte_latencystats.h
 create mode 100644 lib/librte_latencystats/rte_latencystats_version.map
  

Comments

Jerin Jacob Jan. 17, 2017, 4:29 a.m. UTC | #1
On Mon, Jan 16, 2017 at 04:19:32PM +0000, Remy Horton wrote:
> From: Reshma Pattan <reshma.pattan@intel.com>
> 
> Add a library designed to calculate latency statistics and report them
> to the application when queried. The library measures minimum, average and
> maximum latencies, and jitter in nano seconds. The current implementation
> supports global latency stats, i.e. per application stats.
> 
> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> ---
>  MAINTAINERS                                        |   4 +
>  config/common_base                                 |   5 +
>  doc/api/doxy-api-index.md                          |   1 +
>  doc/api/doxy-api.conf                              |   1 +
>  doc/guides/rel_notes/release_17_02.rst             |   5 +
>  lib/Makefile                                       |   1 +
>  lib/librte_latencystats/Makefile                   |  57 +++
>  lib/librte_latencystats/rte_latencystats.c         | 389 +++++++++++++++++++++
>  lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
>  .../rte_latencystats_version.map                   |  10 +
>  lib/librte_mbuf/rte_mbuf.h                         |   3 +

It is a value added feature for DPDK. But what is the plan for incorporating
the mbuf change? I have 8 month old mbuf change for ARM for natural
alignment. If we are accepting any mbuf change then we need to include
outstanding mbuf changes to avoid future ABI breakage.

http://dpdk.org/dev/patchwork/patch/12878/

Jerin
  
Remy Horton Jan. 17, 2017, 6:48 a.m. UTC | #2
On 17/01/2017 04:29, Jerin Jacob wrote:
[..]
> It is a value added feature for DPDK. But what is the plan for incorporating
> the mbuf change? I have 8 month old mbuf change for ARM for natural
> alignment. If we are accepting any mbuf change then we need to include
> outstanding mbuf changes to avoid future ABI breakage.
>
> http://dpdk.org/dev/patchwork/patch/12878/

I know there's some discussion going on in the background regarding 
this. I've yet to hear a definite answer myself..

..Remy
  
Jerin Jacob Jan. 17, 2017, 7:35 a.m. UTC | #3
On Tue, Jan 17, 2017 at 06:48:30AM +0000, Remy Horton wrote:
> 
> On 17/01/2017 04:29, Jerin Jacob wrote:
> [..]
> > It is a value added feature for DPDK. But what is the plan for incorporating
> > the mbuf change? I have 8 month old mbuf change for ARM for natural
> > alignment. If we are accepting any mbuf change then we need to include
> > outstanding mbuf changes to avoid future ABI breakage.
> > 
> > http://dpdk.org/dev/patchwork/patch/12878/
> 
> I know there's some discussion going on in the background regarding this.
> I've yet to hear a definite answer myself..

This was the last thread on this topic
http://dpdk.org/ml/archives/dev/2016-July/043222.html

Where Oliver want to group a few of the mbuf changes together.
Thats is good. But, looking at the history(holding a ARM specific patch for 8
months), I don't believe, we will get consensus on _all_ the items on
mbuf change like ports, m->next etc.
I think we had consensus on my change(making mbuf natural
aligned), but it was queued for grouping with other mbuf changes.

Jerin

> 
> ..Remy
  
John McNamara Jan. 17, 2017, 11:19 a.m. UTC | #4
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> Sent: Tuesday, January 17, 2017 4:30 AM
> To: Horton, Remy <remy.horton@intel.com>
> Cc: dev@dpdk.org; Pattan, Reshma <reshma.pattan@intel.com>; Thomas
> Monjalon <thomas.monjalon@6wind.com>
> Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency
> stats
> 
> On Mon, Jan 16, 2017 at 04:19:32PM +0000, Remy Horton wrote:
> > From: Reshma Pattan <reshma.pattan@intel.com>
> >
> > Add a library designed to calculate latency statistics and report them
> > to the application when queried. The library measures minimum, average
> > and maximum latencies, and jitter in nano seconds. The current
> > implementation supports global latency stats, i.e. per application
> stats.
> >
> > Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> > Signed-off-by: Remy Horton <remy.horton@intel.com>
> > ---
> >  MAINTAINERS                                        |   4 +
> >  config/common_base                                 |   5 +
> >  doc/api/doxy-api-index.md                          |   1 +
> >  doc/api/doxy-api.conf                              |   1 +
> >  doc/guides/rel_notes/release_17_02.rst             |   5 +
> >  lib/Makefile                                       |   1 +
> >  lib/librte_latencystats/Makefile                   |  57 +++
> >  lib/librte_latencystats/rte_latencystats.c         | 389
> +++++++++++++++++++++
> >  lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
> >  .../rte_latencystats_version.map                   |  10 +
> >  lib/librte_mbuf/rte_mbuf.h                         |   3 +
> 
> It is a value added feature for DPDK. But what is the plan for
> incorporating the mbuf change? I have 8 month old mbuf change for ARM for
> natural alignment. If we are accepting any mbuf change then we need to
> include outstanding mbuf changes to avoid future ABI breakage.
> 
> http://dpdk.org/dev/patchwork/patch/12878/
> 

Hi Jerin,

As far as I know the plan was to reach some sort of consensus on the mbuf
structure at the DPDK Userspace 2016, during and after Olivier's
presentation and then to make those changes during 17.02.

However, I believe Olivier had other work commitments in this release and
wasn't able to work on the mbuf changes.

The above mbuf change (and addition at the end of the struct) should
have gone into that mbuf rework, along with your changes.

However, since the mbuf rework didn't happen we need to add the field in
this release.

I guess the difference between the above change and your change is that
the latter is more complex and potentially affect performance, and as such
makes more sense as part of a rework.

Perhaps we, as a community, should commit to the mbuf rework in 17.05
and make sure it gets done.

John
  
Van Haaren, Harry Jan. 17, 2017, 11:41 a.m. UTC | #5
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Remy Horton
> Sent: Monday, January 16, 2017 4:20 PM
> To: dev@dpdk.org
> Cc: Pattan, Reshma <reshma.pattan@intel.com>; Thomas Monjalon <thomas.monjalon@6wind.com>
> Subject: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency stats
> 
> From: Reshma Pattan <reshma.pattan@intel.com>
> 
> Add a library designed to calculate latency statistics and report them
> to the application when queried. The library measures minimum, average and
> maximum latencies, and jitter in nano seconds. The current implementation
> supports global latency stats, i.e. per application stats.
> 
> Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> ---

This patch addresses a few changes - and should probably be split out a bit more than the v6 -> v7 changes done. In particular, the mbuf change should be its own commit to make it visible in the git log.

Detailed comments inline.


> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6cd9896..0a41fe5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -714,3 +714,7 @@ F: examples/tep_termination/
>  F: examples/vmdq/
>  F: examples/vmdq_dcb/
>  F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
> +
> +Latency Stats
> +M: Reshma Pattan <reshma.pattan@intel.com>
> +F: lib/librte_latencystats/

Wrong section? This should be added to the "other libraries" I think.


> +++ b/lib/librte_latencystats/Makefile
> @@ -0,0 +1,57 @@
> +#   BSD LICENSE
> +#
> +#   Copyright(c) 2016 Intel Corporation. All rights reserved.
> +#   All rights reserved.

-2017

> +++ b/lib/librte_latencystats/rte_latencystats.c
> @@ -0,0 +1,389 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.

-2017

> +
> +/** Nano seconds per second */
> +#define NS_PER_SEC 1E9
> +
> +/** Clock cycles per nano second */
> +#define CYCLES_PER_NS (rte_get_timer_hz() / NS_PER_SEC)

Concidering this calls a function, it should probably be a
static inline function instead of a #define, so it is obvious
this cannot be used for static initialization.

> +
> +/* Macros for printing using RTE_LOG */
> +#define RTE_LOGTYPE_LATENCY_STATS RTE_LOGTYPE_USER1
> +
> +static pthread_t latency_stats_thread;

Concerned about the pthread - see below.

> +static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";
> +static int latency_stats_index;
> +static uint64_t samp_intvl;
> +static uint64_t timer_tsc;
> +static uint64_t prev_tsc;
> +
> +static struct rte_latency_stats {
> +	float min_latency; /**< Minimum latency in nano seconds */
> +	float avg_latency; /**< Average latency in nano seconds */
> +	float max_latency; /**< Maximum latency in nano seconds */
> +	float jitter; /** Latency variation */
> +} *glob_stats;

Style guide advises against this type of initialization IIRC;
http://dpdk.org/doc/guides/contributing/coding_style.html#structure-declarations

Its not very clear, but I do think a separate line makes the variable declaration on its own line more obvious:

static struct rte_latency_stats *glob_stats;

> +
> +static struct rxtx_cbs {
> +	struct rte_eth_rxtx_callback *cb;
> +} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
> +	tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];

Same as previous comment - this struct looks very confusing to me :)
Please declare explicitly on own lines:

static struct rxtx_cbs {
    struct rte_eth_rxtx_callback *cb;
};

static struct rxtx_cbs rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
static struct rxtx_cbs tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];

> +
> +struct latency_stats_nameoff {
> +	char name[RTE_ETH_XSTATS_NAME_SIZE];
> +	unsigned int offset;
> +};
> +
> +static const struct latency_stats_nameoff lat_stats_strings[] = {
> +	{"min_latency_ns", offsetof(struct rte_latency_stats, min_latency)},
> +	{"avg_latency_ns", offsetof(struct rte_latency_stats, avg_latency)},
> +	{"max_latency_ns", offsetof(struct rte_latency_stats, max_latency)},
> +	{"jitter_ns", offsetof(struct rte_latency_stats, jitter)},
> +};
> +
> +#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \
> +				sizeof(lat_stats_strings[0]))
> +
> +static __attribute__((noreturn)) void *
> +report_latency_stats(__rte_unused void *arg)
> +{
> +	for (;;) {
> +		unsigned int i;
> +		float *stats_ptr = NULL;
> +		uint64_t values[NUM_LATENCY_STATS] = {0};
> +		int ret;
> +
> +		for (i = 0; i < NUM_LATENCY_STATS; i++) {
> +			stats_ptr = RTE_PTR_ADD(glob_stats,
> +					lat_stats_strings[i].offset);
> +			values[i] = (uint64_t)floor((*stats_ptr)/
> +					CYCLES_PER_NS);
> +		}
> +
> +		ret = rte_metrics_update_metrics(RTE_METRICS_GLOBAL,
> +						latency_stats_index,
> +						values, NUM_LATENCY_STATS);
> +		if (ret < 0)
> +			RTE_LOG(INFO, LATENCY_STATS,
> +				"Failed to push the stats\n");
> +	}
> +}

This function is strange - the library creates a pthread and then calls this function internally?
This does not look like a good design from my point-of-view.

The Timer library may be of use for inspiration, it requires the application polls a function to handle expired timers:
void rte_timer_manage().


> +static void
> +rte_latencystats_fill_values(struct rte_metric_value *values)
> +{
> +	unsigned int i;
> +	float *stats_ptr = NULL;
> +
> +	for (i = 0; i < NUM_LATENCY_STATS; i++) {
> +		stats_ptr = RTE_PTR_ADD(glob_stats,
> +				lat_stats_strings[i].offset);
> +		values[i].key = i;
> +		values[i].value = (uint64_t)floor((*stats_ptr)/
> +						CYCLES_PER_NS);
> +	}
> +}
> +
> +static uint16_t
> +add_time_stamps(uint8_t pid __rte_unused,
> +		uint16_t qid __rte_unused,
> +		struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		uint16_t max_pkts __rte_unused,
> +		void *user_cb __rte_unused)
> +{
> +	unsigned int i;
> +	uint64_t diff_tsc, now;
> +
> +	/*
> +	 * For every sample interval,
> +	 * time stamp is marked on one received packet.
> +	 */
> +	now = rte_rdtsc();
> +	for (i = 0; i < nb_pkts; i++) {
> +		diff_tsc = now - prev_tsc;
> +		timer_tsc += diff_tsc;
> +		if (timer_tsc >= samp_intvl) {
> +			/*
> +			 * TBD: Mark the timestamp only
> +			 * if not already marked by the
> +			 * hardware or the PMD.
> +			 */
> +			pkts[i]->timestamp = now;
> +			timer_tsc = 0;
> +		}
> +		prev_tsc = now;
> +		now = rte_rdtsc();
> +	}
> +
> +	return nb_pkts;
> +}
> +
> +static uint16_t
> +calc_latency(uint8_t pid __rte_unused,
> +		uint16_t qid __rte_unused,
> +		struct rte_mbuf **pkts,
> +		uint16_t nb_pkts,
> +		void *_ __rte_unused)
> +{
> +	unsigned int i, cnt = 0;
> +	uint64_t now;
> +	float latency[nb_pkts];
> +	static float prev_latency;
> +	/*
> +	 * Alpha represents degree of weighting decrease in EWMA,
> +	 * a constant smoothing factor between 0 and 1. The value
> +	 * is used below for measuring average latency.
> +	 */
> +	const float alpha = 0.2;
> +
> +	now = rte_rdtsc();
> +	for (i = 0; i < nb_pkts; i++) {
> +		if (pkts[i]->timestamp)
> +			latency[cnt++] = now - pkts[i]->timestamp;
> +	}
> +
> +	for (i = 0; i < cnt; i++) {
> +		/*
> +		 * The jitter is calculated as statistical mean of interpacket
> +		 * delay variation. The "jitter estimate" is computed by taking
> +		 * the absolute values of the ipdv sequence and applying an
> +		 * exponential filter with parameter 1/16 to generate the
> +		 * estimate. i.e J=J+(|D(i-1,i)|-J)/16. Where J is jitter,
> +		 * D(i-1,i) is difference in latency of two consecutive packets
> +		 * i-1 and i.
> +		 * Reference: Calculated as per RFC 5481, sec 4.1,
> +		 * RFC 3393 sec 4.5, RFC 1889 sec.
> +		 */
> +		glob_stats->jitter +=  (abs(prev_latency - latency[i])
> +					- glob_stats->jitter)/16;
> +		if (glob_stats->min_latency == 0)
> +			glob_stats->min_latency = latency[i];
> +		else if (latency[i] < glob_stats->min_latency)
> +			glob_stats->min_latency = latency[i];
> +		else if (latency[i] > glob_stats->max_latency)
> +			glob_stats->max_latency = latency[i];
> +		/*
> +		 * The average latency is measured using exponential moving
> +		 * average, i.e. using EWMA
> +		 * https://en.wikipedia.org/wiki/Moving_average
> +		 */
> +		glob_stats->avg_latency +=
> +			alpha * (latency[i] - glob_stats->avg_latency);
> +		prev_latency = latency[i];
> +	}
> +
> +	return nb_pkts;
> +}
> +
> +int
> +rte_latencystats_init(uint64_t samp_intvl,
> +		rte_latency_stats_flow_type_fn user_cb)
> +{
> +	unsigned int i;
> +	uint8_t pid;
> +	uint16_t qid;
> +	struct rxtx_cbs *cbs = NULL;
> +	const uint8_t nb_ports = rte_eth_dev_count();
> +	const char *ptr_strings[NUM_LATENCY_STATS] = {0};
> +	const struct rte_memzone *mz = NULL;
> +	const unsigned int flags = 0;
> +
> +	/** Allocate stats in shared memory fo muliti process support */
> +	mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats),
> +					rte_socket_id(), flags);
> +	if (mz == NULL) {
> +		RTE_LOG(ERR, LATENCY_STATS, "Cannot reserve memory: %s:%d\n",
> +			__func__, __LINE__);
> +		return -ENOMEM;
> +	}

There is no check for double-initialization of the library - although it is an application error to initialize twice, the library should handle it.

> +
> +	glob_stats = mz->addr;
> +	samp_intvl *= CYCLES_PER_NS;
> +
> +	/** Register latency stats with stats library */
> +	for (i = 0; i < NUM_LATENCY_STATS; i++)
> +		ptr_strings[i] = lat_stats_strings[i].name;
> +
> +	latency_stats_index = rte_metrics_reg_metrics(ptr_strings,
> +							NUM_LATENCY_STATS);
> +	if (latency_stats_index < 0) {
> +		RTE_LOG(DEBUG, LATENCY_STATS,
> +			"Failed to register latency stats names\n");
> +		return -1;
> +	}
> +
> +	/** Register Rx/Tx callbacks */
> +	for (pid = 0; pid < nb_ports; pid++) {
> +		struct rte_eth_dev_info dev_info;
> +		rte_eth_dev_info_get(pid, &dev_info);
> +		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
> +			cbs = &rx_cbs[pid][qid];
> +			cbs->cb = rte_eth_add_first_rx_callback(pid, qid,
> +					add_time_stamps, user_cb);
> +			if (!cbs->cb)
> +				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
> +					"register Rx callback for pid=%d, "
> +					"qid=%d\n", pid, qid);
> +		}
> +		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
> +			cbs = &tx_cbs[pid][qid];
> +			cbs->cb =  rte_eth_add_tx_callback(pid, qid,
> +					calc_latency, user_cb);
> +			if (!cbs->cb)
> +				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
> +					"register Tx callback for pid=%d, "
> +					"qid=%d\n", pid, qid);
> +		}
> +	}
> +
> +	int ret = 0;
> +	char thread_name[RTE_MAX_THREAD_NAME_LEN];
> +
> +	/** Create the host thread to update latency stats to stats library */
> +	ret = pthread_create(&latency_stats_thread, NULL, report_latency_stats,
> +				NULL);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, LATENCY_STATS,
> +			"Failed to create the latency stats thread:%s, %s:%d\n",
> +			strerror(errno), __func__, __LINE__);
> +		return -1;
> +	}

This pthread_create() raises some flags, spawning a thread "behind the applications back" isn't a good plan. We should require a thread poll the latency stats, not create a thread ourselves. See note above on using an application thread.


> +	/** Set thread_name for aid in debugging */
> +	snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "latency-stats-thread");
> +	ret = rte_thread_setname(latency_stats_thread, thread_name);
> +	if (ret != 0)
> +		RTE_LOG(DEBUG, LATENCY_STATS,
> +			"Failed to set thread name for latency stats handling\n");
> +
> +	return 0;
> +}
> +
> +int
> +rte_latencystats_uninit(void)
> +{
> +	uint8_t pid;
> +	uint16_t qid;
> +	int ret = 0;
> +	struct rxtx_cbs *cbs = NULL;
> +	const uint8_t nb_ports = rte_eth_dev_count();
> +
> +	/** De register Rx/Tx callbacks */
> +	for (pid = 0; pid < nb_ports; pid++) {
> +		struct rte_eth_dev_info dev_info;
> +		rte_eth_dev_info_get(pid, &dev_info);
> +		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
> +			cbs = &rx_cbs[pid][qid];
> +			ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb);
> +			if (ret)
> +				RTE_LOG(INFO, LATENCY_STATS, "failed to "
> +					"remove Rx callback for pid=%d, "
> +					"qid=%d\n", pid, qid);
> +		}
> +		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
> +			cbs = &tx_cbs[pid][qid];
> +			ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb);
> +			if (ret)
> +				RTE_LOG(INFO, LATENCY_STATS, "failed to "
> +					"remove Tx callback for pid=%d, "
> +					"qid=%d\n", pid, qid);
> +		}
> +	}
> +
> +	/** Cancel the thread */
> +	ret = pthread_cancel(latency_stats_thread);
> +	if (ret != 0) {
> +		RTE_LOG(ERR, LATENCY_STATS,
> +			"Failed to cancel latency stats update thread:"
> +			"%s,%s:%d\n",
> +			strerror(errno), __func__, __LINE__);
> +		return -1;
> +	}
> +
> +	return 0;
> +}
> +
> +int
> +rte_latencystats_get_names(struct rte_metric_name *names, uint16_t size)
> +{
> +	unsigned int i;
> +
> +	if (names == NULL || size < NUM_LATENCY_STATS)
> +		return NUM_LATENCY_STATS;
> +
> +	for (i = 0; i < NUM_LATENCY_STATS; i++)
> +		snprintf(names[i].name, sizeof(names[i].name),
> +				"%s", lat_stats_strings[i].name);
> +
> +	return NUM_LATENCY_STATS;
> +}
> +
> +int
> +rte_latencystats_get(struct rte_metric_value *values, uint16_t size)
> +{
> +	if (size < NUM_LATENCY_STATS || values == NULL)
> +		return NUM_LATENCY_STATS;
> +
> +	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
> +		const struct rte_memzone *mz;
> +		mz = rte_memzone_lookup(MZ_RTE_LATENCY_STATS);
> +		if (mz == NULL) {
> +			RTE_LOG(ERR, LATENCY_STATS,
> +				"Latency stats memzone not found\n");
> +			return -ENOMEM;
> +		}
> +		glob_stats =  mz->addr;
> +	}
> +
> +	/* Retrieve latency stats */
> +	rte_latencystats_fill_values(values);
> +
> +	return NUM_LATENCY_STATS;
> +}
> diff --git a/lib/librte_latencystats/rte_latencystats.h
> b/lib/librte_latencystats/rte_latencystats.h
> new file mode 100644
> index 0000000..405b878
> --- /dev/null
> +++ b/lib/librte_latencystats/rte_latencystats.h
> @@ -0,0 +1,146 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> + *   All rights reserved.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_LATENCYSTATS_H_
> +#define _RTE_LATENCYSTATS_H_
> +
> +/**
> + * @file
> + * RTE latency stats
> + *
> + * library to provide application and flow based latency stats.
> + */
> +
> +#include <rte_metrics.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +/**
> + *  Note: This function pointer is for future flow based latency stats
> + *  implementation.
> + *
> + * Function type used for identifting flow types of a Rx packet.
> + *
> + * The callback function is called on Rx for each packet.
> + * This function is used for flow based latency calculations.
> + *
> + * @param pkt
> + *   Packet that has to be identified with its flow types.
> + * @param user_param
> + *   The arbitrary user parameter passed in by the application when
> + *   the callback was originally configured.
> + * @return
> + *   The flow_mask, representing the multiple flow types of a packet.
> + */
> +typedef uint16_t (*rte_latency_stats_flow_type_fn)(struct rte_mbuf *pkt,
> +							void *user_param);
> +
> +/**
> + *  Registers Rx/Tx callbacks for each active port, queue.
> + *
> + * @param samp_intvl
> + *  Sampling time period in nano seconds, at which packet
> + *  should be marked with time stamp.
> + * @param user_cb
> + *  Note: This param is for future flow based latency stats
> + *  implementation.
> + *  User callback to be called to get flow types of a packet.
> + *  Used for flow based latency calculation.
> + *  If the value is NULL, global stats will be calculated,
> + *  else flow based latency stats will be calculated.
> + *  For now just pass on the NULL value to this param.
> + *  @return
> + *   -1     : On error
> + *   -ENOMEM: On error
> + *    0     : On success
> + */
> +int rte_latencystats_init(uint64_t samp_intvl,
> +			rte_latency_stats_flow_type_fn user_cb);
> +
> +/**
> + *  Removes registered Rx/Tx callbacks for each active port, queue.
> + *
> + *  @return
> + *   -1: On error
> + *    0: On success
> + */
> +int rte_latencystats_uninit(void);
> +
> +/**
> + * Retrieve names of latency statistics
> + *
> + * @param names
> + *  Block of memory to insert names into. Must be at least size in capacity.
> + *  If set to NULL, function returns required capacity.
> + * @param size
> + *  Capacity of latency stats names (number of names).
> + * @return
> + *   - positive value lower or equal to size: success. The return value
> + *     is the number of entries filled in the stats table.
> + *   - positive value higher than size: error, the given statistics table
> + *     is too small. The return value corresponds to the size that should
> + *     be given to succeed. The entries in the table are not valid and
> + *     shall not be used by the caller.
> + */
> +int rte_latencystats_get_names(struct rte_metric_name *names,
> +				uint16_t size);
> +
> +/**
> + * Retrieve latency statistics.
> + *
> + * @param values
> + *   A pointer to a table of structure of type *rte_metric_value*
> + *   to be filled with latency statistics ids and values.
> + *   This parameter can be set to NULL if size is 0.
> + * @param size
> + *   The size of the stats table, which should be large enough to store
> + *   all the latency stats.
> + * @return
> + *   - positive value lower or equal to size: success. The return value
> + *     is the number of entries filled in the stats table.
> + *   - positive value higher than size: error, the given statistics table
> + *     is too small. The return value corresponds to the size that should
> + *     be given to succeed. The entries in the table are not valid and
> + *     shall not be used by the caller.
> + *   -ENOMEM: On failure.
> + */
> +int rte_latencystats_get(struct rte_metric_value *values,
> +			uint16_t size);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_LATENCYSTATS_H_ */
> diff --git a/lib/librte_latencystats/rte_latencystats_version.map
> b/lib/librte_latencystats/rte_latencystats_version.map
> new file mode 100644
> index 0000000..502018e
> --- /dev/null
> +++ b/lib/librte_latencystats/rte_latencystats_version.map
> @@ -0,0 +1,10 @@
> +DPDK_17.02 {
> +	global:
> +
> +	rte_latencystats_get;
> +	rte_latencystats_get_names;
> +	rte_latencystats_init;
> +	rte_latencystats_uninit;
> +
> +	local: *;
> +};
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index bfce9f4..c35ba0a 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -512,6 +512,9 @@ struct rte_mbuf {
> 
>  	/** Timesync flags for use with IEEE1588. */
>  	uint16_t timesync;
> +
> +	/** Timestamp for measuring latency. */
> +	uint64_t timestamp;
>  } __rte_cache_aligned;
> 

This change needs its own patch, so it can be reviewed independently and the position in the struct for timestamp can be analysed easily.


>  /**
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index 6aac5ac..1d36fad 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -100,7 +100,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
> -
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)  += -lrte_latencystats
> 
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore
> --
> 2.5.5
  
Jerin Jacob Jan. 17, 2017, 12:34 p.m. UTC | #6
On Tue, Jan 17, 2017 at 11:19:24AM +0000, Mcnamara, John wrote:
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > Sent: Tuesday, January 17, 2017 4:30 AM
> > To: Horton, Remy <remy.horton@intel.com>
> > Cc: dev@dpdk.org; Pattan, Reshma <reshma.pattan@intel.com>; Thomas
> > Monjalon <thomas.monjalon@6wind.com>
> > Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency
> > stats
> > 
> > On Mon, Jan 16, 2017 at 04:19:32PM +0000, Remy Horton wrote:
> > > From: Reshma Pattan <reshma.pattan@intel.com>
> > >
> > > Add a library designed to calculate latency statistics and report them
> > > to the application when queried. The library measures minimum, average
> > > and maximum latencies, and jitter in nano seconds. The current
> > > implementation supports global latency stats, i.e. per application
> > stats.
> > >
> > > Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> > > Signed-off-by: Remy Horton <remy.horton@intel.com>
> > > ---
> > >  MAINTAINERS                                        |   4 +
> > >  config/common_base                                 |   5 +
> > >  doc/api/doxy-api-index.md                          |   1 +
> > >  doc/api/doxy-api.conf                              |   1 +
> > >  doc/guides/rel_notes/release_17_02.rst             |   5 +
> > >  lib/Makefile                                       |   1 +
> > >  lib/librte_latencystats/Makefile                   |  57 +++
> > >  lib/librte_latencystats/rte_latencystats.c         | 389
> > +++++++++++++++++++++
> > >  lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
> > >  .../rte_latencystats_version.map                   |  10 +
> > >  lib/librte_mbuf/rte_mbuf.h                         |   3 +
> > 
> > It is a value added feature for DPDK. But what is the plan for
> > incorporating the mbuf change? I have 8 month old mbuf change for ARM for
> > natural alignment. If we are accepting any mbuf change then we need to
> > include outstanding mbuf changes to avoid future ABI breakage.
> > 
> > http://dpdk.org/dev/patchwork/patch/12878/
> > 
> 
> Hi Jerin,

Hi John,

> 
> As far as I know the plan was to reach some sort of consensus on the mbuf
> structure at the DPDK Userspace 2016, during and after Olivier's
> presentation and then to make those changes during 17.02.
> 
> However, I believe Olivier had other work commitments in this release and
> wasn't able to work on the mbuf changes.
> 
> The above mbuf change (and addition at the end of the struct) should
> have gone into that mbuf rework, along with your changes.
> 
> However, since the mbuf rework didn't happen we need to add the field in
> this release.

So we don't care the mbuf ABI breakage in the next release. This wasn't
the message I got earlier for ARM's mbuf change.

http://dpdk.org/dev/patchwork/patch/12878/

> 
> I guess the difference between the above change and your change is that
> the latter is more complex and potentially affect performance, and as such
> makes more sense as part of a rework.

The mbuf natural alignment is a not complex change, it just moving the field and
it does not have any performance impact on IA nor nobody reported any
performance regression on IA.

There is nothing against you or this feature. The only part concerns me
that some set of patches can always override any rule and include in the release
(even as marking as EXPERIMENTAL) because of its important for some set of consumers.
Another set has to wait in the queue because its not important for some people.
For me, it is not a sign of vendor neutral open source project.

Jerin

> 
> Perhaps we, as a community, should commit to the mbuf rework in 17.05
> and make sure it gets done.
> 
> John
> 
> 
>  
> 
> 
>
  
John McNamara Jan. 17, 2017, 2:53 p.m. UTC | #7
> -----Original Message-----
> From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> Sent: Tuesday, January 17, 2017 12:34 PM
> To: Mcnamara, John <john.mcnamara@intel.com>
> Cc: Horton, Remy <remy.horton@intel.com>; dev@dpdk.org; Pattan, Reshma
> <reshma.pattan@intel.com>; Thomas Monjalon <thomas.monjalon@6wind.com>;
> olivier.matz@6wind.com
> Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency
> stats
> 
> On Tue, Jan 17, 2017 at 11:19:24AM +0000, Mcnamara, John wrote:
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > Sent: Tuesday, January 17, 2017 4:30 AM
> > > To: Horton, Remy <remy.horton@intel.com>
> > > Cc: dev@dpdk.org; Pattan, Reshma <reshma.pattan@intel.com>; Thomas
> > > Monjalon <thomas.monjalon@6wind.com>
> > > Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for
> > > latency stats
> > >
> > > On Mon, Jan 16, 2017 at 04:19:32PM +0000, Remy Horton wrote:
> > > > From: Reshma Pattan <reshma.pattan@intel.com>
> > > >
> > > > Add a library designed to calculate latency statistics and report
> > > > them to the application when queried. The library measures
> > > > minimum, average and maximum latencies, and jitter in nano
> > > > seconds. The current implementation supports global latency stats,
> > > > i.e. per application
> > > stats.
> > > >
> > > > Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> > > > Signed-off-by: Remy Horton <remy.horton@intel.com>
> > > > ---
> > > >  MAINTAINERS                                        |   4 +
> > > >  config/common_base                                 |   5 +
> > > >  doc/api/doxy-api-index.md                          |   1 +
> > > >  doc/api/doxy-api.conf                              |   1 +
> > > >  doc/guides/rel_notes/release_17_02.rst             |   5 +
> > > >  lib/Makefile                                       |   1 +
> > > >  lib/librte_latencystats/Makefile                   |  57 +++
> > > >  lib/librte_latencystats/rte_latencystats.c         | 389
> > > +++++++++++++++++++++
> > > >  lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
> > > >  .../rte_latencystats_version.map                   |  10 +
> > > >  lib/librte_mbuf/rte_mbuf.h                         |   3 +
> > >
> > > It is a value added feature for DPDK. But what is the plan for
> > > incorporating the mbuf change? I have 8 month old mbuf change for
> > > ARM for natural alignment. If we are accepting any mbuf change then
> > > we need to include outstanding mbuf changes to avoid future ABI
> breakage.
> > >
> > > http://dpdk.org/dev/patchwork/patch/12878/
> > >
> >
> > Hi Jerin,
> 
> Hi John,
> 
> >
> > As far as I know the plan was to reach some sort of consensus on the
> > mbuf structure at the DPDK Userspace 2016, during and after Olivier's
> > presentation and then to make those changes during 17.02.
> >
> > However, I believe Olivier had other work commitments in this release
> > and wasn't able to work on the mbuf changes.
> >
> > The above mbuf change (and addition at the end of the struct) should
> > have gone into that mbuf rework, along with your changes.
> >
> > However, since the mbuf rework didn't happen we need to add the field
> > in this release.
> 
> So we don't care the mbuf ABI breakage in the next release. This wasn't
> the message I got earlier for ARM's mbuf change.
> 
> http://dpdk.org/dev/patchwork/patch/12878/


Hi Jerin,

We do care about ABI breakage but I was under the impression that the
timestamp change wasn't breaking the ABI since it was at the end of the
struct. I also ran the ABI validator against the change and it didn't show any
breakage.

http://dpdk.org/doc/guides/contributing/versioning.html#running-the-abi-validator

The rearm_data alignment patch, on the other hand, does break ABI. I think
that is the main difference between the two patches.

If the timestamp change does break ABI then it should also wait until the mbuf
restructuring.


> ...
> 
> There is nothing against you or this feature. The only part concerns me
> that some set of patches can always override any rule and include in the
> release (even as marking as EXPERIMENTAL) because of its important for
> some set of consumers.
> Another set has to wait in the queue because its not important for some
> people.
> For me, it is not a sign of vendor neutral open source project.

To be fair I don't think we are trying to override any rule here. 

Also, we aren't the only vendor looking for a timestamp in the mbuf.
Mellanox also submitted a patch:

    http://dpdk.org/ml/archives/dev/2016-October/048809.html

However, it is also fair to acknowledge that the rearm_data alignment patch
shouldn't have had to wait so long. I can't really answer for that directly.
My feeling is that it was targeted for the mbuf rework but got forgotten
when that work slipped.

John
  
Jerin Jacob Jan. 17, 2017, 4:25 p.m. UTC | #8
On Tue, Jan 17, 2017 at 02:53:55PM +0000, Mcnamara, John wrote:
> 
> 
> > -----Original Message-----
> > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com]
> > Sent: Tuesday, January 17, 2017 12:34 PM
> > To: Mcnamara, John <john.mcnamara@intel.com>
> > Cc: Horton, Remy <remy.horton@intel.com>; dev@dpdk.org; Pattan, Reshma
> > <reshma.pattan@intel.com>; Thomas Monjalon <thomas.monjalon@6wind.com>;
> > olivier.matz@6wind.com
> > Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for latency
> > stats
> > 
> > On Tue, Jan 17, 2017 at 11:19:24AM +0000, Mcnamara, John wrote:
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob
> > > > Sent: Tuesday, January 17, 2017 4:30 AM
> > > > To: Horton, Remy <remy.horton@intel.com>
> > > > Cc: dev@dpdk.org; Pattan, Reshma <reshma.pattan@intel.com>; Thomas
> > > > Monjalon <thomas.monjalon@6wind.com>
> > > > Subject: Re: [dpdk-dev] [PATCH v7 5/6] lib: added new library for
> > > > latency stats
> > > >
> > > > On Mon, Jan 16, 2017 at 04:19:32PM +0000, Remy Horton wrote:
> > > > > From: Reshma Pattan <reshma.pattan@intel.com>
> > > > >
> > > > > Add a library designed to calculate latency statistics and report
> > > > > them to the application when queried. The library measures
> > > > > minimum, average and maximum latencies, and jitter in nano
> > > > > seconds. The current implementation supports global latency stats,
> > > > > i.e. per application
> > > > stats.
> > > > >
> > > > > Signed-off-by: Reshma Pattan <reshma.pattan@intel.com>
> > > > > Signed-off-by: Remy Horton <remy.horton@intel.com>
> > > > > ---
> > > > >  MAINTAINERS                                        |   4 +
> > > > >  config/common_base                                 |   5 +
> > > > >  doc/api/doxy-api-index.md                          |   1 +
> > > > >  doc/api/doxy-api.conf                              |   1 +
> > > > >  doc/guides/rel_notes/release_17_02.rst             |   5 +
> > > > >  lib/Makefile                                       |   1 +
> > > > >  lib/librte_latencystats/Makefile                   |  57 +++
> > > > >  lib/librte_latencystats/rte_latencystats.c         | 389
> > > > +++++++++++++++++++++
> > > > >  lib/librte_latencystats/rte_latencystats.h         | 146 ++++++++
> > > > >  .../rte_latencystats_version.map                   |  10 +
> > > > >  lib/librte_mbuf/rte_mbuf.h                         |   3 +
> > > >
> > > > It is a value added feature for DPDK. But what is the plan for
> > > > incorporating the mbuf change? I have 8 month old mbuf change for
> > > > ARM for natural alignment. If we are accepting any mbuf change then
> > > > we need to include outstanding mbuf changes to avoid future ABI
> > breakage.
> > > >
> > > > http://dpdk.org/dev/patchwork/patch/12878/
> > > >
> > >
> > > Hi Jerin,
> > 
> > Hi John,
> > 
> > >
> > > As far as I know the plan was to reach some sort of consensus on the
> > > mbuf structure at the DPDK Userspace 2016, during and after Olivier's
> > > presentation and then to make those changes during 17.02.
> > >
> > > However, I believe Olivier had other work commitments in this release
> > > and wasn't able to work on the mbuf changes.
> > >
> > > The above mbuf change (and addition at the end of the struct) should
> > > have gone into that mbuf rework, along with your changes.
> > >
> > > However, since the mbuf rework didn't happen we need to add the field
> > > in this release.
> > 
> > So we don't care the mbuf ABI breakage in the next release. This wasn't
> > the message I got earlier for ARM's mbuf change.
> > 
> > http://dpdk.org/dev/patchwork/patch/12878/
> 
> 
> Hi Jerin,
> 
> We do care about ABI breakage but I was under the impression that the
> timestamp change wasn't breaking the ABI since it was at the end of the
> struct. I also ran the ABI validator against the change and it didn't show any
> breakage.
> 
> http://dpdk.org/doc/guides/contributing/versioning.html#running-the-abi-validator
> 
> The rearm_data alignment patch, on the other hand, does break ABI. I think
> that is the main difference between the two patches.
> 
> If the timestamp change does break ABI then it should also wait until the mbuf
> restructuring.

I agree on ABI part.

If understand it correctly, Oliver would like to group all the mbuf modification
in one version and postponed the rearm_data change.

Here is the email
---------------------------------------------------------
Changing the mbuf topology is something that should happen as rarely as
possible, so I think we should group all mbuf modifications in one
version.

Your issue (mbuf->rearm alignment), the removing of uneeded fields (port
id, maybe nb_segs), and possibly other things should be addressed for
next version (16.11). I'll send a deprecation notice before the 16.07 is
out if there is no opposition.
---------------------------------------------------------

> 
> 
> > ...
> > 
> > There is nothing against you or this feature. The only part concerns me
> > that some set of patches can always override any rule and include in the
> > release (even as marking as EXPERIMENTAL) because of its important for
> > some set of consumers.
> > Another set has to wait in the queue because its not important for some
> > people.
> > For me, it is not a sign of vendor neutral open source project.
> 
> To be fair I don't think we are trying to override any rule here. 
> 
> Also, we aren't the only vendor looking for a timestamp in the mbuf.
> Mellanox also submitted a patch:
> 
>     http://dpdk.org/ml/archives/dev/2016-October/048809.html

We don't have any issue in adding timestamp in mbuf either.
The point, I was trying to make some changes like rearm_data only need for
ARM architecture.In those cases, postponing the changes due to some other
non direct dependency change is not good a specific architecture/vendor.

> 
> However, it is also fair to acknowledge that the rearm_data alignment patch
> shouldn't have had to wait so long. I can't really answer for that directly.
> My feeling is that it was targeted for the mbuf rework but got forgotten
> when that work slipped.

Oliver,

Could you please suggest how to proceed further?

> 
> John
> 
>
  
Olivier Matz Jan. 18, 2017, 8:11 p.m. UTC | #9
Hi guys,

On Tue, 17 Jan 2017 21:55:16 +0530, Jerin Jacob
> Oliver,
> 
> Could you please suggest how to proceed further?
> 

Sorry for the lack of response. I know people are waiting for
me, but these days I have too many things to do at the same time, and
it's difficult to find time.

In few words (I'll provide more detailed answers to the thread by
friday): I expected to post the mbuf rework patchset for this release,
which includes the structure changes (Jerin's patch for arm access,
timestamp, port, nb_segs, refcnt changes). But the patchset is clearly
not ready yet, it needs a rebase, and it lacks test.

Jerin, I know that you submitted your patch a long time ago, and I'm
the only one to blame, please do not see any vendor preference in it.

I'll check friday what's the effective state of the patchset in my
workspace. If I can extract a minimal patch that only change the
structure, I'll send it for discussion. But from what I remember, the
mbuf structure rework depends on changing the way we access the refcnt,
so it can be moved to the 2nd cache line.

If that's not possible, I'll try propose some alternatives.

Regards,
Olivier
  
Olivier Matz Jan. 24, 2017, 3:24 p.m. UTC | #10
On Wed, 18 Jan 2017 21:11:28 +0100, Olivier Matz
<olivier.matz@6wind.com> wrote:
> Hi guys,
> 
> On Tue, 17 Jan 2017 21:55:16 +0530, Jerin Jacob
> > Oliver,
> > 
> > Could you please suggest how to proceed further?
> >   
> 
> Sorry for the lack of response. I know people are waiting for
> me, but these days I have too many things to do at the same time, and
> it's difficult to find time.
> 
> In few words (I'll provide more detailed answers to the thread by
> friday): I expected to post the mbuf rework patchset for this release,
> which includes the structure changes (Jerin's patch for arm access,
> timestamp, port, nb_segs, refcnt changes). But the patchset is clearly
> not ready yet, it needs a rebase, and it lacks test.
> 
> Jerin, I know that you submitted your patch a long time ago, and I'm
> the only one to blame, please do not see any vendor preference in it.
> 
> I'll check friday what's the effective state of the patchset in my
> workspace. If I can extract a minimal patch that only change the
> structure, I'll send it for discussion. But from what I remember, the
> mbuf structure rework depends on changing the way we access the
> refcnt, so it can be moved to the 2nd cache line.
> 
> If that's not possible, I'll try propose some alternatives.

I just posted a mbuf RFC patchset [1]. I think it contains most
things that were mentioned on the ML. As checked with Thomas, it's too
late to have it included in 17.02.

I'll tend to agree with John that having the timestamp in the mbuf for
latency is not an ABI break, since it is added at the end of the
structure. So I won't oppose to add this field in the mbuf structure
for the release.

The mbuf rearm patch was not forgotten, but it took clearly too long to
be integrated. With the benefit of hindsight, it should have been
pushed without waiting the mbuf rework. Again, apologies for that, I
understand it's quite frustrating.

Anyway, tests or comments on my RFC patchset are welcome, so we can
integrate it at the beginning of the 17.05 cycle.

Regards,
Olivier

[1] http://dpdk.org/ml/archives/dev/2017-January/056187.html
  

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 6cd9896..0a41fe5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -714,3 +714,7 @@  F: examples/tep_termination/
 F: examples/vmdq/
 F: examples/vmdq_dcb/
 F: doc/guides/sample_app_ug/vmdq_dcb_forwarding.rst
+
+Latency Stats
+M: Reshma Pattan <reshma.pattan@intel.com>
+F: lib/librte_latencystats/
diff --git a/config/common_base b/config/common_base
index decebe5..762a54c 100644
--- a/config/common_base
+++ b/config/common_base
@@ -603,3 +603,8 @@  CONFIG_RTE_LIBRTE_METRICS=y
 # Compile the bitrate statistics library
 #
 CONFIG_RTE_LIBRTE_BITRATE=y
+
+#
+# Compile the latency statistics library
+#
+CONFIG_RTE_LIBRTE_LATENCY_STATS=y
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 5e194b0..1cacacc 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -152,4 +152,5 @@  There are many libraries, so their headers may be grouped by topics:
   [keepalive]          (@ref rte_keepalive.h),
   [Device Metrics]     (@ref rte_metrics.h),
   [Bitrate Statistics] (@ref rte_bitrate.h),
+  [Latency stats]      (@ref rte_latencystats.h),
   [version]            (@ref rte_version.h)
diff --git a/doc/api/doxy-api.conf b/doc/api/doxy-api.conf
index 6e6ab5c..0effece 100644
--- a/doc/api/doxy-api.conf
+++ b/doc/api/doxy-api.conf
@@ -47,6 +47,7 @@  INPUT                   = doc/api/doxy-api-index.md \
                           lib/librte_jobstats \
                           lib/librte_kni \
                           lib/librte_kvargs \
+                          lib/librte_latencystats \
                           lib/librte_lpm \
                           lib/librte_mbuf \
                           lib/librte_mempool \
diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
index 44012c8..c09e60b 100644
--- a/doc/guides/rel_notes/release_17_02.rst
+++ b/doc/guides/rel_notes/release_17_02.rst
@@ -68,6 +68,10 @@  New Features
   Six new APIs have been added to the ixgbe PMD for MACsec offload support.
   The declarations for the APIs can be found in ``rte_pmd_ixgbe.h``.
 
+* **Added latency stats library.**
+  A library that facilitates latency stats measurment of the dpdk based
+  applications. The library measures minimum, average, maximum latencies
+  and also jitter in nano seconds.
 
 Resolved Issues
 ---------------
@@ -184,6 +188,7 @@  The libraries prepended with a plus sign were incremented in this version.
      librte_jobstats.so.1
      librte_kni.so.2
      librte_kvargs.so.1
+   + librte_latencystats.so.1
      librte_lpm.so.2
      librte_mbuf.so.2
      librte_mempool.so.2
diff --git a/lib/Makefile b/lib/Makefile
index e211bc0..6fffa88 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -60,6 +60,7 @@  DIRS-$(CONFIG_RTE_LIBRTE_REORDER) += librte_reorder
 DIRS-$(CONFIG_RTE_LIBRTE_PDUMP) += librte_pdump
 DIRS-$(CONFIG_RTE_LIBRTE_METRICS) += librte_metrics
 DIRS-$(CONFIG_RTE_LIBRTE_BITRATE) += librte_bitratestats
+DIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += librte_latencystats
 
 ifeq ($(CONFIG_RTE_EXEC_ENV_LINUXAPP),y)
 DIRS-$(CONFIG_RTE_LIBRTE_KNI) += librte_kni
diff --git a/lib/librte_latencystats/Makefile b/lib/librte_latencystats/Makefile
new file mode 100644
index 0000000..f744da6
--- /dev/null
+++ b/lib/librte_latencystats/Makefile
@@ -0,0 +1,57 @@ 
+#   BSD LICENSE
+#
+#   Copyright(c) 2016 Intel Corporation. All rights reserved.
+#   All rights reserved.
+#
+#   Redistribution and use in source and binary forms, with or without
+#   modification, are permitted provided that the following conditions
+#   are met:
+#
+#     * Redistributions of source code must retain the above copyright
+#       notice, this list of conditions and the following disclaimer.
+#     * Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in
+#       the documentation and/or other materials provided with the
+#       distribution.
+#     * Neither the name of Intel Corporation nor the names of its
+#       contributors may be used to endorse or promote products derived
+#       from this software without specific prior written permission.
+#
+#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+include $(RTE_SDK)/mk/rte.vars.mk
+
+# library name
+LIB = librte_latencystats.a
+
+CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR) -O3
+LDLIBS += -lm
+LDLIBS += -lpthread
+
+EXPORT_MAP := rte_latencystats_version.map
+
+LIBABIVER := 1
+
+# all source are stored in SRCS-y
+SRCS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) := rte_latencystats.c
+
+# install this header file
+SYMLINK-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)-include := rte_latencystats.h
+
+# this lib depends upon:
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_mbuf
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_eal
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_ether
+DEPDIRS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS) += lib/librte_metrics
+
+include $(RTE_SDK)/mk/rte.lib.mk
diff --git a/lib/librte_latencystats/rte_latencystats.c b/lib/librte_latencystats/rte_latencystats.c
new file mode 100644
index 0000000..1a68469
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.c
@@ -0,0 +1,389 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <unistd.h>
+#include <sys/types.h>
+#include <stdbool.h>
+#include <math.h>
+#include <pthread.h>
+
+#include <rte_mbuf.h>
+#include <rte_log.h>
+#include <rte_cycles.h>
+#include <rte_ethdev.h>
+#include <rte_metrics.h>
+#include <rte_memzone.h>
+#include <rte_lcore.h>
+#include <rte_timer.h>
+
+#include "rte_latencystats.h"
+
+/** Nano seconds per second */
+#define NS_PER_SEC 1E9
+
+/** Clock cycles per nano second */
+#define CYCLES_PER_NS (rte_get_timer_hz() / NS_PER_SEC)
+
+/* Macros for printing using RTE_LOG */
+#define RTE_LOGTYPE_LATENCY_STATS RTE_LOGTYPE_USER1
+
+static pthread_t latency_stats_thread;
+static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats";
+static int latency_stats_index;
+static uint64_t samp_intvl;
+static uint64_t timer_tsc;
+static uint64_t prev_tsc;
+
+static struct rte_latency_stats {
+	float min_latency; /**< Minimum latency in nano seconds */
+	float avg_latency; /**< Average latency in nano seconds */
+	float max_latency; /**< Maximum latency in nano seconds */
+	float jitter; /** Latency variation */
+} *glob_stats;
+
+static struct rxtx_cbs {
+	struct rte_eth_rxtx_callback *cb;
+} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT],
+	tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
+
+struct latency_stats_nameoff {
+	char name[RTE_ETH_XSTATS_NAME_SIZE];
+	unsigned int offset;
+};
+
+static const struct latency_stats_nameoff lat_stats_strings[] = {
+	{"min_latency_ns", offsetof(struct rte_latency_stats, min_latency)},
+	{"avg_latency_ns", offsetof(struct rte_latency_stats, avg_latency)},
+	{"max_latency_ns", offsetof(struct rte_latency_stats, max_latency)},
+	{"jitter_ns", offsetof(struct rte_latency_stats, jitter)},
+};
+
+#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \
+				sizeof(lat_stats_strings[0]))
+
+static __attribute__((noreturn)) void *
+report_latency_stats(__rte_unused void *arg)
+{
+	for (;;) {
+		unsigned int i;
+		float *stats_ptr = NULL;
+		uint64_t values[NUM_LATENCY_STATS] = {0};
+		int ret;
+
+		for (i = 0; i < NUM_LATENCY_STATS; i++) {
+			stats_ptr = RTE_PTR_ADD(glob_stats,
+					lat_stats_strings[i].offset);
+			values[i] = (uint64_t)floor((*stats_ptr)/
+					CYCLES_PER_NS);
+		}
+
+		ret = rte_metrics_update_metrics(RTE_METRICS_GLOBAL,
+						latency_stats_index,
+						values, NUM_LATENCY_STATS);
+		if (ret < 0)
+			RTE_LOG(INFO, LATENCY_STATS,
+				"Failed to push the stats\n");
+	}
+}
+
+static void
+rte_latencystats_fill_values(struct rte_metric_value *values)
+{
+	unsigned int i;
+	float *stats_ptr = NULL;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++) {
+		stats_ptr = RTE_PTR_ADD(glob_stats,
+				lat_stats_strings[i].offset);
+		values[i].key = i;
+		values[i].value = (uint64_t)floor((*stats_ptr)/
+						CYCLES_PER_NS);
+	}
+}
+
+static uint16_t
+add_time_stamps(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		uint16_t max_pkts __rte_unused,
+		void *user_cb __rte_unused)
+{
+	unsigned int i;
+	uint64_t diff_tsc, now;
+
+	/*
+	 * For every sample interval,
+	 * time stamp is marked on one received packet.
+	 */
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		diff_tsc = now - prev_tsc;
+		timer_tsc += diff_tsc;
+		if (timer_tsc >= samp_intvl) {
+			/*
+			 * TBD: Mark the timestamp only
+			 * if not already marked by the
+			 * hardware or the PMD.
+			 */
+			pkts[i]->timestamp = now;
+			timer_tsc = 0;
+		}
+		prev_tsc = now;
+		now = rte_rdtsc();
+	}
+
+	return nb_pkts;
+}
+
+static uint16_t
+calc_latency(uint8_t pid __rte_unused,
+		uint16_t qid __rte_unused,
+		struct rte_mbuf **pkts,
+		uint16_t nb_pkts,
+		void *_ __rte_unused)
+{
+	unsigned int i, cnt = 0;
+	uint64_t now;
+	float latency[nb_pkts];
+	static float prev_latency;
+	/*
+	 * Alpha represents degree of weighting decrease in EWMA,
+	 * a constant smoothing factor between 0 and 1. The value
+	 * is used below for measuring average latency.
+	 */
+	const float alpha = 0.2;
+
+	now = rte_rdtsc();
+	for (i = 0; i < nb_pkts; i++) {
+		if (pkts[i]->timestamp)
+			latency[cnt++] = now - pkts[i]->timestamp;
+	}
+
+	for (i = 0; i < cnt; i++) {
+		/*
+		 * The jitter is calculated as statistical mean of interpacket
+		 * delay variation. The "jitter estimate" is computed by taking
+		 * the absolute values of the ipdv sequence and applying an
+		 * exponential filter with parameter 1/16 to generate the
+		 * estimate. i.e J=J+(|D(i-1,i)|-J)/16. Where J is jitter,
+		 * D(i-1,i) is difference in latency of two consecutive packets
+		 * i-1 and i.
+		 * Reference: Calculated as per RFC 5481, sec 4.1,
+		 * RFC 3393 sec 4.5, RFC 1889 sec.
+		 */
+		glob_stats->jitter +=  (abs(prev_latency - latency[i])
+					- glob_stats->jitter)/16;
+		if (glob_stats->min_latency == 0)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] < glob_stats->min_latency)
+			glob_stats->min_latency = latency[i];
+		else if (latency[i] > glob_stats->max_latency)
+			glob_stats->max_latency = latency[i];
+		/*
+		 * The average latency is measured using exponential moving
+		 * average, i.e. using EWMA
+		 * https://en.wikipedia.org/wiki/Moving_average
+		 */
+		glob_stats->avg_latency +=
+			alpha * (latency[i] - glob_stats->avg_latency);
+		prev_latency = latency[i];
+	}
+
+	return nb_pkts;
+}
+
+int
+rte_latencystats_init(uint64_t samp_intvl,
+		rte_latency_stats_flow_type_fn user_cb)
+{
+	unsigned int i;
+	uint8_t pid;
+	uint16_t qid;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+	const char *ptr_strings[NUM_LATENCY_STATS] = {0};
+	const struct rte_memzone *mz = NULL;
+	const unsigned int flags = 0;
+
+	/** Allocate stats in shared memory fo muliti process support */
+	mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats),
+					rte_socket_id(), flags);
+	if (mz == NULL) {
+		RTE_LOG(ERR, LATENCY_STATS, "Cannot reserve memory: %s:%d\n",
+			__func__, __LINE__);
+		return -ENOMEM;
+	}
+
+	glob_stats = mz->addr;
+	samp_intvl *= CYCLES_PER_NS;
+
+	/** Register latency stats with stats library */
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		ptr_strings[i] = lat_stats_strings[i].name;
+
+	latency_stats_index = rte_metrics_reg_metrics(ptr_strings,
+							NUM_LATENCY_STATS);
+	if (latency_stats_index < 0) {
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to register latency stats names\n");
+		return -1;
+	}
+
+	/** Register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			cbs->cb = rte_eth_add_first_rx_callback(pid, qid,
+					add_time_stamps, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			cbs->cb =  rte_eth_add_tx_callback(pid, qid,
+					calc_latency, user_cb);
+			if (!cbs->cb)
+				RTE_LOG(INFO, LATENCY_STATS, "Failed to "
+					"register Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	int ret = 0;
+	char thread_name[RTE_MAX_THREAD_NAME_LEN];
+
+	/** Create the host thread to update latency stats to stats library */
+	ret = pthread_create(&latency_stats_thread, NULL, report_latency_stats,
+				NULL);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to create the latency stats thread:%s, %s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+	/** Set thread_name for aid in debugging */
+	snprintf(thread_name, RTE_MAX_THREAD_NAME_LEN, "latency-stats-thread");
+	ret = rte_thread_setname(latency_stats_thread, thread_name);
+	if (ret != 0)
+		RTE_LOG(DEBUG, LATENCY_STATS,
+			"Failed to set thread name for latency stats handling\n");
+
+	return 0;
+}
+
+int
+rte_latencystats_uninit(void)
+{
+	uint8_t pid;
+	uint16_t qid;
+	int ret = 0;
+	struct rxtx_cbs *cbs = NULL;
+	const uint8_t nb_ports = rte_eth_dev_count();
+
+	/** De register Rx/Tx callbacks */
+	for (pid = 0; pid < nb_ports; pid++) {
+		struct rte_eth_dev_info dev_info;
+		rte_eth_dev_info_get(pid, &dev_info);
+		for (qid = 0; qid < dev_info.nb_rx_queues; qid++) {
+			cbs = &rx_cbs[pid][qid];
+			ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Rx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+		for (qid = 0; qid < dev_info.nb_tx_queues; qid++) {
+			cbs = &tx_cbs[pid][qid];
+			ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb);
+			if (ret)
+				RTE_LOG(INFO, LATENCY_STATS, "failed to "
+					"remove Tx callback for pid=%d, "
+					"qid=%d\n", pid, qid);
+		}
+	}
+
+	/** Cancel the thread */
+	ret = pthread_cancel(latency_stats_thread);
+	if (ret != 0) {
+		RTE_LOG(ERR, LATENCY_STATS,
+			"Failed to cancel latency stats update thread:"
+			"%s,%s:%d\n",
+			strerror(errno), __func__, __LINE__);
+		return -1;
+	}
+
+	return 0;
+}
+
+int
+rte_latencystats_get_names(struct rte_metric_name *names, uint16_t size)
+{
+	unsigned int i;
+
+	if (names == NULL || size < NUM_LATENCY_STATS)
+		return NUM_LATENCY_STATS;
+
+	for (i = 0; i < NUM_LATENCY_STATS; i++)
+		snprintf(names[i].name, sizeof(names[i].name),
+				"%s", lat_stats_strings[i].name);
+
+	return NUM_LATENCY_STATS;
+}
+
+int
+rte_latencystats_get(struct rte_metric_value *values, uint16_t size)
+{
+	if (size < NUM_LATENCY_STATS || values == NULL)
+		return NUM_LATENCY_STATS;
+
+	if (rte_eal_process_type() == RTE_PROC_SECONDARY) {
+		const struct rte_memzone *mz;
+		mz = rte_memzone_lookup(MZ_RTE_LATENCY_STATS);
+		if (mz == NULL) {
+			RTE_LOG(ERR, LATENCY_STATS,
+				"Latency stats memzone not found\n");
+			return -ENOMEM;
+		}
+		glob_stats =  mz->addr;
+	}
+
+	/* Retrieve latency stats */
+	rte_latencystats_fill_values(values);
+
+	return NUM_LATENCY_STATS;
+}
diff --git a/lib/librte_latencystats/rte_latencystats.h b/lib/librte_latencystats/rte_latencystats.h
new file mode 100644
index 0000000..405b878
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats.h
@@ -0,0 +1,146 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_LATENCYSTATS_H_
+#define _RTE_LATENCYSTATS_H_
+
+/**
+ * @file
+ * RTE latency stats
+ *
+ * library to provide application and flow based latency stats.
+ */
+
+#include <rte_metrics.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/**
+ *  Note: This function pointer is for future flow based latency stats
+ *  implementation.
+ *
+ * Function type used for identifting flow types of a Rx packet.
+ *
+ * The callback function is called on Rx for each packet.
+ * This function is used for flow based latency calculations.
+ *
+ * @param pkt
+ *   Packet that has to be identified with its flow types.
+ * @param user_param
+ *   The arbitrary user parameter passed in by the application when
+ *   the callback was originally configured.
+ * @return
+ *   The flow_mask, representing the multiple flow types of a packet.
+ */
+typedef uint16_t (*rte_latency_stats_flow_type_fn)(struct rte_mbuf *pkt,
+							void *user_param);
+
+/**
+ *  Registers Rx/Tx callbacks for each active port, queue.
+ *
+ * @param samp_intvl
+ *  Sampling time period in nano seconds, at which packet
+ *  should be marked with time stamp.
+ * @param user_cb
+ *  Note: This param is for future flow based latency stats
+ *  implementation.
+ *  User callback to be called to get flow types of a packet.
+ *  Used for flow based latency calculation.
+ *  If the value is NULL, global stats will be calculated,
+ *  else flow based latency stats will be calculated.
+ *  For now just pass on the NULL value to this param.
+ *  @return
+ *   -1     : On error
+ *   -ENOMEM: On error
+ *    0     : On success
+ */
+int rte_latencystats_init(uint64_t samp_intvl,
+			rte_latency_stats_flow_type_fn user_cb);
+
+/**
+ *  Removes registered Rx/Tx callbacks for each active port, queue.
+ *
+ *  @return
+ *   -1: On error
+ *    0: On success
+ */
+int rte_latencystats_uninit(void);
+
+/**
+ * Retrieve names of latency statistics
+ *
+ * @param names
+ *  Block of memory to insert names into. Must be at least size in capacity.
+ *  If set to NULL, function returns required capacity.
+ * @param size
+ *  Capacity of latency stats names (number of names).
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ */
+int rte_latencystats_get_names(struct rte_metric_name *names,
+				uint16_t size);
+
+/**
+ * Retrieve latency statistics.
+ *
+ * @param values
+ *   A pointer to a table of structure of type *rte_metric_value*
+ *   to be filled with latency statistics ids and values.
+ *   This parameter can be set to NULL if size is 0.
+ * @param size
+ *   The size of the stats table, which should be large enough to store
+ *   all the latency stats.
+ * @return
+ *   - positive value lower or equal to size: success. The return value
+ *     is the number of entries filled in the stats table.
+ *   - positive value higher than size: error, the given statistics table
+ *     is too small. The return value corresponds to the size that should
+ *     be given to succeed. The entries in the table are not valid and
+ *     shall not be used by the caller.
+ *   -ENOMEM: On failure.
+ */
+int rte_latencystats_get(struct rte_metric_value *values,
+			uint16_t size);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _RTE_LATENCYSTATS_H_ */
diff --git a/lib/librte_latencystats/rte_latencystats_version.map b/lib/librte_latencystats/rte_latencystats_version.map
new file mode 100644
index 0000000..502018e
--- /dev/null
+++ b/lib/librte_latencystats/rte_latencystats_version.map
@@ -0,0 +1,10 @@ 
+DPDK_17.02 {
+	global:
+
+	rte_latencystats_get;
+	rte_latencystats_get_names;
+	rte_latencystats_init;
+	rte_latencystats_uninit;
+
+	local: *;
+};
diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index bfce9f4..c35ba0a 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -512,6 +512,9 @@  struct rte_mbuf {
 
 	/** Timesync flags for use with IEEE1588. */
 	uint16_t timesync;
+
+	/** Timestamp for measuring latency. */
+	uint64_t timestamp;
 } __rte_cache_aligned;
 
 /**
diff --git a/mk/rte.app.mk b/mk/rte.app.mk
index 6aac5ac..1d36fad 100644
--- a/mk/rte.app.mk
+++ b/mk/rte.app.mk
@@ -100,7 +100,7 @@  _LDLIBS-$(CONFIG_RTE_LIBRTE_CMDLINE)        += -lrte_cmdline
 _LDLIBS-$(CONFIG_RTE_LIBRTE_CFGFILE)        += -lrte_cfgfile
 _LDLIBS-$(CONFIG_RTE_LIBRTE_METRICS)        += -lrte_metrics
 _LDLIBS-$(CONFIG_RTE_LIBRTE_BITRATE)        += -lrte_bitratestats
-
+_LDLIBS-$(CONFIG_RTE_LIBRTE_LATENCY_STATS)  += -lrte_latencystats
 
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_BOND)       += -lrte_pmd_bond
 _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_XENVIRT)    += -lrte_pmd_xenvirt -lxenstore