[dpdk-dev,v1,2/2] Test cases for rte_memcmp functions

Message ID 1457391644-29645-2-git-send-email-rkerur@gmail.com (mailing list archive)
State Rejected, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Ravi Kerur March 7, 2016, 11 p.m. UTC
  v1:
        This patch adds test cases for rte_memcmp functions.
        New rte_memcmp functions can be tested via 'make test'
        and 'testpmd' utility.

        Compiled and tested on Ubuntu 14.04(non-NUMA) and
        15.10(NUMA) systems.

Signed-off-by: Ravi Kerur <rkerur@gmail.com>
---
 app/test/Makefile           |  31 +++-
 app/test/autotest_data.py   |  19 +++
 app/test/test_memcmp.c      | 250 ++++++++++++++++++++++++++++
 app/test/test_memcmp_perf.c | 396 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 695 insertions(+), 1 deletion(-)
 create mode 100644 app/test/test_memcmp.c
 create mode 100644 app/test/test_memcmp_perf.c
  

Comments

Zhihong Wang May 26, 2016, 9:05 a.m. UTC | #1
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ravi Kerur
> Sent: Tuesday, March 8, 2016 7:01 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp functions
> 
> v1:
>         This patch adds test cases for rte_memcmp functions.
>         New rte_memcmp functions can be tested via 'make test'
>         and 'testpmd' utility.
> 
>         Compiled and tested on Ubuntu 14.04(non-NUMA) and
>         15.10(NUMA) systems.
[...]

> +/************************************************************
> *******************
> + * Memcmp function performance test configuration section. Each performance
> test
> + * will be performed MEMCMP_ITERATIONS times.
> + *
> + * The five arrays below control what tests are performed. Every combination
> + * from the array entries is tested.
> + */
> +#define MEMCMP_ITERATIONS (500 * 500 * 500)


Maybe less iteration will make the test faster without compromise precison?


> +
> +static size_t memcmp_sizes[] = {
> +	2, 5, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128,
> +	129, 191, 192, 193, 255, 256, 257, 319, 320, 321, 383, 384,
> +	385, 447, 448, 449, 511, 512, 513, 767, 768, 769, 1023, 1024,
> +	1025, 1522, 1536, 1600, 2048, 2560, 3072, 3584, 4096, 4608,
> +	5632, 6144, 6656, 7168, 7680, 8192, 16834
> +};
> +
[...]
> +/*
> + * Do all performance tests.
> + */
> +static int
> +test_memcmp_perf(void)
> +{
> +	if (run_all_memcmp_eq_perf_tests() != 0)
> +		return -1;
> +
> +	if (run_all_memcmp_gt_perf_tests() != 0)
> +		return -1;
> +
> +	if (run_all_memcmp_lt_perf_tests() != 0)
> +		return -1;
> +


Perhaps unaligned test cases are needed here.
How do you think?


> +
> +	return 0;
> +}
> +
> +static struct test_command memcmp_perf_cmd = {
> +	.command = "memcmp_perf_autotest",
> +	.callback = test_memcmp_perf,
> +};
> +REGISTER_TEST_COMMAND(memcmp_perf_cmd);
> --
> 1.9.1
  
Ravi Kerur June 6, 2016, 6:31 p.m. UTC | #2
Zhilong, Thomas,

If there is enough interest within DPDK community I can work on adding
support for 'unaligned access' and 'test cases' for it. Please let me know
either way.

Thanks,
Ravi


On Thu, May 26, 2016 at 2:05 AM, Wang, Zhihong <zhihong.wang@intel.com>
wrote:

>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ravi Kerur
> > Sent: Tuesday, March 8, 2016 7:01 AM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp functions
> >
> > v1:
> >         This patch adds test cases for rte_memcmp functions.
> >         New rte_memcmp functions can be tested via 'make test'
> >         and 'testpmd' utility.
> >
> >         Compiled and tested on Ubuntu 14.04(non-NUMA) and
> >         15.10(NUMA) systems.
> [...]
>
> > +/************************************************************
> > *******************
> > + * Memcmp function performance test configuration section. Each
> performance
> > test
> > + * will be performed MEMCMP_ITERATIONS times.
> > + *
> > + * The five arrays below control what tests are performed. Every
> combination
> > + * from the array entries is tested.
> > + */
> > +#define MEMCMP_ITERATIONS (500 * 500 * 500)
>
>
> Maybe less iteration will make the test faster without compromise precison?
>
>
> > +
> > +static size_t memcmp_sizes[] = {
> > +     2, 5, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128,
> > +     129, 191, 192, 193, 255, 256, 257, 319, 320, 321, 383, 384,
> > +     385, 447, 448, 449, 511, 512, 513, 767, 768, 769, 1023, 1024,
> > +     1025, 1522, 1536, 1600, 2048, 2560, 3072, 3584, 4096, 4608,
> > +     5632, 6144, 6656, 7168, 7680, 8192, 16834
> > +};
> > +
> [...]
> > +/*
> > + * Do all performance tests.
> > + */
> > +static int
> > +test_memcmp_perf(void)
> > +{
> > +     if (run_all_memcmp_eq_perf_tests() != 0)
> > +             return -1;
> > +
> > +     if (run_all_memcmp_gt_perf_tests() != 0)
> > +             return -1;
> > +
> > +     if (run_all_memcmp_lt_perf_tests() != 0)
> > +             return -1;
> > +
>
>
> Perhaps unaligned test cases are needed here.
> How do you think?
>
>
> > +
> > +     return 0;
> > +}
> > +
> > +static struct test_command memcmp_perf_cmd = {
> > +     .command = "memcmp_perf_autotest",
> > +     .callback = test_memcmp_perf,
> > +};
> > +REGISTER_TEST_COMMAND(memcmp_perf_cmd);
> > --
> > 1.9.1
>
>
  
Zhihong Wang June 7, 2016, 11:09 a.m. UTC | #3
> -----Original Message-----

> From: Ravi Kerur [mailto:rkerur@gmail.com]

> Sent: Tuesday, June 7, 2016 2:32 AM

> To: Wang, Zhihong <zhihong.wang@intel.com>; Thomas Monjalon

> <thomas.monjalon@6wind.com>

> Cc: dev@dpdk.org

> Subject: Re: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp functions

> 

> Zhilong, Thomas,

> 

> If there is enough interest within DPDK community I can work on adding support

> for 'unaligned access' and 'test cases' for it. Please let me know either way.

> 



Hi Ravi,

This rte_memcmp is proved with better performance than glibc's in aligned
cases, I think it has good value to DPDK lib.

Though we don't have memcmp in critical pmd data path, it offers a better
choice for applications who do.


Thanks
Zhihong


> Thanks,

> Ravi

> 

> 

> On Thu, May 26, 2016 at 2:05 AM, Wang, Zhihong <zhihong.wang@intel.com>

> wrote:

> 

> 

> > -----Original Message-----

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ravi Kerur

> > Sent: Tuesday, March 8, 2016 7:01 AM

> > To: dev@dpdk.org

> > Subject: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp functions

> >

> > v1:

> >         This patch adds test cases for rte_memcmp functions.

> >         New rte_memcmp functions can be tested via 'make test'

> >         and 'testpmd' utility.

> >

> >         Compiled and tested on Ubuntu 14.04(non-NUMA) and

> >         15.10(NUMA) systems.

> [...]

> 

> > +/************************************************************

> > *******************

> > + * Memcmp function performance test configuration section. Each performance

> > test

> > + * will be performed MEMCMP_ITERATIONS times.

> > + *

> > + * The five arrays below control what tests are performed. Every combination

> > + * from the array entries is tested.

> > + */

> > +#define MEMCMP_ITERATIONS (500 * 500 * 500)

> 

> 

> Maybe less iteration will make the test faster without compromise precison?

> 

> 

> > +

> > +static size_t memcmp_sizes[] = {

> > +     2, 5, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128,

> > +     129, 191, 192, 193, 255, 256, 257, 319, 320, 321, 383, 384,

> > +     385, 447, 448, 449, 511, 512, 513, 767, 768, 769, 1023, 1024,

> > +     1025, 1522, 1536, 1600, 2048, 2560, 3072, 3584, 4096, 4608,

> > +     5632, 6144, 6656, 7168, 7680, 8192, 16834

> > +};

> > +

> [...]

> > +/*

> > + * Do all performance tests.

> > + */

> > +static int

> > +test_memcmp_perf(void)

> > +{

> > +     if (run_all_memcmp_eq_perf_tests() != 0)

> > +             return -1;

> > +

> > +     if (run_all_memcmp_gt_perf_tests() != 0)

> > +             return -1;

> > +

> > +     if (run_all_memcmp_lt_perf_tests() != 0)

> > +             return -1;

> > +

> 

> 

> Perhaps unaligned test cases are needed here.

> How do you think?

> 

> 

> > +

> > +     return 0;

> > +}

> > +

> > +static struct test_command memcmp_perf_cmd = {

> > +     .command = "memcmp_perf_autotest",

> > +     .callback = test_memcmp_perf,

> > +};

> > +REGISTER_TEST_COMMAND(memcmp_perf_cmd);

> > --

> > 1.9.1
  
Thomas Monjalon Jan. 2, 2017, 8:41 p.m. UTC | #4
2016-06-07 11:09, Wang, Zhihong:
> From: Ravi Kerur [mailto:rkerur@gmail.com]
> > Zhilong, Thomas,
> > 
> > If there is enough interest within DPDK community I can work on adding support
> > for 'unaligned access' and 'test cases' for it. Please let me know either way.
> 
> Hi Ravi,
> 
> This rte_memcmp is proved with better performance than glibc's in aligned
> cases, I think it has good value to DPDK lib.
> 
> Though we don't have memcmp in critical pmd data path, it offers a better
> choice for applications who do.

Re-thinking about this series, could it be some values to have a rte_memcmp
implementation?
What is the value compared to glibc one? Why not working on glibc?
  
Zhihong Wang Jan. 9, 2017, 5:29 a.m. UTC | #5
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Tuesday, January 3, 2017 4:41 AM
> To: Wang, Zhihong <zhihong.wang@intel.com>; Ravi Kerur
> <rkerur@gmail.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp
> functions
> 
> 2016-06-07 11:09, Wang, Zhihong:
> > From: Ravi Kerur [mailto:rkerur@gmail.com]
> > > Zhilong, Thomas,
> > >
> > > If there is enough interest within DPDK community I can work on adding
> support
> > > for 'unaligned access' and 'test cases' for it. Please let me know either
> way.
> >
> > Hi Ravi,
> >
> > This rte_memcmp is proved with better performance than glibc's in aligned
> > cases, I think it has good value to DPDK lib.
> >
> > Though we don't have memcmp in critical pmd data path, it offers a better
> > choice for applications who do.
> 
> Re-thinking about this series, could it be some values to have a rte_memcmp
> implementation?

I think this series (rte_memcmp included) could help:

 1. Potentially better performance in hot paths.

 2. Agile for tuning.

 3. Avoid performance complications -- unusual but possible,
    like the glibc memset issue I met while working on vhost
    enqueue.

> What is the value compared to glibc one? Why not working on glibc?

As to working on glibc, wider design consideration and test
coverage might be needed, and we'll face different release
cycles, can we have the same agility? Also working with old
glibc could be a problem.
  
Thomas Monjalon Jan. 9, 2017, 11:08 a.m. UTC | #6
2017-01-09 05:29, Wang, Zhihong:
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > 2016-06-07 11:09, Wang, Zhihong:
> > > From: Ravi Kerur [mailto:rkerur@gmail.com]
> > > > Zhilong, Thomas,
> > > >
> > > > If there is enough interest within DPDK community I can work on adding
> > support
> > > > for 'unaligned access' and 'test cases' for it. Please let me know either
> > way.
> > >
> > > Hi Ravi,
> > >
> > > This rte_memcmp is proved with better performance than glibc's in aligned
> > > cases, I think it has good value to DPDK lib.
> > >
> > > Though we don't have memcmp in critical pmd data path, it offers a better
> > > choice for applications who do.
> > 
> > Re-thinking about this series, could it be some values to have a rte_memcmp
> > implementation?
> 
> I think this series (rte_memcmp included) could help:
> 
>  1. Potentially better performance in hot paths.
> 
>  2. Agile for tuning.
> 
>  3. Avoid performance complications -- unusual but possible,
>     like the glibc memset issue I met while working on vhost
>     enqueue.
> 
> > What is the value compared to glibc one? Why not working on glibc?
> 
> As to working on glibc, wider design consideration and test
> coverage might be needed, and we'll face different release
> cycles, can we have the same agility? Also working with old
> glibc could be a problem.

Probably we need both: add the optimized version in DPDK while working
on a glibc optimization.
This strategy could be applicable to memcpy, memcmp and memset.
  
Zhihong Wang Jan. 11, 2017, 1:28 a.m. UTC | #7
> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Monday, January 9, 2017 7:09 PM
> To: Wang, Zhihong <zhihong.wang@intel.com>
> Cc: Ravi Kerur <rkerur@gmail.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1 2/2] Test cases for rte_memcmp
> functions
> 
> 2017-01-09 05:29, Wang, Zhihong:
> > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> > > 2016-06-07 11:09, Wang, Zhihong:
> > > > From: Ravi Kerur [mailto:rkerur@gmail.com]
> > > > > Zhilong, Thomas,
> > > > >
> > > > > If there is enough interest within DPDK community I can work on
> adding
> > > support
> > > > > for 'unaligned access' and 'test cases' for it. Please let me know either
> > > way.
> > > >
> > > > Hi Ravi,
> > > >
> > > > This rte_memcmp is proved with better performance than glibc's in
> aligned
> > > > cases, I think it has good value to DPDK lib.
> > > >
> > > > Though we don't have memcmp in critical pmd data path, it offers a
> better
> > > > choice for applications who do.
> > >
> > > Re-thinking about this series, could it be some values to have a
> rte_memcmp
> > > implementation?
> >
> > I think this series (rte_memcmp included) could help:
> >
> >  1. Potentially better performance in hot paths.
> >
> >  2. Agile for tuning.
> >
> >  3. Avoid performance complications -- unusual but possible,
> >     like the glibc memset issue I met while working on vhost
> >     enqueue.
> >
> > > What is the value compared to glibc one? Why not working on glibc?
> >
> > As to working on glibc, wider design consideration and test
> > coverage might be needed, and we'll face different release
> > cycles, can we have the same agility? Also working with old
> > glibc could be a problem.
> 
> Probably we need both: add the optimized version in DPDK while working
> on a glibc optimization.
> This strategy could be applicable to memcpy, memcmp and memset.

This does help in the long run if turned out feasible.
  

Patch

diff --git a/app/test/Makefile b/app/test/Makefile
index ec33e1a..f6ecaa9 100644
--- a/app/test/Makefile
+++ b/app/test/Makefile
@@ -82,6 +82,9 @@  SRCS-y += test_logs.c
 SRCS-y += test_memcpy.c
 SRCS-y += test_memcpy_perf.c
 
+SRCS-y += test_memcmp.c
+SRCS-y += test_memcmp_perf.c
+
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_thash.c
 SRCS-$(CONFIG_RTE_LIBRTE_HASH) += test_hash_perf.c
@@ -160,14 +163,40 @@  CFLAGS += $(WERROR_FLAGS)
 
 CFLAGS += -D_GNU_SOURCE
 
-# Disable VTA for memcpy test
+# Disable VTA for memcpy and memcmp tests
 ifeq ($(CC), gcc)
 ifeq ($(shell test $(GCC_VERSION) -ge 44 && echo 1), 1)
 CFLAGS_test_memcpy.o += -fno-var-tracking-assignments
 CFLAGS_test_memcpy_perf.o += -fno-var-tracking-assignments
+
+CFLAGS_test_memcmp.o += -fno-var-tracking-assignments
+CFLAGS_test_memcmp_perf.o += -fno-var-tracking-assignments
+
 endif
 endif
 
+CMP_AVX2_SUPPORT=$(shell $(CC) -march=core-avx2 -dM -E - </dev/null 2>&1 | \
+	grep -q AVX2 && echo 1)
+
+ifeq ($(CMP_AVX2_SUPPORT), 1)
+	ifeq ($(CC), icc)
+		CFLAGS_test_memcmp.o += -march=core-avx2
+		CFLAGS_test_memcmp_perf.o += -march=core-avx2
+	else
+		CFLAGS_test_memcmp.o += -mavx2
+		CFLAGS_test_memcmp_perf.o += -mavx2
+	endif
+else
+	ifeq ($(CC), icc)
+		CFLAGS_test_memcmp.o += -march=core-sse4.1
+		CFLAGS_test_memcmp_perf.o += -march=core-sse4.1
+	else
+		CFLAGS_test_memcmp.o += -msse4.1
+		CFLAGS_test_memcmp_perf.o += -msse4.1
+	endif
+endif
+
+
 # this application needs libraries first
 DEPDIRS-y += lib drivers
 
diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
index 6f34d6b..5113327 100644
--- a/app/test/autotest_data.py
+++ b/app/test/autotest_data.py
@@ -186,6 +186,12 @@  parallel_test_group_list = [
 		 "Report" :	None,
 		},
 		{
+		 "Name" :	"Memcmp autotest",
+		 "Command" : 	"memcmp_autotest",
+		 "Func" :	default_autotest,
+		 "Report" :	None,
+		},
+		{
 		 "Name" :	"Memzone autotest",
 		 "Command" : 	"memzone_autotest",
 		 "Func" :	default_autotest,
@@ -398,6 +404,19 @@  non_parallel_test_group_list = [
 	]
 },
 {
+	"Prefix":	"memcmp_perf",
+	"Memory" :	per_sockets(512),
+	"Tests" :
+	[
+		{
+		 "Name" :	"Memcmp performance autotest",
+		 "Command" : 	"memcmp_perf_autotest",
+		 "Func" :	default_autotest,
+		 "Report" :	None,
+		},
+	]
+},
+{
 	"Prefix":	"hash_perf",
 	"Memory" :	per_sockets(512),
 	"Tests" :
diff --git a/app/test/test_memcmp.c b/app/test/test_memcmp.c
new file mode 100644
index 0000000..e3b0bf7
--- /dev/null
+++ b/app/test/test_memcmp.c
@@ -0,0 +1,250 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <sys/queue.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_cycles.h>
+#include <rte_random.h>
+#include <rte_memory.h>
+#include <rte_eal.h>
+#include <rte_memcmp.h>
+
+#include "test.h"
+
+/*******************************************************************************
+ * Memcmp function performance test configuration section.
+ * Each performance test will be performed HASHTEST_ITERATIONS times.
+ *
+ * The five arrays below control what tests are performed. Every combination
+ * from the array entries is tested.
+ */
+static size_t memcmp_sizes[] = {
+	1, 7, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128, 129, 255,
+	256, 257, 320, 384, 511, 512, 513, 1023, 1024, 1025, 1518, 1522, 1600,
+	2048, 3072, 4096, 5120, 6144, 7168, 8192, 16384
+};
+
+/******************************************************************************/
+
+#define RTE_MEMCMP_LENGTH_MAX 16384
+
+/*
+ * Test a memcmp equal function.
+ */
+static int run_memcmp_eq_func_test(uint32_t len)
+{
+	uint32_t i, rc;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL) {
+		printf("\nkey_1 is null\n");
+		return -1;
+	}
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		rte_free(key_1);
+		printf("\nkey_2 is null\n");
+		return -1;
+	}
+
+	for (i = 0; i < len; i++)
+		key_1[i] = 1;
+
+	for (i = 0; i < len; i++)
+		key_2[i] = 1;
+
+	rc = rte_memcmp(key_1, key_2, len);
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return rc;
+}
+
+/*
+ * Test memcmp equal functions.
+ */
+static int run_memcmp_eq_func_tests(void)
+{
+	unsigned i;
+
+	for (i = 0;
+	     i < sizeof(memcmp_sizes) / sizeof(memcmp_sizes[0]);
+	     i++) {
+		if (run_memcmp_eq_func_test(memcmp_sizes[i])) {
+			printf("Comparing equal %zd bytes failed\n", memcmp_sizes[i]);
+			return 1;
+		}
+	}
+	printf("RTE memcmp for equality successful\n");
+	return 0;
+}
+
+/*
+ * Test a memcmp less than function.
+ */
+static int run_memcmp_lt_func_test(uint32_t len)
+{
+	uint32_t i, rc;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL)
+		return -1;
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		rte_free(key_1);
+		return -1;
+	}
+
+	for (i = 0; i < len; i++)
+		key_1[i] = 1;
+
+	for (i = 0; i < len; i++)
+		key_2[i] = 2;
+
+	rc = rte_memcmp(key_1, key_2, len);
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return rc;
+}
+
+/*
+ * Test memcmp less than functions.
+ */
+static int run_memcmp_lt_func_tests(void)
+{
+	unsigned i;
+
+	for (i = 0;
+	     i < sizeof(memcmp_sizes) / sizeof(memcmp_sizes[0]);
+	     i++) {
+		if (!(run_memcmp_lt_func_test(memcmp_sizes[i]) < 0)) {
+			printf("Comparing less than for %zd bytes failed\n", memcmp_sizes[i]);
+			return 1;
+		}
+	}
+	printf("RTE memcmp for less than successful\n");
+	return 0;
+}
+
+/*
+ * Test a memcmp greater than function.
+ */
+static int run_memcmp_gt_func_test(uint32_t len)
+{
+	uint32_t i, rc;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL)
+		return -1;
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		rte_free(key_1);
+		return -1;
+	}
+
+	for (i = 0; i < len; i++)
+		key_1[i] = 2;
+
+	for (i = 0; i < len; i++)
+		key_2[i] = 1;
+
+	rc = rte_memcmp(key_1, key_2, len);
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return rc;
+}
+
+/*
+ * Test memcmp less than functions.
+ */
+static int run_memcmp_gt_func_tests(void)
+{
+	unsigned i;
+
+	for (i = 0;
+	     i < sizeof(memcmp_sizes) / sizeof(memcmp_sizes[0]);
+	     i++) {
+		if (!(run_memcmp_gt_func_test(memcmp_sizes[i]) > 0)) {
+			printf("Comparing greater than for %zd bytes failed\n", memcmp_sizes[i]);
+			return 1;
+		}
+	}
+	printf("RTE memcmp for greater than successful\n");
+	return 0;
+}
+
+/*
+ * Do all unit and performance tests.
+ */
+static int
+test_memcmp(void)
+{
+	if (run_memcmp_eq_func_tests())
+		return -1;
+
+	if (run_memcmp_gt_func_tests())
+		return -1;
+
+	if (run_memcmp_lt_func_tests())
+		return -1;
+
+	return 0;
+}
+
+static struct test_command memcmp_cmd = {
+	.command = "memcmp_autotest",
+	.callback = test_memcmp,
+};
+REGISTER_TEST_COMMAND(memcmp_cmd);
diff --git a/app/test/test_memcmp_perf.c b/app/test/test_memcmp_perf.c
new file mode 100644
index 0000000..4c0f4d9
--- /dev/null
+++ b/app/test/test_memcmp_perf.c
@@ -0,0 +1,396 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <errno.h>
+#include <sys/queue.h>
+#include <sys/times.h>
+
+#include <rte_common.h>
+#include <rte_malloc.h>
+#include <rte_cycles.h>
+#include <rte_random.h>
+#include <rte_memory.h>
+#include <rte_memcmp.h>
+
+#include "test.h"
+
+/*******************************************************************************
+ * Memcmp function performance test configuration section. Each performance test
+ * will be performed MEMCMP_ITERATIONS times.
+ *
+ * The five arrays below control what tests are performed. Every combination
+ * from the array entries is tested.
+ */
+#define MEMCMP_ITERATIONS (500 * 500 * 500)
+
+static size_t memcmp_sizes[] = {
+	2, 5, 8, 9, 15, 16, 17, 31, 32, 33, 63, 64, 65, 127, 128,
+	129, 191, 192, 193, 255, 256, 257, 319, 320, 321, 383, 384,
+	385, 447, 448, 449, 511, 512, 513, 767, 768, 769, 1023, 1024,
+	1025, 1522, 1536, 1600, 2048, 2560, 3072, 3584, 4096, 4608,
+	5632, 6144, 6656, 7168, 7680, 8192, 16834
+};
+
+static size_t memcmp_lt_gt_sizes[] = {
+	1, 8, 15, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384
+};
+
+/******************************************************************************/
+
+static int
+run_single_memcmp_eq_perf_test(uint32_t len, int func_type, uint64_t iterations)
+{
+	uint32_t i, j;
+
+	double begin = 0, end = 0;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+	int rc = 0;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL) {
+		printf("\nkey_1 mem alloc failure\n");
+		return -1;
+	}
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		printf("\nkey_2 mem alloc failure\n");
+		rte_free(key_2);
+		return -1;
+	}
+
+	/* Prepare inputs for the current iteration */
+	for (j = 0; j < len; j++)
+		key_1[j] = key_2[j] = j / 64;
+
+	begin = rte_rdtsc();
+
+	/* Perform operation, and measure time it takes */
+	for (i = 0; i < iterations; i++) {
+
+		switch (func_type) {
+		case 1:
+			rc += rte_memcmp(key_1, key_2, len);
+			break;
+		case 2:
+			rc += memcmp(key_1, key_2, len);
+			break;
+		default:
+			break;
+		}
+
+	}
+
+	end = rte_rdtsc() - begin;
+
+	printf(" *** %10i, %10.4f ***\n", len, (double)(end/iterations));
+
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return rc;
+}
+
+/*
+ * Run all memcmp table performance tests.
+ */
+static int run_all_memcmp_eq_perf_tests(void)
+{
+	unsigned i;
+
+	printf(" *** RTE memcmp equal performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_sizes) / sizeof(memcmp_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_eq_perf_test(memcmp_sizes[i], 1,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+
+	printf(" *** memcmp equal performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_sizes) / sizeof(memcmp_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_eq_perf_test(memcmp_sizes[i], 2,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+	return 0;
+}
+
+static int
+run_single_memcmp_lt_perf_test(uint32_t len, int func_type,
+					uint64_t iterations)
+{
+	uint32_t i, j;
+
+	double begin = 0, end = 0;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+	int rc;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL) {
+		printf("\nKey_1 lt mem alloc failure\n");
+		return -1;
+	}
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		printf("\nKey_2 lt mem alloc failure\n");
+		rte_free(key_1);
+		return -1;
+	}
+
+	/* Prepare inputs for the current iteration */
+	for (j = 0; j < len; j++)
+		key_1[j] = 1;
+
+	for (j = 0; j < len; j++)
+		key_2[j] = 1;
+
+	/* Perform operation, and measure time it takes */
+	for (i = 0; i < iterations; i++) {
+
+		key_2[i % len] = 2;
+
+		switch (func_type) {
+		case 1:
+			begin = rte_rdtsc();
+			rc = rte_memcmp(key_1, key_2, len);
+			end += rte_rdtsc() - begin;
+			break;
+		case 2:
+			begin = rte_rdtsc();
+			rc = memcmp(key_1, key_2, len);
+			end += rte_rdtsc() - begin;
+			break;
+		default:
+			break;
+		}
+
+		key_2[i % len] = 1;
+
+		if (!(rc < 0)) {
+			printf("\nrc %d i %d\n", rc, i);
+			return -1;
+		}
+	}
+
+	printf(" *** %10i, %10.4f ***\n", len, (double)(end/iterations));
+
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return 0;
+}
+
+/*
+ * Run all memcmp table performance tests.
+ */
+static int run_all_memcmp_lt_perf_tests(void)
+{
+	unsigned i;
+
+	printf(" *** RTE memcmp less than performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_lt_gt_sizes) / sizeof(memcmp_lt_gt_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_lt_perf_test(memcmp_lt_gt_sizes[i], 1,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+
+	printf(" *** memcmp less than performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_lt_gt_sizes) / sizeof(memcmp_lt_gt_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_lt_perf_test(memcmp_lt_gt_sizes[i], 2,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+	return 0;
+}
+
+static int
+run_single_memcmp_gt_perf_test(uint32_t len, int func_type,
+					uint64_t iterations)
+{
+	uint32_t i, j;
+
+	double begin = 0, end = 0;
+
+	uint8_t *volatile key_1 = NULL;
+	uint8_t *volatile key_2 = NULL;
+	int rc;
+
+	key_1 = rte_zmalloc("memcmp key_1", len * sizeof(uint8_t), 16);
+	if (key_1 == NULL) {
+		printf("\nkey_1 gt mem alloc failure\n");
+		return -1;
+	}
+
+	key_2 = rte_zmalloc("memcmp key_2", len * sizeof(uint8_t), 16);
+	if (key_2 == NULL) {
+		printf("\nkey_2 gt mem alloc failure\n");
+		rte_free(key_1);
+		return -1;
+	}
+
+	/* Prepare inputs for the current iteration */
+	for (j = 0; j < len; j++)
+		key_1[j] = 1;
+
+	for (j = 0; j < len; j++)
+		key_2[j] = 1;
+
+	/* Perform operation, and measure time it takes */
+	for (i = 0; i < iterations; i++) {
+		key_1[i % len] = 2;
+
+		switch (func_type) {
+		case 1:
+			begin = rte_rdtsc();
+			rc = rte_memcmp(key_1, key_2, len);
+			end += rte_rdtsc() - begin;
+			break;
+		case 2:
+			begin = rte_rdtsc();
+			rc = memcmp(key_1, key_2, len);
+			end += rte_rdtsc() - begin;
+			break;
+		default:
+			break;
+		}
+
+		key_1[i % len] = 1;
+
+		if (!(rc > 0)) {
+			printf("\nrc %d i %d\n", rc, i);
+			for (i = 0; i < len; i++)
+				printf("\nkey_1 %d key_2 %d mod %d\n", key_1[i], key_2[i], (i % len));
+			return -1;
+		}
+	}
+
+	printf(" *** %10i, %10.4f ***\n", len, (double)(end/iterations));
+
+	rte_free(key_1);
+	rte_free(key_2);
+
+	return 0;
+}
+
+/*
+ * Run all memcmp table performance tests.
+ */
+static int run_all_memcmp_gt_perf_tests(void)
+{
+	unsigned i;
+
+	printf(" *** RTE memcmp greater than performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_lt_gt_sizes) / sizeof(memcmp_lt_gt_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_gt_perf_test(memcmp_lt_gt_sizes[i], 1,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+
+	printf(" *** memcmp greater than performance test results ***\n");
+	printf(" *** Length (bytes), Ticks/Op. ***\n");
+
+	/* Loop through every combination of test parameters */
+	for (i = 0;
+	     i < sizeof(memcmp_lt_gt_sizes) / sizeof(memcmp_lt_gt_sizes[0]);
+	     i++) {
+		/* Perform test */
+		if (run_single_memcmp_gt_perf_test(memcmp_lt_gt_sizes[i], 2,
+						MEMCMP_ITERATIONS) != 0)
+			return -1;
+	}
+	return 0;
+}
+
+/*
+ * Do all performance tests.
+ */
+static int
+test_memcmp_perf(void)
+{
+	if (run_all_memcmp_eq_perf_tests() != 0)
+		return -1;
+
+	if (run_all_memcmp_gt_perf_tests() != 0)
+		return -1;
+
+	if (run_all_memcmp_lt_perf_tests() != 0)
+		return -1;
+
+
+	return 0;
+}
+
+static struct test_command memcmp_perf_cmd = {
+	.command = "memcmp_perf_autotest",
+	.callback = test_memcmp_perf,
+};
+REGISTER_TEST_COMMAND(memcmp_perf_cmd);