[dpdk-dev] [PATCH v2] Implement rte_memcmp with AVX/SSE instructions.

Ravi Kerur rkerur at gmail.com
Fri May 8 23:19:06 CEST 2015


Background:
After preliminary discussion with John (Zhihong) and Tim from Intel it was
decided that it would be beneficial to use AVX/SSE instructions for memcmp
similar to memcpy being implemeneted. In addition, we decided to use
librte_hash as a test candidate to test both functionality and performance.

Currently memcmp in librte_hash is used for key comparisons whose length
can vary and max key length is defined to 64 bytes. Preliminary tests on
memory comparison alone shows using AVX/SSE instructions takes 1/3rd
CPU ticks compared with regular memcmp function. Furthermore,
hash_perf_autotest shows better results in all categories. Please note
that memory comparison is a small portion in hash functionality and CPU
Ticks/Op is for hash operations (Add on Empty, Add update, Lookup). Only
hash lookup results are shown below. I can send complete results if
interested.

Test was conducted on Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz, Ubuntu
14.04, x86_64, 16GB DDR3 system.

PS: I would like to keep "rte_memcmp" simple with return codes

0 - match
1 - no-match

since usage in DPDK is for equality or inequality and I have not seen
any instance where less-than/greater-than comparison is needed. Hence
"if (unlikely(...))" portion in the code will probably be removed and it
will be made specific to DPDK rather than being generic.

/*************Existing code**********************************/
 *** Hash table performance test results ***
Hash Func.  , Operation      , Key size (bytes), Entries, Entries per bucket, Errors  , Avg. bucket entries, Ticks/Op.
rte_hash_crc, Lookup         , 16              , 1024   , 1                 , 10000   , 0.00               , 88.55
rte_hash_crc, Lookup         , 16              , 1024   , 2                 , 10000   , 0.00               , 99.28
rte_hash_crc, Lookup         , 16              , 1024   , 4                 , 10000   , 0.00               , 106.73
rte_hash_crc, Lookup         , 16              , 1024   , 8                 , 10000   , 0.00               , 126.99
rte_hash_crc, Lookup         , 16              , 1024   , 16                , 10000   , 0.00               , 159.80

rte_hash_crc, Lookup         , 16              , 1048576, 1                 , 51      , 0.01               , 175.23
rte_hash_crc, Lookup         , 16              , 1048576, 2                 , 2       , 0.02               , 171.24
rte_hash_crc, Lookup         , 16              , 1048576, 4                 , 0       , 0.04               , 145.48
rte_hash_crc, Lookup         , 16              , 1048576, 8                 , 0       , 0.08               , 162.35
rte_hash_crc, Lookup         , 16              , 1048576, 16                , 0       , 0.15               , 182.42

jhash       , Lookup         , 16              , 1048576, 1                 , 33      , 0.01               , 219.71
jhash       , Lookup         , 16              , 1048576, 2                 , 1       , 0.02               , 216.44
jhash       , Lookup         , 16              , 1048576, 4                 , 0       , 0.04               , 188.29
jhash       , Lookup         , 16              , 1048576, 8                 , 0       , 0.08               , 203.70
jhash       , Lookup         , 16              , 1048576, 16                , 0       , 0.15               , 229.50

/**************New AVX/SSE code******************************/
Hash Func.  , Operation      , Key size (bytes), Entries, Entries per bucket, Errors  , Avg. bucket entries, Ticks/Op.
rte_hash_crc, Lookup         , 16              , 1024   , 1                 , 10000   , 0.00               , 85.69
rte_hash_crc, Lookup         , 16              , 1024   , 2                 , 10000   , 0.00               , 93.95
rte_hash_crc, Lookup         , 16              , 1024   , 4                 , 10000   , 0.00               , 102.80
rte_hash_crc, Lookup         , 16              , 1024   , 8                 , 10000   , 0.00               , 122.60
rte_hash_crc, Lookup         , 16              , 1024   , 16                , 10000   , 0.00               , 156.58

rte_hash_crc, Lookup         , 16              , 1048576, 1                 , 41      , 0.01               , 156.84
rte_hash_crc, Lookup         , 16              , 1048576, 2                 , 0       , 0.02               , 157.90
rte_hash_crc, Lookup         , 16              , 1048576, 4                 , 0       , 0.04               , 134.92
rte_hash_crc, Lookup         , 16              , 1048576, 8                 , 0       , 0.08               , 150.99
rte_hash_crc, Lookup         , 16              , 1048576, 16                , 0       , 0.15               , 174.08

jhash       , Lookup         , 16              , 1048576, 1                 , 45      , 0.01               , 212.03
jhash       , Lookup         , 16              , 1048576, 2                 , 2       , 0.02               , 210.65
jhash       , Lookup         , 16              , 1048576, 4                 , 0       , 0.04               , 185.90
jhash       , Lookup         , 16              , 1048576, 8                 , 0       , 0.08               , 201.35
jhash       , Lookup         , 16              , 1048576, 16                , 0       , 0.15               , 223.54

Ravi Kerur (1):
  Implement memcmp using AVX/SSE instructions.

 app/test/test_hash_perf.c                          |  36 +-
 .../common/include/arch/ppc_64/rte_memcmp.h        |  62 +++
 .../common/include/arch/x86/rte_memcmp.h           | 421 +++++++++++++++++++++
 lib/librte_eal/common/include/generic/rte_memcmp.h | 131 +++++++
 lib/librte_hash/rte_hash.c                         |  59 ++-
 5 files changed, 675 insertions(+), 34 deletions(-)
 create mode 100644 lib/librte_eal/common/include/arch/ppc_64/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/arch/x86/rte_memcmp.h
 create mode 100644 lib/librte_eal/common/include/generic/rte_memcmp.h

-- 
1.9.1



More information about the dev mailing list