[dpdk-dev] [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash

Thomas Monjalon thomas.monjalon at 6wind.com
Mon Jul 13 00:46:56 CEST 2015


2015-07-11 01:18, Pablo de Lara:
> This patchset is to replace the existing hash library with
> a more efficient and functional approach, using the Cuckoo hash
> method to deal with collisions. This method is based on using
> two different hash functions to have two possible locations
> in the hash table where an entry can be.
> So, if a bucket is full, a new entry can push one of the items
> in that bucket to its alternative location, making space for itself.
> 
> Advantages
> ~~
> - Offers the option to store more entries when the target bucket is full
>   (unlike the previous implementation)
> - Memory efficient: for storing those entries, it is not necessary to
>   request new memory, as the entries will be stored in the same table
> - Constant worst lookup time: in worst case scenario, it always takes
>   the same time to look up an entry, as there are only two possible locations
>   where an entry can be.
> - Storing data: user can store data in the hash table, unlike the
>   previous implementation, but he can still use the old API
> 
> This implementation typically offers over 90% utilization.
> Notice that API has been extended, but old API remains.
> Check documentation included to know more about this new implementation
> (including how entries are distributed as table utilization increases).
> 
> Changes in v7:
> - Fix inaccurate documentation
> 
> Changes in v6:
> - Replace datatype for functions from uintptr_t to void *
> 
> Changes in v5:
> - Fix 32-bit compilation issues
> 
> Changes in v4:
> - Unit tests enhancements are not part of this patchset anymore.
> - rte_hash structure has been made internal in another patch,
>   so it is not part of this patchset anymore.
> - Add function to iterate through the hash table, as rte_hash
>   structure has been made private.
> - Added extra_flag parameter in rte_hash_parameter to be able
>   to add new parameters in the future without breaking the ABI
> - Remove proposed lookup_bulk_with_hash function, as it is
>   not of much use with the existing hash functions
>   (there are no vector hash functions).
> - User can store 8-byte integer or pointer as data, instead
>   of variable size data, as discussed in the mailing list.
> 
> Changes in v3:
> 
> - Now user can store variable size data, instead of 32 or 64-bit size data,
>   using the new parameter "data_len" in rte_hash_parameters
> - Add lookup_bulk_with_hash function in performance  unit tests
> - Add new functions that handle data in performance unit tests
> - Remove duplicates in performance unit tests
> - Fix rte_hash_reset, which was not resetting the last entry
> 
> Changes in v2:
> 
> - Fixed issue where table could not store maximum number of entries
> - Fixed issue where lookup burst could not be more than 32 (instead of 64)
> - Remove unnecessary macros and add other useful ones
> - Added missing library dependencies
> - Used directly rte_hash_secondary instead of rte_hash_alt
> - Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library
> - Renamed test_hash_perf.c temporarily to ease the view of the improved unit test
> - Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to
>   make them private
> - Corrected copyright dates
> - Added an optimized function to compare keys that are multiple of 16 bytes
> - Improved the way to use primary/secondary signatures. Now both are stored in
>   the bucket, so there is no need to calculate them if required.
>   Also, there is no need to use the MSB of a signature to differenciate between
>   an empty entry and signature 0, since we are storing both signatures,
>   which cannot be both 0.
> - Removed rte_hash_rehash, as it was a very expensive operation.
>   Therefore, the add function returns now -ENOSPC if key cannot be added
>   because of a loop.
> - Prefetched new slot for new key in add function to improve performance.
> - Made doxygen comments more clear.
> - Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data,
>   as we can use the lookup functions if we want to get the data before deleting.
> - Removed some unnecessary includes in rte_hash.h
> - Removed some unnecessary variables in rte_cuckoo_hash.c
> - Removed some unnecessary checks before creating a new hash table
> - Added documentation (in release notes and programmers guide)
> - Added new unit tests and replaced the performance one for hash tables
> 
> Series Acked-by: Bruce Richardson <bruce.richardson at intel.com>

Applied, thanks

Some bugs/formatting were fixed in fly.
Some remaining comments may be addressed in further patches.


More information about the dev mailing list