[dpdk-dev] [PATCH v7 0/7] Cuckoo hash - part 3 of Cuckoo hash
Thomas Monjalon
thomas.monjalon at 6wind.com
Mon Jul 13 00:46:56 CEST 2015
2015-07-11 01:18, Pablo de Lara:
> This patchset is to replace the existing hash library with
> a more efficient and functional approach, using the Cuckoo hash
> method to deal with collisions. This method is based on using
> two different hash functions to have two possible locations
> in the hash table where an entry can be.
> So, if a bucket is full, a new entry can push one of the items
> in that bucket to its alternative location, making space for itself.
>
> Advantages
> ~~
> - Offers the option to store more entries when the target bucket is full
> (unlike the previous implementation)
> - Memory efficient: for storing those entries, it is not necessary to
> request new memory, as the entries will be stored in the same table
> - Constant worst lookup time: in worst case scenario, it always takes
> the same time to look up an entry, as there are only two possible locations
> where an entry can be.
> - Storing data: user can store data in the hash table, unlike the
> previous implementation, but he can still use the old API
>
> This implementation typically offers over 90% utilization.
> Notice that API has been extended, but old API remains.
> Check documentation included to know more about this new implementation
> (including how entries are distributed as table utilization increases).
>
> Changes in v7:
> - Fix inaccurate documentation
>
> Changes in v6:
> - Replace datatype for functions from uintptr_t to void *
>
> Changes in v5:
> - Fix 32-bit compilation issues
>
> Changes in v4:
> - Unit tests enhancements are not part of this patchset anymore.
> - rte_hash structure has been made internal in another patch,
> so it is not part of this patchset anymore.
> - Add function to iterate through the hash table, as rte_hash
> structure has been made private.
> - Added extra_flag parameter in rte_hash_parameter to be able
> to add new parameters in the future without breaking the ABI
> - Remove proposed lookup_bulk_with_hash function, as it is
> not of much use with the existing hash functions
> (there are no vector hash functions).
> - User can store 8-byte integer or pointer as data, instead
> of variable size data, as discussed in the mailing list.
>
> Changes in v3:
>
> - Now user can store variable size data, instead of 32 or 64-bit size data,
> using the new parameter "data_len" in rte_hash_parameters
> - Add lookup_bulk_with_hash function in performance unit tests
> - Add new functions that handle data in performance unit tests
> - Remove duplicates in performance unit tests
> - Fix rte_hash_reset, which was not resetting the last entry
>
> Changes in v2:
>
> - Fixed issue where table could not store maximum number of entries
> - Fixed issue where lookup burst could not be more than 32 (instead of 64)
> - Remove unnecessary macros and add other useful ones
> - Added missing library dependencies
> - Used directly rte_hash_secondary instead of rte_hash_alt
> - Renamed rte_hash.c to rte_cuckoo_hash.c to ease the view of the new library
> - Renamed test_hash_perf.c temporarily to ease the view of the improved unit test
> - Moved rte_hash, rte_bucket and rte_hash_key structures to rte_cuckoo_hash.c to
> make them private
> - Corrected copyright dates
> - Added an optimized function to compare keys that are multiple of 16 bytes
> - Improved the way to use primary/secondary signatures. Now both are stored in
> the bucket, so there is no need to calculate them if required.
> Also, there is no need to use the MSB of a signature to differenciate between
> an empty entry and signature 0, since we are storing both signatures,
> which cannot be both 0.
> - Removed rte_hash_rehash, as it was a very expensive operation.
> Therefore, the add function returns now -ENOSPC if key cannot be added
> because of a loop.
> - Prefetched new slot for new key in add function to improve performance.
> - Made doxygen comments more clear.
> - Removed unnecessary rte_hash_del_key_data and rte_hash_del_key_with_data,
> as we can use the lookup functions if we want to get the data before deleting.
> - Removed some unnecessary includes in rte_hash.h
> - Removed some unnecessary variables in rte_cuckoo_hash.c
> - Removed some unnecessary checks before creating a new hash table
> - Added documentation (in release notes and programmers guide)
> - Added new unit tests and replaced the performance one for hash tables
>
> Series Acked-by: Bruce Richardson <bruce.richardson at intel.com>
Applied, thanks
Some bugs/formatting were fixed in fly.
Some remaining comments may be addressed in further patches.
More information about the dev
mailing list