[dpdk-dev] [PATCH v2] hash: fix scaling by reducing contention

Bruce Richardson bruce.richardson at intel.com
Fri Oct 30 14:52:34 CET 2015


On Fri, Oct 30, 2015 at 12:22:52PM +0000, Pablo de Lara wrote:
> From: "De Lara Guarch, Pablo" <pablo.de.lara.guarch at intel.com>
> 
> If using multiple cores on a system with hardware transactional
> memory support, thread scaling does not work, as there was a single
> point in the hash library which is a bottleneck for all threads,
> which is the "free_slots" ring, which stores all the indices of
> the free slots in the table.
> 
> This patch fixes the problem, by creating a local cache per logical core,
> which stores locally indices of free slots,
> so most times, writer threads will not interfere each other.
> 
> Fixes: 48a399119619 ("hash: replace with cuckoo hash implementation")
> 
> Signed-off-by: Pablo de Lara <pablo.de.lara.guarch at intel.com>
> ---
> 
> Changes in v2:
>  - Include patch dependency below
> 
> This patch depends on patch "hash: free internal ring when freeing hash"
> (http://www.dpdk.org/dev/patchwork/patch/7377/)
> 
>  app/test/test_hash_scaling.c         |   1 +
>  doc/guides/rel_notes/release_2_2.rst |   5 ++
>  lib/librte_hash/rte_cuckoo_hash.c    | 144 ++++++++++++++++++++++++++++++-----
>  lib/librte_hash/rte_hash.h           |   3 +
>  4 files changed, 133 insertions(+), 20 deletions(-)
> 
> diff --git a/app/test/test_hash_scaling.c b/app/test/test_hash_scaling.c
> index 39602cb..e7cb7e2 100644
> --- a/app/test/test_hash_scaling.c
> +++ b/app/test/test_hash_scaling.c
> @@ -133,6 +133,7 @@ test_hash_scaling(int locking_mode)
>  		.hash_func = rte_hash_crc,
>  		.hash_func_init_val = 0,
>  		.socket_id = rte_socket_id(),
> +		.extra_flag = 1 << RTE_HASH_TRANS_MEM_SUPPORT_FLAG,

This flag could do with having it's name adjusted, because as used here it's
not really a flag, but a shift value.

<snip>
>  
> +static inline void
> +enqueue_slot(const struct rte_hash *h, struct lcore_cache *cached_free_slots,
> +		void *slot_id)
> +{
> +	if (h->hw_trans_mem_support) {
> +		cached_free_slots->objs[cached_free_slots->len] = slot_id;
> +		cached_free_slots->len++;
> +	} else
> +		rte_ring_sp_enqueue(h->free_slots, slot_id);
> +}
> +

This function could do with a comment explaining that it's just used inside the
add function and is not a generic enqueue function as it assumes that we are just
putting back an element we have just dequeued, so there is guaranteed to be space
in the local cache for it. Otherwise the lack of support for a full cache is
a glaring omission. :-)

/Bruce


More information about the dev mailing list