[PATCH] hash: fix SSE comparison

Bruce Richardson bruce.richardson at intel.com
Mon Oct 2 12:39:54 CEST 2023

Previous message (by thread): [PATCH v1] examples/l3fwd: relax the RSS/Offload requirement
Next message (by thread): 回复: [PATCH] hash: fix SSE comparison
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
> 
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang at intel.com
> Cc: stable at dpdk.org
> 
> Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>

Fix looks correct, but see comment below. I think we can convert the vector
mask to a simpler - and possibly faster - scalar one.

/Bruce

> ---
>  lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
>  	/* For match mask the first bit of every two bits indicates the match */
>  	switch (sig_cmp_fn) {
>  #if defined(__SSE2__)
> -	case RTE_HASH_COMPARE_SSE:
> +	case RTE_HASH_COMPARE_SSE: {
>  		/* Compare all signatures in the bucket */
> -		*prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i shift_mask = _mm_set1_epi16(0x0080);

Not sure that this variable name is the most descriptive, as we don't
actually shift anything using this. How about "results_mask".

> +		__m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)prim_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));

While this will work like you describe, I would think the simpler solution
here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be
masked with compile-time constant 0xAAAA.

>  		/* Compare all signatures in the bucket */
> -		*sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> -				_mm_load_si128(
> +		__m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
>  					(__m128i const *)sec_bkt->sig_current),
> -				_mm_set1_epi16(sig)));
> +					_mm_set1_epi16(sig));
> +		*sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> +		}
>  		break;
>  #elif defined(__ARM_NEON)
>  	case RTE_HASH_COMPARE_NEON: {
> -- 
> 2.25.1
>

Previous message (by thread): [PATCH v1] examples/l3fwd: relax the RSS/Offload requirement
Next message (by thread): 回复: [PATCH] hash: fix SSE comparison
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the stable mailing list