[PATCH] hash: fix SSE comparison
Bruce Richardson
bruce.richardson at intel.com
Mon Oct 2 12:39:54 CEST 2023
On Wed, Sep 06, 2023 at 10:31:00AM +0800, Jieqiang Wang wrote:
> __mm_cmpeq_epi16 returns 0xFFFF if the corresponding 16-bit elements are
> equal. In original SSE2 implementation for function compare_signatures,
> it utilizes _mm_movemask_epi8 to create mask from the MSB of each 8-bit
> element, while we should only care about the MSB of lower 8-bit in each
> 16-bit element.
> For example, if the comparison result is all equal, SSE2 path returns
> 0xFFFF while NEON and default scalar path return 0x5555.
> Although this bug is not causing any negative effects since the caller
> function solely examines the trailing zeros of each match mask, we
> recommend this fix to ensure consistency with NEON and default scalar
> code behaviors.
>
> Fixes: c7d93df552c2 ("hash: use partial-key hashing")
> Cc: yipeng1.wang at intel.com
> Cc: stable at dpdk.org
>
> Signed-off-by: Feifei Wang <feifei.wang2 at arm.com>
> Signed-off-by: Jieqiang Wang <jieqiang.wang at arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang at arm.com>
Fix looks correct, but see comment below. I think we can convert the vector
mask to a simpler - and possibly faster - scalar one.
/Bruce
> ---
> lib/hash/rte_cuckoo_hash.c | 16 +++++++++-------
> 1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index d92a903bb3..acaa8b74bd 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -1862,17 +1862,19 @@ compare_signatures(uint32_t *prim_hash_matches, uint32_t *sec_hash_matches,
> /* For match mask the first bit of every two bits indicates the match */
> switch (sig_cmp_fn) {
> #if defined(__SSE2__)
> - case RTE_HASH_COMPARE_SSE:
> + case RTE_HASH_COMPARE_SSE: {
> /* Compare all signatures in the bucket */
> - *prim_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> - _mm_load_si128(
> + __m128i shift_mask = _mm_set1_epi16(0x0080);
Not sure that this variable name is the most descriptive, as we don't
actually shift anything using this. How about "results_mask".
> + __m128i prim_cmp = _mm_cmpeq_epi16(_mm_load_si128(
> (__m128i const *)prim_bkt->sig_current),
> - _mm_set1_epi16(sig)));
> + _mm_set1_epi16(sig));
> + *prim_hash_matches = _mm_movemask_epi8(_mm_and_si128(prim_cmp, shift_mask));
While this will work like you describe, I would think the simpler solution
here is not to do a vector mask, but instead to simply do a scalar one.
This would save extra vector loads too, since all values could just be
masked with compile-time constant 0xAAAA.
> /* Compare all signatures in the bucket */
> - *sec_hash_matches = _mm_movemask_epi8(_mm_cmpeq_epi16(
> - _mm_load_si128(
> + __m128i sec_cmp = _mm_cmpeq_epi16(_mm_load_si128(
> (__m128i const *)sec_bkt->sig_current),
> - _mm_set1_epi16(sig)));
> + _mm_set1_epi16(sig));
> + *sec_hash_matches = _mm_movemask_epi8(_mm_and_si128(sec_cmp, shift_mask));
> + }
> break;
> #elif defined(__ARM_NEON)
> case RTE_HASH_COMPARE_NEON: {
> --
> 2.25.1
>
More information about the stable
mailing list