[dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS

Vladimir Medvedkin medvedkinv at gmail.com
Fri May 8 16:58:56 CEST 2015


Hi Andrey,

OK, so be it. Thus in case you want to distribute (or just calculate hash
based on non standart tuple) - use your own tuple and own hash key (length
of tuple and key - responsible of the programmer). In case you want to
emulate NIC RSS - use union rte_thash_tuple (still needs to be updated with
new NICs input tuples) and NIC RSS hash key.
P.S Thanks for reviews.

Regards,
Vladimir

2015-05-07 14:38 GMT+03:00 Chilikin, Andrey <andrey.chilikin at intel.com>:

>  Hi Vladimir,
>
>
>
> Yes, at the moment NICs support limited input sets for hash calculation,
> but why limit SW for the same sets if it can be done in more general way
> and be easily scalable for HW updates? Using limited input set for RSS is
> not a feature of Toeplitz hash, but limitation of HW. I believe that
> general Toeplitz function will be more appropriate – it will cover input
> sets currently supported by HW and also will be easily scalable for future
> HW. Also, talking about different NICs – Niantic and Fortville, for
> example, have hash keys of different length, so rte_softrss() function
> should take into account hash key’s length as well.
>
>  Regards,
>
> Andrey
>
>
>
>
>
> *From:* Vladimir Medvedkin [mailto:medvedkinv at gmail.com]
> *Sent:* Thursday, May 7, 2015 11:28 AM
> *To:* Chilikin, Andrey
> *Cc:* dev at dpdk.org
> *Subject:* Re: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by
> RSS
>
>
>
> Hi Andrey,
>
> The main goal of this new functions is to calculate the hash which is
> equal to the hash of the NIC.
> According to XL710 datasheet table 7-5 for sctp input set consists of
> IP4-S, IP4-D, SCTP-Verification-Tag. I don't see any NIC that uses QinQ or
> single vlan tag, ip proto number, tunnel id, vxlan, etc for calculating RSS
> hash. If it appear we can always update union rte_thash_tuple.
> I think it should be like:
>
> struct rte_ports {
>         uint16_t dport;
>         uint16_t sport;
> };
>
> union rte_thash_l4 {
>         struct          rte_ports ports;
>         uint32_t        sctp_tag;
> };
> struct rte_ipv4_tuple {
>         uint32_t        src_addr;
>         uint32_t        dst_addr;
>         union rte_thash_l4 l4;
> };
>
> If it is necessary to distribute packets according to non standart tuples
> I think it's more appropriate to use crc32 or jhash because of speed.
> rte_softrss_be consumes 400-500 clocks for each 4-byte input at E3
> 1230v1 at 3.2GHz. This means for ipv4+tcp it consumes ~1500 clocks.
>
> If you or someone still think you need general toeplitz hash I'll add it.
>
> Regards,
>
> Vladimir
>
>
>
>
>
> 2015-05-05 19:03 GMT+03:00 Chilikin, Andrey <andrey.chilikin at intel.com>:
>
> Hi Vladimir,
>
> Why limit Toeplitz hash calculation to predefined tuples and length?
> Should it be more general, something like
> rte_softrss_be(void *input, uint32_t input_len, const uint8_t *rss_key) to
> enable hash calculation for an input of any size? It would be useful for
> distributing packets using some non-standard tuples, like hashing on QinQ
> or adding IP protocol to hash calculation to separate UDP and TCP flows or
> even some other fields from a packet, for example, tunnel ID from VXLAN
> headers. By the way, i40e already supports RSS for SCTP in addition to TCP
> and UDP and includes Verification Tag as well as SCTP source and
> destination ports for RSS hash.
>
> Regards,
> Andrey
>
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladimir
> > Medvedkin
> > Sent: Tuesday, May 5, 2015 2:20 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH v2] Add toeplitz hash algorithm used by RSS
> >
> > Software implementation of the Toeplitz hash function used by RSS.
> > Can be used either for packet distribution on single queue NIC or for
> > simulating of RSS computation on specific NIC (for example after GRE
> header
> > decapsulating).
> >
> > v2 changes
> > - Add ipv6 support
> > - Various style fixes
> >
> > Signed-off-by: Vladimir Medvedkin <medvedkinv at gmail.com>
> > ---
> >  lib/librte_hash/Makefile    |   1 +
> >  lib/librte_hash/rte_thash.h | 209
> > ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 210 insertions(+)
> >  create mode 100644 lib/librte_hash/rte_thash.h
> >
> > diff --git a/lib/librte_hash/Makefile b/lib/librte_hash/Makefile index
> > 3696cb1..981230b 100644
> > --- a/lib/librte_hash/Makefile
> > +++ b/lib/librte_hash/Makefile
> > @@ -49,6 +49,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_HASH) += rte_fbk_hash.c
> > SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include := rte_hash.h  SYMLINK-
> > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_hash_crc.h  SYMLINK-
> > $(CONFIG_RTE_LIBRTE_HASH)-include += rte_jhash.h
> > +SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_thash.h
> >  SYMLINK-$(CONFIG_RTE_LIBRTE_HASH)-include += rte_fbk_hash.h
> >
> >  # this lib needs eal
> > diff --git a/lib/librte_hash/rte_thash.h b/lib/librte_hash/rte_thash.h
> new file
> > mode 100644 index 0000000..42c7bf6
> > --- /dev/null
> > +++ b/lib/librte_hash/rte_thash.h
> > @@ -0,0 +1,209 @@
> > +/*-
> > + *   BSD LICENSE
> > + *
> > + *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
> > + *   All rights reserved.
> > + *
> > + *   Redistribution and use in source and binary forms, with or without
> > + *   modification, are permitted provided that the following conditions
> > + *   are met:
> > + *
> > + *     * Redistributions of source code must retain the above copyright
> > + *       notice, this list of conditions and the following disclaimer.
> > + *     * Redistributions in binary form must reproduce the above
> copyright
> > + *       notice, this list of conditions and the following disclaimer in
> > + *       the documentation and/or other materials provided with the
> > + *       distribution.
> > + *     * Neither the name of Intel Corporation nor the names of its
> > + *       contributors may be used to endorse or promote products derived
> > + *       from this software without specific prior written permission.
> > + *
> > + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
> > CONTRIBUTORS
> > + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT
> > NOT
> > + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> > FITNESS FOR
> > + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> > COPYRIGHT
> > + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
> > INCIDENTAL,
> > + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
> > NOT
> > + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
> > OF USE,
> > + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
> > AND ON ANY
> > + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR
> > TORT
> > + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF
> > THE USE
> > + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
> > DAMAGE.
> > + */
> > +
> > +#ifndef _RTE_THASH_H
> > +#define _RTE_THASH_H
> > +
> > +/**
> > + * @file
> > + *
> > + * toeplitz hash functions.
> > + */
> > +
> > +#ifdef __cplusplus
> > +extern "C" {
> > +#endif
> > +
> > +/**
> > + * Software implementation of the Toeplitz hash function used by RSS.
> > + * Can be used either for packet distribution on single queue NIC
> > + * or for simulating of RSS computation on specific NIC (for example
> > + * after GRE header decapsulating)
> > + */
> > +
> > +#include <stdint.h>
> > +#include <rte_byteorder.h>
> > +#include <rte_vect.h>
> > +
> > +#ifdef __SSE3__
> > +static const __m128i bswap_mask = {0x0405060700010203,
>
> > +0x0C0D0E0F08090A0B}; #endif
> > +
> > +enum rte_thash_len {
> > +     RTE_THASH_V4_L3 = 2,    /*calculate hash of ipv4 header
> > only*/
> > +     RTE_THASH_V4_L4 = 3,    /*calculate hash of ipv4 + transport
> > headers*/
> > +     RTE_THASH_V6_L3 = 8,    /*calculate hash of ipv6 header only
> > */
> > +     RTE_THASH_V6_L4 = 9,    /*calculate hash of ipv6 + transport
> > headers */
> > +};
> > +
> > +/**
> > + * IPv4 tuple
> > + * addreses and ports have to be CPU byte order  */ struct
> > +rte_ipv4_tuple {
>
> > +     uint32_t        src_addr;
> > +     uint32_t        dst_addr;
> > +     uint16_t        dport;
> > +     uint16_t        sport;
> > +};
> > +
> > +/**
> > + * IPv6 tuple
> > + * Addresses have to be filled by rte_thash_load_v6_addr()
> > + * ports have to be CPU byte order
> > + */
> > +struct rte_ipv6_tuple {
> > +     uint8_t         src_addr[16];
> > +     uint8_t         dst_addr[16];
> > +     uint16_t        dport;
> > +     uint16_t        sport;
> > +};
> > +
> > +union rte_thash_tuple {
> > +     struct rte_ipv4_tuple   v4;
> > +     struct rte_ipv6_tuple   v6;
> > +} __attribute__((aligned(16)));
> > +
> > +/**
> > + * Prepare special converted key to use with rte_softrss_be()
> > + * @param orig
> > + *   pointer to original RSS key
> > + * @param targ
> > + *   pointer to target RSS key
> > + * @param len
> > + *   RSS key length
> > + */
> > +static inline void
> > +rte_convert_rss_key(const uint32_t *orig, uint32_t *targ, int len) {
>
> > +     int i;
> > +
> > +     for (i = 0; i < (len >> 2); i++) {
> > +             targ[i] = rte_be_to_cpu_32(orig[i]);
> > +     }
> > +}
> > +
> > +/**
> > + * Prepare and load IPv6 address
> > + * @param orig
> > + *   Pointer to ipv6 address inside ipv6_hdr
> > + * @param targ
> > + *   Pointer to ipv6 address inside rte_ipv6_tuple
> > + */
> > +static inline void
> > +rte_thash_load_v6_addr(const uint8_t *orig, uint8_t *targ) { #ifdef
> > +__SSE3__
> > +     __m128i ipv6 = _mm_loadu_si128((const __m128i *)orig);
> > +     *(__m128i *)targ = _mm_shuffle_epi8(ipv6, bswap_mask); #else
>
> > +     int i;
> > +
> > +     for (i = 0; i < 4; i++) {
> > +             *((uint32_t *)targ + i) =
> > +                     rte_be_to_cpu_32(*((const uint32_t *)orig + i));
> > +     }
> > +#endif
> > +}
> > +
> > +/**
> > + * Generic implementation. Can be used with original rss_key
> > + * @param input_tuple
> > + *   Pointer to rte_thash_tuple union
> > + * @param input_len
> > + *   Length of input_tuple in 4-bytes chunks
> > + *   RTE_THASH_V4_L3:        calculate hash of IPv4 src address and
> IPv4 dst
> > address
> > + *   RTE_THASH_V4_L4 calculate hash of IPv4 adresses and TCP|UDP
> > ports
> > + *   RTE_THASH_V6_L3:        calculate hash of IPv6 src address and
> IPv4 dst
> > address
> > + *   RTE_THASH_V6_L4 calculate hash of IPv6 adresses and TCP|UDP
> > ports
> > + * @param rss_key
> > + *   Pointer to RSS hash key.
> > + * @return
> > + *   Calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_softrss(union rte_thash_tuple *input_tuple, enum rte_thash_len
> > input_len,
> > +             const uint8_t *rss_key)
> > +{
> > +     uint32_t i, j, ret = 0;
> > +
> > +     for (j = 0; j < input_len; j++) {
> > +             for (i = 0; i < 32; i++) {
> > +                     if (((uint32_t *)input_tuple)[j] & (1 << (31 -
> i))) {
> > +                             ret ^= rte_cpu_to_be_32(((const uint32_t
> > *)rss_key)[j]) << i |
> > +
> >       (uint32_t)((uint64_t)(rte_cpu_to_be_32(((const uint32_t
> > *)rss_key)[j + 1])) >> (32 - i));
> > +                     }
> > +             }
> > +     }
> > +     return ret;
> > +}
> > +
> > +/**
> > + * Optimized implementation.
> > + * If you want the calculated hash value matches NIC RSS value
> > + * you have to use special converted key.
> > + * @param input_tuple
> > + *   Pointer to rte_thash_tuple union
> > + * @param input_len
> > + *   Length of input_tuple in 4-bytes chunks
> > + *   RTE_THASH_V4_L3:        calculate hash of IPv4 src address and
> IPv4 dst
> > address
> > + *   RTE_THASH_V4_L4 calculate hash of IPv4 adresses and TCP|UDP
> > ports
> > + *   RTE_THASH_V6_L3:        calculate hash of IPv6 src address and
> IPv4 dst
> > address
> > + *   RTE_THASH_V6_L4 calculate hash of IPv6 adresses and TCP|UDP
> > ports
> > + * @param *rss_key
> > + *   Pointer to RSS hash key.
> > + * @return
> > + *   Calculated hash value.
> > + */
> > +static inline uint32_t
> > +rte_softrss_be(union rte_thash_tuple *input_tuple, enum rte_thash_len
> > input_len,
> > +             const uint8_t *rss_key)
> > +{
> > +     uint32_t i, j, ret = 0;
> > +
> > +     for (j = 0; j < input_len; j++) {
> > +             for (i = 0; i < 32; i++) {
> > +                     if (((uint32_t *)input_tuple)[j] & (1 << (31 -
> i))) {
> > +                             ret ^= ((const uint32_t *)rss_key)[j] << i
> |
> > +                                     (uint32_t)((uint64_t)(((const
> uint32_t
> > *)rss_key)[j + 1]) >> (32 - i));
> > +                     }
> > +             }
> > +     }
> > +     return ret;
> > +}
> > +
> > +#ifdef __cplusplus
> > +}
> > +#endif
> > +
> > +#endif /* _RTE_THASH_H */
> > --
> > 1.8.3.2
>
>
>


More information about the dev mailing list