[PATCH v2 2/3] ip_frag: improve reassembly lookup performance

Pavan Nikhilesh Bhagavatula pbhagavatula at marvell.com
Wed May 24 00:23:35 CEST 2023



> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula
> Sent: Tuesday, May 23, 2023 11:29 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli at arm.com>; Jerin Jacob
> Kollanukkaran <jerinj at marvell.com>; nd <nd at arm.com>; Konstantin
> Ananyev <konstantin.v.ananyev at yandex.ru>
> Cc: dev at dpdk.org; nd <nd at arm.com>; nd <nd at arm.com>
> Subject: RE: [PATCH v2 2/3] ip_frag: improve reassembly lookup
> performance
> 
> > > -----Original Message-----
> > > From: pbhagavatula at marvell.com <pbhagavatula at marvell.com>
> > > Sent: Tuesday, May 23, 2023 9:39 AM
> > > To: jerinj at marvell.com; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli at arm.com>; nd <nd at arm.com>; Konstantin
> > Ananyev
> > > <konstantin.v.ananyev at yandex.ru>
> > > Cc: dev at dpdk.org; Pavan Nikhilesh <pbhagavatula at marvell.com>
> > > Subject: [PATCH v2 2/3] ip_frag: improve reassembly lookup
> performance
> > >
> > > From: Pavan Nikhilesh <pbhagavatula at marvell.com>
> > >
> > > Improve reassembly lookup performance by using NEON intrinsics for key
> > > validation.
> > What is the improvement do you see with this?
> 
> On Neoverse-N2 I see around improvement of 300-600c per flow and ~200c
> per insert.
> 

Below data is incorrect due to a bug (See below), but I still see improvement with ipv6.

> Here are some test results.
> 
> Without patch:
> +=========================================================
> =================================================+
> | IPV4                            | Flow Count : 32768                                                     |
> +================+================+=============+=========
> ====+========================+===================+
> | Fragment Order | Fragments/Flow | Outstanding | Cycles/Flow |
> Cycles/Fragment insert | Cycles/Reassembly |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 0           | 1244        | 919                    | 114               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 2              | 0           | 1653        | 968                    | 128               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 3              | 0           | 1379        | 503                    | 110               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 3              | 0           | 1613        | 520                    | 139               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 0           | 2030        | 199                    | 190               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 8              | 0           | 4393        | 309                    | 402               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | RANDOM         | 0           | 1531        | 333                    | 147               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | RANDOM         | 0           | 2771        | 357                    | 213
> |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 100         | 1228        | 920                    | 102               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 500         | 1197        | 905                    | 103               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 1000        | 1183        | 904                    | 104               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 2000        | 1153        | 921                    | 105               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 3000        | 1123        | 911                    | 111               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 100         | 829         | 193                    | 690               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 500         | 830         | 195                    | 682               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 1000        | 817         | 211                    | 690               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 2000        | 819         | 195                    | 690               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 3000        | 823         | 223                    | 676               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 2              | 0           | 1765        | 1038                   | 177               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 3              | 0           | 2588        | 699                    | 190               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 8              | 0           | 5253        | 265                    | 403               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | RANDOM         | 0           | 3398        | 493                    | 301
> |
> +================+================+=============+=========
> ====+========================+===================+
> 
> +=========================================================
> =================================================+
> | IPV6                            | Flow Count : 32768                                                     |
> +================+================+=============+=========
> ====+========================+===================+
> | Fragment Order | Fragments/Flow | Outstanding | Cycles/Flow |
> Cycles/Fragment insert | Cycles/Reassembly |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 0           | 1838        | 1176                   | 136               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 2              | 0           | 1892        | 1188                   | 160               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 3              | 0           | 1986        | 628                    | 143               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 3              | 0           | 2670        | 646                    | 155               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 0           | 3152        | 261                    | 271               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 8              | 0           | 5127        | 324                    | 434               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | RANDOM         | 0           | 2169        | 427                    | 203               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | RANDOM         | 0           | 3382        | 452                    | 255
> |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 100         | 1837        | 1164                   | 124               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 500         | 1790        | 1158                   | 126               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 1000        | 1807        | 1161                   | 138               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 2000        | 1776        | 1160                   | 138               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 3000        | 1715        | 1169                   | 144               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 100         | 1488        | 256                    | 1228              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 500         | 1461        | 300                    | 1205              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 1000        | 1457        | 303                    | 1202              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 2000        | 1456        | 305                    | 1201              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 3000        | 1460        | 308                    | 1205              |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 2              | 0           | 2145        | 1330                   | 296               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 3              | 0           | 2778        | 830                    | 330               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 8              | 0           | 5715        | 324                    | 444               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | RANDOM         | 0           | 3625        | 550                    | 363
> |
> +================+================+=============+=========
> ====+========================+===================+
> 
> With patch :
> 
> +=========================================================
> =================================================+
> | IPV4                            | Flow Count : 32768                                                     |
> +================+================+=============+=========
> ====+========================+===================+
> | Fragment Order | Fragments/Flow | Outstanding | Cycles/Flow |
> Cycles/Fragment insert | Cycles/Reassembly |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 0           | 950         | 717                    | 98                |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 2              | 0           | 1013        | 706                    | 108               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 3              | 0           | 1096        | 397                    | 115               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 3              | 0           | 1150        | 412                    | 128               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 0           | 1783        | 166                    | 202               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 8              | 0           | 3933        | 284                    | 424               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | RANDOM         | 0           | 1288        | 267                    | 159               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | RANDOM         | 0           | 2393        | 302                    | 235
> |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 100         | 956         | 703                    | 110               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 500         | 937         | 693                    | 112               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 1000        | 912         | 670                    | 121               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 2000        | 908         | 688                    | 122               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 3000        | 894         | 688                    | 128               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 100         | 1019        | 179                    | 865               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 500         | 1052        | 176                    | 895               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 1000        | 1130        | 180                    | 1003              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 2000        | 1143        | 180                    | 1020              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 3000        | 1130        | 181                    | 985               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 2              | 0           | 1582        | 710                    | 168               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 3              | 0           | 2162        | 446                    | 194               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 8              | 0           | 4997        | 214                    | 426               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | RANDOM         | 0           | 2921        | 341                    | 311
> |
> +================+================+=============+=========
> ====+========================+===================+
> 
> +=========================================================
> =================================================+
> | IPV6                            | Flow Count : 32768                                                     |
> +================+================+=============+=========
> ====+========================+===================+
> | Fragment Order | Fragments/Flow | Outstanding | Cycles/Flow |
> Cycles/Fragment insert | Cycles/Reassembly |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 0           | 1275        | 687                    | 125               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 2              | 0           | 1335        | 721                    | 169               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 3              | 0           | 1388        | 415                    | 169               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 3              | 0           | 2117        | 393                    | 163               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 0           | 2811        | 172                    | 241               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | 8              | 0           | 4322        | 227                    | 401               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | RANDOM         | 0           | 1730        | 270                    | 192               |
> +================+================+=============+=========
> ====+========================+===================+
> | RANDOM         | RANDOM         | 0           | 2839        | 317                    | 264
> |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 100         | 1152        | 662                    | 126               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 500         | 1107        | 658                    | 130               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 1000        | 1190        | 647                    | 138               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 2000        | 1086        | 635                    | 141               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 2              | 3000        | 1064        | 645                    | 150               |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 100         | 1560        | 172                    | 1296              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 500         | 1536        | 226                    | 1274              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 1000        | 1543        | 228                    | 1282              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 2000        | 1548        | 228                    | 1287              |
> +================+================+=============+=========
> ====+========================+===================+
> | LINEAR         | 8              | 3000        | 1541        | 227                    | 1280              |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 2              | 0           | 1585        | 769                    | 281               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 3              | 0           | 2222        | 536                    | 327               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | 8              | 0           | 4962        | 232                    | 439               |
> +================+================+=============+=========
> ====+========================+===================+
> | INTERLEAVED    | RANDOM         | 0           | 2998        | 373                    | 360
> |
> +================+================+=============+=========
> ====+========================+===================+
> 
> >
> > >
> > > Signed-off-by: Pavan Nikhilesh <pbhagavatula at marvell.com>
> > > ---
> > >  lib/ip_frag/ip_frag_internal.c   | 224 +++++++++++++++++++++++++-----
> -
> > >  lib/ip_frag/ip_reassembly.h      |   6 +
> > >  lib/ip_frag/rte_ip_frag_common.c |  10 ++
> > >  3 files changed, 196 insertions(+), 44 deletions(-)
> > >
> > > diff --git a/lib/ip_frag/ip_frag_internal.c b/lib/ip_frag/ip_frag_internal.c
> > index
> > > 7cbef647df..de78a0ed8f 100644
> > > --- a/lib/ip_frag/ip_frag_internal.c
> > > +++ b/lib/ip_frag/ip_frag_internal.c
> > > @@ -4,8 +4,9 @@
> > >
> > >  #include <stddef.h>
> > >
> > > -#include <rte_jhash.h>
> > >  #include <rte_hash_crc.h>
> > > +#include <rte_jhash.h>
> > > +#include <rte_vect.h>
> > >
> > >  #include "ip_frag_common.h"
> > >
> > > @@ -280,10 +281,166 @@ ip_frag_find(struct rte_ip_frag_tbl *tbl, struct
> > > rte_ip_frag_death_row *dr,
> > >  	return pkt;
> > >  }
> > >
> > > -struct ip_frag_pkt *
> > > -ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
> > > -	const struct ip_frag_key *key, uint64_t tms,
> > > -	struct ip_frag_pkt **free, struct ip_frag_pkt **stale)
> > > +static inline void
> > > +ip_frag_dbg(struct rte_ip_frag_tbl *tbl, struct ip_frag_pkt *p,
> > > +	    uint32_t list_idx, uint32_t list_cnt) {
> > > +	RTE_SET_USED(tbl);
> > > +	RTE_SET_USED(list_idx);
> > > +	RTE_SET_USED(list_cnt);
> > > +	if (p->key.key_len == IPV4_KEYLEN)
> > > +		IP_FRAG_LOG(DEBUG,
> > > +			    "%s:%d:\n"
> > > +			    "tbl: %p, max_entries: %u, use_entries: %u\n"
> > > +			    "ipv4_frag_pkt line0: %p, index: %u from %u\n"
> > > +			    "key: <%" PRIx64 ", %#x>, start: %" PRIu64 "\n",
> > > +			    __func__, __LINE__, tbl, tbl->max_entries,
> > > +			    tbl->use_entries, p, list_idx, list_cnt,
> > > +			    p->key.src_dst[0], p->key.id, p->start);
> > > +	else
> > > +		IP_FRAG_LOG(DEBUG,
> > > +			    "%s:%d:\n"
> > > +			    "tbl: %p, max_entries: %u, use_entries: %u\n"
> > > +			    "ipv6_frag_pkt line0: %p, index: %u from %u\n"
> > > +			    "key: <" IPv6_KEY_BYTES_FMT
> > > +			    ", %#x>, start: %" PRIu64 "\n",
> > > +			    __func__, __LINE__, tbl, tbl->max_entries,
> > > +			    tbl->use_entries, p, list_idx, list_cnt,
> > > +			    IPv6_KEY_BYTES(p1[i].key.src_dst), p->key.id,
> > > +			    p->start);
> > > +}
> > > +
> > > +#if defined(RTE_ARCH_ARM64)
> > > +static inline struct ip_frag_pkt *
> > > +ip_frag_lookup_neon(struct rte_ip_frag_tbl *tbl, const struct
> ip_frag_key
> > > *key, uint64_t tms,
> > > +		    struct ip_frag_pkt **free, struct ip_frag_pkt **stale) {
> > > +	struct ip_frag_pkt *empty, *old;
> > > +	struct ip_frag_pkt *p1, *p2;
> > > +	uint32_t assoc, sig1, sig2;
> > > +	uint64_t max_cycles;
> > > +
> > > +	empty = NULL;
> > > +	old = NULL;
> > > +
> > > +	max_cycles = tbl->max_cycles;
> > > +	assoc = tbl->bucket_entries;
> > > +
> > > +	if (tbl->last != NULL && ip_frag_key_cmp(key, &tbl->last->key) == 0)
> > > +		return tbl->last;
> > > +
> > > +	/* different hashing methods for IPv4 and IPv6 */
> > > +	if (key->key_len == IPV4_KEYLEN)
> > > +		ipv4_frag_hash(key, &sig1, &sig2);
> > > +	else
> > > +		ipv6_frag_hash(key, &sig1, &sig2);
> > > +
> > > +	p1 = IP_FRAG_TBL_POS(tbl, sig1);
> > > +	p2 = IP_FRAG_TBL_POS(tbl, sig2);
> > > +
> > > +	uint64x2_t key0, key1, key2, key3;
> > > +	uint64_t vmask, zmask, ts_mask;
> > > +	uint64x2_t ts0, ts1;
> > > +	uint32x4_t nz_key;
> > > +	uint8_t idx;
> > > +	/* Bucket entries are always power of 2. */
> > > +	rte_prefetch0(&p1[0].key);
> > > +	rte_prefetch0(&p1[1].key);
> > > +	rte_prefetch0(&p2[0].key);
> > > +	rte_prefetch0(&p2[1].key);
> > > +
> > > +	while (assoc > 1) {
> > > +		if (assoc > 2) {
> > > +			rte_prefetch0(&p1[2].key);
> > > +			rte_prefetch0(&p1[3].key);
> > > +			rte_prefetch0(&p2[2].key);
> > > +			rte_prefetch0(&p2[3].key);
> > > +		}
> > > +		struct ip_frag_pkt *p[] = {&p1[0], &p2[0], &p1[1], &p2[1]};
> > > +		key0 = vld1q_u64(&p[0]->key.id_key_len);
> > > +		key1 = vld1q_u64(&p[1]->key.id_key_len);
> > > +		key2 = vld1q_u64(&p[2]->key.id_key_len);
> > > +		key3 = vld1q_u64(&p[3]->key.id_key_len);
> > > +
> > > +		nz_key =
> > > vsetq_lane_u32(vgetq_lane_u32(vreinterpretq_u32_u64(key0), 1),
> > nz_key, 0);
> > > +		nz_key =
> > > vsetq_lane_u32(vgetq_lane_u32(vreinterpretq_u32_u64(key1), 1),
> > nz_key, 1);
> > > +		nz_key =
> > > vsetq_lane_u32(vgetq_lane_u32(vreinterpretq_u32_u64(key2), 1),
> > nz_key, 2);
> > > +		nz_key =
> > > vsetq_lane_u32(vgetq_lane_u32(vreinterpretq_u32_u64(key3),
> > > +1), nz_key, 3);
> > > +

I think we can compare id part too since its already in the vector register, I will rewrite this part.

> > > +		nz_key = vceqzq_u32(nz_key);
> > > +		zmask =
> > > vget_lane_u64(vreinterpret_u64_u16(vshrn_n_u32(nz_key, 16)), 0);
> > > +		vmask = ~zmask;
> > > +
> > > +		vmask &= 0x8000800080008000;
> > > +		for (; vmask > 0; vmask &= vmask - 1) {
> > > +			idx = __builtin_ctzll(vmask) >> 4;
> > > +			if (ip_frag_key_cmp(key, &p[idx]->key) == 0)
> > > +				return p[idx];
> > > +		}
> > > +
> > > +		vmask = ~zmask;
> > > +		if (zmask && empty == NULL) {
> > > +			zmask &= 0x8000800080008000;
> > > +			idx = __builtin_ctzll(zmask) >> 4;
> > > +			empty = p[idx];
> > > +		}
> > > +
> > > +		if (vmask && old == NULL) {
> > > +			const uint64x2_t max_cyc =
> > > vdupq_n_u64(max_cycles);
> > > +			const uint64x2_t cur_cyc = vdupq_n_u64(tms);
> > > +
> > > +			ts0 = vsetq_lane_u64(vgetq_lane_u64(key0, 1), ts0,
> > > 0);
> > > +			ts0 = vsetq_lane_u64(vgetq_lane_u64(key1, 1), ts0,
> > > 1);
> > > +			ts1 = vsetq_lane_u64(vgetq_lane_u64(key2, 1), ts1,
> > > 0);
> > > +			ts1 = vsetq_lane_u64(vgetq_lane_u64(key3, 1), ts1,
> > > 1);
> > > +
> > > +			ts0 = vcgtq_u64(cur_cyc, vaddq_u64(ts0, max_cyc));
> > > +			ts1 = vcgtq_u64(cur_cyc, vaddq_u64(ts1, max_cyc));
> > > +
> > > +			ts_mask =
> > > vget_lane_u64(vreinterpret_u64_u16(vshrn_n_u32(
> > > +
> > > 	vuzp1q_u32(vreinterpretq_u32_u64(ts0),
> > > +
> > > vreinterpretq_u32_u64(ts1)),
> > > +							16)),
> > > +						0);
> > > +			vmask &= 0x8000800080008000;
> > > +			ts_mask &= vmask;
> > > +			if (ts_mask) {
> > > +				idx = __builtin_ctzll(ts_mask) >> 4;
> > > +				old = p[idx];
> > > +			}
> > > +		}
> > > +		p1 += 2;
> > > +		p2 += 2;
> > > +		assoc -= 4;

Should be -=2

> > > +	}
> > > +	while (assoc) {
> > > +		if (ip_frag_key_cmp(key, &p1->key) == 0)
> > > +			return p1;
> > > +		else if (ip_frag_key_is_empty(&p1->key))
> > > +			empty = (empty == NULL) ? p1 : empty;
> > > +		else if (max_cycles + p1->start < tms)
> > > +			old = (old == NULL) ? p1 : old;
> > > +
> > > +		if (ip_frag_key_cmp(key, &p2->key) == 0)
> > > +			return p2;
> > > +		else if (ip_frag_key_is_empty(&p2->key))
> > > +			empty = (empty == NULL) ? p2 : empty;
> > > +		else if (max_cycles + p2->start < tms)
> > > +			old = (old == NULL) ? p2 : old;
> > > +		p1++;
> > > +		p2++;
> > > +		assoc--;
> > > +	}
> > > +
> > > +	*free = empty;
> > > +	*stale = old;
> > > +	return NULL;
> > > +}
> > > +#endif
> > > +
> > > +static struct ip_frag_pkt *
> > > +ip_frag_lookup_scalar(struct rte_ip_frag_tbl *tbl, const struct
> > ip_frag_key
> > > *key, uint64_t tms,
> > > +		      struct ip_frag_pkt **free, struct ip_frag_pkt **stale)
> > >  {
> > >  	struct ip_frag_pkt *p1, *p2;
> > >  	struct ip_frag_pkt *empty, *old;
> > > @@ -309,25 +466,7 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
> > >  	p2 = IP_FRAG_TBL_POS(tbl, sig2);
> > >
> > >  	for (i = 0; i != assoc; i++) {
> > > -		if (p1->key.key_len == IPV4_KEYLEN)
> > > -			IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > > -					"tbl: %p, max_entries: %u,
> > > use_entries: %u\n"
> > > -					"ipv4_frag_pkt line0: %p, index: %u
> > > from %u\n"
> > > -			"key: <%" PRIx64 ", %#x>, start: %" PRIu64 "\n",
> > > -					__func__, __LINE__,
> > > -					tbl, tbl->max_entries, tbl-
> > >use_entries,
> > > -					p1, i, assoc,
> > > -			p1[i].key.src_dst[0], p1[i].key.id, p1[i].start);
> > > -		else
> > > -			IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > > -					"tbl: %p, max_entries: %u,
> > > use_entries: %u\n"
> > > -					"ipv6_frag_pkt line0: %p, index: %u
> > > from %u\n"
> > > -			"key: <" IPv6_KEY_BYTES_FMT ", %#x>, start: %"
> > > PRIu64 "\n",
> > > -					__func__, __LINE__,
> > > -					tbl, tbl->max_entries, tbl-
> > >use_entries,
> > > -					p1, i, assoc,
> > > -			IPv6_KEY_BYTES(p1[i].key.src_dst), p1[i].key.id,
> > > p1[i].start);
> > > -
> > > +		ip_frag_dbg(tbl, &p1[i], i, assoc);
> > >  		if (ip_frag_key_cmp(key, &p1[i].key) == 0)
> > >  			return p1 + i;
> > >  		else if (ip_frag_key_is_empty(&p1[i].key))
> > > @@ -335,29 +474,11 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
> > >  		else if (max_cycles + p1[i].start < tms)
> > >  			old = (old == NULL) ? (p1 + i) : old;
> > >
> > > -		if (p2->key.key_len == IPV4_KEYLEN)
> > > -			IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > > -					"tbl: %p, max_entries: %u,
> > > use_entries: %u\n"
> > > -					"ipv4_frag_pkt line1: %p, index: %u
> > > from %u\n"
> > > -			"key: <%" PRIx64 ", %#x>, start: %" PRIu64 "\n",
> > > -					__func__, __LINE__,
> > > -					tbl, tbl->max_entries, tbl-
> > >use_entries,
> > > -					p2, i, assoc,
> > > -			p2[i].key.src_dst[0], p2[i].key.id, p2[i].start);
> > > -		else
> > > -			IP_FRAG_LOG(DEBUG, "%s:%d:\n"
> > > -					"tbl: %p, max_entries: %u,
> > > use_entries: %u\n"
> > > -					"ipv6_frag_pkt line1: %p, index: %u
> > > from %u\n"
> > > -			"key: <" IPv6_KEY_BYTES_FMT ", %#x>, start: %"
> > > PRIu64 "\n",
> > > -					__func__, __LINE__,
> > > -					tbl, tbl->max_entries, tbl-
> > >use_entries,
> > > -					p2, i, assoc,
> > > -			IPv6_KEY_BYTES(p2[i].key.src_dst), p2[i].key.id,
> > > p2[i].start);
> > > -
> > > +		ip_frag_dbg(tbl, &p2[i], i, assoc);
> > >  		if (ip_frag_key_cmp(key, &p2[i].key) == 0)
> > >  			return p2 + i;
> > >  		else if (ip_frag_key_is_empty(&p2[i].key))
> > > -			empty = (empty == NULL) ?( p2 + i) : empty;
> > > +			empty = (empty == NULL) ? (p2 + i) : empty;
> > >  		else if (max_cycles + p2[i].start < tms)
> > >  			old = (old == NULL) ? (p2 + i) : old;
> > >  	}
> > > @@ -366,3 +487,18 @@ ip_frag_lookup(struct rte_ip_frag_tbl *tbl,
> > >  	*stale = old;
> > >  	return NULL;
> > >  }
> > > +
> > > +struct ip_frag_pkt *
> > > +ip_frag_lookup(struct rte_ip_frag_tbl *tbl, const struct ip_frag_key
> *key,
> > > uint64_t tms,
> > > +	       struct ip_frag_pkt **free, struct ip_frag_pkt **stale) {
> > > +	switch (tbl->lookup_fn) {
> > > +#if defined(RTE_ARCH_ARM64)
> > > +	case REASSEMBLY_LOOKUP_NEON:
> > > +		return ip_frag_lookup_neon(tbl, key, tms, free, stale);
> > #endif
> > > +	case REASSEMBLY_LOOKUP_SCALAR:
> > > +	default:
> > > +		return ip_frag_lookup_scalar(tbl, key, tms, free, stale);
> > > +	}
> > > +}
> > > diff --git a/lib/ip_frag/ip_reassembly.h b/lib/ip_frag/ip_reassembly.h
> index
> > > ef9d8c0d75..049437ae32 100644
> > > --- a/lib/ip_frag/ip_reassembly.h
> > > +++ b/lib/ip_frag/ip_reassembly.h
> > > @@ -12,6 +12,11 @@
> > >
> > >  #include <rte_ip_frag.h>
> > >
> > > +enum ip_frag_lookup_func {
> > > +	REASSEMBLY_LOOKUP_SCALAR = 0,
> > > +	REASSEMBLY_LOOKUP_NEON,
> > > +};
> > > +
> > >  enum {
> > >  	IP_LAST_FRAG_IDX,    /* index of last fragment */
> > >  	IP_FIRST_FRAG_IDX,   /* index of first fragment */
> > > @@ -83,6 +88,7 @@ struct rte_ip_frag_tbl {
> > >  	struct ip_frag_pkt *last;     /* last used entry. */
> > >  	struct ip_pkt_list lru;       /* LRU list for table entries. */
> > >  	struct ip_frag_tbl_stat stat; /* statistics counters. */
> > > +	enum ip_frag_lookup_func lookup_fn;	/* hash table lookup
> > function.
> > > */
> > >  	__extension__ struct ip_frag_pkt pkt[]; /* hash table. */  };
> > >
> > > diff --git a/lib/ip_frag/rte_ip_frag_common.c
> > > b/lib/ip_frag/rte_ip_frag_common.c
> > > index c1de2e81b6..ef3c104e45 100644
> > > --- a/lib/ip_frag/rte_ip_frag_common.c
> > > +++ b/lib/ip_frag/rte_ip_frag_common.c
> > > @@ -5,7 +5,9 @@
> > >  #include <stddef.h>
> > >  #include <stdio.h>
> > >
> > > +#include <rte_cpuflags.h>
> > >  #include <rte_log.h>
> > > +#include <rte_vect.h>
> > >
> > >  #include "ip_frag_common.h"
> > >
> > > @@ -75,6 +77,14 @@ rte_ip_frag_table_create(uint32_t bucket_num,
> > > uint32_t bucket_entries,
> > >  	tbl->bucket_entries = bucket_entries;
> > >  	tbl->entry_mask = (tbl->nb_entries - 1) & ~(tbl->bucket_entries  - 1);
> > >
> > > +#if defined(RTE_ARCH_ARM64)
> > > +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_NEON) &&
> > > +	    rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_128)
> > > +		tbl->lookup_fn = REASSEMBLY_LOOKUP_NEON;
> > > +	else
> > > +#endif
> > > +		tbl->lookup_fn = REASSEMBLY_LOOKUP_SCALAR;
> > > +
> > >  	TAILQ_INIT(&(tbl->lru));
> > >  	return tbl;
> > >  }
> > > --
> > > 2.25.1



More information about the dev mailing list