[dpdk-dev] [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support

Hu, Jiayu jiayu.hu at intel.com
Fri Jun 30 17:40:20 CEST 2017


Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Friday, June 30, 2017 8:07 PM
> To: Hu, Jiayu <jiayu.hu at intel.com>
> Cc: dev at dpdk.org; Tan, Jianfeng <jianfeng.tan at intel.com>;
> stephen at networkplumber.org; yliu at fridaylinux.org; Wu, Jingjing
> <jingjing.wu at intel.com>; Yao, Lei A <lei.a.yao at intel.com>; Wiles, Keith
> <keith.wiles at intel.com>; Bie, Tiwei <tiwei.bie at intel.com>
> Subject: RE: [PATCH v7 2/3] lib/gro: add TCP/IPv4 GRO support
> 
> Hi Jiayu,
> 
> > > > +
> > > > +int32_t
> > > > +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> > > > +		struct gro_tcp_tbl *tbl,
> > > > +		uint32_t max_packet_size)
> > > > +{
> > > > +	struct ether_hdr *eth_hdr;
> > > > +	struct ipv4_hdr *ipv4_hdr;
> > > > +	struct tcp_hdr *tcp_hdr;
> > > > +	uint16_t ipv4_ihl, tcp_hl, tcp_dl;
> > > > +
> > > > +	struct tcp_key key;
> > > > +	uint32_t cur_idx, prev_idx, item_idx;
> > > > +	uint32_t i, key_idx;
> > > > +
> > > > +	eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> > > > +	ipv4_hdr = (struct ipv4_hdr *)(eth_hdr + 1);
> > > > +	ipv4_ihl = IPv4_HDR_LEN(ipv4_hdr);
> > > > +
> > > > +	/* check if the packet should be processed */
> > > > +	if (ipv4_ihl < sizeof(struct ipv4_hdr))
> > > > +		goto fail;
> > > > +	tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + ipv4_ihl);
> > > > +	tcp_hl = TCP_HDR_LEN(tcp_hdr);
> > > > +	tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - ipv4_ihl
> > > > +		- tcp_hl;
> > > > +	if (tcp_dl == 0)
> > > > +		goto fail;
> > > > +
> > > > +	/* find a key and traverse all packets in its item group */
> > > > +	key.eth_saddr = eth_hdr->s_addr;
> > > > +	key.eth_daddr = eth_hdr->d_addr;
> > > > +	key.ip_src_addr[0] = rte_be_to_cpu_32(ipv4_hdr->src_addr);
> > > > +	key.ip_dst_addr[0] = rte_be_to_cpu_32(ipv4_hdr->dst_addr);
> > >
> > > Your key.ip_src_addr[1-3] still contains some junk.
> > > How memcmp below supposed to worj properly?
> >
> > When allocate an item, we already guarantee the content of its
> > memory space is 0. So memcpy won't be error.
> 
> key is allocated on the stack.
> Obviously fileds that are not initialized manually will contain undefined values,
> i.e.: ip_src-addr[1-3].
> Then below you are doing:
> memcp((&(tbl->keys[i].key), &key, sizeof(struct tcp_key));
> ...
> memcpy(&(tbl->keys[key_idx].key), &key, sizeof(struct tcp_key));
> 
> So I still think you are comparing/copying some junk here.

Oh, yes. Key is allocated in stack. Thanks, I will modify it.

> 
> >
> > > BTW why do you need 4 elems here, why just not uint32_t ip_src_addr;?
> > > Same for ip_dst_addr.
> >
> > I think tcp6 and tcp4 can share the same table structure. So I use
> > 128bit IP address here. You mean we need to use different structures
> > for tcp4 and tcp6?
> 
> That would be my preference.

Got it. I will modify it. Thanks.

> 
> >
> > >
> > > > +	key.src_port = rte_be_to_cpu_16(tcp_hdr->src_port);
> > > > +	key.dst_port = rte_be_to_cpu_16(tcp_hdr->dst_port);
> > > > +	key.recv_ack = rte_be_to_cpu_32(tcp_hdr->recv_ack);
> > > > +	key.tcp_flags = tcp_hdr->tcp_flags;
> > > > +
> > > > +	for (i = 0; i < tbl->max_key_num; i++) {
> > > > +		if (tbl->keys[i].is_valid &&
> > > > +				(memcmp(&(tbl->keys[i].key), &key,
> > > > +						sizeof(struct tcp_key))
> > > > +				 == 0)) {
> > > > +			cur_idx = tbl->keys[i].start_index;
> > > > +			prev_idx = cur_idx;
> > > > +			while (cur_idx != INVALID_ARRAY_INDEX) {
> > > > +				if (check_seq_option(tbl->items[cur_idx].pkt,
> > > > +							tcp_hdr,
> > > > +							tcp_hl) > 0) {
> > >
> > > As I remember linux gro also check ipv4 packet_id - it should be
> consecutive.
> >
> > IP fragmented packets have the same IP ID, but others are consecutive.
> 
> Yes, you assume that they are consecutive.
> But the only way to know for sure that they are - check it.

Yes, I will modify it. Thanks.

> Another thing - I think we need to perform GRO only for TCP packets with
> only ACK bit set
> (i.e. - no GRO for FIN/SYN/PUSH/URG/, etc.).

Currently, we don't process packets whose payload length is 0. So if the packets
are SYN or FIN, we won't process them, since their payload length is 0. For URG,
PSH and RST, TCP/IPv4 GRO may still process these packets.

You are right. We shouldn't process packets whose URG, PSH or RST bit is set.
Thanks, I will modify it. 

> 
> > As we  suppose GRO can merge IP fragmented packets, so I think we
> shouldn't check if
> > the IP ID is consecutive here. How do you think?
> >
> > >
> > > > +					if (merge_two_tcp4_packets(
> > > > +								tbl-
> >items[cur_idx].pkt,
> > > > +								pkt,
> > > > +
> 	max_packet_size) > 0) {
> > > > +						/* successfully merge two
> packets */
> > > > +						tbl-
> >items[cur_idx].is_groed = 1;
> > > > +						return 1;
> > > > +					}
> > >
> > > If you allow more then packet per flow to be stored in the table, then you
> should be
> > > prepared that new segment can fill a gap between 2 packets.
> > > Probably the easiest thing - don't allow more then one 'item' per flow.
> >
> > We allow the table to store same flow but out-of-order arriving packets.
> For
> > these packets, they will occupy different 'item' and we won't re-merge
> them.
> > For example, there are three same flow tcp packets: p1, p2 and p3. And p1
> arrives
> > first, then p3, and last is p2. So TCP GRO will allocate one 'item' for p1 and
> one
> > 'item' for p3, and when p2 arrives, p2 will be merged with p1. Therefore, in
> the
> > table, we will have two 'item': item1 to store merged p1 and p2, item2 to
> store p3.
> >
> > As you can see, TCP GRO can only merges sequential arriving packets. If we
> want to
> > merge all out-of-order arriving packets, we need to re-process these
> packets which
> > are already processed and have one 'item'. IMO, this procedure will be very
> complicated.
> > So we don't do that.
> >
> > Sorry, I don't understand how to allow one 'item' per-flow. Because
> packets are arriving
> > out-of-order. If we don't re-process these packets which already have one
> 'item', how to
> > guarantee it?
> 
> As I understand you'll have an input array:
> <seq=2, seq=1> - you wouldn't be able to merge it.
> So I think your merge need be prepared to merge both smaller and bigger
> sequence.

Oh yes, it's much better. I will modify it.

> About one item my thought : instead of allowing N items per key(flow) - for
> simplicity
> just allow one item per flow.
> In that case we wouldn't allow holes, but still would be able to merge
> reordered packets.
> Alternatively you can store items ordered by seq, and after merge, try to
> merge neighbor
> Items too.

Yes, when we insert a new item, we can chain it with the packets of its item
group ordered by seq. After processing all packets, we need to traverse each
item group and try to merge neighbors. But when will 'merge neighbors' happen?
When flush packets from the table? (e.g.  gro_tcp_tbl_timeout_flush)

BRs,
Jiayu
> 
> >
> > >
> > > > +					/**
> > > > +					 * fail to merge two packets since
> > > > +					 * it's beyond the max packet length.
> > > > +					 * Insert it into the item group.
> > > > +					 */
> > > > +					goto insert_to_item_group;
> > > > +				} else {
> > > > +					prev_idx = cur_idx;
> > > > +					cur_idx = tbl-
> >items[cur_idx].next_pkt_idx;
> > > > +				}
> > > > +			}
> > > > +			/**
> > > > +			 * find a corresponding item group but fails to find
> > > > +			 * one packet to merge. Insert it into this item group.
> > > > +			 */
> > > > +insert_to_item_group:
> > > > +			item_idx = find_an_empty_item(tbl);
> > > > +			/* the item number is greater than the max value */
> > > > +			if (item_idx == INVALID_ARRAY_INDEX)
> > > > +				return -1;
> > > > +			tbl->items[prev_idx].next_pkt_idx = item_idx;
> > > > +			tbl->items[item_idx].pkt = pkt;
> > > > +			tbl->items[item_idx].is_groed = 0;
> > > > +			tbl->items[item_idx].next_pkt_idx =
> INVALID_ARRAY_INDEX;
> > > > +			tbl->items[item_idx].is_valid = 1;
> > > > +			tbl->items[item_idx].start_time = rte_rdtsc();
> > > > +			tbl->item_num++;
> > > > +			return 0;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	/**
> > > > +	 * merge fail as the given packet has a new key.
> > > > +	 * So insert a new key.
> > > > +	 */
> > > > +	item_idx = find_an_empty_item(tbl);
> > > > +	key_idx = find_an_empty_key(tbl);
> > > > +	/**
> > > > +	 * if current key or item number is greater than the max
> > > > +	 * value, don't insert the packet into the table and return
> > > > +	 * immediately.
> > > > +	 */
> > > > +	if (item_idx == INVALID_ARRAY_INDEX ||
> > > > +			key_idx == INVALID_ARRAY_INDEX)
> > > > +		return -1;
> > > > +	tbl->items[item_idx].pkt = pkt;
> > > > +	tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> > > > +	tbl->items[item_idx].is_groed = 0;
> > > > +	tbl->items[item_idx].is_valid = 1;
> > > > +	tbl->items[item_idx].start_time = rte_rdtsc();
> > >


More information about the dev mailing list