[PATCH v5] gro : fix reordering of packets in GRO library

Hu, Jiayu jiayu.hu at intel.com
Tue Jun 20 09:35:49 CEST 2023


Hi Kumara,

Please see replies inline.

Thanks,
Jiayu

> -----Original Message-----
> From: Kumara Parameshwaran <kumaraparamesh92 at gmail.com>
> Sent: Tuesday, November 1, 2022 3:06 PM
> To: Hu, Jiayu <jiayu.hu at intel.com>
> Cc: dev at dpdk.org; Kumara Parameshwaran
> <kumaraparamesh92 at gmail.com>; Kumara Parameshwaran
> <kparameshwar at vmware.com>
> Subject: [PATCH v5] gro : fix reordering of packets in GRO library
> 
> From: Kumara Parameshwaran <kumaraparamesh92 at gmail.com>
> 
> When a TCP packet contains flags like PSH it is returned immediately to the
> application though there might be packets of the same flow in the GRO table.
> If PSH flag is set on a segment packets up to the segment should be delivered
> immediately. But the current implementation delivers the last arrived packet
> with PSH flag set causing re-ordering
> 
> With this patch, if a packet does not contain only ACK flag and if there are no
> previous packets for the flow the packet would be returned immediately,
> else will be merged with the previous segment and the flag on the last
> segment will be set on the entire segment.
> This is the behaviour with linux stack as well.
> 
> Signed-off-by: Kumara Parameshwaran <kumaraparamesh92 at gmail.com>
> Co-authored-by: Kumara Parameshwaran <kparameshwar at vmware.com>
> ---
> v1:
> 	If the received packet is not a pure ACK packet, we check if
> 	there are any previous packets in the flow, if present we indulge
> 	the received packet also in the coalescing logic and update the flags
> 	of the last recived packet to the entire segment which would avoid
> 	re-ordering.
> 
> 	Lets say a case where P1(PSH), P2(ACK), P3(ACK)  are received in
> burst mode,
> 	P1 contains PSH flag and since it does not contain any prior packets in
> the flow
> 	we copy it to unprocess_packets and P2(ACK) and P3(ACK) are
> merged together.
> 	In the existing case the  P2,P3 would be delivered as single segment
> first and the
> 	unprocess_packets will be copied later which will cause reordering.
> With the patch
> 	copy the unprocess packets first and then the packets from the GRO
> table.
> 
> 	Testing done
> 	The csum test-pmd was modifited to support the following
> 	GET request of 10MB from client to server via test-pmd (static arp
> entries added in client
> 	and server). Enable GRO and TSO in test-pmd where the packets
> recived from the client mac
> 	would be sent to server mac and vice versa.
> 	In above testing, without the patch the client observerd re-ordering
> of 25 packets
> 	and with the patch there were no packet re-ordering observerd.
> 
> v2:
> 	Fix warnings in commit and comment.
> 	Do not consider packet as candidate to merge if it contains SYN/RST
> flag.
> 
> v3:
> 	Fix warnings.
> 
> v4:
> 	Rebase with master.
> 
> v5:
> 	Adding co-author email
> 
>  lib/gro/gro_tcp4.c | 45 +++++++++++++++++++++++++++++++++++++--------
>  lib/gro/rte_gro.c  | 18 +++++++++---------
>  2 files changed, 46 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/gro/gro_tcp4.c b/lib/gro/gro_tcp4.c index
> 0014096e63..7363c5d540 100644
> --- a/lib/gro/gro_tcp4.c
> +++ b/lib/gro/gro_tcp4.c
> @@ -188,6 +188,19 @@ update_header(struct gro_tcp4_item *item)
>  			pkt->l2_len);
>  }
> 
> +static inline void
> +update_tcp_hdr_flags(struct rte_tcp_hdr *tcp_hdr, struct rte_mbuf *pkt)
> +{
> +	struct rte_ether_hdr *eth_hdr;
> +	struct rte_ipv4_hdr *ipv4_hdr;
> +	struct rte_tcp_hdr *merged_tcp_hdr;
> +
> +	eth_hdr = rte_pktmbuf_mtod(pkt, struct rte_ether_hdr *);
> +	ipv4_hdr = (struct rte_ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> +	merged_tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt-
> >l3_len);
> +	merged_tcp_hdr->tcp_flags |= tcp_hdr->tcp_flags; }

The Linux kernel updates the TCP flag via "tcp_flag_word(th2) |= flags & (TCP_FLAG_FIN | TCP_FLAG_PSH)",
which only adds FIN and PSH at most to the merge packet.

> +
>  int32_t
>  gro_tcp4_reassemble(struct rte_mbuf *pkt,
>  		struct gro_tcp4_tbl *tbl,
> @@ -206,6 +219,7 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>  	uint32_t i, max_flow_num, remaining_flow_num;
>  	int cmp;
>  	uint8_t find;
> +	uint32_t start_idx;
> 
>  	/*
>  	 * Don't process the packet whose TCP header length is greater @@ -
> 219,13 +233,6 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>  	tcp_hdr = (struct rte_tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
>  	hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len;
> 
> -	/*
> -	 * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE
> -	 * or CWR set.
> -	 */
> -	if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG)
> -		return -1;
> -
>  	/* trim the tail padding bytes */
>  	ip_tlen = rte_be_to_cpu_16(ipv4_hdr->total_length);
>  	if (pkt->pkt_len > (uint32_t)(ip_tlen + pkt->l2_len)) @@ -264,12
> +271,30 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>  		if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) {
>  			if (is_same_tcp4_flow(tbl->flows[i].key, key)) {
>  				find = 1;
> +				start_idx = tbl->flows[i].start_index;
>  				break;
>  			}
>  			remaining_flow_num--;
>  		}
>  	}
> 
> +	if (tcp_hdr->tcp_flags != RTE_TCP_ACK_FLAG) {
> +		/*
> +		 * Check and try merging the current TCP segment with the
> previous
> +		 * TCP segment if the TCP header does not contain RST and
> SYN flag
> +		 * There are cases where the last segment is sent with
> FIN|PSH|ACK
> +		 * which should also be considered for merging with previous
> segments.
> +		 */
> +		if (find && !(tcp_hdr->tcp_flags &
> (RTE_TCP_RST_FLAG|RTE_TCP_SYN_FLAG)))
> +			/*
> +			 * Since PSH flag is set, start time will be set to 0 so it
> will be flushed
> +			 * immediately.
> +			 */
> +			tbl->items[start_idx].start_time = 0;
> +		else
> +			return -1;
> +	}

The nested if-else check is not straightforward, and it's hard to read the condition-action of
different combinations of flag bits. In addition, are all flag bits considered like Linux kernel?

> +
>  	/*
>  	 * Fail to find a matched flow. Insert a new flow and store the
>  	 * packet into the flow.
> @@ -304,8 +329,12 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
>  				is_atomic);
>  		if (cmp) {
>  			if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
> -						pkt, cmp, sent_seq, ip_id, 0))
> +						pkt, cmp, sent_seq, ip_id, 0))
> {
> +				if (tbl->items[cur_idx].start_time == 0)
> +					update_tcp_hdr_flags(tcp_hdr, tbl-
> >items[cur_idx].firstseg);
>  				return 1;
> +			}
> +
>  			/*
>  			 * Fail to merge the two packets, as the packet
>  			 * length is greater than the max value. Store diff --git
> a/lib/gro/rte_gro.c b/lib/gro/rte_gro.c index e35399fd42..87c5502dce
> 100644
> --- a/lib/gro/rte_gro.c
> +++ b/lib/gro/rte_gro.c
> @@ -283,10 +283,17 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>  	if ((nb_after_gro < nb_pkts)
>  		 || (unprocess_num < nb_pkts)) {
>  		i = 0;
> +		/* Copy unprocessed packets */
> +		if (unprocess_num > 0) {
> +			memcpy(&pkts[i], unprocess_pkts,
> +					sizeof(struct rte_mbuf *) *
> +					unprocess_num);
> +			i = unprocess_num;
> +		}

Why copy unprocess pkts first? This is for avoiding out-of-order?

Thanks,
Jiayu
>  		/* Flush all packets from the tables */
>  		if (do_vxlan_tcp_gro) {
> -			i = gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
> -					0, pkts, nb_pkts);
> +			i +=
> gro_vxlan_tcp4_tbl_timeout_flush(&vxlan_tcp_tbl,
> +					0, &pkts[i], nb_pkts - i);
>  		}
> 
>  		if (do_vxlan_udp_gro) {
> @@ -304,13 +311,6 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts,
>  			i += gro_udp4_tbl_timeout_flush(&udp_tbl, 0,
>  					&pkts[i], nb_pkts - i);
>  		}
> -		/* Copy unprocessed packets */
> -		if (unprocess_num > 0) {
> -			memcpy(&pkts[i], unprocess_pkts,
> -					sizeof(struct rte_mbuf *) *
> -					unprocess_num);
> -		}
> -		nb_after_gro = i + unprocess_num;
>  	}
> 
>  	return nb_after_gro;
> --
> 2.25.1



More information about the dev mailing list