[dpdk-dev] [PATCH v11 2/3] lib/gro: add TCP/IPv4 GRO support
Tan, Jianfeng
jianfeng.tan at intel.com
Fri Jul 7 08:55:28 CEST 2017
On 7/5/2017 12:08 PM, Jiayu Hu wrote:
> In this patch, we introduce five APIs to support TCP/IPv4 GRO.
> - gro_tcp4_tbl_create: create a TCP/IPv4 reassembly table, which is used
> to merge packets.
> - gro_tcp4_tbl_destroy: free memory space of a TCP/IPv4 reassembly table.
> - gro_tcp4_tbl_timeout_flush: flush timeout packets from a TCP/IPv4
> reassembly table.
> - gro_tcp4_tbl_get_count: return the number of packets in a TCP/IPv4
> reassembly table.
> - gro_tcp4_reassemble: reassemble an inputted TCP/IPv4 packet.
>
> TCP/IPv4 GRO API assumes all inputted packets are with correct IPv4
> and TCP checksums. And TCP/IPv4 GRO API doesn't update IPv4 and TCP
> checksums for merged packets. If inputted packets are IP fragmented,
> TCP/IPv4 GRO API assumes they are complete packets (i.e. with L4
> headers).
>
> In TCP/IPv4 GRO, we use a table structure, called TCP/IPv4 reassembly
> table, to reassemble packets. A TCP/IPv4 reassembly table includes a key
> array and a item array, where the key array keeps the criteria to merge
> packets and the item array keeps packet information.
>
> One key in the key array points to an item group, which consists of
> packets which have the same criteria value. If two packets are able to
> merge, they must be in the same item group. Each key in the key array
> includes two parts:
> - criteria: the criteria of merging packets. If two packets can be
> merged, they must have the same criteria value.
> - start_index: the index of the first incoming packet of the item group.
>
> Each element in the item array keeps the information of one packet. It
> mainly includes three parts:
> - firstseg: the address of the first segment of the packet
> - lastsegL the address of the last segment of the packet
> - next_pkt_index: the index of the next packet in the same item group.
> All packets in the same item group are chained by next_pkt_index.
> With next_pkt_index, we can locate all packets in the same item
> group one by one.
>
> To process an incoming packet needs three steps:
> a. check if the packet should be processed. Packets with one of the
> following properties won't be processed:
> - FIN, SYN, RST URG, PSH, ECE or CWR bit is set;
> - packet payload length is 0.
> b. traverse the key array to find a key which has the same criteria
> value with the incoming packet. If find, goto step c. Otherwise,
> insert a new key and insert the packet into the item array.
> c. locate the first packet in the item group via the start_index in the
> key. Then traverse all packets in the item group via next_pkt_index.
> If find one packet which can merge with the incoming one, merge them
> together. If can't find, insert the packet into this item group.
>
> Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> ---
> doc/guides/rel_notes/release_17_08.rst | 7 +
> lib/librte_gro/Makefile | 1 +
> lib/librte_gro/gro_tcp4.c | 493 +++++++++++++++++++++++++++++++++
> lib/librte_gro/gro_tcp4.h | 206 ++++++++++++++
> lib/librte_gro/rte_gro.c | 121 +++++++-
> lib/librte_gro/rte_gro.h | 5 +-
> 6 files changed, 819 insertions(+), 14 deletions(-)
> create mode 100644 lib/librte_gro/gro_tcp4.c
> create mode 100644 lib/librte_gro/gro_tcp4.h
>
> diff --git a/doc/guides/rel_notes/release_17_08.rst b/doc/guides/rel_notes/release_17_08.rst
> index 842f46f..f067247 100644
> --- a/doc/guides/rel_notes/release_17_08.rst
> +++ b/doc/guides/rel_notes/release_17_08.rst
> @@ -75,6 +75,13 @@ New Features
>
> Added support for firmwares with multiple Ethernet ports per physical port.
>
> +* **Add Generic Receive Offload API support.**
> +
> + Generic Receive Offload (GRO) API supports to reassemble TCP/IPv4
> + packets. GRO API assumes all inputted packets are with correct
> + checksums. GRO API doesn't update checksums for merged packets. If
> + inputted packets are IP fragmented, GRO API assumes they are complete
> + packets (i.e. with L4 headers).
>
> Resolved Issues
> ---------------
> diff --git a/lib/librte_gro/Makefile b/lib/librte_gro/Makefile
> index 7e0f128..747eeec 100644
> --- a/lib/librte_gro/Makefile
> +++ b/lib/librte_gro/Makefile
> @@ -43,6 +43,7 @@ LIBABIVER := 1
>
> # source files
> SRCS-$(CONFIG_RTE_LIBRTE_GRO) += rte_gro.c
> +SRCS-$(CONFIG_RTE_LIBRTE_GRO) += gro_tcp4.c
>
> # install this header file
> SYMLINK-$(CONFIG_RTE_LIBRTE_GRO)-include += rte_gro.h
> diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
> new file mode 100644
> index 0000000..703282d
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.c
> @@ -0,0 +1,493 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +#include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> +#include "gro_tcp4.h"
> +
> +void *
> +gro_tcp4_tbl_create(uint16_t socket_id,
> + uint16_t max_flow_num,
> + uint16_t max_item_per_flow)
> +{
> + struct gro_tcp4_tbl *tbl;
> + size_t size;
> + uint32_t entries_num;
> +
> + entries_num = max_flow_num * max_item_per_flow;
> + entries_num = entries_num > GRO_TCP4_TBL_MAX_ITEM_NUM ?
> + GRO_TCP4_TBL_MAX_ITEM_NUM : entries_num;
As I commented before, this check is not good;
entries_num is uint32_t; it can never be greater than (UINT32_MAX - 1).
Plus, we cannot allocate a memory as big as sizeof(struct gro_tcp4_item)
* UINT32_MAX.
If we really need a check, please make it smaller. Considering each item
means a flow in some extent, I think we can limit it to 1M flows for now.
(Sorry, I should comment at the definition of GRO_TCP4_TBL_MAX_ITEM_NUM.
> +
> + if (entries_num == 0)
> + return NULL;
> +
> + tbl = rte_zmalloc_socket(__func__,
> + sizeof(struct gro_tcp4_tbl),
> + RTE_CACHE_LINE_SIZE,
> + socket_id);
> + if (tbl == NULL)
> + return NULL;
> +
> + size = sizeof(struct gro_tcp4_item) * entries_num;
> + tbl->items = rte_zmalloc_socket(__func__,
> + size,
> + RTE_CACHE_LINE_SIZE,
> + socket_id);
> + if (tbl->items == NULL) {
> + rte_free(tbl);
> + return NULL;
> + }
> + tbl->max_item_num = entries_num;
> +
> + size = sizeof(struct gro_tcp4_key) * entries_num;
> + tbl->keys = rte_zmalloc_socket(__func__,
> + size,
> + RTE_CACHE_LINE_SIZE,
> + socket_id);
> + if (tbl->keys == NULL) {
> + rte_free(tbl->items);
> + rte_free(tbl);
> + return NULL;
> + }
> + tbl->max_key_num = entries_num;
> +
> + return tbl;
> +}
> +
> +void
> +gro_tcp4_tbl_destroy(void *tbl)
> +{
> + struct gro_tcp4_tbl *tcp_tbl = tbl;
> +
> + if (tcp_tbl) {
> + rte_free(tcp_tbl->items);
> + rte_free(tcp_tbl->keys);
> + }
> + rte_free(tcp_tbl);
> +}
> +
> +/*
> + * merge two TCP/IPv4 packets without updating checksums.
> + * If cmp is larger than 0, append the new packet to the
> + * original packet. Otherwise, pre-pend the new packet to
> + * the original packet.
> + */
> +static inline int
> +merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> + struct rte_mbuf *pkt,
> + uint16_t ip_id,
> + uint32_t sent_seq,
> + int cmp)
> +{
> + struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> + uint16_t tcp_dl1;
We don't have a tcp_dl2, and for readability, we should not hide "dl";
so just change the name to tcp_datalen.
> +
> + if (cmp > 0) {
> + pkt_head = item_src->firstseg;
> + pkt_tail = pkt;
> + } else {
> + pkt_head = pkt;
> + pkt_tail = item_src->firstseg;
> + }
> +
> + /* check if the packet length will be beyond the max value */
> + tcp_dl1 = pkt_tail->pkt_len - pkt_tail->l2_len -
> + pkt_tail->l3_len - pkt_tail->l4_len;
> + if (pkt_head->pkt_len - pkt_head->l2_len + tcp_dl1 >
> + TCP4_MAX_L3_LENGTH)
> + return -1;
> +
> + /* remove packet header for the tail packet */
> + rte_pktmbuf_adj(pkt_tail,
> + pkt_tail->l2_len +
> + pkt_tail->l3_len +
> + pkt_tail->l4_len);
> +
> + /* chain two packets together */
> + if (cmp > 0) {
> + item_src->lastseg->next = pkt;
> + item_src->lastseg = rte_pktmbuf_lastseg(pkt);
> + /* update IP ID to the larger value */
> + item_src->ip_id = ip_id;
> + } else {
> + lastseg = rte_pktmbuf_lastseg(pkt);
> + lastseg->next = item_src->firstseg;
> + item_src->firstseg = pkt;
> + /* update sent_seq to the smaller value */
> + item_src->sent_seq = sent_seq;
> + }
> + item_src->nb_merged++;
> +
> + /* update mbuf metadata for the merged packet */
> + pkt_head->nb_segs += pkt_tail->nb_segs;
> + pkt_head->pkt_len += pkt_tail->pkt_len;
> +
> + return 1;
> +}
> +
> +static inline int
> +check_seq_option(struct gro_tcp4_item *item,
> + struct tcp_hdr *tcp_hdr,
> + uint16_t tcp_hl,
> + uint16_t tcp_dl,
> + uint16_t ip_id,
> + uint32_t sent_seq)
> +{
> + struct rte_mbuf *pkt0 = item->firstseg;
> + struct ipv4_hdr *ipv4_hdr0;
> + struct tcp_hdr *tcp_hdr0;
> + uint16_t tcp_hl0, tcp_dl0;
> + uint16_t len;
> +
> + ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
> + pkt0->l2_len);
> + tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
> + tcp_hl0 = pkt0->l4_len;
> +
> + /* check if TCP option fields equal. If not, return 0. */
> + len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> + if ((tcp_hl != tcp_hl0) ||
> + ((len > 0) && (memcmp(tcp_hdr + 1,
> + tcp_hdr0 + 1,
> + len) != 0)))
> + return 0;
> +
> + /* check if the two packets are neighbors */
> + tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> + if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> + (ip_id == (item->ip_id + 1)))
> + /* append the new packet */
> + return 1;
> + else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> + ((ip_id + item->nb_merged) == item->ip_id))
> + /* pre-pend the new packet */
> + return -1;
> + else
> + return 0;
> +}
> +
> +static inline uint32_t
> +find_an_empty_item(struct gro_tcp4_tbl *tbl)
> +{
> + uint32_t i;
> +
> + for (i = 0; i < tbl->max_item_num; i++)
> + if (tbl->items[i].firstseg == NULL)
> + return i;
> + return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +find_an_empty_key(struct gro_tcp4_tbl *tbl)
> +{
> + uint32_t i;
> +
> + for (i = 0; i < tbl->max_key_num; i++)
> + if (tbl->keys[i].is_valid == 0)
> + return i;
> + return INVALID_ARRAY_INDEX;
> +}
> +
> +static inline uint32_t
> +insert_new_item(struct gro_tcp4_tbl *tbl,
> + struct rte_mbuf *pkt,
> + uint16_t ip_id,
> + uint32_t sent_seq,
> + uint32_t prev_idx,
> + uint64_t start_time)
> +{
> + uint32_t item_idx;
> +
> + item_idx = find_an_empty_item(tbl);
> + if (item_idx == INVALID_ARRAY_INDEX)
> + return INVALID_ARRAY_INDEX;
> +
> + tbl->items[item_idx].firstseg = pkt;
> + tbl->items[item_idx].lastseg = rte_pktmbuf_lastseg(pkt);
> + tbl->items[item_idx].start_time = start_time;
> + tbl->items[item_idx].next_pkt_idx = INVALID_ARRAY_INDEX;
> + tbl->items[item_idx].sent_seq = sent_seq;
> + tbl->items[item_idx].ip_id = ip_id;
> + tbl->items[item_idx].nb_merged = 1;
> + tbl->item_num++;
> +
> + /* if the previous packet exists, chain the new one with it */
> + if (prev_idx != INVALID_ARRAY_INDEX)
> + tbl->items[prev_idx].next_pkt_idx = item_idx;
> +
> + return item_idx;
> +}
> +
> +static inline uint32_t
> +delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx)
> +{
> + uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
> +
> + /* set NULL to firstseg to indicate it's an empty item */
> + tbl->items[item_idx].firstseg = NULL;
> + tbl->item_num--;
> +
> + return next_idx;
> +}
> +
> +static inline uint32_t
> +insert_new_key(struct gro_tcp4_tbl *tbl,
> + struct tcp4_key *key_src,
> + uint32_t item_idx)
> +{
> + struct tcp4_key *key_dst;
> + uint32_t key_idx;
> +
> + key_idx = find_an_empty_key(tbl);
> + if (key_idx == INVALID_ARRAY_INDEX)
> + return INVALID_ARRAY_INDEX;
> +
> + key_dst = &(tbl->keys[key_idx].key);
> +
> + ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
> + ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
> + key_dst->ip_src_addr = key_src->ip_src_addr;
> + key_dst->ip_dst_addr = key_src->ip_dst_addr;
> + key_dst->recv_ack = key_src->recv_ack;
> + key_dst->src_port = key_src->src_port;
> + key_dst->dst_port = key_src->dst_port;
> +
> + tbl->keys[key_idx].start_index = item_idx;
> + tbl->keys[key_idx].is_valid = 1;
> + tbl->key_num++;
> +
> + return key_idx;
> +}
> +
> +static inline int
> +compare_key(struct tcp4_key k1, struct tcp4_key k2)
> +{
> + uint16_t *c1, *c2;
> +
> + c1 = (uint16_t *)&(k1.eth_saddr);
> + c2 = (uint16_t *)&(k2.eth_saddr);
> + if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> + return -1;
> + c1 = (uint16_t *)&(k1.eth_daddr);
> + c2 = (uint16_t *)&(k2.eth_daddr);
> + if ((c1[0] != c2[0]) || (c1[1] != c2[1]) || (c1[2] != c2[2]))
> + return -1;
> + if ((k1.ip_src_addr != k2.ip_src_addr) ||
> + (k1.ip_dst_addr != k2.ip_dst_addr) ||
> + (k1.recv_ack != k2.recv_ack) ||
> + (k1.src_port != k2.src_port) ||
> + (k1.dst_port != k2.dst_port))
> + return -1;
> +
> + return 0;
> +}
Above function can be written in a cleaner way:
static inline int
is_same_key(struct tcp4_key k1, struct tcp4_key k2)
{
if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
return 0;
if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
return 0;
return ((k1.ip_src_addr == k2.ip_src_addr) &&
(k1.ip_dst_addr == k2.ip_dst_addr) &&
(k1.recv_ack == k2.recv_ack) &&
(k1.src_port == k2.src_port) &&
(k1.dst_port == k2.dst_port));
}
> +
> +/*
> + * update packet length and IP ID for the flushed packet.
> + */
> +static inline void
> +update_packet_header(struct gro_tcp4_item *item)
> +{
> + struct ipv4_hdr *ipv4_hdr;
> + struct rte_mbuf *pkt = item->firstseg;
> +
> + ipv4_hdr = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt, char *) +
> + pkt->l2_len);
> + ipv4_hdr->total_length = rte_cpu_to_be_16(pkt->pkt_len -
> + pkt->l2_len);
> + ipv4_hdr->packet_id = rte_cpu_to_be_16(item->ip_id);
> +}
> +
> +int32_t
> +gro_tcp4_reassemble(struct rte_mbuf *pkt,
> + struct gro_tcp4_tbl *tbl,
> + uint64_t start_time)
> +{
> + struct ether_hdr *eth_hdr;
> + struct ipv4_hdr *ipv4_hdr;
> + struct tcp_hdr *tcp_hdr;
> + uint32_t sent_seq;
> + uint16_t tcp_dl, ip_id;
> +
> + struct tcp4_key key;
> + uint32_t cur_idx, prev_idx, item_idx;
> + uint32_t i;
> + int cmp;
> +
> + eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> + ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> + tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> +
> + /*
> + * if FIN, SYN, RST, PSH, URG, ECE or CWR is set, return immediately.
> + */
> + if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> + return -1;
> + /* if payload length is 0, return immediately */
> + tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> + pkt->l4_len;
> + if (tcp_dl == 0)
> + return -1;
> +
> + ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> + sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
> +
> + ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
> + ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
> + key.ip_src_addr = ipv4_hdr->src_addr;
> + key.ip_dst_addr = ipv4_hdr->dst_addr;
> + key.src_port = tcp_hdr->src_port;
> + key.dst_port = tcp_hdr->dst_port;
> + key.recv_ack = tcp_hdr->recv_ack;
> +
> + /* search for a key */
> + for (i = 0; i < tbl->max_key_num; i++) {
> + if ((tbl->keys[i].is_valid == 1) &&
> + (compare_key(tbl->keys[i].key, key) == 0))
> + break;
Simplified as:
for (i = 0; i < tbl->max_key_num; i++)
if (tbl->keys[i].is_valid &&
is_same_key(tbl->keys[i].key, key))
break;
> + }
> +
> + /* can't find a key, so insert a new key and a new item. */
> + if (i == tbl->max_key_num) {
> + item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
> + INVALID_ARRAY_INDEX, start_time);
> + if (item_idx == INVALID_ARRAY_INDEX)
> + return -1;
> + if (insert_new_key(tbl, &key, item_idx) ==
> + INVALID_ARRAY_INDEX) {
> + /* fail to insert a new key, delete the inserted item */
> + delete_item(tbl, item_idx);
> + return -1;
> + }
> + return 0;
> + }
> +
> + /* traverse all packets in the item group to find one to merge */
> + cur_idx = tbl->keys[i].start_index;
> + prev_idx = cur_idx;
> + do {
> + cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
> + pkt->l4_len, tcp_dl, ip_id, sent_seq);
> + if (cmp != 0) {
> + if (merge_two_tcp4_packets(&(tbl->items[cur_idx]), pkt,
> + ip_id, sent_seq, cmp) > 0)
> + return 1;
> + /*
> + * fail to merge two packets since the packet length
> + * will be greater than the max value. So insert the
> + * packet into the item group.
> + */
> + if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> + start_time) == INVALID_ARRAY_INDEX)
> + return -1;
> + return 0;
> + }
> + prev_idx = cur_idx;
> + cur_idx = tbl->items[cur_idx].next_pkt_idx;
> + } while (cur_idx != INVALID_ARRAY_INDEX);
> +
> + /*
> + * can't find a packet in the item group to merge,
> + * so insert the packet into the item group.
> + */
> + if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> + start_time) == INVALID_ARRAY_INDEX)
> + return -1;
> +
> + return 0;
> +}
> +
> +uint16_t
> +gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> + uint64_t timeout_cycles,
> + struct rte_mbuf **out,
> + uint16_t nb_out)
> +{
> + uint16_t k = 0;
> + uint32_t i, j;
> + uint64_t current_time;
> +
> + current_time = rte_rdtsc();
> +
> + for (i = 0; i < tbl->max_key_num; i++) {
> + /* all keys have been checked, return immediately */
> + if (tbl->key_num == 0)
> + return k;
> +
> + if (tbl->keys[i].is_valid == 0)
> + continue;
> +
> + j = tbl->keys[i].start_index;
> + do {
> + if ((current_time - tbl->items[j].start_time) >=
> + timeout_cycles) {
> + out[k++] = tbl->items[j].firstseg;
> + update_packet_header(&(tbl->items[j]));
> + /* delete the item and get the next packet index */
> + j = delete_item(tbl, j);
> +
> + /* delete the key as all of packets are flushed */
> + if (j == INVALID_ARRAY_INDEX) {
> + tbl->keys[i].is_valid = 0;
> + tbl->key_num--;
> + } else
> + /* update start_index of the key */
> + tbl->keys[i].start_index = j;
> +
> + if (k == nb_out)
> + return k;
> + } else
> + /*
> + * left packets of this key won't be timeout, so go to
> + * check other keys.
> + */
> + break;
> + } while (j != INVALID_ARRAY_INDEX);
> + }
> + return k;
> +}
> +
> +uint32_t
> +gro_tcp4_tbl_get_count(void *tbl)
> +{
> + struct gro_tcp4_tbl *gro_tbl = tbl;
> +
> + if (gro_tbl)
> + return gro_tbl->item_num;
> +
> + return 0;
> +}
> diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
> new file mode 100644
> index 0000000..4a57451
> --- /dev/null
> +++ b/lib/librte_gro/gro_tcp4.h
> @@ -0,0 +1,206 @@
> +/*-
> + * BSD LICENSE
> + *
> + * Copyright(c) 2017 Intel Corporation. All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in
> + * the documentation and/or other materials provided with the
> + * distribution.
> + * * Neither the name of Intel Corporation nor the names of its
> + * contributors may be used to endorse or promote products derived
> + * from this software without specific prior written permission.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _GRO_TCP4_H_
> +#define _GRO_TCP4_H_
> +
> +#define INVALID_ARRAY_INDEX 0xffffffffUL
> +#define GRO_TCP4_TBL_MAX_ITEM_NUM (UINT32_MAX - 1)
> +
> +/*
> + * the max L3 length of a TCP/IPv4 packet. The L3 length
> + * is the sum of ipv4 header, tcp header and L4 payload.
> + */
> +#define TCP4_MAX_L3_LENGTH UINT16_MAX
> +
> +/* criteria of mergeing packets */
> +struct tcp4_key {
> + struct ether_addr eth_saddr;
> + struct ether_addr eth_daddr;
> + uint32_t ip_src_addr;
> + uint32_t ip_dst_addr;
> +
> + uint32_t recv_ack;
> + uint16_t src_port;
> + uint16_t dst_port;
> +};
> +
> +struct gro_tcp4_key {
> + struct tcp4_key key;
> + /* the index of the first packet in the item group */
> + uint32_t start_index;
> + uint8_t is_valid;
> +};
> +
> +struct gro_tcp4_item {
> + /*
> + * first segment of the packet. If the value
> + * is NULL, it means the item is empty.
> + */
> + struct rte_mbuf *firstseg;
> + /* last segment of the packet */
> + struct rte_mbuf *lastseg;
> + /*
> + * the time when the first packet is inserted
> + * into the table. If a packet in the table is
> + * merged with an incoming packet, this value
> + * won't be updated. We set this value only
> + * when the first packet is inserted into the
> + * table.
> + */
> + uint64_t start_time;
> + /*
> + * we use next_pkt_idx to chain the packets that
> + * have same key value but can't be merged together.
> + */
> + uint32_t next_pkt_idx;
> + /* the sequence number of the packet */
> + uint32_t sent_seq;
> + /* the IP ID of the packet */
> + uint16_t ip_id;
> + /* the number of merged packets */
> + uint16_t nb_merged;
> +};
> +
> +/*
> + * TCP/IPv4 reassembly table structure.
> + */
> +struct gro_tcp4_tbl {
> + /* item array */
> + struct gro_tcp4_item *items;
> + /* key array */
> + struct gro_tcp4_key *keys;
> + /* current item number */
> + uint32_t item_num;
> + /* current key num */
> + uint32_t key_num;
> + /* item array size */
> + uint32_t max_item_num;
> + /* key array size */
> + uint32_t max_key_num;
> +};
> +
> +/**
> + * This function creates a TCP/IPv4 reassembly table.
> + *
> + * @param socket_id
> + * socket index for allocating TCP/IPv4 reassemblt table
> + * @param max_flow_num
> + * the maximum number of flows in the TCP/IPv4 GRO table
> + * @param max_item_per_flow
> + * the maximum packet number per flow.
> + *
> + * @return
> + * if create successfully, return a pointer which points to the
> + * created TCP/IPv4 GRO table. Otherwise, return NULL.
> + */
> +void *gro_tcp4_tbl_create(uint16_t socket_id,
> + uint16_t max_flow_num,
> + uint16_t max_item_per_flow);
> +
> +/**
> + * This function destroys a TCP/IPv4 reassembly table.
> + *
> + * @param tbl
> + * a pointer points to the TCP/IPv4 reassembly table.
> + */
> +void gro_tcp4_tbl_destroy(void *tbl);
> +
> +/**
> + * This function searches for a packet in the TCP/IPv4 reassembly table
> + * to merge with the inputted one. To merge two packets is to chain them
> + * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
> + * CWR, ECE or URG bit is set, are returned immediately. Packets which
> + * only have packet headers (i.e. without data) are also returned
> + * immediately. Otherwise, the packet is either merged, or inserted into
> + * the table. Besides, if there is no available space to insert the
> + * packet, this function returns immediately too.
> + *
> + * This function assumes the inputted packet is with correct IPv4 and
> + * TCP checksums. And if two packets are merged, it won't re-calculate
> + * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> + * fragmented, it assumes the packet is complete (with TCP header).
> + *
> + * @param pkt
> + * packet to reassemble.
> + * @param tbl
> + * a pointer that points to a TCP/IPv4 reassembly table.
> + * @start_time
> + * the start time that the packet is inserted into the table
> + *
> + * @return
> + * if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
> + * or URG bit is set, or there is no available space in the table to
> + * insert a new item or a new key, return a negative value. If the
> + * packet is merged successfully, return an positive value. If the
> + * packet is inserted into the table, return 0.
> + */
> +int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
> + struct gro_tcp4_tbl *tbl,
> + uint64_t start_time);
> +
> +/**
> + * This function flushes timeout packets in a TCP/IPv4 reassembly table
> + * to applications, and without updating checksums for merged packets.
> + * The max number of flushed timeout packets is the element number of
> + * the array which is used to keep flushed packets.
> + *
> + * @param tbl
> + * a pointer that points to a TCP GRO table.
> + * @param timeout_cycles
> + * the maximum time that packets can stay in the table.
> + * @param out
> + * pointer array which is used to keep flushed packets.
> + * @param nb_out
> + * the element number of out. It's also the max number of timeout
> + * packets that can be flushed finally.
> + *
> + * @return
> + * the number of packets that are returned.
> + */
> +uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> + uint64_t timeout_cycles,
> + struct rte_mbuf **out,
> + uint16_t nb_out);
> +
> +/**
> + * This function returns the number of the packets in a TCP/IPv4
> + * reassembly table.
> + *
> + * @param tbl
> + * pointer points to a TCP/IPv4 reassembly table.
> + *
> + * @return
> + * the number of packets in the table
> + */
> +uint32_t gro_tcp4_tbl_get_count(void *tbl);
> +#endif
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index 24e5f2b..7488845 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -32,8 +32,11 @@
>
> #include <rte_malloc.h>
> #include <rte_mbuf.h>
> +#include <rte_cycles.h>
> +#include <rte_ethdev.h>
>
> #include "rte_gro.h"
> +#include "gro_tcp4.h"
>
> typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> uint16_t max_flow_num,
> @@ -41,9 +44,12 @@ typedef void *(*gro_tbl_create_fn)(uint16_t socket_id,
> typedef void (*gro_tbl_destroy_fn)(void *tbl);
> typedef uint32_t (*gro_tbl_get_count_fn)(void *tbl);
>
> -static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM];
> -static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM];
> +static gro_tbl_create_fn tbl_create_fn[RTE_GRO_TYPE_MAX_NUM] = {
> + gro_tcp4_tbl_create, NULL};
> +static gro_tbl_destroy_fn tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
> + gro_tcp4_tbl_destroy, NULL};
> +static gro_tbl_get_count_fn tbl_get_count_fn[RTE_GRO_TYPE_MAX_NUM] = {
> + gro_tcp4_tbl_get_count, NULL};
>
> /*
> * GRO context structure, which is used to merge packets. It keeps
> @@ -124,27 +130,116 @@ rte_gro_ctx_destroy(void *ctx)
> }
>
> uint16_t
> -rte_gro_reassemble_burst(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> - const struct rte_gro_param *param __rte_unused)
> + const struct rte_gro_param *param)
> {
> - return nb_pkts;
> + uint16_t i;
> + uint16_t nb_after_gro = nb_pkts;
> + uint32_t item_num;
> +
> + /* allocate a reassembly table for TCP/IPv4 GRO */
> + struct gro_tcp4_tbl tcp_tbl;
> + struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> + struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM] = {0};
> +
> + struct rte_mbuf *unprocess_pkts[nb_pkts];
> + uint16_t unprocess_num = 0;
> + int32_t ret;
> + uint64_t current_time;
> +
> + if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
> + return nb_pkts;
> +
> + /* get the actual number of packets */
> + item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> + param->max_item_per_flow));
> + item_num = RTE_MIN(item_num, RTE_GRO_MAX_BURST_ITEM_NUM);
> +
> + tcp_tbl.keys = tcp_keys;
> + tcp_tbl.items = tcp_items;
> + tcp_tbl.key_num = 0;
> + tcp_tbl.item_num = 0;
> + tcp_tbl.max_key_num = item_num;
> + tcp_tbl.max_item_num = item_num;
> +
> + current_time = rte_rdtsc();
> +
> + for (i = 0; i < nb_pkts; i++) {
> + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> + (pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
Keep one style to check the ptypes, either macro or just compare the bit
like:
pkt->packet_type & (RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP) ==
(RTE_PTYPE_L3_IP | RTE_PTYPE_L4_TCP)
> + ret = gro_tcp4_reassemble(pkts[i],
> + &tcp_tbl,
> + current_time);
> + if (ret > 0)
> + /* merge successfully */
> + nb_after_gro--;
> + else if (ret < 0)
> + unprocess_pkts[unprocess_num++] = pkts[i];
> + } else
> + unprocess_pkts[unprocess_num++] = pkts[i];
> + }
> +
> + /* re-arrange GROed packets */
> + if (nb_after_gro < nb_pkts) {
> + i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts);
> + if (unprocess_num > 0) {
> + memcpy(&pkts[i], unprocess_pkts,
> + sizeof(struct rte_mbuf *) * unprocess_num);
> + }
> + }
> +
> + return nb_after_gro;
> }
>
> uint16_t
> -rte_gro_reassemble(struct rte_mbuf **pkts __rte_unused,
> +rte_gro_reassemble(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> - void *ctx __rte_unused)
> + void *ctx)
> {
> - return nb_pkts;
> + uint16_t i, unprocess_num = 0;
> + struct rte_mbuf *unprocess_pkts[nb_pkts];
> + struct gro_ctx *gro_ctx = ctx;
> + uint64_t current_time;
> +
> + if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
> + return nb_pkts;
> +
> + current_time = rte_rdtsc();
> +
> + for (i = 0; i < nb_pkts; i++) {
> + if (RTE_ETH_IS_IPV4_HDR(pkts[i]->packet_type) &&
> + (pkts[i]->packet_type & RTE_PTYPE_L4_TCP)) {
> + if (gro_tcp4_reassemble(pkts[i],
> + gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> + current_time) < 0)
> + unprocess_pkts[unprocess_num++] = pkts[i];
> + } else
> + unprocess_pkts[unprocess_num++] = pkts[i];
> + }
> + if (unprocess_num > 0) {
> + memcpy(pkts, unprocess_pkts,
> + sizeof(struct rte_mbuf *) * unprocess_num);
> + }
> +
> + return unprocess_num;
> }
>
> uint16_t
> -rte_gro_timeout_flush(void *ctx __rte_unused,
> - uint64_t gro_types __rte_unused,
> - struct rte_mbuf **out __rte_unused,
> - uint16_t max_nb_out __rte_unused)
> +rte_gro_timeout_flush(void *ctx,
> + uint64_t gro_types,
> + struct rte_mbuf **out,
> + uint16_t max_nb_out)
> {
> + struct gro_ctx *gro_ctx = ctx;
> +
> + gro_types = gro_types & gro_ctx->gro_types;
> + if (gro_types & RTE_GRO_TCP_IPV4) {
> + return gro_tcp4_tbl_timeout_flush(
> + gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX],
> + gro_ctx->max_timeout_cycles,
> + out, max_nb_out);
> + }
> return 0;
> }
>
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 54a6e82..c2140e6 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -45,8 +45,11 @@ extern "C" {
> /**< max number of supported GRO types */
> #define RTE_GRO_TYPE_MAX_NUM 64
> /**< current supported GRO num */
> -#define RTE_GRO_TYPE_SUPPORT_NUM 0
> +#define RTE_GRO_TYPE_SUPPORT_NUM 1
>
> +/**< TCP/IPv4 GRO flag */
> +#define RTE_GRO_TCP_IPV4_INDEX 0
> +#define RTE_GRO_TCP_IPV4 (1ULL << RTE_GRO_TCP_IPV4_INDEX)
>
> struct rte_gro_param {
> /**< desired GRO types */
More information about the dev
mailing list