[dpdk-dev] [PATCH v4 1/2] gro: code cleanup
Yao, Lei A
lei.a.yao at intel.com
Mon Jan 8 02:15:39 CET 2018
> -----Original Message-----
> From: Hu, Jiayu
> Sent: Friday, January 5, 2018 2:13 PM
> To: dev at dpdk.org
> Cc: Richardson, Bruce <bruce.richardson at intel.com>; Chen, Junjie J
> <junjie.j.chen at intel.com>; Tan, Jianfeng <jianfeng.tan at intel.com>;
> stephen at networkplumber.org; Yigit, Ferruh <ferruh.yigit at intel.com>;
> Ananyev, Konstantin <konstantin.ananyev at intel.com>; Yao, Lei A
> <lei.a.yao at intel.com>; Hu, Jiayu <jiayu.hu at intel.com>
> Subject: [PATCH v4 1/2] gro: code cleanup
>
> - Remove needless check and variants
> - For better understanding, update the programmer guide and rename
> internal functions and variants
> - For supporting tunneled gro, move common internal functions from
> gro_tcp4.c to gro_tcp4.h
> - Comply RFC 6864 to process the IPv4 ID field
>
> Signed-off-by: Jiayu Hu <jiayu.hu at intel.com>
> Reviewed-by: Junjie Chen <junjie.j.chen at intel.com>
Tested-by: Lei Yao<lei.a.yao at intel.com>
I have tested this patch with following traffic follow:
NIC1(In kernel)-->NIC2(pmd, GRO on)-->vhost-user->virtio-net(in VM)
The Iperf test with 1 stream show that GRO VxLAN can improve the
performance from 6 Gbps(GRO off) to 16 Gbps(GRO on).
> ---
> .../prog_guide/generic_receive_offload_lib.rst | 246 ++++++++-------
> doc/guides/prog_guide/img/gro-key-algorithm.svg | 223
> ++++++++++++++
> lib/librte_gro/gro_tcp4.c | 339 +++++++--------------
> lib/librte_gro/gro_tcp4.h | 253 ++++++++++-----
> lib/librte_gro/rte_gro.c | 102 +++----
> lib/librte_gro/rte_gro.h | 92 +++---
> 6 files changed, 750 insertions(+), 505 deletions(-)
> create mode 100644 doc/guides/prog_guide/img/gro-key-algorithm.svg
>
> diff --git a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> index 22e50ec..c2d7a41 100644
> --- a/doc/guides/prog_guide/generic_receive_offload_lib.rst
> +++ b/doc/guides/prog_guide/generic_receive_offload_lib.rst
> @@ -32,128 +32,162 @@ Generic Receive Offload Library
> ===============================
>
> Generic Receive Offload (GRO) is a widely used SW-based offloading
> -technique to reduce per-packet processing overhead. It gains performance
> -by reassembling small packets into large ones. To enable more flexibility
> -to applications, DPDK implements GRO as a standalone library. Applications
> -explicitly use the GRO library to merge small packets into large ones.
> -
> -The GRO library assumes all input packets have correct checksums. In
> -addition, the GRO library doesn't re-calculate checksums for merged
> -packets. If input packets are IP fragmented, the GRO library assumes
> -they are complete packets (i.e. with L4 headers).
> -
> -Currently, the GRO library implements TCP/IPv4 packet reassembly.
> -
> -Reassembly Modes
> -----------------
> -
> -The GRO library provides two reassembly modes: lightweight and
> -heavyweight mode. If applications want to merge packets in a simple way,
> -they can use the lightweight mode API. If applications want more
> -fine-grained controls, they can choose the heavyweight mode API.
> -
> -Lightweight Mode
> -~~~~~~~~~~~~~~~~
> -
> -The ``rte_gro_reassemble_burst()`` function is used for reassembly in
> -lightweight mode. It tries to merge N input packets at a time, where
> -N should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
> -
> -In each invocation, ``rte_gro_reassemble_burst()`` allocates temporary
> -reassembly tables for the desired GRO types. Note that the reassembly
> -table is a table structure used to reassemble packets and different GRO
> -types (e.g. TCP/IPv4 GRO and TCP/IPv6 GRO) have different reassembly
> table
> -structures. The ``rte_gro_reassemble_burst()`` function uses the
> reassembly
> -tables to merge the N input packets.
> -
> -For applications, performing GRO in lightweight mode is simple. They
> -just need to invoke ``rte_gro_reassemble_burst()``. Applications can get
> -GROed packets as soon as ``rte_gro_reassemble_burst()`` returns.
> -
> -Heavyweight Mode
> -~~~~~~~~~~~~~~~~
> -
> -The ``rte_gro_reassemble()`` function is used for reassembly in
> heavyweight
> -mode. Compared with the lightweight mode, performing GRO in
> heavyweight mode
> -is relatively complicated.
> -
> -Before performing GRO, applications need to create a GRO context object
> -by calling ``rte_gro_ctx_create()``. A GRO context object holds the
> -reassembly tables of desired GRO types. Note that all update/lookup
> -operations on the context object are not thread safe. So if different
> -processes or threads want to access the same context object
> simultaneously,
> -some external syncing mechanisms must be used.
> -
> -Once the GRO context is created, applications can then use the
> -``rte_gro_reassemble()`` function to merge packets. In each invocation,
> -``rte_gro_reassemble()`` tries to merge input packets with the packets
> -in the reassembly tables. If an input packet is an unsupported GRO type,
> -or other errors happen (e.g. SYN bit is set), ``rte_gro_reassemble()``
> -returns the packet to applications. Otherwise, the input packet is either
> -merged or inserted into a reassembly table.
> -
> -When applications want to get GRO processed packets, they need to use
> -``rte_gro_timeout_flush()`` to flush them from the tables manually.
> +technique to reduce per-packet processing overheads. By reassembling
> +small packets into larger ones, GRO enables applications to process
> +fewer large packets directly, thus reducing the number of packets to
> +be processed. To benefit DPDK-based applications, like Open vSwitch,
> +DPDK also provides own GRO implementation. In DPDK, GRO is
> implemented
> +as a standalone library. Applications explicitly use the GRO library to
> +reassemble packets.
> +
> +Overview
> +--------
> +
> +In the GRO library, there are many GRO types which are defined by packet
> +types. One GRO type is in charge of process one kind of packets. For
> +example, TCP/IPv4 GRO processes TCP/IPv4 packets.
> +
> +Each GRO type has a reassembly function, which defines own algorithm and
> +table structure to reassemble packets. We assign input packets to the
> +corresponding GRO functions by MBUF->packet_type.
> +
> +The GRO library doesn't check if input packets have correct checksums and
> +doesn't re-calculate checksums for merged packets. The GRO library
> +assumes the packets are complete (i.e., MF==0 && frag_off==0), when IP
> +fragmentation is possible (i.e., DF==0). Additionally, it complies RFC
> +6864 to process the IPv4 ID field.
>
> -TCP/IPv4 GRO
> -------------
> +Currently, the GRO library provides GRO supports for TCP/IPv4 packets.
> +
> +Two Sets of API
> +---------------
> +
> +For different usage scenarios, the GRO library provides two sets of API.
> +The one is called the lightweight mode API, which enables applications to
> +merge a small number of packets rapidly; the other is called the
> +heavyweight mode API, which provides fine-grained controls to
> +applications and supports to merge a large number of packets.
> +
> +Lightweight Mode API
> +~~~~~~~~~~~~~~~~~~~~
> +
> +The lightweight mode only has one function ``rte_gro_reassemble_burst()``,
> +which process N packets at a time. Using the lightweight mode API to
> +merge packets is very simple. Calling ``rte_gro_reassemble_burst()`` is
> +enough. The GROed packets are returned to applications as soon as it
> +finishes.
> +
> +In ``rte_gro_reassemble_burst()``, table structures of different GRO
> +types are allocated in the stack. This design simplifies applications'
> +operations. However, limited by the stack size, the maximum number of
> +packets that ``rte_gro_reassemble_burst()`` can process in an invocation
> +should be less than or equal to ``RTE_GRO_MAX_BURST_ITEM_NUM``.
> +
> +Heavyweight Mode API
> +~~~~~~~~~~~~~~~~~~~~
> +
> +Compared with the lightweight mode, using the heavyweight mode API is
> +relatively complex. Firstly, applications need to create a GRO context
> +by ``rte_gro_ctx_create()``. ``rte_gro_ctx_create()`` allocates tables
> +structures in the heap and stores their pointers in the GRO context.
> +Secondly, applications use ``rte_gro_reassemble()`` to merge packets.
> +If input packets have invalid parameters, ``rte_gro_reassemble()``
> +returns them to applications. For example, packets of unsupported GRO
> +types or TCP SYN packets are returned. Otherwise, the input packets are
> +either merged with the existed packets in the tables or inserted into the
> +tables. Finally, applications use ``rte_gro_timeout_flush()`` to flush
> +packets from the tables, when they want to get the GROed packets.
> +
> +Note that all update/lookup operations on the GRO context are not thread
> +safe. So if different processes or threads want to access the same
> +context object simultaneously, some external syncing mechanisms must be
> +used.
> +
> +Reassembly Algorithm
> +--------------------
> +
> +The reassembly algorithm is used for reassembling packets. In the GRO
> +library, different GRO types can use different algorithms. In this
> +section, we will introduce an algorithm, which is used by TCP/IPv4 GRO.
>
> -TCP/IPv4 GRO supports merging small TCP/IPv4 packets into large ones,
> -using a table structure called the TCP/IPv4 reassembly table.
> +Challenges
> +~~~~~~~~~~
>
> -TCP/IPv4 Reassembly Table
> -~~~~~~~~~~~~~~~~~~~~~~~~~
> +The reassembly algorithm determines the efficiency of GRO. There are two
> +challenges in the algorithm design:
>
> -A TCP/IPv4 reassembly table includes a "key" array and an "item" array.
> -The key array keeps the criteria to merge packets and the item array
> -keeps the packet information.
> +- a high cost algorithm/implementation would cause packet dropping in a
> + high speed network.
>
> -Each key in the key array points to an item group, which consists of
> -packets which have the same criteria values but can't be merged. A key
> -in the key array includes two parts:
> +- packet reordering makes it hard to merge packets. For example, Linux
> + GRO fails to merge packets when encounters packet reordering.
>
> -* ``criteria``: the criteria to merge packets. If two packets can be
> - merged, they must have the same criteria values.
> +The above two challenges require our algorithm is:
>
> -* ``start_index``: the item array index of the first packet in the item
> - group.
> +- lightweight enough to scale fast networking speed
>
> -Each element in the item array keeps the information of a packet. An item
> -in the item array mainly includes three parts:
> +- capable of handling packet reordering
>
> -* ``firstseg``: the mbuf address of the first segment of the packet.
> +In DPDK GRO, we use a key-based algorithm to address the two challenges.
>
> -* ``lastseg``: the mbuf address of the last segment of the packet.
> +Key-based Reassembly Algorithm
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +:numref:`figure_gro-key-algorithm` illustrates the procedure of the
> +key-based algorithm. Packets are classified into "flows" by some header
> +fields (we call them as "key"). To process an input packet, the algorithm
> +searches for a matched "flow" (i.e., the same value of key) for the
> +packet first, then checks all packets in the "flow" and tries to find a
> +"neighbor" for it. If find a "neighbor", merge the two packets together.
> +If can't find a "neighbor", store the packet into its "flow". If can't
> +find a matched "flow", insert a new "flow" and store the packet into the
> +"flow".
> +
> +.. note::
> + Packets in the same "flow" that can't merge are always caused
> + by packet reordering.
> +
> +The key-based algorithm has two characters:
> +
> +- classifying packets into "flows" to accelerate packet aggregation is
> + simple (address challenge 1).
> +
> +- storing out-of-order packets makes it possible to merge later (address
> + challenge 2).
> +
> +.. _figure_gro-key-algorithm:
> +
> +.. figure:: img/gro-key-algorithm.*
> + :align: center
> +
> + Key-based Reassembly Algorithm
> +
> +TCP/IPv4 GRO
> +------------
>
> -* ``next_pkt_index``: the item array index of the next packet in the same
> - item group. TCP/IPv4 GRO uses ``next_pkt_index`` to chain the packets
> - that have the same criteria value but can't be merged together.
> +The table structure used by TCP/IPv4 GRO contains two arrays: flow array
> +and item array. The flow array keeps flow information, and the item array
> +keeps packet information.
>
> -Procedure to Reassemble a Packet
> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +Header fields used to define a TCP/IPv4 flow include:
>
> -To reassemble an incoming packet needs three steps:
> +- source and destination: Ethernet and IP address, TCP port
>
> -#. Check if the packet should be processed. Packets with one of the
> - following properties aren't processed and are returned immediately:
> +- TCP acknowledge number
>
> - * FIN, SYN, RST, URG, PSH, ECE or CWR bit is set.
> +TCP/IPv4 packets whose FIN, SYN, RST, URG, PSH, ECE or CWR bit is set
> +won't be processed.
>
> - * L4 payload length is 0.
> +Header fields deciding if two packets are neighbors include:
>
> -#. Traverse the key array to find a key which has the same criteria
> - value with the incoming packet. If found, go to the next step.
> - Otherwise, insert a new key and a new item for the packet.
> +- TCP sequence number
>
> -#. Locate the first packet in the item group via ``start_index``. Then
> - traverse all packets in the item group via ``next_pkt_index``. If a
> - packet is found which can be merged with the incoming one, merge them
> - together. If one isn't found, insert the packet into this item group.
> - Note that to merge two packets is to link them together via mbuf's
> - ``next`` field.
> +- IPv4 ID. The IPv4 ID fields of the packets, whose DF bit is 0, should
> + be increased by 1.
>
> -When packets are flushed from the reassembly table, TCP/IPv4 GRO
> updates
> -packet header fields for the merged packets. Note that before reassembling
> -the packet, TCP/IPv4 GRO doesn't check if the checksums of packets are
> -correct. Also, TCP/IPv4 GRO doesn't re-calculate checksums for merged
> -packets.
> +.. note::
> + We comply RFC 6864 to process the IPv4 ID field. Specifically,
> + we check IPv4 ID fields for the packets whose DF bit is 0 and
> + ignore IPv4 ID fields for the packets whose DF bit is 1.
> + Additionally, packets which have different value of DF bit can't
> + be merged.
> diff --git a/doc/guides/prog_guide/img/gro-key-algorithm.svg
> b/doc/guides/prog_guide/img/gro-key-algorithm.svg
> new file mode 100644
> index 0000000..94e42f5
> --- /dev/null
> +++ b/doc/guides/prog_guide/img/gro-key-algorithm.svg
> @@ -0,0 +1,223 @@
> +<?xml version="1.0" encoding="UTF-8" standalone="no"?>
> +<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
> "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
> +<!-- Generated by Microsoft Visio 11.0, SVG Export, v1.0 gro-key-
> algorithm.svg Page-1 -->
> +<svg xmlns="http://www.w3.org/2000/svg"
> xmlns:xlink="http://www.w3.org/1999/xlink"
> xmlns:ev="http://www.w3.org/2001/xml-events"
> +
> xmlns:v="http://schemas.microsoft.com/visio/2003/SVGExtensions/
> " width="6.06163in" height="2.66319in"
> + viewBox="0 0 436.438 191.75" xml:space="preserve" color-
> interpolation-filters="sRGB" class="st10">
> + <v:documentProperties v:langID="1033" v:viewMarkup="false"/>
> +
> + <style type="text/css">
> + <![CDATA[
> + .st1 {fill:url(#grad30-4);stroke:#404040;stroke-
> linecap:round;stroke-linejoin:round;stroke-width:0.25}
> + .st2 {fill:#000000;font-family:Calibri;font-size:1.00001em}
> + .st3 {font-size:1em;font-weight:bold}
> + .st4 {fill:#000000;font-family:Calibri;font-size:1.00001em;font-
> weight:bold}
> + .st5 {font-size:1em;font-weight:normal}
> + .st6 {marker-end:url(#mrkr5-38);stroke:#404040;stroke-
> linecap:round;stroke-linejoin:round;stroke-width:1}
> + .st7 {fill:#404040;fill-opacity:1;stroke:#404040;stroke-
> opacity:1;stroke-width:0.28409090909091}
> + .st8 {fill:none;stroke:none;stroke-linecap:round;stroke-
> linejoin:round;stroke-width:0.25}
> + .st9 {fill:#000000;font-family:Calibri;font-size:0.833336em}
> + .st10 {fill:none;fill-rule:evenodd;font-
> size:12px;overflow:visible;stroke-linecap:square;stroke-miterlimit:3}
> + ]]>
> + </style>
> +
> + <defs id="Patterns_And_Gradients">
> + <linearGradient id="grad30-4" v:fillPattern="30"
> v:foreground="#c6d09f" v:background="#d1dab4" x1="0" y1="1" x2="0"
> y2="0">
> + <stop offset="0" style="stop-color:#c6d09f;stop-
> opacity:1"/>
> + <stop offset="1" style="stop-color:#d1dab4;stop-
> opacity:1"/>
> + </linearGradient>
> + <linearGradient id="grad30-35" v:fillPattern="30"
> v:foreground="#f0f0f0" v:background="#ffffff" x1="0" y1="1" x2="0" y2="0">
> + <stop offset="0" style="stop-color:#f0f0f0;stop-
> opacity:1"/>
> + <stop offset="1" style="stop-color:#ffffff;stop-
> opacity:1"/>
> + </linearGradient>
> + </defs>
> + <defs id="Markers">
> + <g id="lend5">
> + <path d="M 2 1 L 0 0 L 1.98117 -0.993387 C 1.67173 -
> 0.364515 1.67301 0.372641 1.98465 1.00043 " style="stroke:none"/>
> + </g>
> + <marker id="mrkr5-38" class="st7" v:arrowType="5"
> v:arrowSize="2" v:setback="6.16" refX="-6.16" orient="auto"
> + markerUnits="strokeWidth"
> overflow="visible">
> + <use xlink:href="#lend5" transform="scale(-3.52,-
> 3.52) "/>
> + </marker>
> + </defs>
> + <g v:mID="0" v:index="1" v:groupContext="foregroundPage">
> + <title>Page-1</title>
> + <v:pageProperties v:drawingScale="1" v:pageScale="1"
> v:drawingUnits="0" v:shadowOffsetX="9" v:shadowOffsetY="-9"/>
> + <v:layer v:name="Connector" v:index="0"/>
> + <g id="shape1-1" v:mID="1" v:groupContext="shape"
> transform="translate(0.25,-117.25)">
> + <title>Rounded rectangle</title>
> + <desc>Categorize into an existed “flow”</desc>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="90" cy="173.75" width="180"
> height="36"/>
> + <path d="M171 191.75 A9.00007 9.00007 -180 0 0 180
> 182.75 L180 164.75 A9.00007 9.00007 -180 0 0 171 155.75 L9 155.75
> + A9.00007 9.00007 -180 0 0 -0
> 164.75 L0 182.75 A9.00007 9.00007 -180 0 0 9 191.75 L171 191.75 Z"
> + class="st1"/>
> + <text x="8.91" y="177.35" class="st2"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>Categorize into
> an <tspan
> +
> class="st3">existed</tspan><tspan class="st3" v:langID="2052">
> </tspan>“<tspan class="st3">flow</tspan>”</text> </g>
> + <g id="shape2-9" v:mID="2" v:groupContext="shape"
> transform="translate(0.25,-58.75)">
> + <title>Rounded rectangle.2</title>
> + <desc>Search for a “neighbor”</desc>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="90" cy="173.75" width="180"
> height="36"/>
> + <path d="M171 191.75 A9.00007 9.00007 -180 0 0 180
> 182.75 L180 164.75 A9.00007 9.00007 -180 0 0 171 155.75 L9 155.75
> + A9.00007 9.00007 -180 0 0 -0
> 164.75 L0 182.75 A9.00007 9.00007 -180 0 0 9 191.75 L171 191.75 Z"
> + class="st1"/>
> + <text x="32.19" y="177.35" class="st2"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>Search for a
> “<tspan
> +
> class="st3">neighbor</tspan>”</text> </g>
> + <g id="shape3-14" v:mID="3" v:groupContext="shape"
> transform="translate(225.813,-117.25)">
> + <title>Rounded rectangle.3</title>
> + <desc>Insert a new “flow” and store the
> packet</desc>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="105.188" cy="173.75" width="210.38"
> height="36"/>
> + <path d="M201.37 191.75 A9.00007 9.00007 -180 0 0
> 210.37 182.75 L210.37 164.75 A9.00007 9.00007 -180 0 0 201.37 155.75
> + L9 155.75 A9.00007 9.00007 -
> 180 0 0 -0 164.75 L0 182.75 A9.00007 9.00007 -180 0 0 9 191.75 L201.37 191.75
> + Z" class="st1"/>
> + <text x="5.45" y="177.35" class="st2"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>Insert a <tspan
> + class="st3">new
> </tspan>“<tspan class="st3">flow</tspan>” and <tspan class="st3">store
> </tspan>the packet</text> </g>
> + <g id="shape4-21" v:mID="4" v:groupContext="shape"
> transform="translate(225.25,-58.75)">
> + <title>Rounded rectangle.4</title>
> + <desc>Store the packet</desc>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="83.25" cy="173.75" width="166.5"
> height="36"/>
> + <path d="M157.5 191.75 A9.00007 9.00007 -180 0 0
> 166.5 182.75 L166.5 164.75 A9.00007 9.00007 -180 0 0 157.5 155.75 L9
> + 155.75 A9.00007 9.00007 -180
> 0 0 -0 164.75 L0 182.75 A9.00007 9.00007 -180 0 0 9 191.75 L157.5 191.75 Z"
> + class="st1"/>
> + <text x="42.81" y="177.35" class="st4"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>Store <tspan
> + class="st5">the
> packet</tspan></text> </g>
> + <g id="shape5-26" v:mID="5" v:groupContext="shape"
> transform="translate(0.25,-0.25)">
> + <title>Rounded rectangle.5</title>
> + <desc>Merge the packet</desc>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="90" cy="173.75" width="180"
> height="36"/>
> + <path d="M171 191.75 A9.00007 9.00007 -180 0 0 180
> 182.75 L180 164.75 A9.00007 9.00007 -180 0 0 171 155.75 L9 155.75
> + A9.00007 9.00007 -180 0 0 -0
> 164.75 L0 182.75 A9.00007 9.00007 -180 0 0 9 191.75 L171 191.75 Z"
> + class="st1"/>
> + <text x="46.59" y="177.35" class="st4"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>Merge <tspan
> + class="st5">the
> packet</tspan></text> </g>
> + <g id="shape6-31" v:mID="6" v:groupContext="shape"
> v:layerMember="0" transform="translate(81.25,-175.75)">
> + <title>Dynamic connector</title>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <path d="M9 191.75 L9 208.09" class="st6"/>
> + </g>
> + <g id="shape7-39" v:mID="7" v:groupContext="shape"
> v:layerMember="0" transform="translate(81.25,-117.25)">
> + <title>Dynamic connector.7</title>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <path d="M9 191.75 L9 208.09" class="st6"/>
> + </g>
> + <g id="shape8-45" v:mID="8" v:groupContext="shape"
> v:layerMember="0" transform="translate(81.25,-58.75)">
> + <title>Dynamic connector.8</title>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <path d="M9 191.75 L9 208.09" class="st6"/>
> + </g>
> + <g id="shape9-51" v:mID="9" v:groupContext="shape"
> v:layerMember="0" transform="translate(180.25,-126.25)">
> + <title>Dynamic connector.9</title>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <path d="M0 182.75 L39.4 182.75" class="st6"/>
> + </g>
> + <g id="shape10-57" v:mID="10" v:groupContext="shape"
> v:layerMember="0" transform="translate(180.25,-67.75)">
> + <title>Dynamic connector.10</title>
> + <v:userDefs>
> + <v:ud v:nameU="visVersion"
> v:val="VT0(14):26"/>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <path d="M0 182.75 L38.84 182.75" class="st6"/>
> + </g>
> + <g id="shape11-63" v:mID="11" v:groupContext="shape"
> transform="translate(65.5,-173.5)">
> + <title>Sheet.11</title>
> + <desc>packet</desc>
> + <v:userDefs>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="24.75" cy="182.75" width="49.5"
> height="18"/>
> + <rect x="0" y="173.75" width="49.5" height="18"
> class="st8"/>
> + <text x="8.46" y="186.35" class="st2"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>packet</text>
> </g>
> + <g id="shape14-66" v:mID="14" v:groupContext="shape"
> transform="translate(98.125,-98.125)">
> + <title>Sheet.14</title>
> + <desc>find a “flow”</desc>
> + <v:userDefs>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="32.0625" cy="183.875" width="64.13"
> height="15.75"/>
> + <rect x="0" y="176" width="64.125" height="15.75"
> class="st8"/>
> + <text x="6.41" y="186.88" class="st9"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>find a
> “flow”</text> </g>
> + <g id="shape15-69" v:mID="15" v:groupContext="shape"
> transform="translate(99.25,-39.625)">
> + <title>Sheet.15</title>
> + <desc>find a “neighbor”</desc>
> + <v:userDefs>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="40.5" cy="183.875" width="81"
> height="15.75"/>
> + <rect x="0" y="176" width="81" height="15.75"
> class="st8"/>
> + <text x="5.48" y="186.88" class="st9"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>find a
> “neighbor”</text> </g>
> + <g id="shape13-72" v:mID="13" v:groupContext="shape"
> transform="translate(181.375,-79)">
> + <title>Sheet.13</title>
> + <desc>not find</desc>
> + <v:userDefs>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="21.375" cy="183.875" width="42.75"
> height="15.75"/>
> + <rect x="0" y="176" width="42.75" height="15.75"
> class="st8"/>
> + <text x="5.38" y="186.88" class="st9"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>not find</text>
> </g>
> + <g id="shape12-75" v:mID="12" v:groupContext="shape"
> transform="translate(181.375,-137.5)">
> + <title>Sheet.12</title>
> + <desc>not find</desc>
> + <v:userDefs>
> + <v:ud v:nameU="msvThemeColors"
> v:val="VT0(36):26"/>
> + <v:ud v:nameU="msvThemeEffects"
> v:val="VT0(16):26"/>
> + </v:userDefs>
> + <v:textBlock v:margins="rect(4,4,4,4)"/>
> + <v:textRect cx="21.375" cy="183.875" width="42.75"
> height="15.75"/>
> + <rect x="0" y="176" width="42.75" height="15.75"
> class="st8"/>
> + <text x="5.38" y="186.88" class="st9"
> v:langID="1033"><v:paragraph v:horizAlign="1"/><v:tabList/>not find</text>
> </g>
> + </g>
> +</svg>
> diff --git a/lib/librte_gro/gro_tcp4.c b/lib/librte_gro/gro_tcp4.c
> index 03e5ccf..27af23e 100644
> --- a/lib/librte_gro/gro_tcp4.c
> +++ b/lib/librte_gro/gro_tcp4.c
> @@ -6,8 +6,6 @@
> #include <rte_mbuf.h>
> #include <rte_cycles.h>
> #include <rte_ethdev.h>
> -#include <rte_ip.h>
> -#include <rte_tcp.h>
>
> #include "gro_tcp4.h"
>
> @@ -44,20 +42,20 @@ gro_tcp4_tbl_create(uint16_t socket_id,
> }
> tbl->max_item_num = entries_num;
>
> - size = sizeof(struct gro_tcp4_key) * entries_num;
> - tbl->keys = rte_zmalloc_socket(__func__,
> + size = sizeof(struct gro_tcp4_flow) * entries_num;
> + tbl->flows = rte_zmalloc_socket(__func__,
> size,
> RTE_CACHE_LINE_SIZE,
> socket_id);
> - if (tbl->keys == NULL) {
> + if (tbl->flows == NULL) {
> rte_free(tbl->items);
> rte_free(tbl);
> return NULL;
> }
> - /* INVALID_ARRAY_INDEX indicates empty key */
> + /* INVALID_ARRAY_INDEX indicates an empty flow */
> for (i = 0; i < entries_num; i++)
> - tbl->keys[i].start_index = INVALID_ARRAY_INDEX;
> - tbl->max_key_num = entries_num;
> + tbl->flows[i].start_index = INVALID_ARRAY_INDEX;
> + tbl->max_flow_num = entries_num;
>
> return tbl;
> }
> @@ -69,116 +67,15 @@ gro_tcp4_tbl_destroy(void *tbl)
>
> if (tcp_tbl) {
> rte_free(tcp_tbl->items);
> - rte_free(tcp_tbl->keys);
> + rte_free(tcp_tbl->flows);
> }
> rte_free(tcp_tbl);
> }
>
> -/*
> - * merge two TCP/IPv4 packets without updating checksums.
> - * If cmp is larger than 0, append the new packet to the
> - * original packet. Otherwise, pre-pend the new packet to
> - * the original packet.
> - */
> -static inline int
> -merge_two_tcp4_packets(struct gro_tcp4_item *item_src,
> - struct rte_mbuf *pkt,
> - uint16_t ip_id,
> - uint32_t sent_seq,
> - int cmp)
> -{
> - struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> - uint16_t tcp_datalen;
> -
> - if (cmp > 0) {
> - pkt_head = item_src->firstseg;
> - pkt_tail = pkt;
> - } else {
> - pkt_head = pkt;
> - pkt_tail = item_src->firstseg;
> - }
> -
> - /* check if the packet length will be beyond the max value */
> - tcp_datalen = pkt_tail->pkt_len - pkt_tail->l2_len -
> - pkt_tail->l3_len - pkt_tail->l4_len;
> - if (pkt_head->pkt_len - pkt_head->l2_len + tcp_datalen >
> - TCP4_MAX_L3_LENGTH)
> - return 0;
> -
> - /* remove packet header for the tail packet */
> - rte_pktmbuf_adj(pkt_tail,
> - pkt_tail->l2_len +
> - pkt_tail->l3_len +
> - pkt_tail->l4_len);
> -
> - /* chain two packets together */
> - if (cmp > 0) {
> - item_src->lastseg->next = pkt;
> - item_src->lastseg = rte_pktmbuf_lastseg(pkt);
> - /* update IP ID to the larger value */
> - item_src->ip_id = ip_id;
> - } else {
> - lastseg = rte_pktmbuf_lastseg(pkt);
> - lastseg->next = item_src->firstseg;
> - item_src->firstseg = pkt;
> - /* update sent_seq to the smaller value */
> - item_src->sent_seq = sent_seq;
> - }
> - item_src->nb_merged++;
> -
> - /* update mbuf metadata for the merged packet */
> - pkt_head->nb_segs += pkt_tail->nb_segs;
> - pkt_head->pkt_len += pkt_tail->pkt_len;
> -
> - return 1;
> -}
> -
> -static inline int
> -check_seq_option(struct gro_tcp4_item *item,
> - struct tcp_hdr *tcp_hdr,
> - uint16_t tcp_hl,
> - uint16_t tcp_dl,
> - uint16_t ip_id,
> - uint32_t sent_seq)
> -{
> - struct rte_mbuf *pkt0 = item->firstseg;
> - struct ipv4_hdr *ipv4_hdr0;
> - struct tcp_hdr *tcp_hdr0;
> - uint16_t tcp_hl0, tcp_dl0;
> - uint16_t len;
> -
> - ipv4_hdr0 = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt0, char *) +
> - pkt0->l2_len);
> - tcp_hdr0 = (struct tcp_hdr *)((char *)ipv4_hdr0 + pkt0->l3_len);
> - tcp_hl0 = pkt0->l4_len;
> -
> - /* check if TCP option fields equal. If not, return 0. */
> - len = RTE_MAX(tcp_hl, tcp_hl0) - sizeof(struct tcp_hdr);
> - if ((tcp_hl != tcp_hl0) ||
> - ((len > 0) && (memcmp(tcp_hdr + 1,
> - tcp_hdr0 + 1,
> - len) != 0)))
> - return 0;
> -
> - /* check if the two packets are neighbors */
> - tcp_dl0 = pkt0->pkt_len - pkt0->l2_len - pkt0->l3_len - tcp_hl0;
> - if ((sent_seq == (item->sent_seq + tcp_dl0)) &&
> - (ip_id == (item->ip_id + 1)))
> - /* append the new packet */
> - return 1;
> - else if (((sent_seq + tcp_dl) == item->sent_seq) &&
> - ((ip_id + item->nb_merged) == item->ip_id))
> - /* pre-pend the new packet */
> - return -1;
> - else
> - return 0;
> -}
> -
> static inline uint32_t
> find_an_empty_item(struct gro_tcp4_tbl *tbl)
> {
> - uint32_t i;
> - uint32_t max_item_num = tbl->max_item_num;
> + uint32_t max_item_num = tbl->max_item_num, i;
>
> for (i = 0; i < max_item_num; i++)
> if (tbl->items[i].firstseg == NULL)
> @@ -187,13 +84,12 @@ find_an_empty_item(struct gro_tcp4_tbl *tbl)
> }
>
> static inline uint32_t
> -find_an_empty_key(struct gro_tcp4_tbl *tbl)
> +find_an_empty_flow(struct gro_tcp4_tbl *tbl)
> {
> - uint32_t i;
> - uint32_t max_key_num = tbl->max_key_num;
> + uint32_t max_flow_num = tbl->max_flow_num, i;
>
> - for (i = 0; i < max_key_num; i++)
> - if (tbl->keys[i].start_index == INVALID_ARRAY_INDEX)
> + for (i = 0; i < max_flow_num; i++)
> + if (tbl->flows[i].start_index == INVALID_ARRAY_INDEX)
> return i;
> return INVALID_ARRAY_INDEX;
> }
> @@ -201,10 +97,11 @@ find_an_empty_key(struct gro_tcp4_tbl *tbl)
> static inline uint32_t
> insert_new_item(struct gro_tcp4_tbl *tbl,
> struct rte_mbuf *pkt,
> - uint16_t ip_id,
> - uint32_t sent_seq,
> + uint64_t start_time,
> uint32_t prev_idx,
> - uint64_t start_time)
> + uint32_t sent_seq,
> + uint16_t ip_id,
> + uint8_t is_atomic)
> {
> uint32_t item_idx;
>
> @@ -219,9 +116,10 @@ insert_new_item(struct gro_tcp4_tbl *tbl,
> tbl->items[item_idx].sent_seq = sent_seq;
> tbl->items[item_idx].ip_id = ip_id;
> tbl->items[item_idx].nb_merged = 1;
> + tbl->items[item_idx].is_atomic = is_atomic;
> tbl->item_num++;
>
> - /* if the previous packet exists, chain the new one with it */
> + /* If the previous packet exists, chain them together. */
> if (prev_idx != INVALID_ARRAY_INDEX) {
> tbl->items[item_idx].next_pkt_idx =
> tbl->items[prev_idx].next_pkt_idx;
> @@ -232,12 +130,13 @@ insert_new_item(struct gro_tcp4_tbl *tbl,
> }
>
> static inline uint32_t
> -delete_item(struct gro_tcp4_tbl *tbl, uint32_t item_idx,
> +delete_item(struct gro_tcp4_tbl *tbl,
> + uint32_t item_idx,
> uint32_t prev_item_idx)
> {
> uint32_t next_idx = tbl->items[item_idx].next_pkt_idx;
>
> - /* set NULL to firstseg to indicate it's an empty item */
> + /* NULL indicates an empty item. */
> tbl->items[item_idx].firstseg = NULL;
> tbl->item_num--;
> if (prev_item_idx != INVALID_ARRAY_INDEX)
> @@ -247,53 +146,33 @@ delete_item(struct gro_tcp4_tbl *tbl, uint32_t
> item_idx,
> }
>
> static inline uint32_t
> -insert_new_key(struct gro_tcp4_tbl *tbl,
> - struct tcp4_key *key_src,
> +insert_new_flow(struct gro_tcp4_tbl *tbl,
> + struct tcp4_flow_key *src,
> uint32_t item_idx)
> {
> - struct tcp4_key *key_dst;
> - uint32_t key_idx;
> + struct tcp4_flow_key *dst;
> + uint32_t flow_idx;
>
> - key_idx = find_an_empty_key(tbl);
> - if (key_idx == INVALID_ARRAY_INDEX)
> + flow_idx = find_an_empty_flow(tbl);
> + if (unlikely(flow_idx == INVALID_ARRAY_INDEX))
> return INVALID_ARRAY_INDEX;
>
> - key_dst = &(tbl->keys[key_idx].key);
> + dst = &(tbl->flows[flow_idx].key);
>
> - ether_addr_copy(&(key_src->eth_saddr), &(key_dst->eth_saddr));
> - ether_addr_copy(&(key_src->eth_daddr), &(key_dst->eth_daddr));
> - key_dst->ip_src_addr = key_src->ip_src_addr;
> - key_dst->ip_dst_addr = key_src->ip_dst_addr;
> - key_dst->recv_ack = key_src->recv_ack;
> - key_dst->src_port = key_src->src_port;
> - key_dst->dst_port = key_src->dst_port;
> + ether_addr_copy(&(src->eth_saddr), &(dst->eth_saddr));
> + ether_addr_copy(&(src->eth_daddr), &(dst->eth_daddr));
> + dst->ip_src_addr = src->ip_src_addr;
> + dst->ip_dst_addr = src->ip_dst_addr;
> + dst->recv_ack = src->recv_ack;
> + dst->src_port = src->src_port;
> + dst->dst_port = src->dst_port;
>
> - /* non-INVALID_ARRAY_INDEX value indicates this key is valid */
> - tbl->keys[key_idx].start_index = item_idx;
> - tbl->key_num++;
> + tbl->flows[flow_idx].start_index = item_idx;
> + tbl->flow_num++;
>
> - return key_idx;
> + return flow_idx;
> }
>
> -static inline int
> -is_same_key(struct tcp4_key k1, struct tcp4_key k2)
> -{
> - if (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) == 0)
> - return 0;
> -
> - if (is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr) == 0)
> - return 0;
> -
> - return ((k1.ip_src_addr == k2.ip_src_addr) &&
> - (k1.ip_dst_addr == k2.ip_dst_addr) &&
> - (k1.recv_ack == k2.recv_ack) &&
> - (k1.src_port == k2.src_port) &&
> - (k1.dst_port == k2.dst_port));
> -}
> -
> -/*
> - * update packet length for the flushed packet.
> - */
> static inline void
> update_header(struct gro_tcp4_item *item)
> {
> @@ -315,84 +194,106 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
> struct ipv4_hdr *ipv4_hdr;
> struct tcp_hdr *tcp_hdr;
> uint32_t sent_seq;
> - uint16_t tcp_dl, ip_id;
> + uint16_t tcp_dl, ip_id, frag_off, hdr_len;
> + uint8_t is_atomic;
>
> - struct tcp4_key key;
> + struct tcp4_flow_key key;
> uint32_t cur_idx, prev_idx, item_idx;
> - uint32_t i, max_key_num;
> + uint32_t i, max_flow_num, left_flow_num;
> int cmp;
> + uint8_t find;
>
> eth_hdr = rte_pktmbuf_mtod(pkt, struct ether_hdr *);
> ipv4_hdr = (struct ipv4_hdr *)((char *)eth_hdr + pkt->l2_len);
> tcp_hdr = (struct tcp_hdr *)((char *)ipv4_hdr + pkt->l3_len);
> + hdr_len = pkt->l2_len + pkt->l3_len + pkt->l4_len;
>
> /*
> - * if FIN, SYN, RST, PSH, URG, ECE or
> - * CWR is set, return immediately.
> + * Don't process the packet which has FIN, SYN, RST, PSH, URG, ECE
> + * or CWR set.
> */
> if (tcp_hdr->tcp_flags != TCP_ACK_FLAG)
> return -1;
> - /* if payload length is 0, return immediately */
> - tcp_dl = rte_be_to_cpu_16(ipv4_hdr->total_length) - pkt->l3_len -
> - pkt->l4_len;
> - if (tcp_dl == 0)
> + /*
> + * Don't process the packet whose payload length is less than or
> + * equal to 0.
> + */
> + tcp_dl = pkt->pkt_len - hdr_len;
> + if (tcp_dl <= 0)
> return -1;
>
> - ip_id = rte_be_to_cpu_16(ipv4_hdr->packet_id);
> + /*
> + * Save IPv4 ID for the packet whose DF bit is 0. For the packet
> + * whose DF bit is 1, IPv4 ID is ignored.
> + */
> + frag_off = rte_be_to_cpu_16(ipv4_hdr->fragment_offset);
> + is_atomic = (frag_off & IPV4_HDR_DF_FLAG) == IPV4_HDR_DF_FLAG;
> + ip_id = is_atomic ? 0 : rte_be_to_cpu_16(ipv4_hdr->packet_id);
> sent_seq = rte_be_to_cpu_32(tcp_hdr->sent_seq);
>
> ether_addr_copy(&(eth_hdr->s_addr), &(key.eth_saddr));
> ether_addr_copy(&(eth_hdr->d_addr), &(key.eth_daddr));
> key.ip_src_addr = ipv4_hdr->src_addr;
> key.ip_dst_addr = ipv4_hdr->dst_addr;
> + key.recv_ack = tcp_hdr->recv_ack;
> key.src_port = tcp_hdr->src_port;
> key.dst_port = tcp_hdr->dst_port;
> - key.recv_ack = tcp_hdr->recv_ack;
>
> - /* search for a key */
> - max_key_num = tbl->max_key_num;
> - for (i = 0; i < max_key_num; i++) {
> - if ((tbl->keys[i].start_index != INVALID_ARRAY_INDEX) &&
> - is_same_key(tbl->keys[i].key, key))
> - break;
> + /* Search for a matched flow. */
> + max_flow_num = tbl->max_flow_num;
> + left_flow_num = tbl->flow_num;
> + find = 0;
> + for (i = 0; i < max_flow_num && left_flow_num; i++) {
> + if (tbl->flows[i].start_index != INVALID_ARRAY_INDEX) {
> + if (is_same_tcp4_flow(tbl->flows[i].key, key)) {
> + find = 1;
> + break;
> + }
> + left_flow_num--;
> + }
> }
>
> - /* can't find a key, so insert a new key and a new item. */
> - if (i == tbl->max_key_num) {
> - item_idx = insert_new_item(tbl, pkt, ip_id, sent_seq,
> - INVALID_ARRAY_INDEX, start_time);
> + /*
> + * Fail to find a matched flow. Insert a new flow and store the
> + * packet into the flow.
> + */
> + if (find == 0) {
> + item_idx = insert_new_item(tbl, pkt, start_time,
> + INVALID_ARRAY_INDEX, sent_seq, ip_id,
> + is_atomic);
> if (item_idx == INVALID_ARRAY_INDEX)
> return -1;
> - if (insert_new_key(tbl, &key, item_idx) ==
> + if (insert_new_flow(tbl, &key, item_idx) ==
> INVALID_ARRAY_INDEX) {
> - /*
> - * fail to insert a new key, so
> - * delete the inserted item
> - */
> + /* Fail to insert a new flow. */
> delete_item(tbl, item_idx, INVALID_ARRAY_INDEX);
> return -1;
> }
> return 0;
> }
>
> - /* traverse all packets in the item group to find one to merge */
> - cur_idx = tbl->keys[i].start_index;
> + /*
> + * Check all packets in the flow and try to find a neighbor for
> + * the input packet.
> + */
> + cur_idx = tbl->flows[i].start_index;
> prev_idx = cur_idx;
> do {
> cmp = check_seq_option(&(tbl->items[cur_idx]), tcp_hdr,
> - pkt->l4_len, tcp_dl, ip_id, sent_seq);
> + sent_seq, ip_id, pkt->l4_len, tcp_dl, 0,
> + is_atomic);
> if (cmp) {
> if (merge_two_tcp4_packets(&(tbl->items[cur_idx]),
> - pkt, ip_id,
> - sent_seq, cmp))
> + pkt, cmp, sent_seq, ip_id, 0))
> return 1;
> /*
> - * fail to merge two packets since the packet
> - * length will be greater than the max value.
> - * So insert the packet into the item group.
> + * Fail to merge the two packets, as the packet
> + * length is greater than the max value. Store
> + * the packet into the flow.
> */
> - if (insert_new_item(tbl, pkt, ip_id, sent_seq,
> - prev_idx, start_time) ==
> + if (insert_new_item(tbl, pkt, start_time, prev_idx,
> + sent_seq, ip_id,
> + is_atomic) ==
> INVALID_ARRAY_INDEX)
> return -1;
> return 0;
> @@ -401,12 +302,9 @@ gro_tcp4_reassemble(struct rte_mbuf *pkt,
> cur_idx = tbl->items[cur_idx].next_pkt_idx;
> } while (cur_idx != INVALID_ARRAY_INDEX);
>
> - /*
> - * can't find a packet in the item group to merge,
> - * so insert the packet into the item group.
> - */
> - if (insert_new_item(tbl, pkt, ip_id, sent_seq, prev_idx,
> - start_time) == INVALID_ARRAY_INDEX)
> + /* Fail to find a neighbor, so store the packet into the flow. */
> + if (insert_new_item(tbl, pkt, start_time, prev_idx, sent_seq,
> + ip_id, is_atomic) == INVALID_ARRAY_INDEX)
> return -1;
>
> return 0;
> @@ -418,46 +316,35 @@ gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl
> *tbl,
> struct rte_mbuf **out,
> uint16_t nb_out)
> {
> - uint16_t k = 0;
> + uint32_t max_flow_num = tbl->max_flow_num;
> uint32_t i, j;
> - uint32_t max_key_num = tbl->max_key_num;
> + uint16_t k = 0;
>
> - for (i = 0; i < max_key_num; i++) {
> - /* all keys have been checked, return immediately */
> - if (tbl->key_num == 0)
> + for (i = 0; i < max_flow_num; i++) {
> + if (unlikely(tbl->flow_num == 0))
> return k;
>
> - j = tbl->keys[i].start_index;
> + j = tbl->flows[i].start_index;
> while (j != INVALID_ARRAY_INDEX) {
> if (tbl->items[j].start_time <= flush_timestamp) {
> out[k++] = tbl->items[j].firstseg;
> if (tbl->items[j].nb_merged > 1)
> update_header(&(tbl->items[j]));
> /*
> - * delete the item and get
> - * the next packet index
> + * Delete the packet and get the next
> + * packet in the flow.
> */
> - j = delete_item(tbl, j,
> - INVALID_ARRAY_INDEX);
> + j = delete_item(tbl, j,
> INVALID_ARRAY_INDEX);
> + tbl->flows[i].start_index = j;
> + if (j == INVALID_ARRAY_INDEX)
> + tbl->flow_num--;
>
> - /*
> - * delete the key as all of
> - * packets are flushed
> - */
> - if (j == INVALID_ARRAY_INDEX) {
> - tbl->keys[i].start_index =
> - INVALID_ARRAY_INDEX;
> - tbl->key_num--;
> - } else
> - /* update start_index of the key */
> - tbl->keys[i].start_index = j;
> -
> - if (k == nb_out)
> + if (unlikely(k == nb_out))
> return k;
> } else
> /*
> - * left packets of this key won't be
> - * timeout, so go to check other keys.
> + * The left packets in this flow won't be
> + * timeout. Go to check other flows.
> */
> break;
> }
> diff --git a/lib/librte_gro/gro_tcp4.h b/lib/librte_gro/gro_tcp4.h
> index d129523..c2b66a8 100644
> --- a/lib/librte_gro/gro_tcp4.h
> +++ b/lib/librte_gro/gro_tcp4.h
> @@ -5,17 +5,20 @@
> #ifndef _GRO_TCP4_H_
> #define _GRO_TCP4_H_
>
> +#include <rte_ip.h>
> +#include <rte_tcp.h>
> +
> #define INVALID_ARRAY_INDEX 0xffffffffUL
> #define GRO_TCP4_TBL_MAX_ITEM_NUM (1024UL * 1024UL)
>
> /*
> - * the max L3 length of a TCP/IPv4 packet. The L3 length
> - * is the sum of ipv4 header, tcp header and L4 payload.
> + * The max length of a IPv4 packet, which includes the length of the L3
> + * header, the L4 header and the data payload.
> */
> -#define TCP4_MAX_L3_LENGTH UINT16_MAX
> +#define MAX_IPV4_PKT_LENGTH UINT16_MAX
>
> -/* criteria of mergeing packets */
> -struct tcp4_key {
> +/* Header fields representing a TCP/IPv4 flow */
> +struct tcp4_flow_key {
> struct ether_addr eth_saddr;
> struct ether_addr eth_daddr;
> uint32_t ip_src_addr;
> @@ -26,77 +29,76 @@ struct tcp4_key {
> uint16_t dst_port;
> };
>
> -struct gro_tcp4_key {
> - struct tcp4_key key;
> +struct gro_tcp4_flow {
> + struct tcp4_flow_key key;
> /*
> - * the index of the first packet in the item group.
> - * If the value is INVALID_ARRAY_INDEX, it means
> - * the key is empty.
> + * The index of the first packet in the flow.
> + * INVALID_ARRAY_INDEX indicates an empty flow.
> */
> uint32_t start_index;
> };
>
> struct gro_tcp4_item {
> /*
> - * first segment of the packet. If the value
> + * The first MBUF segment of the packet. If the value
> * is NULL, it means the item is empty.
> */
> struct rte_mbuf *firstseg;
> - /* last segment of the packet */
> + /* The last MBUF segment of the packet */
> struct rte_mbuf *lastseg;
> /*
> - * the time when the first packet is inserted
> - * into the table. If a packet in the table is
> - * merged with an incoming packet, this value
> - * won't be updated. We set this value only
> - * when the first packet is inserted into the
> - * table.
> + * The time when the first packet is inserted into the table.
> + * This value won't be updated, even if the packet is merged
> + * with other packets.
> */
> uint64_t start_time;
> /*
> - * we use next_pkt_idx to chain the packets that
> - * have same key value but can't be merged together.
> + * next_pkt_idx is used to chain the packets that
> + * are in the same flow but can't be merged together
> + * (e.g. caused by packet reordering).
> */
> uint32_t next_pkt_idx;
> - /* the sequence number of the packet */
> + /* TCP sequence number of the packet */
> uint32_t sent_seq;
> - /* the IP ID of the packet */
> + /* IPv4 ID of the packet */
> uint16_t ip_id;
> - /* the number of merged packets */
> + /* The number of merged packets */
> uint16_t nb_merged;
> + /* Indicate if IPv4 ID can be ignored */
> + uint8_t is_atomic;
> };
>
> /*
> - * TCP/IPv4 reassembly table structure.
> + * TCP/IPv4 reassembly table structure
> */
> struct gro_tcp4_tbl {
> /* item array */
> struct gro_tcp4_item *items;
> - /* key array */
> - struct gro_tcp4_key *keys;
> + /* flow array */
> + struct gro_tcp4_flow *flows;
> /* current item number */
> uint32_t item_num;
> - /* current key num */
> - uint32_t key_num;
> + /* current flow num */
> + uint32_t flow_num;
> /* item array size */
> uint32_t max_item_num;
> - /* key array size */
> - uint32_t max_key_num;
> + /* flow array size */
> + uint32_t max_flow_num;
> };
>
> /**
> * This function creates a TCP/IPv4 reassembly table.
> *
> * @param socket_id
> - * socket index for allocating TCP/IPv4 reassemble table
> + * Socket index for allocating the TCP/IPv4 reassemble table
> * @param max_flow_num
> - * the maximum number of flows in the TCP/IPv4 GRO table
> + * The maximum number of flows in the TCP/IPv4 GRO table
> * @param max_item_per_flow
> - * the maximum packet number per flow.
> + * The maximum number of packets per flow
> *
> * @return
> - * if create successfully, return a pointer which points to the
> - * created TCP/IPv4 GRO table. Otherwise, return NULL.
> + * - Return the table pointer on success.
> + * - Return NULL on failure.
> */
> void *gro_tcp4_tbl_create(uint16_t socket_id,
> uint16_t max_flow_num,
> @@ -106,62 +108,56 @@ void *gro_tcp4_tbl_create(uint16_t socket_id,
> * This function destroys a TCP/IPv4 reassembly table.
> *
> * @param tbl
> - * a pointer points to the TCP/IPv4 reassembly table.
> + * Pointer pointing to the TCP/IPv4 reassembly table.
> */
> void gro_tcp4_tbl_destroy(void *tbl);
>
> /**
> - * This function searches for a packet in the TCP/IPv4 reassembly table
> - * to merge with the inputted one. To merge two packets is to chain them
> - * together and update packet headers. Packets, whose SYN, FIN, RST, PSH
> - * CWR, ECE or URG bit is set, are returned immediately. Packets which
> - * only have packet headers (i.e. without data) are also returned
> - * immediately. Otherwise, the packet is either merged, or inserted into
> - * the table. Besides, if there is no available space to insert the
> - * packet, this function returns immediately too.
> + * This function merges a TCP/IPv4 packet. It doesn't process the packet,
> + * which has SYN, FIN, RST, PSH, CWR, ECE or URG set, or doesn't have
> + * payload.
> *
> - * This function assumes the inputted packet is with correct IPv4 and
> - * TCP checksums. And if two packets are merged, it won't re-calculate
> - * IPv4 and TCP checksums. Besides, if the inputted packet is IP
> - * fragmented, it assumes the packet is complete (with TCP header).
> + * This function doesn't check if the packet has correct checksums and
> + * doesn't re-calculate checksums for the merged packet. Additionally,
> + * it assumes the packets are complete (i.e., MF==0 && frag_off==0),
> + * when IP fragmentation is possible (i.e., DF==0). It returns the
> + * packet, if the packet has invalid parameters (e.g. SYN bit is set)
> + * or there is no available space in the table.
> *
> * @param pkt
> - * packet to reassemble.
> + * Packet to reassemble
> * @param tbl
> - * a pointer that points to a TCP/IPv4 reassembly table.
> + * Pointer pointing to the TCP/IPv4 reassembly table
> * @start_time
> - * the start time that the packet is inserted into the table
> + * The time when the packet is inserted into the table
> *
> * @return
> - * if the packet doesn't have data, or SYN, FIN, RST, PSH, CWR, ECE
> - * or URG bit is set, or there is no available space in the table to
> - * insert a new item or a new key, return a negative value. If the
> - * packet is merged successfully, return an positive value. If the
> - * packet is inserted into the table, return 0.
> + * - Return a positive value if the packet is merged.
> + * - Return zero if the packet isn't merged but stored in the table.
> + * - Return a negative value for invalid parameters or no available
> + * space in the table.
> */
> int32_t gro_tcp4_reassemble(struct rte_mbuf *pkt,
> struct gro_tcp4_tbl *tbl,
> uint64_t start_time);
>
> /**
> - * This function flushes timeout packets in a TCP/IPv4 reassembly table
> - * to applications, and without updating checksums for merged packets.
> - * The max number of flushed timeout packets is the element number of
> - * the array which is used to keep flushed packets.
> + * This function flushes timeout packets in a TCP/IPv4 reassembly table,
> + * and without updating checksums.
> *
> * @param tbl
> - * a pointer that points to a TCP GRO table.
> + * TCP/IPv4 reassembly table pointer
> * @param flush_timestamp
> - * this function flushes packets which are inserted into the table
> - * before or at the flush_timestamp.
> + * Flush packets which are inserted into the table before or at the
> + * flush_timestamp.
> * @param out
> - * pointer array which is used to keep flushed packets.
> + * Pointer array used to keep flushed packets
> * @param nb_out
> - * the element number of out. It's also the max number of timeout
> + * The element number in 'out'. It also determines the maximum number
> of
> * packets that can be flushed finally.
> *
> * @return
> - * the number of packets that are returned.
> + * The number of flushed packets
> */
> uint16_t gro_tcp4_tbl_timeout_flush(struct gro_tcp4_tbl *tbl,
> uint64_t flush_timestamp,
> @@ -173,10 +169,131 @@ uint16_t gro_tcp4_tbl_timeout_flush(struct
> gro_tcp4_tbl *tbl,
> * reassembly table.
> *
> * @param tbl
> - * pointer points to a TCP/IPv4 reassembly table.
> + * TCP/IPv4 reassembly table pointer
> *
> * @return
> - * the number of packets in the table
> + * The number of packets in the table
> */
> uint32_t gro_tcp4_tbl_pkt_count(void *tbl);
> +
> +/*
> + * Check if two TCP/IPv4 packets belong to the same flow.
> + */
> +static inline int
> +is_same_tcp4_flow(struct tcp4_flow_key k1, struct tcp4_flow_key k2)
> +{
> + return (is_same_ether_addr(&k1.eth_saddr, &k2.eth_saddr) &&
> + is_same_ether_addr(&k1.eth_daddr, &k2.eth_daddr)
> &&
> + (k1.ip_src_addr == k2.ip_src_addr) &&
> + (k1.ip_dst_addr == k2.ip_dst_addr) &&
> + (k1.recv_ack == k2.recv_ack) &&
> + (k1.src_port == k2.src_port) &&
> + (k1.dst_port == k2.dst_port));
> +}
> +
> +/*
> + * Check if two TCP/IPv4 packets are neighbors.
> + */
> +static inline int
> +check_seq_option(struct gro_tcp4_item *item,
> + struct tcp_hdr *tcph,
> + uint32_t sent_seq,
> + uint16_t ip_id,
> + uint16_t tcp_hl,
> + uint16_t tcp_dl,
> + uint16_t l2_offset,
> + uint8_t is_atomic)
> +{
> + struct rte_mbuf *pkt_orig = item->firstseg;
> + struct ipv4_hdr *iph_orig;
> + struct tcp_hdr *tcph_orig;
> + uint16_t len, l4_len_orig;
> +
> + iph_orig = (struct ipv4_hdr *)(rte_pktmbuf_mtod(pkt_orig, char *) +
> + l2_offset + pkt_orig->l2_len);
> + tcph_orig = (struct tcp_hdr *)((char *)iph_orig + pkt_orig->l3_len);
> + l4_len_orig = pkt_orig->l4_len;
> +
> + /* Check if TCP option fields equal */
> + len = RTE_MAX(tcp_hl, l4_len_orig) - sizeof(struct tcp_hdr);
> + if ((tcp_hl != l4_len_orig) || ((len > 0) &&
> + (memcmp(tcph + 1, tcph_orig + 1,
> + len) != 0)))
> + return 0;
> +
> + /* Don't merge packets whose DF bits are different */
> + if (unlikely(item->is_atomic ^ is_atomic))
> + return 0;
> +
> + /* Check if the two packets are neighbors */
> + len = pkt_orig->pkt_len - l2_offset - pkt_orig->l2_len -
> + pkt_orig->l3_len - l4_len_orig;
> + if ((sent_seq == item->sent_seq + len) && (is_atomic ||
> + (ip_id == item->ip_id + item->nb_merged)))
> + /* Append the new packet */
> + return 1;
> + else if ((sent_seq + tcp_dl == item->sent_seq) && (is_atomic ||
> + (ip_id + 1 == item->ip_id)))
> + /* Pre-pend the new packet */
> + return -1;
> +
> + return 0;
> +}
> +
> +/*
> + * Merge two TCP/IPv4 packets without updating checksums.
> + * If cmp is larger than 0, append the new packet to the
> + * original packet. Otherwise, pre-pend the new packet to
> + * the original packet.
> + */
> +static inline int
> +merge_two_tcp4_packets(struct gro_tcp4_item *item,
> + struct rte_mbuf *pkt,
> + int cmp,
> + uint32_t sent_seq,
> + uint16_t ip_id,
> + uint16_t l2_offset)
> +{
> + struct rte_mbuf *pkt_head, *pkt_tail, *lastseg;
> + uint16_t hdr_len, l2_len;
> +
> + if (cmp > 0) {
> + pkt_head = item->firstseg;
> + pkt_tail = pkt;
> + } else {
> + pkt_head = pkt;
> + pkt_tail = item->firstseg;
> + }
> +
> + /* Check if the IPv4 packet length is greater than the max value */
> + hdr_len = l2_offset + pkt_head->l2_len + pkt_head->l3_len +
> + pkt_head->l4_len;
> + l2_len = l2_offset > 0 ? pkt_head->outer_l2_len : pkt_head->l2_len;
> + if (unlikely(pkt_head->pkt_len - l2_len + pkt_tail->pkt_len - hdr_len >
> + MAX_IPV4_PKT_LENGTH))
> + return 0;
> +
> + /* Remove the packet header */
> + rte_pktmbuf_adj(pkt_tail, hdr_len);
> +
> + /* Chain two packets together */
> + if (cmp > 0) {
> + item->lastseg->next = pkt;
> + item->lastseg = rte_pktmbuf_lastseg(pkt);
> + } else {
> + lastseg = rte_pktmbuf_lastseg(pkt);
> + lastseg->next = item->firstseg;
> + item->firstseg = pkt;
> + /* Update sent_seq and ip_id */
> + item->sent_seq = sent_seq;
> + item->ip_id = ip_id;
> + }
> + item->nb_merged++;
> +
> + /* Update MBUF metadata for the merged packet */
> + pkt_head->nb_segs += pkt_tail->nb_segs;
> + pkt_head->pkt_len += pkt_tail->pkt_len;
> +
> + return 1;
> +}
> #endif
> diff --git a/lib/librte_gro/rte_gro.c b/lib/librte_gro/rte_gro.c
> index d6b8cd1..7176c0e 100644
> --- a/lib/librte_gro/rte_gro.c
> +++ b/lib/librte_gro/rte_gro.c
> @@ -23,11 +23,14 @@ static gro_tbl_destroy_fn
> tbl_destroy_fn[RTE_GRO_TYPE_MAX_NUM] = {
> static gro_tbl_pkt_count_fn tbl_pkt_count_fn[RTE_GRO_TYPE_MAX_NUM]
> = {
> gro_tcp4_tbl_pkt_count, NULL};
>
> +#define IS_IPV4_TCP_PKT(ptype) (RTE_ETH_IS_IPV4_HDR(ptype) && \
> + ((ptype & RTE_PTYPE_L4_TCP) == RTE_PTYPE_L4_TCP))
> +
> /*
> - * GRO context structure, which is used to merge packets. It keeps
> - * many reassembly tables of desired GRO types. Applications need to
> - * create GRO context objects before using rte_gro_reassemble to
> - * perform GRO.
> + * GRO context structure. It keeps the table structures, which are
> + * used to merge packets, for different GRO types. Before using
> + * rte_gro_reassemble(), applications need to create the GRO context
> + * first.
> */
> struct gro_ctx {
> /* GRO types to perform */
> @@ -65,7 +68,7 @@ rte_gro_ctx_create(const struct rte_gro_param *param)
> param->max_flow_num,
> param->max_item_per_flow);
> if (gro_ctx->tbls[i] == NULL) {
> - /* destroy all created tables */
> + /* Destroy all created tables */
> gro_ctx->gro_types = gro_types;
> rte_gro_ctx_destroy(gro_ctx);
> return NULL;
> @@ -85,8 +88,6 @@ rte_gro_ctx_destroy(void *ctx)
> uint64_t gro_type_flag;
> uint8_t i;
>
> - if (gro_ctx == NULL)
> - return;
> for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> gro_type_flag = 1ULL << i;
> if ((gro_ctx->gro_types & gro_type_flag) == 0)
> @@ -103,62 +104,54 @@ rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> const struct rte_gro_param *param)
> {
> - uint16_t i;
> - uint16_t nb_after_gro = nb_pkts;
> - uint32_t item_num;
> -
> - /* allocate a reassembly table for TCP/IPv4 GRO */
> + /* Allocate a reassembly table for TCP/IPv4 GRO */
> struct gro_tcp4_tbl tcp_tbl;
> - struct gro_tcp4_key tcp_keys[RTE_GRO_MAX_BURST_ITEM_NUM];
> + struct gro_tcp4_flow
> tcp_flows[RTE_GRO_MAX_BURST_ITEM_NUM];
> struct gro_tcp4_item tcp_items[RTE_GRO_MAX_BURST_ITEM_NUM]
> = {{0} };
>
> struct rte_mbuf *unprocess_pkts[nb_pkts];
> - uint16_t unprocess_num = 0;
> + uint32_t item_num;
> int32_t ret;
> - uint64_t current_time;
> + uint16_t i, unprocess_num = 0, nb_after_gro = nb_pkts;
>
> - if ((param->gro_types & RTE_GRO_TCP_IPV4) == 0)
> + if (unlikely((param->gro_types & RTE_GRO_TCP_IPV4) == 0))
> return nb_pkts;
>
> - /* get the actual number of packets */
> + /* Get the maximum number of packets */
> item_num = RTE_MIN(nb_pkts, (param->max_flow_num *
> - param->max_item_per_flow));
> + param->max_item_per_flow));
> item_num = RTE_MIN(item_num,
> RTE_GRO_MAX_BURST_ITEM_NUM);
>
> for (i = 0; i < item_num; i++)
> - tcp_keys[i].start_index = INVALID_ARRAY_INDEX;
> + tcp_flows[i].start_index = INVALID_ARRAY_INDEX;
>
> - tcp_tbl.keys = tcp_keys;
> + tcp_tbl.flows = tcp_flows;
> tcp_tbl.items = tcp_items;
> - tcp_tbl.key_num = 0;
> + tcp_tbl.flow_num = 0;
> tcp_tbl.item_num = 0;
> - tcp_tbl.max_key_num = item_num;
> + tcp_tbl.max_flow_num = item_num;
> tcp_tbl.max_item_num = item_num;
>
> - current_time = rte_rdtsc();
> -
> for (i = 0; i < nb_pkts; i++) {
> - if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
> - RTE_PTYPE_L4_TCP)) ==
> - (RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP))
> {
> - ret = gro_tcp4_reassemble(pkts[i],
> - &tcp_tbl,
> - current_time);
> + if (IS_IPV4_TCP_PKT(pkts[i]->packet_type)) {
> + /*
> + * The timestamp is ignored, since all packets
> + * will be flushed from the tables.
> + */
> + ret = gro_tcp4_reassemble(pkts[i], &tcp_tbl, 0);
> if (ret > 0)
> - /* merge successfully */
> + /* Merge successfully */
> nb_after_gro--;
> - else if (ret < 0) {
> - unprocess_pkts[unprocess_num++] =
> - pkts[i];
> - }
> + else if (ret < 0)
> + unprocess_pkts[unprocess_num++] = pkts[i];
> } else
> unprocess_pkts[unprocess_num++] = pkts[i];
> }
>
> - /* re-arrange GROed packets */
> if (nb_after_gro < nb_pkts) {
> - i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, current_time,
> - pkts, nb_pkts);
> + /* Flush all packets from the tables */
> + i = gro_tcp4_tbl_timeout_flush(&tcp_tbl, 0, pkts, nb_pkts);
> + /* Copy unprocessed packets */
> if (unprocess_num > 0) {
> memcpy(&pkts[i], unprocess_pkts,
> sizeof(struct rte_mbuf *) *
> @@ -174,31 +167,28 @@ rte_gro_reassemble(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> void *ctx)
> {
> - uint16_t i, unprocess_num = 0;
> struct rte_mbuf *unprocess_pkts[nb_pkts];
> struct gro_ctx *gro_ctx = ctx;
> + void *tcp_tbl;
> uint64_t current_time;
> + uint16_t i, unprocess_num = 0;
>
> - if ((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0)
> + if (unlikely((gro_ctx->gro_types & RTE_GRO_TCP_IPV4) == 0))
> return nb_pkts;
>
> + tcp_tbl = gro_ctx->tbls[RTE_GRO_TCP_IPV4_INDEX];
> current_time = rte_rdtsc();
>
> for (i = 0; i < nb_pkts; i++) {
> - if ((pkts[i]->packet_type & (RTE_PTYPE_L3_IPV4 |
> - RTE_PTYPE_L4_TCP)) ==
> - (RTE_PTYPE_L3_IPV4 | RTE_PTYPE_L4_TCP))
> {
> - if (gro_tcp4_reassemble(pkts[i],
> - gro_ctx->tbls
> - [RTE_GRO_TCP_IPV4_INDEX],
> + if (IS_IPV4_TCP_PKT(pkts[i]->packet_type)) {
> + if (gro_tcp4_reassemble(pkts[i], tcp_tbl,
> current_time) < 0)
> unprocess_pkts[unprocess_num++] = pkts[i];
> } else
> unprocess_pkts[unprocess_num++] = pkts[i];
> }
> if (unprocess_num > 0) {
> - memcpy(pkts, unprocess_pkts,
> - sizeof(struct rte_mbuf *) *
> + memcpy(pkts, unprocess_pkts, sizeof(struct rte_mbuf *) *
> unprocess_num);
> }
>
> @@ -224,6 +214,7 @@ rte_gro_timeout_flush(void *ctx,
> flush_timestamp,
> out, max_nb_out);
> }
> +
> return 0;
> }
>
> @@ -232,19 +223,20 @@ rte_gro_get_pkt_count(void *ctx)
> {
> struct gro_ctx *gro_ctx = ctx;
> gro_tbl_pkt_count_fn pkt_count_fn;
> + uint64_t gro_types = gro_ctx->gro_types, flag;
> uint64_t item_num = 0;
> - uint64_t gro_type_flag;
> uint8_t i;
>
> - for (i = 0; i < RTE_GRO_TYPE_MAX_NUM; i++) {
> - gro_type_flag = 1ULL << i;
> - if ((gro_ctx->gro_types & gro_type_flag) == 0)
> + for (i = 0; i < RTE_GRO_TYPE_MAX_NUM && gro_types; i++) {
> + flag = 1ULL << i;
> + if ((gro_types & flag) == 0)
> continue;
>
> + gro_types ^= flag;
> pkt_count_fn = tbl_pkt_count_fn[i];
> - if (pkt_count_fn == NULL)
> - continue;
> - item_num += pkt_count_fn(gro_ctx->tbls[i]);
> + if (pkt_count_fn)
> + item_num += pkt_count_fn(gro_ctx->tbls[i]);
> }
> +
> return item_num;
> }
> diff --git a/lib/librte_gro/rte_gro.h b/lib/librte_gro/rte_gro.h
> index 81a2eac..7979a59 100644
> --- a/lib/librte_gro/rte_gro.h
> +++ b/lib/librte_gro/rte_gro.h
> @@ -31,8 +31,8 @@ extern "C" {
> /**< TCP/IPv4 GRO flag */
>
> /**
> - * A structure which is used to create GRO context objects or tell
> - * rte_gro_reassemble_burst() what reassembly rules are demanded.
> + * Structure used to create GRO context objects or used to pass
> + * application-determined parameters to rte_gro_reassemble_burst().
> */
> struct rte_gro_param {
> uint64_t gro_types;
> @@ -78,26 +78,23 @@ void rte_gro_ctx_destroy(void *ctx);
>
> /**
> * This is one of the main reassembly APIs, which merges numbers of
> - * packets at a time. It assumes that all inputted packets are with
> - * correct checksums. That is, applications should guarantee all
> - * inputted packets are correct. Besides, it doesn't re-calculate
> - * checksums for merged packets. If inputted packets are IP fragmented,
> - * this function assumes them are complete (i.e. with L4 header). After
> - * finishing processing, it returns all GROed packets to applications
> - * immediately.
> + * packets at a time. It doesn't check if input packets have correct
> + * checksums and doesn't re-calculate checksums for merged packets.
> + * It assumes the packets are complete (i.e., MF==0 && frag_off==0),
> + * when IP fragmentation is possible (i.e., DF==1). The GROed packets
> + * are returned as soon as the function finishes.
> *
> * @param pkts
> - * a pointer array which points to the packets to reassemble. Besides,
> - * it keeps mbuf addresses for the GROed packets.
> + * Pointer array pointing to the packets to reassemble. Besides, it
> + * keeps MBUF addresses for the GROed packets.
> * @param nb_pkts
> - * the number of packets to reassemble.
> + * The number of packets to reassemble
> * @param param
> - * applications use it to tell rte_gro_reassemble_burst() what rules
> - * are demanded.
> + * Application-determined parameters for reassembling packets.
> *
> * @return
> - * the number of packets after been GROed. If no packets are merged,
> - * the returned value is nb_pkts.
> + * The number of packets after been GROed. If no packets are merged,
> + * the return value is equals to nb_pkts.
> */
> uint16_t rte_gro_reassemble_burst(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> @@ -107,32 +104,28 @@ uint16_t rte_gro_reassemble_burst(struct
> rte_mbuf **pkts,
> * @warning
> * @b EXPERIMENTAL: this API may change without prior notice
> *
> - * Reassembly function, which tries to merge inputted packets with
> - * the packets in the reassembly tables of a given GRO context. This
> - * function assumes all inputted packets are with correct checksums.
> - * And it won't update checksums if two packets are merged. Besides,
> - * if inputted packets are IP fragmented, this function assumes they
> - * are complete packets (i.e. with L4 header).
> + * Reassembly function, which tries to merge input packets with the
> + * existed packets in the reassembly tables of a given GRO context.
> + * It doesn't check if input packets have correct checksums and doesn't
> + * re-calculate checksums for merged packets. Additionally, it assumes
> + * the packets are complete (i.e., MF==0 && frag_off==0), when IP
> + * fragmentation is possible (i.e., DF==1).
> *
> - * If the inputted packets don't have data or are with unsupported GRO
> - * types etc., they won't be processed and are returned to applications.
> - * Otherwise, the inputted packets are either merged or inserted into
> - * the table. If applications want get packets in the table, they need
> - * to call flush API.
> + * If the input packets have invalid parameters (e.g. no data payload,
> + * unsupported GRO types), they are returned to applications. Otherwise,
> + * they are either merged or inserted into the table. Applications need
> + * to flush packets from the tables by flush API, if they want to get the
> + * GROed packets.
> *
> * @param pkts
> - * packet to reassemble. Besides, after this function finishes, it
> - * keeps the unprocessed packets (e.g. without data or unsupported
> - * GRO types).
> + * Packets to reassemble. It's also used to store the unprocessed packets.
> * @param nb_pkts
> - * the number of packets to reassemble.
> + * The number of packets to reassemble
> * @param ctx
> - * a pointer points to a GRO context object.
> + * GRO context object pointer
> *
> * @return
> - * return the number of unprocessed packets (e.g. without data or
> - * unsupported GRO types). If all packets are processed (merged or
> - * inserted into the table), return 0.
> + * The number of unprocessed packets.
> */
> uint16_t rte_gro_reassemble(struct rte_mbuf **pkts,
> uint16_t nb_pkts,
> @@ -142,29 +135,28 @@ uint16_t rte_gro_reassemble(struct rte_mbuf
> **pkts,
> * @warning
> * @b EXPERIMENTAL: this API may change without prior notice
> *
> - * This function flushes the timeout packets from reassembly tables of
> - * desired GRO types. The max number of flushed timeout packets is the
> - * element number of the array which is used to keep the flushed packets.
> + * This function flushes the timeout packets from the reassembly tables
> + * of desired GRO types. The max number of flushed packets is the
> + * element number of 'out'.
> *
> - * Besides, this function won't re-calculate checksums for merged
> - * packets in the tables. That is, the returned packets may be with
> - * wrong checksums.
> + * Additionally, the flushed packets may have incorrect checksums, since
> + * this function doesn't re-calculate checksums for merged packets.
> *
> * @param ctx
> - * a pointer points to a GRO context object.
> + * GRO context object pointer.
> * @param timeout_cycles
> - * max TTL for packets in reassembly tables, measured in nanosecond.
> + * The max TTL for packets in reassembly tables, measured in nanosecond.
> * @param gro_types
> - * this function only flushes packets which belong to the GRO types
> - * specified by gro_types.
> + * This function flushes packets whose GRO types are specified by
> + * gro_types.
> * @param out
> - * a pointer array that is used to keep flushed timeout packets.
> + * Pointer array used to keep flushed packets.
> * @param max_nb_out
> - * the element number of out. It's also the max number of timeout
> + * The element number of 'out'. It's also the max number of timeout
> * packets that can be flushed finally.
> *
> * @return
> - * the number of flushed packets. If no packets are flushed, return 0.
> + * The number of flushed packets.
> */
> uint16_t rte_gro_timeout_flush(void *ctx,
> uint64_t timeout_cycles,
> @@ -180,10 +172,10 @@ uint16_t rte_gro_timeout_flush(void *ctx,
> * of a given GRO context.
> *
> * @param ctx
> - * pointer points to a GRO context object.
> + * GRO context object pointer.
> *
> * @return
> - * the number of packets in all reassembly tables.
> + * The number of packets in the tables.
> */
> uint64_t rte_gro_get_pkt_count(void *ctx);
>
> --
> 2.7.4
More information about the dev
mailing list