[dpdk-dev] [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4 offload

Xueming(Steven) Li xuemingl at mellanox.com
Wed Apr 4 10:20:49 CEST 2018


Hi Olivier, 

> -----Original Message-----
> From: Olivier Matz <olivier.matz at 6wind.com>
> Sent: Wednesday, March 28, 2018 8:53 PM
> To: Xueming(Steven) Li <xuemingl at mellanox.com>
> Cc: Yongseok Koh <yskoh at mellanox.com>; Wenzhuo Lu <wenzhuo.lu at intel.com>;
> Jingjing Wu <jingjing.wu at intel.com>; Thomas Monjalon <thomas at monjalon.net>;
> Shahaf Shuler <shahafs at mellanox.com>; Ferruh Yigit
> <ferruh.yigit at intel.com>; dev at dpdk.org
> Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel L3/L4
> offload
> 
> Hi Xueming, Yongseok,
> 
> On Thu, Mar 22, 2018 at 01:55:05PM +0000, Xueming(Steven) Li wrote:
> >
> >
> > > -----Original Message-----
> > > From: Yongseok Koh
> > > Sent: Wednesday, March 21, 2018 9:41 AM
> > > To: Xueming(Steven) Li <xuemingl at mellanox.com>
> > > Cc: Wenzhuo Lu <wenzhuo.lu at intel.com>; Jingjing Wu
> > > <jingjing.wu at intel.com>; Thomas Monjalon <thomas at monjalon.net>;
> > > Olivier MATZ <olivier.matz at 6wind.com>; Shahaf Shuler
> > > <shahafs at mellanox.com>; Ferruh Yigit <ferruh.yigit at intel.com>;
> > > dev at dpdk.org
> > > Subject: Re: [PATCH v3 1/7] ethdev: introduce Tx generic tunnel
> > > L3/L4 offload
> > >
> > > On Mon, Mar 05, 2018 at 10:51:15PM +0800, Xueming Li wrote:
> > > > This patch introduce new TX offload flags for device that supports
> > > > tunnel agnostic L3/L4 checksum and TSO offload.
> > > >
> > > > The support from the device is for inner and outer checksums on
> > > > IPV4/TCP/UDP and TSO for *any packet with the following format*:
> > > >
> > > > < some headers > / [optional IPv4/IPv6] / [optional TCP/UDP] /
> > > > <some
> > > > headers> / [optional inner IPv4/IPv6] / [optional TCP/UDP]
> > > >
> > > > For example the following packets can use this feature:
> > > >
> > > > 1. eth / ipv4 / udp / VXLAN / ip / tcp 2. eth / ipv4 / GRE / MPLS
> > > > /
> > > > ipv4 / udp
> > > >
> > > > Signed-off-by: Xueming Li <xuemingl at mellanox.com>
> > > > ---
> > > >  lib/librte_ether/rte_ethdev.h | 24 ++++++++++++++++++++++++
> > > >  lib/librte_mbuf/rte_mbuf.c    |  5 +++++
> > > >  lib/librte_mbuf/rte_mbuf.h    | 18 ++++++++++++++++--
> > > >  3 files changed, 45 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/lib/librte_ether/rte_ethdev.h
> > > > b/lib/librte_ether/rte_ethdev.h index 036153306..66d12d3e0 100644
> > > > --- a/lib/librte_ether/rte_ethdev.h
> > > > +++ b/lib/librte_ether/rte_ethdev.h
> > > > @@ -980,6 +980,30 @@ struct rte_eth_conf {
> > > >   *   the same mempool and has refcnt = 1.
> > > >   */
> > > >  #define DEV_TX_OFFLOAD_SECURITY         0x00020000
> > > > +/**< Generic tunnel L3/L4 checksum offload. To enable this
> > > > +offload feature
> > > > + * for a packet to be transmitted on hardware supporting generic
> > > > +tunnel L3/L4
> > > > + * checksum offload:
> > > > + *  - fill outer_l2_len and outer_l3_len in mbuf
> > > > + *  - fill l2_len and l3_len in mbuf
> > > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN
> > > > +if
> > > > +undefined)
> > > > + *  - set the flags PKT_TX_OUTER_IP_CKSUM
> > > > + *  - set the flags PKT_TX_IP_CKSUM
> > > > + *  - set the flags PKT_TX_TCP_CKSUM, PKT_TX_SCTP_CKSUM or
> > > > +PKT_TX_UDP_CKSUM  */
> > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_CKSUM	0x00040000
> > >
> > > It looks redundant to me. Isn't it same as having
> DEV_TX_OFFLOAD_*_CKSUM?
> > > According to the API document, when PKT_TX_*_CKSUM is set, all the
> > > necessary length fields should be filled in (e.g. mbuf->outer_l?_len
> > > and
> > > mbuf->l?_len). It doesn't need to specify any tunnel type for checksum.
> > > For example, in order to request checksum offload for an unknown
> > > tunnel type which is similar to VxLAN, what should app do? Even
> > > without defining this new offload flag, it is currently possible by
> setting
> > > 	(PKT_TX_OUTER_IP_CKSUM | PKT_TX_OUTER_IPV4 | PKT_TX_IP_CKSUM |
> > > 	 PKT_TX_IPV4 | PKT_TX_UDP_CKSUM)
> > > along with various length fields.
> >
> > Agree it almost same as existing checksum offloading requirement,
> besides:
> > 1. it does not demand pseudo checksum
> > 2. no need to clear IP checksum
> > I guess the requirement on API doc is a super set and differences
> > could be documented on PMD doc.
> 
> Since commit 4fb7e803eb1a ("ethdev: add Tx preparation"), setting the IP
> checksum to 0 and the pseudo header checksum in TCP is not required.
> The user just need to set the flags, then before transmitting the packet,
> call tx_prepare() which will do the proper work, according to the hw
> requirements.
> 
> It looks that the API documentation of PKT_TX_TCP_SEG should be updated.
> I'll submit a patch for this.
> 

Nice to hide intel specific requirement and now it's much clear, thanks.

> 
> > For mlx5 original chksum offloading, hw will parse everything w/o
> > demanding mbuf headers, a little better performance than SWP. The
> > question is how to choose between SWP and HW checksum offloading -
> > assuming PKT_TX_TUNNEL_UNKNOWN is the key. Take L3 VXLAN(no inner L2
> > header) as example, only SWP could offload it correctly and
> > PKT_TX_TUNNEL_UNKOWN should be used here, not PKT_TX_TUNNEL_VXLAN, make
> sense?
> >
> > Another question, how does user app know the NIC capability of generic
> > tunnel checksum offload?
> 
> After reading the patchset, I still don't see what you want to achieve.
> In patch 1, we talk about these 2 packets:
> 
> 1. eth / ipv4 / udp / VXLAN / ip / tcp     (L3 VXLAN as you said above)
> 2. eth / ipv4 / GRE / MPLS / ipv4 / udp
> 
> What is not doable with current API?
> 
> I didn't try, but I think the following would work currently with
> tx_prepare() + tx_burst().
> 
> - inner TSO on 1.
> 
>   flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
>           PKT_TX_TCP_CKSUM | PKT_TX_TCP_SEG
>   outer_l2_len = size(eth)
>   outer_l3_len = size(ipv4)
>   l2_len = size(udp + vxlan)
>   l3_len = size(ip)
> 
> - inner TCP checksum on 1.
> 
>   flags = PKT_TX_TUNNEL_VXLAN | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
>           PKT_TX_TCP_CKSUM
>   outer_l2_len = size(eth)
>   outer_l3_len = size(ipv4)
>   l2_len = size(udp + vxlan)
>   l3_len = size(ip)
> 
> - inner UDP checksum on 2.
> 
>   flags = PKT_TX_TUNNEL_GRE | PKT_TX_IPV4 | PKT_TX_IP_CKSUM |
>           PKT_TX_UDP_CKSUM
>   outer_l2_len = size(eth)
>   outer_l3_len = size(ipv4)
>   l2_len = size(gre + mpls)
>   l3_len = size(ipv4)
> 
> Yes, writing the tunnel len in l2_len is a bit strange, but it should do
> the trick.
> 
> It's still unclear for me why you need to add a "generic" flag.
> Is there something you can't do today with the current API?
> 

I think the problem is mlx5 current hw so smart that not generic enough, 
that's why this patchset tried to introduce new hw feature and new generic flag.
Since current checksum API already covered, no reason to add new, thanks.

> 
> >
> > >
> > > > +/**< Generic tunnel segmentation offload. To enable it, the user
> > > > +needs
> > > to:
> > > > + *  - fill outer_l2_len and outer_l3_len in mbuf
> > > > + *  - fill l2_len and l3_len in mbuf
> > > > + *  - set the flags PKT_TX_TUNNEL_xxx (use PKT_TX_TUNNEL_UNKNOWN
> > > > +if
> > > > +undefined)
> > > > + *  - set the flags PKT_TX_OUTER_IPV4 or PKT_TX_OUTER_IPV6
> > > > + *  - if it's UDP tunnel, set the flags PKT_TX_OUTER_UDP
> > > > + *  - set the flags PKT_TX_IPV4 or PKT_TX_IPV6
> > > > + *  - set the PKT_TX_TCP_SEG flag in mbuf->ol_flags (this flag
> implies
> > > > + *    PKT_TX_OUTER_IP_CKSUM, PKT_TX_IP_CKSUM and PKT_TX_TCP_CKSUM)
> > > > + * Hardware that supports generic tunnel TSO offload only update
> > > > +outer/inner
> > > > + * L3/L4 fields, tunnel fields are not touched.
> > > > + */
> > > > +#define DEV_TX_OFFLOAD_GENERIC_TNL_TSO		0x00080000
> > >
> > > Practically, for the existing TSO offloads for tunneled packets,
> > > tunnel type
> > > (PKT_TX_TUNNEL_*) is needed for knowing if transport is UDP-based or
> > > IP- based because the length field of UDP header should be updated for
> TSO.
> > > AFAIK, there's no TCP-based tunnel. For PKT_TX_TUNNEL_UNKNOWN, the
> > > only thing _unknown_ seems to be the type of transport and by
> > > defining PKT_TX_OUTER_UDP, you seem to recognize the type of
> > > transport between UDP and IP (it's not non-UDP). Given that, the new
> > > flags still look redundant and ambiguous. If PKT_TX_TUNNEL_VXLAN is
> > > set, there's no need to set PKT_TX_OUTER_UDP redundantly.
> > >
> > > Instead, I think defining PKT_TX_TUNNEL_UDP_UNKNOWN and
> > > PKT_TX_TUNNEL_IP_UNKNOWN could be a good alternative. It is only my
> > > humble two cents and I'd like to hear from more people regarding this.
> > >
> >
> 
> I agree with Yongseok here. The flag PKT_TX_TUNNEL_VXLAN implies that the
> protocol is UDP. Having a flag PKT_TX_TUNNEL_UDP or PKT_TX_TUNNEL_IP could
> make sense if we find a use case that cannot be done currently with the
> existing API (for instance, another tunnel type over UDP).

PKT_TX_TUNNEL_UDP + PKT_TX_TUNNEL_IP looks similar to PKT_TX_TUNNEL_UNKOWN + 
PKT_TX_OUTER_UDP, I'm fine with this proposal.

> 
> > This flag looks more like a feature notification. Current tunnel
> > offloading capability like "DEV_TX_OFFLOAD_VXLAN_TNL_TSO" ties to
> > VXLAN tunnel type, DEV_TX_OFFLOAD_GENERIC_TNL_TSO should be able to
> > let users know that NIC's capability of offloading generic tunnel type
> > like MPLS-in-UDP that not defined yet. PKT_TX_OUTER_UDP is something
> > must to have to flag an exists of outer UDP in such case.
> 
> So, maybe, instead of DEV_TX_OFFLOAD_GENERIC_TNL_TSO it could be
> DEV_TX_OFFLOAD_UDP_TUNNEL_TSO. Because it is not possible for a hardware
> to support TSO on top of any kind of tunnel type.
> 
> For instance, if the tunnel header has a checksum or a sequence id, the
> hardware must be aware of it.

Similarly, I'll split DEV_TX_OFFLOAD_GENERIC_TNL_TSO into DEV_TX_OFFLOAD_UDP_TNL_TSO
and DEV_TX_OFFLOAD_IP_TNL_TSO to cover tunnel types that no checksum and seq id.

Overall speaking, existing app using tunnel offloading should work w/o any change.
I'm working on a new version to reflect this change, and split this patch set into 
public and mlx5.


More information about the dev mailing list