[dpdk-dev] [PATCH v12 0/6] add Tx preparation

Kulasek, TomaszX tomaszx.kulasek at intel.com
Wed Nov 30 11:30:54 CET 2016


Hi,

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Wednesday, November 30, 2016 09:50
> To: Adrien Mazarguil <adrien.mazarguil at 6wind.com>; Kulasek, TomaszX
> <tomaszx.kulasek at intel.com>
> Cc: dev at dpdk.org; Ananyev, Konstantin <konstantin.ananyev at intel.com>;
> olivier.matz at 6wind.com
> Subject: Re: [dpdk-dev] [PATCH v12 0/6] add Tx preparation
> 
> 2016-11-30 08:40, Adrien Mazarguil:
> [...]
> > I understand tx_prep() automates this process, however I'm wondering
> > why isn't the TX burst function doing that itself. Using
> > nb_mtu_seg_max as an example, tx_prep() has an extra check in case of
> > TSO that the TX burst function does not perform. This ends up being
> > much more expensive to applications due to the additional loop doing
> > redundant testing on each mbuf.
> >
> > If, say as a performance improvement, we decided to leave the
> > validation part to the TX burst function; what remains in tx_prep() is
> > basically heavy "preparation" requiring mbuf changes (i.e. erasing
> checksums, for now).
> >
> > Following the same logic, why can't such a thing be made part of the
> > TX burst function as well (through a direct call to
> > rte_phdr_cksum_fix() whenever necessary). From an application
> > standpoint, what are the advantages of having to:
> >
> >  if (tx_prep()) // iterate and update mbufs as needed
> >      tx_burst(); // iterate and send
> >
> > Compared to:
> >
> >  tx_burst(); // iterate, update as needed and send
> >
> > Note that PMDs could still provide different TX callbacks depending on
> > the set of enabled offloads so performance is not unnecessarily
> impacted.
> >
> > In my opinion the second approach is both faster to applications and
> > more friendly from a usability perspective, am I missing something
> obvious?
> 
> I think it was not clearly explained in this patchset, but this is my
> understanding:
> tx_prepare and tx_burst can be called at different stages of a pipeline,
> on different cores.

Yes, this API is intended to be used optionaly, not only just before tx_burst.

1. Separating both stages:
   a) We may have a control over burst (packet content, validation) when needed.
   b) For invalid packets we may restore them or do some another task if needed (even on early stage of processing).
   c) Tx burst keep as simple as it should be.

2. Joining the functionality of tx_prepare and tx_burst have some disadvantages:
   a) When packet is invalid it cannot be restored by application should be dropped.
   b) Tx burst needs to modify the content of the packet.
   c) We have no way to eliminate overhead of preparation (tx_prepare) for the application where performance is a key.

3. Using tx callbacks
   a) We still need to have different implementations for different devices.
   b) The overhead in performance (comparing to the pair tx_prepare/tx_burst) will not be better while both ways uses very similar mechanism.

In addition, tx_prepare mechanism can be turned off by compilation flag (as discussed with Jerin in http://dpdk.org/dev/patchwork/patch/15770/) to provide real NOOP functionality (e.g. for low-end CPUs, where even unnecessary memory dereference and check can have significant impact on performance).

Tomasz


More information about the dev mailing list