[dpdk-dev] Possible bug in mlx5_tx_burst_mpw?

Adrien Mazarguil adrien.mazarguil at 6wind.com
Fri Sep 16 09:14:03 CEST 2016


On Wed, Sep 14, 2016 at 09:33:18PM +0200, Luke Gorrie wrote:
> Hi Adrien,
> 
> On 14 September 2016 at 16:30, Adrien Mazarguil <adrien.mazarguil at 6wind.com>
> wrote:
> 
> > Your interpretation is correct (this is intentional and not a bug).
> >
> 
> Thanks very much for clarifying.
> 
> This is interesting to me because I am also working on a ConnectX-4 (Lx)
> driver based on the newly released driver interface specification [1] and I
> am wondering how interested I should be in this MPW feature that is
> currently not documented.

Seems like this document only describes established features whose interface
won't be subject to firmware evolutions, I think MPW is not one of them.
AFAIK currently MPW cannot be used with LSO which we intend to support soon.

Our implementation is a stripped down version of the code found in
libmlx5. I guess you could ask Mellanox directly if you need more
information.

> In the event successive packets share a few properties (length, number of
> > segments, offload flags), these can be factored out as an optimization to
> > lower the amount of traffic on the PCI bus. This feature is currently
> > supported by the ConnectX-4 Lx family of adapters.
> >
> 
> I have a concern here that I hope you will forgive me for voicing.
> 
> This optimization seems to run the risk of inflating scores on
> constant-packet-size IXIA-style benchmarks like [2] and making them less
> useful for predicting real-world performance. That seems like a negative to
> me as an application developer. I wonder if I am overlooking some practical
> benefits that motivate implementing this in silicon and in the driver and
> enabling it by default?

Your concern is understandable, no offense taken. You are obviously right
about benchmarks with constant packets, whose results can be improved by
MPW.

Performance-wise, with the right traffic patterns MPW allows ConnectX-4 Lx
adapters to outperform their non-Lx counterparts (e.g. comparing 40G EN Lx
PCIe 8x vs. 40G EN PCIe 8x) when measuring traffic rate (Mpps), not
throughput. Disabling MPW yields comparable results, which is why it is
considered to be an optimization.

Since processing MPW consumes a few additional CPU cycles, it can be
disabled at runtime with the txq_mpw_en switch (documented in mlx5.rst).

Now about the real-world scenario, we are not talking about needing millions
of identical packets to notice an improvement. MPW is effective from 2 to at
most 5 consecutive packets that share some meta-data (length, number of
segments and offload flags), all within the same burst. Just to be clear,
neither their destination nor their payload need to be the same, it would
have been useless otherwise.

Sending a few packets at once with such similar properties is common
occurrence in the real world, think about forwarding TCP traffic that has
been shaped to a constant size by LSO or MTU.

Like many optimizations, this one targets a specific yet common use-case.
If you would rather get a constant rate out of any traffic pattern for
predictable latency, DPDK which is burst-oriented is probably not what your
application needs if used as-is.

> [1]
> http://www.mellanox.com/related-docs/user_manuals/Ethernet_Adapters_Programming_Manual.pdf
> [2]
> https://www.mellanox.com/blog/2016/06/performance-beyond-numbers-stephen-curry-style-server-io/

-- 
Adrien Mazarguil
6WIND


More information about the dev mailing list