RE: 回复: [RFC PATCH v1 0/4] Direct re-arming of buffers on receive side

Morten Brørup mb at smartsharesystems.com
Tue Jan 18 18:27:49 CET 2022


> From: Thomas Monjalon [mailto:thomas at monjalon.net]
> Sent: Tuesday, 18 January 2022 17.54
> 
> [quick summary: ethdev API to bypass mempool]
> 
> 18/01/2022 16:51, Ferruh Yigit:
> > On 12/28/2021 6:55 AM, Feifei Wang wrote:
> > > Morten Brørup <mb at smartsharesystems.com>:
> > >> The patch provides a significant performance improvement, but I am
> > >> wondering if any real world applications exist that would use
> this. Only a
> > >> "router on a stick" (i.e. a single-port router) comes to my mind,
> and that is
> > >> probably sufficient to call it useful in the real world. Do you
> have any other
> > >> examples to support the usefulness of this patch?
> > >>
> > > One case I have is about network security. For network firewall,
> all packets need
> > > to ingress on the specified port and egress on the specified port
> to do packet filtering.
> > > In this case, we can know flow direction in advance.
> >
> > I also have some concerns on how useful this API will be in real
> life,
> > and does the use case worth the complexity it brings.
> > And it looks too much low level detail for the application.
> 
> That's difficult to judge.
> The use case is limited and the API has some severe limitations.
> The benefit is measured with l3fwd, which is not exactly a real app.
> Do we want an API which improves performance in limited scenarios
> at the cost of breaking some general design assumptions?
> 
> Can we achieve the same level of performance with a mempool trick?

Perhaps the mbuf library could offer bulk functions for alloc/free of raw mbufs - essentially a shortcut directly to the mempool library.

There might be a few more details to micro-optimize in the mempool library, if approached with this use case in mind. E.g. the rte_mempool_default_cache() could do with a few unlikely() in its comparisons.

Also, for this use case, the mempool library adds tracing overhead, which this API bypasses. And considering how short the code path through the mempool cache is, the tracing overhead is relatively much. I.e.: memcpy(NIC->NIC) vs. trace() memcpy(NIC->cache) trace() memcpy(cache->NIC).

A key optimization point could be the number of mbufs being moved to/from the mempool cache. If that number was fixed at compile time, a faster memcpy() could be used. However, it seems that different PMDs use bursts of either 4, 8, or in this case 32 mbufs. If only they could agree on such a simple detail.

Overall, I strongly agree that it is preferable to optimize the core libraries, rather than bypass them. Bypassing will eventually lead to "spaghetti code".



More information about the dev mailing list