[dpdk-dev] [RFC 0/8] mbuf: structure reorganization

Bruce Richardson bruce.richardson at intel.com
Tue Feb 21 15:28:46 CET 2017


On Tue, Feb 21, 2017 at 03:20:23PM +0100, Morten Brørup wrote:
> Comments at the end.
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier Matz
> > Sent: Thursday, February 16, 2017 5:14 PM
> > To: Bruce Richardson
> > Cc: Ananyev, Konstantin; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [RFC 0/8] mbuf: structure reorganization
> > 
> > On Thu, 16 Feb 2017 15:46:19 +0000, Bruce Richardson
> > <bruce.richardson at intel.com> wrote:
> > > On Thu, Feb 16, 2017 at 02:48:07PM +0100, Olivier Matz wrote:
> > > > Hi Konstantin,
> > > >
> > > > Thanks for the feedback.
> > > > Comments inline.
> > > >
> > > >
> > > > On Mon, 6 Feb 2017 18:41:27 +0000, "Ananyev, Konstantin"
> > > > <konstantin.ananyev at intel.com> wrote:
> > > > > Hi Olivier,
> > > > > Looks good in general, some comments from me below.
> > > > > Thanks
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > > The main changes are:
> > > > > > - reorder structure to increase vector performance on some
> > > > > > non-ia platforms.
> > > > > > - add a 64bits timestamp field in the 1st cache line
> > > > >
> > > > > Wonder why it deserves to be in first cache line?
> > > > > How it differs from seqn below (pure SW stuff right now).
> > > >
> > > > In case the timestamp is set from a NIC value, it is set in the Rx
> > > > path. So that's why I think it deserve to be located in the 1st
> > > > cache line.
> > > >
> > > > As you said, the seqn is a pure sw stuff right: it is set in a lib,
> > > > not in a PMD rx path.
> > > >
> > > > > > - m->next, m->nb_segs, and m->refcnt are always initialized for
> > > > > > mbufs in the pool, avoiding the need of setting m->next
> > (located
> > > > > > in the 2nd cache line) in the Rx path for mono-segment packets.
> > > > > > - change port and nb_segs to 16 bits
> > > > >
> > > > > Not that I am completely against it, but changing nb_segs to 16
> > > > > bits seems like an overkill to me.
> > > > > I think we can keep and extra 8bits for something more useful in
> > > > > future.
> > > >
> > > > In my case, I use the m->next field to chain more than 256 segments
> > > > for L4 socket buffers. It also updates nb_seg that can overflow.
> > > > It's not a big issue since at the end, nb_seg is decremented for
> > > > each segment. On the other hand, if I enable some sanity checks on
> > > > mbufs, it complains because the number of segments is not equal to
> > > > nb_seg.
> > > >
> > > > There is also another use case with fragmentation as discussed
> > > > recently: http://dpdk.org/dev/patchwork/patch/19819/
> > > >
> > > > Of course, dealing with a long mbuf list is not that efficient, but
> > > > the application can maintain another structure to accelerate the
> > > > access to the middle/end of the list.
> > > >
> > > > Finally, we have other ideas to get additional 8 bits if required
> > in
> > > > the future, so I don't think it's really a problem.
> > > >
> > > >
> > > > >
> > > > > > - move seqn in the 2nd cache line
> > > > > >
> > > > > > Things discussed but not done in the patchset:
> > > > > > - move refcnt and nb_segs to the 2nd cache line: many drivers
> > > > > > sets them in the Rx path, so it could introduce a performance
> > > > > > regression, or
> > > > >
> > > > > I wonder can refcnt only be moved into the 2-nd cacheline?
> > > > > As I understand thanks to other change (from above) m->refcnt
> > will
> > > > > already be initialized, so RX code don't need to touch it.
> > > > > Though yes, it still would require changes in all PMDs.
> > > >
> > > > Yes, I agree, some fields could be moved in the 2nd cache line once
> > > > all PMDs stop to write them in RX path. I propose to issue some
> > > > guidelines to PMD maintainers at the same time the patchset is
> > > > pushed. Then we can consider changing it in a future version, in
> > > > case we need more room in the 1st mbuf cache line.
> > > >
> > >
> > > If we are changing things, we should really do all that now, rather
> > > than storing up future breaks to mbuf. Worst case, we should plan for
> > > it immediately after the release where we make these changes. Have
> > two
> > > releases that break mbuf immediately after each other - and flagged
> > as
> > > such, but keep it stable thereafter. I don't like having technical
> > > debt on mbuf just after we supposedly "fix" it.
> > 
> > I think there is no need to do this change now. And I don't feel good
> > with the idea of having a patchset that updates all the PMDs to remove
> > the access to a field because it moved to the 2nd cache line
> > (especially thinking about vector PMDs).
> > 
> > That's why I think the plan could be:
> > - push an updated version of this patchset quickly
> > - advertise to PMD maintainers "you don't need to set the m->next,
> >   m->refcnt, and m->nb_segs in the RX path, please update your drivers"
> > - later, if we need more room in the 1st cache line of the mbuf, we
> >   can move refcnt and nb_seg, probably without impacting the
> >   performance.
> > 
> > 
> > Olivier
> 
> I suppose you mean that PMDs don't need to /initialize/ m->next, m->refcnt and m->nb_segs.
> 
> Forgive my ignorance, and this is wild speculation, but: Would a PMD not need to set m->next and m->nb_segs if it receives a jumbogram larger than an mbuf packet buffer? And if this is a realistic use case, these fields actually do belong in the 1st cache line. PMD developers please chime in.

Yes, it would. However, this is not really fast-path processing. If we
assume a 2GHz CPU, for 64-byte packets, a core has 34 cycles to process
each packet to achieve 40G line rate. For a packet of size 2k - the
normal size it would need to hit to overflow a buffer, unless you are
using small buffers - the core has 827 cycles per packet. Therefore, in
the latter case, with big packets, the core can afford the hit of
accessing the second cacheline.

/Bruce


More information about the dev mailing list