[dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Feb 14 15:16:12 CET 2018



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, February 14, 2018 12:35 PM
> To: Richardson, Bruce <bruce.richardson at intel.com>
> Cc: Yongseok Koh <yskoh at mellanox.com>; Olivier Matz <olivier.matz at 6wind.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> 
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Wednesday, February 14, 2018 12:12 PM
> > To: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> > Cc: Yongseok Koh <yskoh at mellanox.com>; Olivier Matz <olivier.matz at 6wind.com>; dev at dpdk.org
> > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> >
> > On Wed, Feb 14, 2018 at 12:03:55PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ananyev, Konstantin
> > > > Sent: Wednesday, February 14, 2018 11:48 AM
> > > > To: Yongseok Koh <yskoh at mellanox.com>; Olivier Matz <olivier.matz at 6wind.com>
> > > > Cc: dev at dpdk.org
> > > > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> > > >
> > > > Hi Yongseok,
> > > >
> > > > > > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh at mellanox.com> wrote:
> > > > > >
> > > > > > Hi Olivier
> > > > > >
> > > > > > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > > > > > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > > > > > to the cases where almost mbufs have single segment.
> > > > > >
> > > > > > A customer reported high rate of cache misses in the code and I thought the
> > > > > > following patch could be helpful. I haven't had them try it yet but just wanted
> > > > > > to hear from you.
> > > > > >
> > > > > > I'd appreciate if you can review this idea.
> > > > > >
> > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > > index 62740254d..96edbcb9e 100644
> > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > > >                        rte_pktmbuf_detach(m);
> > > > > >
> > > > > > -               if (m->next != NULL) {
> > > > > > +               if (m->nb_segs > 1) {
> > > > > >                        m->next = NULL;
> > > > > >                        m->nb_segs = 1;
> > > > > >                }
> > > > > > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > > >                        rte_pktmbuf_detach(m);
> > > > > >
> > > > > > -               if (m->next != NULL) {
> > > > > > +               if (m->nb_segs > 1) {
> > > > > >                        m->next = NULL;
> > > > > >                        m->nb_segs = 1;
> > > > > >                }
> > > > >
> > > > > Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> > > > > It looks like the cache miss is unavoidable.
> > > >
> > > > As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
> > > > then it could be something like:
> > > >
> > > > if (rte_pktmbuf_prefree_seg(m[x] != NULL)
> > > >    rte_mempool_put(pool[x], m[x]);
> > > >
> > > > Then what you suggested above might help.
> > >
> > > After another thought - we have to check m->next not m->nb_segs.
> > > There could be a situations where nb_segs==1, but m->next != NULL
> > > (2-nd segment of the 3 segment packet for example).
> > > So probably we have to keep it as it is.
> > > Sorry for the noise
> > > Konstantin
> >
> > It's still worth considering as an option. We could check nb_segs for
> > the first segment of a packet and thereafter iterate using the next
> > pointer.
> 
> In multi-seg case PMD frees segments (not packets).
> It could happen that first segment would be already freed while the second
> still not.
> 
> > It means that your idea of storing the pool pointer for each
> > mbuf becomes useful for single-segment packets.
> 
> But then we'll have to support 2 different flavors of prefree_seg().
> Alternative would be to change all PMDs multi-seg TX so when first segment is
> going to be freed we update nb_segs for the second and so on.
> Both options seems like too much hassle.
> 

As  a side thought what probably can be  done to minimize access
to 2-nd mbuf's cache line at PMD tx free:
Introduce something like that:
static __rte_always_inline struct rte_mepool *
xxx_prefree_seg(struct rte_mbuf *m)
{
        if (rte_mbuf_refcnt_read(m) == 1 && RTE_MBUF_DIRECT(m)) {
                if (m->next != NULL) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }
                return m->pool;
       }
       return NULL;
}

Then at tx_burst() before doing actual TX PMD can call that function
and store it's return value along with mbuf:
..
m[x] = pkt;
pool[x] = xxx_prefree_seg(m[x]);

Then at free time, we can do something ilike:
If (pool[x] != NULL) 
   rte_mempool_put(pool[x], m[x]);
else
    rte_pktmbuf_free_seg(m[x]);

We still access m->next but doing that before actual TX is done.
Hopefully there would be more chances that m->next
is still in the cache at that moment.
In theory, that might help for most common case when
we have direct mbufs with refcnt==1.
Though for indirect/refcnt>1 mbufs there would be extra overhead.
Konstantin


More information about the dev mailing list