[dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()

Message ID B5748614-2B5F-402D-9A09-1C0781109000@mellanox.com (mailing list archive)
State Not Applicable, archived
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail apply patch file failure

Commit Message

Yongseok Koh Feb. 13, 2018, 10:45 p.m. UTC
  Hi Olivier

I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
to the cases where almost mbufs have single segment.

A customer reported high rate of cache misses in the code and I thought the
following patch could be helpful. I haven't had them try it yet but just wanted
to hear from you.

I'd appreciate if you can review this idea.


Thanks,
Yongseok
  

Comments

Yongseok Koh Feb. 14, 2018, 3:16 a.m. UTC | #1
> On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> Hi Olivier
> 
> I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> to the cases where almost mbufs have single segment.
> 
> A customer reported high rate of cache misses in the code and I thought the
> following patch could be helpful. I haven't had them try it yet but just wanted
> to hear from you.
> 
> I'd appreciate if you can review this idea.
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 62740254d..96edbcb9e 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>                if (RTE_MBUF_INDIRECT(m))
>                        rte_pktmbuf_detach(m);
> 
> -               if (m->next != NULL) {
> +               if (m->nb_segs > 1) {
>                        m->next = NULL;
>                        m->nb_segs = 1;
>                }
> @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
>                if (RTE_MBUF_INDIRECT(m))
>                        rte_pktmbuf_detach(m);
> 
> -               if (m->next != NULL) {
> +               if (m->nb_segs > 1) {
>                        m->next = NULL;
>                        m->nb_segs = 1;
>                }

Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
It looks like the cache miss is unavoidable.

Thanks
Yongseok
  
Ananyev, Konstantin Feb. 14, 2018, 11:48 a.m. UTC | #2
Hi Yongseok,

> > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >
> > Hi Olivier
> >
> > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > to the cases where almost mbufs have single segment.
> >
> > A customer reported high rate of cache misses in the code and I thought the
> > following patch could be helpful. I haven't had them try it yet but just wanted
> > to hear from you.
> >
> > I'd appreciate if you can review this idea.
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 62740254d..96edbcb9e 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >                if (RTE_MBUF_INDIRECT(m))
> >                        rte_pktmbuf_detach(m);
> >
> > -               if (m->next != NULL) {
> > +               if (m->nb_segs > 1) {
> >                        m->next = NULL;
> >                        m->nb_segs = 1;
> >                }
> > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> >                if (RTE_MBUF_INDIRECT(m))
> >                        rte_pktmbuf_detach(m);
> >
> > -               if (m->next != NULL) {
> > +               if (m->nb_segs > 1) {
> >                        m->next = NULL;
> >                        m->nb_segs = 1;
> >                }
> 
> Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> It looks like the cache miss is unavoidable.

As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
then it could be something like:

if (rte_pktmbuf_prefree_seg(m[x] != NULL)
   rte_mempool_put(pool[x], m[x]);

Then what you suggested above might help.
Konstantin
  
Ananyev, Konstantin Feb. 14, 2018, 12:03 p.m. UTC | #3
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, February 14, 2018 11:48 AM
> To: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> 
> Hi Yongseok,
> 
> > > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > >
> > > Hi Olivier
> > >
> > > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > > to the cases where almost mbufs have single segment.
> > >
> > > A customer reported high rate of cache misses in the code and I thought the
> > > following patch could be helpful. I haven't had them try it yet but just wanted
> > > to hear from you.
> > >
> > > I'd appreciate if you can review this idea.
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 62740254d..96edbcb9e 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >                if (RTE_MBUF_INDIRECT(m))
> > >                        rte_pktmbuf_detach(m);
> > >
> > > -               if (m->next != NULL) {
> > > +               if (m->nb_segs > 1) {
> > >                        m->next = NULL;
> > >                        m->nb_segs = 1;
> > >                }
> > > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > >                if (RTE_MBUF_INDIRECT(m))
> > >                        rte_pktmbuf_detach(m);
> > >
> > > -               if (m->next != NULL) {
> > > +               if (m->nb_segs > 1) {
> > >                        m->next = NULL;
> > >                        m->nb_segs = 1;
> > >                }
> >
> > Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> > It looks like the cache miss is unavoidable.
> 
> As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
> then it could be something like:
> 
> if (rte_pktmbuf_prefree_seg(m[x] != NULL)
>    rte_mempool_put(pool[x], m[x]);
> 
> Then what you suggested above might help.

After another thought - we have to check m->next not m->nb_segs.
There could be a situations where nb_segs==1, but m->next != NULL
(2-nd segment of the 3 segment packet for example).
So probably we have to keep it as it is.
Sorry for the noise
Konstantin
  
Bruce Richardson Feb. 14, 2018, 12:11 p.m. UTC | #4
On Wed, Feb 14, 2018 at 12:03:55PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > Sent: Wednesday, February 14, 2018 11:48 AM
> > To: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> > 
> > Hi Yongseok,
> > 
> > > > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > > >
> > > > Hi Olivier
> > > >
> > > > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > > > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > > > to the cases where almost mbufs have single segment.
> > > >
> > > > A customer reported high rate of cache misses in the code and I thought the
> > > > following patch could be helpful. I haven't had them try it yet but just wanted
> > > > to hear from you.
> > > >
> > > > I'd appreciate if you can review this idea.
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index 62740254d..96edbcb9e 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >                if (RTE_MBUF_INDIRECT(m))
> > > >                        rte_pktmbuf_detach(m);
> > > >
> > > > -               if (m->next != NULL) {
> > > > +               if (m->nb_segs > 1) {
> > > >                        m->next = NULL;
> > > >                        m->nb_segs = 1;
> > > >                }
> > > > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > >                if (RTE_MBUF_INDIRECT(m))
> > > >                        rte_pktmbuf_detach(m);
> > > >
> > > > -               if (m->next != NULL) {
> > > > +               if (m->nb_segs > 1) {
> > > >                        m->next = NULL;
> > > >                        m->nb_segs = 1;
> > > >                }
> > >
> > > Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> > > It looks like the cache miss is unavoidable.
> > 
> > As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
> > then it could be something like:
> > 
> > if (rte_pktmbuf_prefree_seg(m[x] != NULL)
> >    rte_mempool_put(pool[x], m[x]);
> > 
> > Then what you suggested above might help.
> 
> After another thought - we have to check m->next not m->nb_segs.
> There could be a situations where nb_segs==1, but m->next != NULL
> (2-nd segment of the 3 segment packet for example).
> So probably we have to keep it as it is.
> Sorry for the noise
> Konstantin

It's still worth considering as an option. We could check nb_segs for
the first segment of a packet and thereafter iterate using the next
pointer. It means that your idea of storing the pool pointer for each
mbuf becomes useful for single-segment packets.

/Bruce
  
Ananyev, Konstantin Feb. 14, 2018, 12:35 p.m. UTC | #5
> -----Original Message-----
> From: Richardson, Bruce
> Sent: Wednesday, February 14, 2018 12:12 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> 
> On Wed, Feb 14, 2018 at 12:03:55PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > > Sent: Wednesday, February 14, 2018 11:48 AM
> > > To: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> > >
> > > Hi Yongseok,
> > >
> > > > > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > > > >
> > > > > Hi Olivier
> > > > >
> > > > > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > > > > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > > > > to the cases where almost mbufs have single segment.
> > > > >
> > > > > A customer reported high rate of cache misses in the code and I thought the
> > > > > following patch could be helpful. I haven't had them try it yet but just wanted
> > > > > to hear from you.
> > > > >
> > > > > I'd appreciate if you can review this idea.
> > > > >
> > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > index 62740254d..96edbcb9e 100644
> > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > >                        rte_pktmbuf_detach(m);
> > > > >
> > > > > -               if (m->next != NULL) {
> > > > > +               if (m->nb_segs > 1) {
> > > > >                        m->next = NULL;
> > > > >                        m->nb_segs = 1;
> > > > >                }
> > > > > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > >                        rte_pktmbuf_detach(m);
> > > > >
> > > > > -               if (m->next != NULL) {
> > > > > +               if (m->nb_segs > 1) {
> > > > >                        m->next = NULL;
> > > > >                        m->nb_segs = 1;
> > > > >                }
> > > >
> > > > Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> > > > It looks like the cache miss is unavoidable.
> > >
> > > As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
> > > then it could be something like:
> > >
> > > if (rte_pktmbuf_prefree_seg(m[x] != NULL)
> > >    rte_mempool_put(pool[x], m[x]);
> > >
> > > Then what you suggested above might help.
> >
> > After another thought - we have to check m->next not m->nb_segs.
> > There could be a situations where nb_segs==1, but m->next != NULL
> > (2-nd segment of the 3 segment packet for example).
> > So probably we have to keep it as it is.
> > Sorry for the noise
> > Konstantin
> 
> It's still worth considering as an option. We could check nb_segs for
> the first segment of a packet and thereafter iterate using the next
> pointer.

In multi-seg case PMD frees segments (not packets).
It could happen that first segment would be already freed while the second 
still not.

> It means that your idea of storing the pool pointer for each
> mbuf becomes useful for single-segment packets.

But then we'll have to support 2 different flavors of prefree_seg().
Alternative would be to change all PMDs multi-seg TX so when first segment is 
going to be freed we update nb_segs for the second and so on.
Both options seems like too much hassle.

Konstantin
  
Ananyev, Konstantin Feb. 14, 2018, 2:16 p.m. UTC | #6
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> Sent: Wednesday, February 14, 2018 12:35 PM
> To: Richardson, Bruce <bruce.richardson@intel.com>
> Cc: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; dev@dpdk.org
> Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> 
> 
> 
> > -----Original Message-----
> > From: Richardson, Bruce
> > Sent: Wednesday, February 14, 2018 12:12 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>; dev@dpdk.org
> > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> >
> > On Wed, Feb 14, 2018 at 12:03:55PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin
> > > > Sent: Wednesday, February 14, 2018 11:48 AM
> > > > To: Yongseok Koh <yskoh@mellanox.com>; Olivier Matz <olivier.matz@6wind.com>
> > > > Cc: dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] Accessing 2nd cacheline in rte_pktmbuf_prefree_seg()
> > > >
> > > > Hi Yongseok,
> > > >
> > > > > > On Feb 13, 2018, at 2:45 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > > > > >
> > > > > > Hi Olivier
> > > > > >
> > > > > > I'm wondering why rte_pktmbuf_prefree_seg() checks m->next instead of
> > > > > > m->nb_segs? As 'next' is in the 2nd cacheline, checking nb_segs seems beneficial
> > > > > > to the cases where almost mbufs have single segment.
> > > > > >
> > > > > > A customer reported high rate of cache misses in the code and I thought the
> > > > > > following patch could be helpful. I haven't had them try it yet but just wanted
> > > > > > to hear from you.
> > > > > >
> > > > > > I'd appreciate if you can review this idea.
> > > > > >
> > > > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > > > index 62740254d..96edbcb9e 100644
> > > > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > > > @@ -1398,7 +1398,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > > >                        rte_pktmbuf_detach(m);
> > > > > >
> > > > > > -               if (m->next != NULL) {
> > > > > > +               if (m->nb_segs > 1) {
> > > > > >                        m->next = NULL;
> > > > > >                        m->nb_segs = 1;
> > > > > >                }
> > > > > > @@ -1410,7 +1410,7 @@ rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > > > > >                if (RTE_MBUF_INDIRECT(m))
> > > > > >                        rte_pktmbuf_detach(m);
> > > > > >
> > > > > > -               if (m->next != NULL) {
> > > > > > +               if (m->nb_segs > 1) {
> > > > > >                        m->next = NULL;
> > > > > >                        m->nb_segs = 1;
> > > > > >                }
> > > > >
> > > > > Well, m->pool in the 2nd cacheline has to be accessed anyway in order to put it back to the mempool.
> > > > > It looks like the cache miss is unavoidable.
> > > >
> > > > As a thought: in theory PMD can store pool pointer together with each mbuf it has to free,
> > > > then it could be something like:
> > > >
> > > > if (rte_pktmbuf_prefree_seg(m[x] != NULL)
> > > >    rte_mempool_put(pool[x], m[x]);
> > > >
> > > > Then what you suggested above might help.
> > >
> > > After another thought - we have to check m->next not m->nb_segs.
> > > There could be a situations where nb_segs==1, but m->next != NULL
> > > (2-nd segment of the 3 segment packet for example).
> > > So probably we have to keep it as it is.
> > > Sorry for the noise
> > > Konstantin
> >
> > It's still worth considering as an option. We could check nb_segs for
> > the first segment of a packet and thereafter iterate using the next
> > pointer.
> 
> In multi-seg case PMD frees segments (not packets).
> It could happen that first segment would be already freed while the second
> still not.
> 
> > It means that your idea of storing the pool pointer for each
> > mbuf becomes useful for single-segment packets.
> 
> But then we'll have to support 2 different flavors of prefree_seg().
> Alternative would be to change all PMDs multi-seg TX so when first segment is
> going to be freed we update nb_segs for the second and so on.
> Both options seems like too much hassle.
> 

As  a side thought what probably can be  done to minimize access
to 2-nd mbuf's cache line at PMD tx free:
Introduce something like that:
static __rte_always_inline struct rte_mepool *
xxx_prefree_seg(struct rte_mbuf *m)
{
        if (rte_mbuf_refcnt_read(m) == 1 && RTE_MBUF_DIRECT(m)) {
                if (m->next != NULL) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }
                return m->pool;
       }
       return NULL;
}

Then at tx_burst() before doing actual TX PMD can call that function
and store it's return value along with mbuf:
..
m[x] = pkt;
pool[x] = xxx_prefree_seg(m[x]);

Then at free time, we can do something ilike:
If (pool[x] != NULL) 
   rte_mempool_put(pool[x], m[x]);
else
    rte_pktmbuf_free_seg(m[x]);

We still access m->next but doing that before actual TX is done.
Hopefully there would be more chances that m->next
is still in the cache at that moment.
In theory, that might help for most common case when
we have direct mbufs with refcnt==1.
Though for indirect/refcnt>1 mbufs there would be extra overhead.
Konstantin
  

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 62740254d..96edbcb9e 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1398,7 +1398,7 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
                if (RTE_MBUF_INDIRECT(m))
                        rte_pktmbuf_detach(m);
 
-               if (m->next != NULL) {
+               if (m->nb_segs > 1) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }
@@ -1410,7 +1410,7 @@  rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
                if (RTE_MBUF_INDIRECT(m))
                        rte_pktmbuf_detach(m);
 
-               if (m->next != NULL) {
+               if (m->nb_segs > 1) {
                        m->next = NULL;
                        m->nb_segs = 1;
                }