[dpdk-dev] rte_mbuf.next in 2nd cacheline
Thomas Monjalon
thomas.monjalon at 6wind.com
Wed Jun 17 18:32:24 CEST 2015
2015-06-17 14:23, Damjan Marion:
>
> > On 17 Jun 2015, at 16:06, Bruce Richardson <bruce.richardson at intel.com> wrote:
> >
> > On Wed, Jun 17, 2015 at 01:55:57PM +0000, Damjan Marion (damarion) wrote:
> >>
> >>> On 15 Jun 2015, at 16:12, Bruce Richardson <bruce.richardson at intel.com> wrote:
> >>>
> >>> The next pointers always start out as NULL when the mbuf pool is created. The
> >>> only time it is set to non-NULL is when we have chained mbufs. If we never have
> >>> any chained mbufs, we never need to touch the next field, or even read it - since
> >>> we have the num-segments count in the first cache line. If we do have a multi-segment
> >>> mbuf, it's likely to be a big packet, so we have more processing time available
> >>> and we can then take the hit of setting the next pointer.
> >>
> >> There are applications which are not using rx offload, but they deal with chained mbufs.
> >> Why they are less important than ones using rx offload? This is something people
> >> should be able to configure on build time.
> >
> > It's not that they are less important, it's that the packet processing cycle count
> > budget is going to be greater. A packet which is 64 bytes, or 128 bytes in size
> > can make use of a number of RX offloads to reduce it's processing time. However,
> > a 64/128 packet is not going to be split across multiple buffers [unless we
> > are dealing with a very unusual setup!].
> >
> > To handle 64 byte packets at 40G line rate, one has 50 cycles per core per packet
> > when running at 3GHz. [3000000000 cycles / 59.5 mpps].
> > If we assume that we are dealing with fairly small buffers
> > here, and that anything greater than 1k packets are chained, we still have 626
> > cycles per 3GHz core per packet to work with for that 1k packet. Given that
> > "normal" DPDK buffers are 2k in size, we have over a thousand cycles per packet
> > for any packet that is split.
> >
> > In summary, packets spread across multiple buffers are large packets, and so have
> > larger packet cycle count budgets and so can much better absorb the cost of
> > touching a second cache line in the mbuf than a 64-byte packet can. Therefore,
> > we optimize for the 64B packet case.
>
> This makes sense if there is no other work to do on the same core.
> Otherwise it is better to spent those cycles on actual work instead of waiting for
> 2nd cache line...
You're probably right.
I wonder wether this flexibility can be implemented only in static lib builds?
More information about the dev
mailing list