[dpdk-dev] [PATCH 11/13] mbuf: move l2_len and l3_len to second cache line

Bruce Richardson bruce.richardson at intel.com
Thu Sep 4 12:27:45 CEST 2014


On Thu, Sep 04, 2014 at 11:08:57AM +0600, Yerden Zhumabekov wrote:
> Hello Bruce,
> 
> I'm a little bit concerned about performance issues that would arise if
> these fields would go to the 2nd cache line.
> 
> For exampe, l2_len and l3_len fields are used by librte_ip_frag to find
> L3 and L4 headers position inside mbuf data. Thus, these values should
> be calculated by NIC offload, or by user on RX leg.
> 
> Secondly, (I wouldn't say on behalf of everyone, but) we use these
> fields in our libraries as well for needs of classification. For
> instance, in case you try to support other ethertypes which are not
> supported by NIC offload (MPLS, IPX etc), but you still need to point
> out L3 and L3 headers.
> 
> If my concerns are consistent, what would be possible suggestions?

Hi Yerden,

I understand your concerns and it's good to have this discussion.

There are a number of reasons why I've moved these particular fields
to the second cache line. Firstly, the main reason is that, obviously enough,
not all fields will fit in cache line 0, and we need to prioritize what does
get stored there. The guiding principle behind what fields get moved or not
that I've chosen to use for this patch set is to move fields that are not
used on the receive path (or the fastpath receive path, more specifically -
so that we can move fields only used by jumbo frames that span mbufs) to the
second cache line. From a search through the existing codebase, there are no
drivers which set the l2/l3 length fields on RX, this is only used in
reassembly libraries/apps and by the drivers on TX.

The other reason for moving it to the second cache line is that it logically
belongs with all the other length fields that we need to add to enable
tunneling support. [To get an idea of the extra fields that I propose adding
to the mbuf, please see the RFC patchset I sent out previously as "[RFC 
PATCH 00/14] Extend the mbuf structure"]. While we probably can fit the 16-bits
needed for l2/l3 length on the mbuf line 0, there is not enough room for all
the lengths so we would end up splitting them with other fields in between.

So, in terms of what do to about this particular issue. I would hope that for
applications that use these fields the impact should be small and/or possible
to work around e.g. maybe prefetch second cache line on RX in driver. If not,
then I'm happy to see about withdrawing this particular change and seeing if
we can keep l2/l3 lengths on cache line zero, with other length fields being
on cache line 1.

Question: would you consider the ip fragmentation and reassembly example apps
in the Intel DPDK releases good examples to test to see the impacts of this
change, or is there some other test you would prefer that I look to do? 
Can you perhaps test out the patch sets for the mbuf that I've upstreamed so
far and let me know what regressions, if any, you see in your use-case
scenarios?

Regards,
/Bruce

> 
> 03.09.2014 21:49, Bruce Richardson ?????:
> > The l2_len and l3_len fields are used for TX offloads and so should be
> > put on the second cache line, along with the other fields only used on
> > TX.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson at intel.com>
> 
> -- 
> Sincerely,
> 
> Yerden Zhumabekov
> STS, ACI
> Astana, KZ
> 


More information about the dev mailing list