[dpdk-dev] [PATCH 11/13] mbuf: move l2_len and l3_len to second cache line

Yerden Zhumabekov e_zhumabekov at sts.kz
Thu Sep 4 13:00:12 CEST 2014


I get your point. I've also read throught the code of various PMDs and
have found no indication of setting l2_len/l3_len fields as well.

As for testing, we'd be happy to test the patchset but we are now in
process of building our testing facilities so we are not ready to
provide enough workload for the hardware/software. I was also wondering
if anyone has run some test and can provide some numbers on that matter.

Personally, I don't think frag/reassemly app is a perfect example for
evaluating 2nd cache line performance penalty. The offsets to L3 and L4
headers need to be calculated for all TCP/IP traffic and fragmented
traffic is not representative in this case. Maybe it would be better to
write an app which calculates these offsets for different set of mbufs
and provides some stats. For example, l2fwd/l3fwd + additional l2_len
and l3_len calculation.

And I'm also figuring out how to rewrite our app/libs (prefetch etc) to
reflect the future changes in mbuf, hence my concerns :)


04.09.2014 16:27, Bruce Richardson пишет:
> Hi Yerden,
>
> I understand your concerns and it's good to have this discussion.
>
> There are a number of reasons why I've moved these particular fields
> to the second cache line. Firstly, the main reason is that, obviously enough,
> not all fields will fit in cache line 0, and we need to prioritize what does
> get stored there. The guiding principle behind what fields get moved or not
> that I've chosen to use for this patch set is to move fields that are not
> used on the receive path (or the fastpath receive path, more specifically -
> so that we can move fields only used by jumbo frames that span mbufs) to the
> second cache line. From a search through the existing codebase, there are no
> drivers which set the l2/l3 length fields on RX, this is only used in
> reassembly libraries/apps and by the drivers on TX.
>
> The other reason for moving it to the second cache line is that it logically
> belongs with all the other length fields that we need to add to enable
> tunneling support. [To get an idea of the extra fields that I propose adding
> to the mbuf, please see the RFC patchset I sent out previously as "[RFC 
> PATCH 00/14] Extend the mbuf structure"]. While we probably can fit the 16-bits
> needed for l2/l3 length on the mbuf line 0, there is not enough room for all
> the lengths so we would end up splitting them with other fields in between.
>
> So, in terms of what do to about this particular issue. I would hope that for
> applications that use these fields the impact should be small and/or possible
> to work around e.g. maybe prefetch second cache line on RX in driver. If not,
> then I'm happy to see about withdrawing this particular change and seeing if
> we can keep l2/l3 lengths on cache line zero, with other length fields being
> on cache line 1.
>
> Question: would you consider the ip fragmentation and reassembly example apps
> in the Intel DPDK releases good examples to test to see the impacts of this
> change, or is there some other test you would prefer that I look to do? 
> Can you perhaps test out the patch sets for the mbuf that I've upstreamed so
> far and let me know what regressions, if any, you see in your use-case
> scenarios?
>
> Regards,
> /Bruce
>
-- 
Sincerely,

Yerden Zhumabekov
STS, ACI
Astana, KZ



More information about the dev mailing list