[dpdk-stable] [PATCH] mbuf: optimize memory loads during mbuf freeing
Olivier Matz
olivier.matz at 6wind.com
Thu Mar 19 10:30:25 CET 2020
Hi,
On Mon, Mar 16, 2020 at 06:31:40PM +0000, Alexander Kozyrev wrote:
> Introduction of pinned external buffers doubled memory loads in the
> rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly
> code shows unnecessary load of the pool field of the rte_mbuf structure.
> Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))":
> Before the change the code was:
> movq 0x18(%rbx), %rax // load the ol_flags field
> test %r13, %rax // check if ol_flags equals to 0x60...0
> jz 0x9a8718 <Block 2> // jump out to "if (m->next != NULL)"
> After the change the code becomed:
> movq 0x18(%rbx), %rax // load ol_flags
> test %r14, %rax // check if ol_flags equals to 0x60...0
> jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)"
> movq 0x48(%rbx), %rax // load the pool field
> jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)"
> Look like this absolutely unneeded memory load of the pool field is an
> optimization for the external buffer case in GCC (4.8.5), since Clang
> generates the same assembly for both before and after the chenge versions.
> Plus, GCC favors the extrnal buffer case over the simple case.
> This assembly code layout causes the performance degradation because the
> rte_pktmbuf_prefree_seg() function is a part of a very hot path.
> Workaround this compilation issue by moving the check for pinned buffer
> apart from the check for external buffer and restore the initial code
> flow that favors the direct mbuf case over the external one.
>
> Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer")
> Cc: stable at dpdk.org
>
> Signed-off-by: Alexander Kozyrev <akozyrev at mellanox.com>
> Acked-by: Viacheslav Ovsiienko <viacheslavo at mellanox.com>
> ---
> lib/librte_mbuf/rte_mbuf.h | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 34679e0..ab9d3f5 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1335,10 +1335,9 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
> if (likely(rte_mbuf_refcnt_read(m) == 1)) {
>
> if (!RTE_MBUF_DIRECT(m)) {
> - if (!RTE_MBUF_HAS_EXTBUF(m) ||
> - !RTE_MBUF_HAS_PINNED_EXTBUF(m))
> - rte_pktmbuf_detach(m);
> - else if (__rte_pktmbuf_pinned_extbuf_decref(m))
> + rte_pktmbuf_detach(m);
> + if (RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
> + __rte_pktmbuf_pinned_extbuf_decref(m))
> return NULL;
> }
>
[...]
Reading the previous code again, it was correct but not easy
to understand, especially the:
if (!RTE_MBUF_HAS_EXTBUF(m) || !RTE_MBUF_HAS_PINNED_EXTBUF(m))
Knowing we already checked it is not a direct mbuf, it is equivalent to:
if (!RTE_MBUF_HAS_PINNED_EXTBUF(m))
I think the objective was to avoid an access to the pool flags if not
necessary.
Completely removing the test as you did is also functionally OK, because
rte_pktmbuf_detach() also does the check, and the code is even clearer.
I wonder however if doing this wouldn't avoid an access to the pool
flags for mbufs which have the IND_ATTACHED flags:
if (!RTE_MBUF_DIRECT(m)) {
rte_pktmbuf_detach(m);
if (RTE_MBUF_HAS_EXTBUF(m) &&
RTE_MBUF_HAS_PINNED_EXTBUF(m) &&
__rte_pktmbuf_pinned_extbuf_decref(m))
return NULL;
}
What do you think?
Nit: if you wish to send a v2, there are few english fixes that could
be done (becomed, chenge, extrnal)
Thanks
More information about the stable
mailing list