[dpdk-dev] [PATCH v4 1/9] mbuf: new function to generate raw Tx offload value
Ananyev, Konstantin
konstantin.ananyev at intel.com
Sat Mar 30 15:20:31 CET 2019
Hi Olivier,
> > Operations to set/update bit-fields often cause compilers
> > to generate suboptimal code.
> > To help avoid such situation for tx_offload fields:
> > introduce new enum for tx_offload bit-fields lengths and offsets,
> > and new function to generate raw tx_offload value.
> >
> > Signed-off-by: Konstantin Ananyev <konstantin.ananyev at intel.com>
> > Acked-by: Akhil Goyal <akhil.goyal at nxp.com>
>
> I understand the need. Out of curiosity, do you have any performance
> numbers to share?
On my board (SKX):
for micro-benchmark (doing nothing but setting tx_offload for 1M mbufs in a loop)
the difference is more than 150% - from ~55 cycles to ~20 cycles per iteration.
For ipsec-secgw - ~3% improvement for tunneled outbound packets.
>
> Few cosmetic questions below.
>
> > ---
> > lib/librte_mbuf/rte_mbuf.h | 79 ++++++++++++++++++++++++++++++++++----
> > 1 file changed, 72 insertions(+), 7 deletions(-)
> >
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index d961ccaf6..0b197e8ce 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -479,6 +479,31 @@ struct rte_mbuf_sched {
> > uint16_t reserved; /**< Reserved. */
> > }; /**< Hierarchical scheduler */
> >
> > +/**
> > + * enum for the tx_offload bit-fields lenghts and offsets.
> > + * defines the layout of rte_mbuf tx_offload field.
> > + */
> > +enum {
> > + RTE_MBUF_L2_LEN_BITS = 7,
> > + RTE_MBUF_L3_LEN_BITS = 9,
> > + RTE_MBUF_L4_LEN_BITS = 8,
> > + RTE_MBUF_TSO_SEGSZ_BITS = 16,
> > + RTE_MBUF_OUTL3_LEN_BITS = 9,
> > + RTE_MBUF_OUTL2_LEN_BITS = 7,
> > + RTE_MBUF_L2_LEN_OFS = 0,
> > + RTE_MBUF_L3_LEN_OFS = RTE_MBUF_L2_LEN_OFS + RTE_MBUF_L2_LEN_BITS,
> > + RTE_MBUF_L4_LEN_OFS = RTE_MBUF_L3_LEN_OFS + RTE_MBUF_L3_LEN_BITS,
> > + RTE_MBUF_TSO_SEGSZ_OFS = RTE_MBUF_L4_LEN_OFS + RTE_MBUF_L4_LEN_BITS,
> > + RTE_MBUF_OUTL3_LEN_OFS =
> > + RTE_MBUF_TSO_SEGSZ_OFS + RTE_MBUF_TSO_SEGSZ_BITS,
> > + RTE_MBUF_OUTL2_LEN_OFS =
> > + RTE_MBUF_OUTL3_LEN_OFS + RTE_MBUF_OUTL3_LEN_BITS,
> > + RTE_MBUF_TXOFLD_UNUSED_OFS =
> > + RTE_MBUF_OUTL2_LEN_OFS + RTE_MBUF_OUTL2_LEN_BITS,
> > + RTE_MBUF_TXOFLD_UNUSED_BITS =
> > + sizeof(uint64_t) * CHAR_BIT - RTE_MBUF_TXOFLD_UNUSED_OFS,
> > +};
> > +
>
> What is the advantage of defining an enum instead of #defines?
No big difference here, just looks nicer to me.
>
> In any case, I wonder if it wouldn't be clearer to change the order like
> this:
>
> enum {
> RTE_MBUF_L2_LEN_OFS = 0,
> RTE_MBUF_L2_LEN_BITS = 7,
> RTE_MBUF_L3_LEN_OFS = RTE_MBUF_L2_LEN_OFS + RTE_MBUF_L2_LEN_BITS,
> RTE_MBUF_L3_LEN_BITS = 9,
> RTE_MBUF_L4_LEN_OFS = RTE_MBUF_L3_LEN_OFS + RTE_MBUF_L3_LEN_BITS,
> RTE_MBUF_L4_LEN_BITS = 8,
> ...
NP, can do this way.
>
>
> > /**
> > * The generic rte_mbuf, containing a packet mbuf.
> > */
> > @@ -640,19 +665,24 @@ struct rte_mbuf {
> > uint64_t tx_offload; /**< combined for easy fetch */
> > __extension__
> > struct {
> > - uint64_t l2_len:7;
> > + uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
> > /**< L2 (MAC) Header Length for non-tunneling pkt.
> > * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
> > */
> > - uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> > - uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> > - uint64_t tso_segsz:16; /**< TCP TSO segment size */
> > + uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
> > + /**< L3 (IP) Header Length. */
> > + uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
> > + /**< L4 (TCP/UDP) Header Length. */
> > + uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
> > + /**< TCP TSO segment size */
> >
> > /* fields for TX offloading of tunnels */
> > - uint64_t outer_l3_len:9; /**< Outer L3 (IP) Hdr Length. */
> > - uint64_t outer_l2_len:7; /**< Outer L2 (MAC) Hdr Length. */
> > + uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
> > + /**< Outer L3 (IP) Hdr Length. */
> > + uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
> > + /**< Outer L2 (MAC) Hdr Length. */
> >
> > - /* uint64_t unused:8; */
> > + /* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
> > };
> > };
> >
> > @@ -2243,6 +2273,41 @@ static inline int rte_pktmbuf_chain(struct rte_mbuf *head, struct rte_mbuf *tail
> > return 0;
> > }
> >
> > +/*
> > + * @warning
> > + * @b EXPERIMENTAL: This API may change without prior notice.
> > + *
> > + * For given input values generate raw tx_offload value.
> > + * @param il2
> > + * l2_len value.
> > + * @param il3
> > + * l3_len value.
> > + * @param il4
> > + * l4_len value.
> > + * @param tso
> > + * tso_segsz value.
> > + * @param ol3
> > + * outer_l3_len value.
> > + * @param ol2
> > + * outer_l2_len value.
> > + * @param unused
> > + * unused value.
> > + * @return
> > + * raw tx_offload value.
> > + */
> > +static __rte_always_inline uint64_t
> > +rte_mbuf_tx_offload(uint64_t il2, uint64_t il3, uint64_t il4, uint64_t tso,
> > + uint64_t ol3, uint64_t ol2, uint64_t unused)
> > +{
> > + return il2 << RTE_MBUF_L2_LEN_OFS |
> > + il3 << RTE_MBUF_L3_LEN_OFS |
> > + il4 << RTE_MBUF_L4_LEN_OFS |
> > + tso << RTE_MBUF_TSO_SEGSZ_OFS |
> > + ol3 << RTE_MBUF_OUTL3_LEN_OFS |
> > + ol2 << RTE_MBUF_OUTL2_LEN_OFS |
> > + unused << RTE_MBUF_TXOFLD_UNUSED_OFS;
> > +}
> > +
> > /**
>
>
> From what I see, the problem is quite similar to what was done with
> rte_mbuf_sched_set() recently. So I wondered if it was possible to
> declare a structure like this:
>
> struct rte_mbuf_ol_len {
> uint64_t l2_len:7;
> uint64_t l3_len:9; /**< L3 (IP) Header Length. */
> uint64_t l4_len:8; /**< L4 (TCP/UDP) Header Length. */
> ...
> }
>
> And have the set function like this:
>
> m->l = (struct rte_mbuf_ol_len) {
> .l2_len = l2_len,
> .l3_len = l3_len,
> .l4_len = l4_len,
> ...
>
> This would avoid the definition of the offsets and bits, but I didn't
> find any way to declare these fields as anonymous in the mbuf structure.
> Did you tried that way too?
I thought about such approach, but as you said above it would change
from unnamed struct to named one.
Which, as I understand, means API breakage.
So don't think the hassle will be worth the benefit.
Also the code wouldn't be totally identical - that approach will generate few
extra 'AND' instructions.
Konstantin
More information about the dev
mailing list