[dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data

Chilikin, Andrey andrey.chilikin at intel.com
Wed Jul 6 14:21:48 CEST 2016


Hi Oliver,

> -----Original Message-----
> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> Sent: Wednesday, July 6, 2016 1:09 PM
> To: Chilikin, Andrey <andrey.chilikin at intel.com>; Liang, Cunming
> <cunming.liang at intel.com>; dev at dpdk.org
> Cc: Ananyev, Konstantin <konstantin.ananyev at intel.com>
> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
> from data
> 
> Hi Andrey,
> 
> On 07/06/2016 01:59 PM, Chilikin, Andrey wrote:
> > Hi Oliver,
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Olivier MATZ
> >> Sent: Wednesday, July 6, 2016 8:43 AM
> >> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get
> >> packet type from data
> >>
> >> Hi Cunming,
> >>
> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
> >>> Hi Olivier,
> >>>
> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote:
> >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a mbuf
> >>>> and returns its packet type. For now, the following packet types
> >>>> are
> >>>> parsed:
> >>>>     L2: Ether
> >>>>     L3: IPv4, IPv6
> >>>>     L4: TCP, UDP, SCTP
> >>>>
> >>>> The goal here is to provide a reference implementation for packet
> >>>> type parsing. This function will be used by testpmd in next
> >>>> commits, allowing to compare its result with the value given by the
> hardware.
> >>>>
> >>>> This function will also be useful when implementing Rx offload
> >>>> support in virtio pmd. Indeed, the virtio protocol gives the csum
> >>>> start and offset, but it does not give the L4 protocol nor it tells
> >>>> if the checksum is relevant for inner or outer. This information
> >>>> has to be known to properly set the ol_flags in mbuf.
> >>>>
> >>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
> >>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com>
> >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
> >>>> ---
> >>>>   doc/guides/rel_notes/release_16_11.rst |   5 +
> >>>>   lib/librte_mbuf/Makefile               |   5 +-
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
> >>>> +++++++++++++++++++++++++++++++++
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
> >>>>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
> >>>>   5 files changed, 286 insertions(+), 2 deletions(-)
> >>>>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
> >>>>
> >>>> [...]
> >>>> +
> >>>> +/* parse mbuf data to get packet type */ uint32_t
> >>>> +rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
> >>>> +    struct rte_mbuf_hdr_lens *hdr_lens) {
> >>>> +    struct rte_mbuf_hdr_lens local_hdr_lens;
> >>>> +    const struct ether_hdr *eh;
> >>>> +    struct ether_hdr eh_copy;
> >>>> +    uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
> >>>> +    uint32_t off = 0;
> >>>> +    uint16_t proto;
> >>>> +
> >>>> +    if (hdr_lens == NULL)
> >>>> +        hdr_lens = &local_hdr_lens;
> >>>> +
> >>>> +    eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
> >>>> +    if (unlikely(eh == NULL))
> >>>> +        return 0;
> >>>> +    proto = eh->ether_type;
> >>>> +    off = sizeof(*eh);
> >>>> +    hdr_lens->l2_len = off;
> >>>> +
> >>>> +    if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
> >>>> +        const struct ipv4_hdr *ip4h;
> >>>> +        struct ipv4_hdr ip4h_copy;
> >>>> +
> >>>> +        ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy);
> >>>> +        if (unlikely(ip4h == NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        pkt_type |= ptype_l3_ip(ip4h->version_ihl);
> >>>> +        hdr_lens->l3_len = ip4_hlen(ip4h);
> >>>> +        off += hdr_lens->l3_len;
> >>>> +        if (ip4h->fragment_offset &
> >>>> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
> >>>> +                    IPV4_HDR_MF_FLAG)) {
> >>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len = 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        proto = ip4h->next_proto_id;
> >>>> +        pkt_type |= ptype_l4(proto);
> >>>> +    } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
> >>>> +        const struct ipv6_hdr *ip6h;
> >>>> +        struct ipv6_hdr ip6h_copy;
> >>>> +        int frag = 0;
> >>>> +
> >>>> +        ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy);
> >>>> +        if (unlikely(ip6h == NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        proto = ip6h->proto;
> >>>> +        hdr_lens->l3_len = sizeof(*ip6h);
> >>>> +        off += hdr_lens->l3_len;
> >>>> +        pkt_type |= ptype_l3_ip6(proto);
> >>>> +        if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT)
> {
> >>>> +            proto = skip_ip6_ext(proto, m, &off, &frag);
> >>>> +            hdr_lens->l3_len = off - hdr_lens->l2_len;
> >>>> +        }
> >>>> +        if (proto == 0)
> >>>> +            return pkt_type;
> >>>> +        if (frag) {
> >>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len = 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        pkt_type |= ptype_l4(proto);
> >>>> +    }
> >>>> +
> >>>> +    if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) {
> >>>> +        hdr_lens->l4_len = sizeof(struct udp_hdr);
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) {
> >>>> +        const struct tcp_hdr *th;
> >>>> +        struct tcp_hdr th_copy;
> >>>> +
> >>>> +        th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
> >>>> +        if (unlikely(th == NULL))
> >>>> +            return pkt_type & (RTE_PTYPE_L2_MASK |
> >>>> +                RTE_PTYPE_L3_MASK);
> >>>> +        hdr_lens->l4_len = (th->data_off & 0xf0) >> 2;
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) {
> >>>> +        hdr_lens->l4_len = sizeof(struct sctp_hdr);
> >>>> +    } else {
> >>>> +        hdr_lens->l4_len = 0;
> >>>> +    }
> >>>> +
> >>>> +    return pkt_type;
> >>>> +}
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> index 4a34678..f468520 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> @@ -545,6 +545,49 @@ extern "C" {
> >>>>           RTE_PTYPE_INNER_L3_MASK |                \
> >>>>           RTE_PTYPE_INNER_L4_MASK))
> >>>>   +struct rte_mbuf;
> >>>> +
> >>>> +/**
> >>>> + * Structure containing header lengths associated to a packet.
> >>>> + */
> >>>> +struct rte_mbuf_hdr_lens {
> >>>> +    uint8_t l2_len;
> >>>> +    uint8_t l3_len;
> >>>> +    uint8_t l4_len;
> >>>> +    uint8_t tunnel_len;
> >>>> +    uint8_t inner_l2_len;
> >>>> +    uint8_t inner_l3_len;
> >>>> +    uint8_t inner_l4_len;
> >>>> +};
> >>> [LC] The header parsing graph usually is not unique. The definition
> >>> maybe nice for the basic IP and L4 tunnel.
> >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac,
> >>> mpls
> >>> l2/l3 tunnel.
> >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
> >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme.
> >>> In this case, the fixed function is to support below.
> >>>
> >>> + * Supported packet types are:
> >>> + *   L2: Ether
> >>> + *   L3: IPv4, IPv6
> >>> + *   L4: TCP, UDP, SCTP
> >>>
> >>> Of course, it can add more packet type detection logic in future.
> >>> But the more support, the higher the cost.
> >>>
> >>> One of the alternative way is to allow registering parser pair. APP
> >>> decides to choose the predefined scheme(by DPDK LIB), or to
> >>> self-define the parsing logic.
> >>> In this way, the scheme can do some assumption for the specific case
> >>> and ignore some useless graph detection.
> >>> In addition, besides the SW parser, the HW parser(identified by
> >>> packet_type in mbuf) can be turn on/off by leveraging the same manner.
> >>
> >> Sorry, I'm not sure I'm fully getting what you are saying. If I
> >> understand well, you would like to have something more flexible that
> >> supports the registration of protocol to be recognized?
> >>
> >> I'm not sure having a function with a dynamic registration method
> >> would really increase the performance compared to a static complete
> function.
> >> Actually, we will never support a tons of protocols since each layer
> >> packet type is 4 bits, and since it requires that at least one hw supports it.
> >
> > This patch will be very useful as a reference implementation, but it also
> highlights an issue with the current implementation of packet types reporting
> by HW and SW - as you just mentioned there are only 4 bits per each layer. As
> these 4 bit are used as a enumeration it is impossible to reports multiple
> headers located on the same layer. MPLS is one example, different packets
> could have different numbers of MPLS labels, but it is impossible to report
> using current packet_type structure.
> >
> > It is possible, however, to  program HW to report user (application) specific
> packet types. For example, for IPoMPLS with one MPLS label, HW will report
> packet type A, but for IPoMPLS with two MPLS labels HW will reports packet
> type B. In this case, instead of defining and supporting tons of statically defined
> (or enumerated) protocol headers combinations, application will register
> packet types it expects from HW in addition to standard packet types. At the
> moment we  have high bits of packet_type reserved, so one possible solution
> would be to use the highest bit to indicate that this is user defined packet_type,
> specific to the application. Then it could be used with HW and with SW parser.
> For example, packet_type 0x8000000A is IPoMPLS with one MPLS label,
> 0x8000000B is IPoMPLS with two MPLS labels and so on.
> 
> Thank you for the explanation. From your description, I wonder if the flow
> director API recently [1] proposed by Adrien wouldn't solve this issue?
> 
> [1] http://dpdk.org/ml/archives/dev/2016-July/043365.html

I'm reviewing Adrien's proposal (it is a big document :)) and reply with my comment after reviewing.

Regards,
Andrey  

> Regards,
> Olivier


More information about the dev mailing list