[dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data

Olivier Matz olivier.matz at 6wind.com
Thu Jul 7 17:48:41 CEST 2016


Hi Cunming,

Thank you for your feedback.

On 07/07/2016 10:19 AM, Liang, Cunming wrote:
> Hi Olivier,
> 
>> -----Original Message-----
>> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
>> Sent: Wednesday, July 06, 2016 3:43 PM
>> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
>> from data
>>
>> Hi Cunming,
>>
>> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
>>> Hi Olivier,
>>>
>>> On 7/5/2016 11:41 PM, Olivier Matz wrote:
>>>> Introduce the function rte_pktmbuf_get_ptype() that parses a
>>>> mbuf and returns its packet type. For now, the following packet
>>>> types are parsed:
>>>>     L2: Ether
>>>>     L3: IPv4, IPv6
>>>>     L4: TCP, UDP, SCTP
>>>>
>>>> The goal here is to provide a reference implementation for packet type
>>>> parsing. This function will be used by testpmd in next commits, allowing
>>>> to compare its result with the value given by the hardware.
>>>>
>>>> This function will also be useful when implementing Rx offload support
>>>> in virtio pmd. Indeed, the virtio protocol gives the csum start and
>>>> offset, but it does not give the L4 protocol nor it tells if the
>>>> checksum is relevant for inner or outer. This information has to be
>>>> known to properly set the ol_flags in mbuf.
>>>>
>>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
>>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com>
>>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
>>>> ---
>>>>   doc/guides/rel_notes/release_16_11.rst |   5 +
>>>>   lib/librte_mbuf/Makefile               |   5 +-
>>>>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
>>>> +++++++++++++++++++++++++++++++++
>>>>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
>>>>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
>>>>   5 files changed, 286 insertions(+), 2 deletions(-)
>>>>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
>>>>
>>>> [...]
>>>> +
>>>> +/* parse mbuf data to get packet type */
>>>> +uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
>>>> +    struct rte_mbuf_hdr_lens *hdr_lens)
>>>> +{
>>>> +    struct rte_mbuf_hdr_lens local_hdr_lens;
>>>> +    const struct ether_hdr *eh;
>>>> +    struct ether_hdr eh_copy;
>>>> +    uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
>>>> +    uint32_t off = 0;
>>>> +    uint16_t proto;
>>>> +
>>>> +    if (hdr_lens == NULL)
>>>> +        hdr_lens = &local_hdr_lens;
>>>> +
>>>> +    eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
>>>> +    if (unlikely(eh == NULL))
>>>> +        return 0;
>>>> +    proto = eh->ether_type;
>>>> +    off = sizeof(*eh);
>>>> +    hdr_lens->l2_len = off;
>>>> +
>>>> +    if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
>>>> +        const struct ipv4_hdr *ip4h;
>>>> +        struct ipv4_hdr ip4h_copy;
>>>> +
>>>> +        ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy);
>>>> +        if (unlikely(ip4h == NULL))
>>>> +            return pkt_type;
>>>> +
>>>> +        pkt_type |= ptype_l3_ip(ip4h->version_ihl);
>>>> +        hdr_lens->l3_len = ip4_hlen(ip4h);
>>>> +        off += hdr_lens->l3_len;
>>>> +        if (ip4h->fragment_offset &
>>>> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
>>>> +                    IPV4_HDR_MF_FLAG)) {
>>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
>>>> +            hdr_lens->l4_len = 0;
>>>> +            return pkt_type;
>>>> +        }
>>>> +        proto = ip4h->next_proto_id;
>>>> +        pkt_type |= ptype_l4(proto);
>>>> +    } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
>>>> +        const struct ipv6_hdr *ip6h;
>>>> +        struct ipv6_hdr ip6h_copy;
>>>> +        int frag = 0;
>>>> +
>>>> +        ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy);
>>>> +        if (unlikely(ip6h == NULL))
>>>> +            return pkt_type;
>>>> +
>>>> +        proto = ip6h->proto;
>>>> +        hdr_lens->l3_len = sizeof(*ip6h);
>>>> +        off += hdr_lens->l3_len;
>>>> +        pkt_type |= ptype_l3_ip6(proto);
>>>> +        if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) {
>>>> +            proto = skip_ip6_ext(proto, m, &off, &frag);
>>>> +            hdr_lens->l3_len = off - hdr_lens->l2_len;
>>>> +        }
>>>> +        if (proto == 0)
>>>> +            return pkt_type;
>>>> +        if (frag) {
>>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
>>>> +            hdr_lens->l4_len = 0;
>>>> +            return pkt_type;
>>>> +        }
>>>> +        pkt_type |= ptype_l4(proto);
>>>> +    }
>>>> +
>>>> +    if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) {
>>>> +        hdr_lens->l4_len = sizeof(struct udp_hdr);
>>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) {
>>>> +        const struct tcp_hdr *th;
>>>> +        struct tcp_hdr th_copy;
>>>> +
>>>> +        th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
>>>> +        if (unlikely(th == NULL))
>>>> +            return pkt_type & (RTE_PTYPE_L2_MASK |
>>>> +                RTE_PTYPE_L3_MASK);
>>>> +        hdr_lens->l4_len = (th->data_off & 0xf0) >> 2;
>>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) {
>>>> +        hdr_lens->l4_len = sizeof(struct sctp_hdr);
>>>> +    } else {
>>>> +        hdr_lens->l4_len = 0;
>>>> +    }
>>>> +
>>>> +    return pkt_type;
>>>> +}
>>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> b/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> index 4a34678..f468520 100644
>>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
>>>> @@ -545,6 +545,49 @@ extern "C" {
>>>>           RTE_PTYPE_INNER_L3_MASK |                \
>>>>           RTE_PTYPE_INNER_L4_MASK))
>>>>   +struct rte_mbuf;
>>>> +
>>>> +/**
>>>> + * Structure containing header lengths associated to a packet.
>>>> + */
>>>> +struct rte_mbuf_hdr_lens {
>>>> +    uint8_t l2_len;
>>>> +    uint8_t l3_len;
>>>> +    uint8_t l4_len;
>>>> +    uint8_t tunnel_len;
>>>> +    uint8_t inner_l2_len;
>>>> +    uint8_t inner_l3_len;
>>>> +    uint8_t inner_l4_len;
>>>> +};
>>> [LC] The header parsing graph usually is not unique. The definition
>>> maybe nice for the basic IP and L4 tunnel.
>>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls
>>> l2/l3 tunnel.
>>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
>>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme.
>>> In this case, the fixed function is to support below.
>>>
>>> + * Supported packet types are:
>>> + *   L2: Ether
>>> + *   L3: IPv4, IPv6
>>> + *   L4: TCP, UDP, SCTP
>>>
>>> Of course, it can add more packet type detection logic in future. But
>>> the more support, the higher the cost.
>>>
>>> One of the alternative way is to allow registering parser pair. APP
>>> decides to choose the predefined scheme(by DPDK LIB), or to self-define
>>> the parsing logic.
>>> In this way, the scheme can do some assumption for the specific case and
>>> ignore some useless graph detection.
>>> In addition, besides the SW parser, the HW parser(identified by
>>> packet_type in mbuf) can be turn on/off by leveraging the same manner.
>>
>>
>> Sorry, I'm not sure I'm fully getting what you are saying. If I
>> understand well, you would like to have something more flexible that
>> supports the registration of protocol to be recognized?
> [LC] Not on that granularity, but on the entire parsing routine.
> rte_pktmbuf_get_ptype() as the common API, and can present in different behavior.
> Usually in different scenario, the interested packet set is different.
> For the specific case, can do some speculation pre-checking on the optimization perspective.
> 
>>
>> I'm not sure having a function with a dynamic registration method would
>> really increase the performance compared to a static complete function.
> [LC] No, it won't. But the overhead is not much, refer to rx_pkt_burst(is a callback either).
> If someone only interest for IPv4-NoFrag-TCP stream, the easiest way maybe not layer by layer detection.
> The straight forward way maybe, 1) load n bytes 2) compare mask 3) update ptype.
> We require a normal way to do SW detection, current version is perfect.
> My point is, we can provide a simple mechanism to allow other way, and under the same unified API.

Again, sorry, I'm not perfectly sure I understand what you are saying.

What you describe (mask packet data, then compare with a value) seems
very similar to what ovs does. Do you mean we should have an API for that?

I think once we have masked+compared the data, we may know much more
than just a packet_type.



> 
>> Actually, we will never support a tons of protocols since each layer
>> packet type is 4 bits, and since it requires that at least one hw
>> supports it.
> [LC] Agree, it is today. But maybe dynamic in future, packet type definition as a template.
>>
>> As described in the cover letter, the 2 main goals of this patchset are
>> to provide a reference implementation for packet type recognition, and
>> enable the support of virtio offloads (I'll send the patchset soon).
>> This function is adapted to these 2 usages. Are you thinking of another
>> use-case that would not be covered?
> [LC] That's excellent work.  Furthermore I believe it can cover all ethdev actually.
> When HW can't report some demand packet type, then fallback to your SW parser version.
> If the auto-switch can be transparent, that's perfect. Maybe rx callback and update ptype in mbuf?

I was also thinking about calling rte_pktmbuf_get_ptype() from a driver.
I think drivers should not access to mbuf data if it's not absolutely
required.
Calling rte_pktmbuf_get_ptype() from inside a rx callback seems easily
feasible, it may be useful for applications that mostly relies on
packet_type to select an action.


Regards,
Olivier


More information about the dev mailing list