[dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type from data

Liang, Cunming cunming.liang at intel.com
Fri Jul 8 12:08:05 CEST 2016


Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz at 6wind.com]
> Sent: Thursday, July 07, 2016 11:49 PM
> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
> from data
> 
> Hi Cunming,
> 
> Thank you for your feedback.
> 
> On 07/07/2016 10:19 AM, Liang, Cunming wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com]
> >> Sent: Wednesday, July 06, 2016 3:43 PM
> >> To: Liang, Cunming <cunming.liang at intel.com>; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH 05/18] mbuf: add function to get packet type
> >> from data
> >>
> >> Hi Cunming,
> >>
> >> On 07/06/2016 08:44 AM, Liang, Cunming wrote:
> >>> Hi Olivier,
> >>>
> >>> On 7/5/2016 11:41 PM, Olivier Matz wrote:
> >>>> Introduce the function rte_pktmbuf_get_ptype() that parses a
> >>>> mbuf and returns its packet type. For now, the following packet
> >>>> types are parsed:
> >>>>     L2: Ether
> >>>>     L3: IPv4, IPv6
> >>>>     L4: TCP, UDP, SCTP
> >>>>
> >>>> The goal here is to provide a reference implementation for packet type
> >>>> parsing. This function will be used by testpmd in next commits, allowing
> >>>> to compare its result with the value given by the hardware.
> >>>>
> >>>> This function will also be useful when implementing Rx offload support
> >>>> in virtio pmd. Indeed, the virtio protocol gives the csum start and
> >>>> offset, but it does not give the L4 protocol nor it tells if the
> >>>> checksum is relevant for inner or outer. This information has to be
> >>>> known to properly set the ol_flags in mbuf.
> >>>>
> >>>> Signed-off-by: Didier Pallard <didier.pallard at 6wind.com>
> >>>> Signed-off-by: Jean Dao <jean.dao at 6wind.com>
> >>>> Signed-off-by: Olivier Matz <olivier.matz at 6wind.com>
> >>>> ---
> >>>>   doc/guides/rel_notes/release_16_11.rst |   5 +
> >>>>   lib/librte_mbuf/Makefile               |   5 +-
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.c       | 234
> >>>> +++++++++++++++++++++++++++++++++
> >>>>   lib/librte_mbuf/rte_mbuf_ptype.h       |  43 ++++++
> >>>>   lib/librte_mbuf/rte_mbuf_version.map   |   1 +
> >>>>   5 files changed, 286 insertions(+), 2 deletions(-)
> >>>>   create mode 100644 lib/librte_mbuf/rte_mbuf_ptype.c
> >>>>
> >>>> [...]
> >>>> +
> >>>> +/* parse mbuf data to get packet type */
> >>>> +uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
> >>>> +    struct rte_mbuf_hdr_lens *hdr_lens)
> >>>> +{
> >>>> +    struct rte_mbuf_hdr_lens local_hdr_lens;
> >>>> +    const struct ether_hdr *eh;
> >>>> +    struct ether_hdr eh_copy;
> >>>> +    uint32_t pkt_type = RTE_PTYPE_L2_ETHER;
> >>>> +    uint32_t off = 0;
> >>>> +    uint16_t proto;
> >>>> +
> >>>> +    if (hdr_lens == NULL)
> >>>> +        hdr_lens = &local_hdr_lens;
> >>>> +
> >>>> +    eh = rte_pktmbuf_read(m, off, sizeof(*eh), &eh_copy);
> >>>> +    if (unlikely(eh == NULL))
> >>>> +        return 0;
> >>>> +    proto = eh->ether_type;
> >>>> +    off = sizeof(*eh);
> >>>> +    hdr_lens->l2_len = off;
> >>>> +
> >>>> +    if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv4)) {
> >>>> +        const struct ipv4_hdr *ip4h;
> >>>> +        struct ipv4_hdr ip4h_copy;
> >>>> +
> >>>> +        ip4h = rte_pktmbuf_read(m, off, sizeof(*ip4h), &ip4h_copy);
> >>>> +        if (unlikely(ip4h == NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        pkt_type |= ptype_l3_ip(ip4h->version_ihl);
> >>>> +        hdr_lens->l3_len = ip4_hlen(ip4h);
> >>>> +        off += hdr_lens->l3_len;
> >>>> +        if (ip4h->fragment_offset &
> >>>> +                rte_cpu_to_be_16(IPV4_HDR_OFFSET_MASK |
> >>>> +                    IPV4_HDR_MF_FLAG)) {
> >>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len = 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        proto = ip4h->next_proto_id;
> >>>> +        pkt_type |= ptype_l4(proto);
> >>>> +    } else if (proto == rte_cpu_to_be_16(ETHER_TYPE_IPv6)) {
> >>>> +        const struct ipv6_hdr *ip6h;
> >>>> +        struct ipv6_hdr ip6h_copy;
> >>>> +        int frag = 0;
> >>>> +
> >>>> +        ip6h = rte_pktmbuf_read(m, off, sizeof(*ip6h), &ip6h_copy);
> >>>> +        if (unlikely(ip6h == NULL))
> >>>> +            return pkt_type;
> >>>> +
> >>>> +        proto = ip6h->proto;
> >>>> +        hdr_lens->l3_len = sizeof(*ip6h);
> >>>> +        off += hdr_lens->l3_len;
> >>>> +        pkt_type |= ptype_l3_ip6(proto);
> >>>> +        if ((pkt_type & RTE_PTYPE_L3_MASK) == RTE_PTYPE_L3_IPV6_EXT) {
> >>>> +            proto = skip_ip6_ext(proto, m, &off, &frag);
> >>>> +            hdr_lens->l3_len = off - hdr_lens->l2_len;
> >>>> +        }
> >>>> +        if (proto == 0)
> >>>> +            return pkt_type;
> >>>> +        if (frag) {
> >>>> +            pkt_type |= RTE_PTYPE_L4_FRAG;
> >>>> +            hdr_lens->l4_len = 0;
> >>>> +            return pkt_type;
> >>>> +        }
> >>>> +        pkt_type |= ptype_l4(proto);
> >>>> +    }
> >>>> +
> >>>> +    if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_UDP) {
> >>>> +        hdr_lens->l4_len = sizeof(struct udp_hdr);
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_TCP) {
> >>>> +        const struct tcp_hdr *th;
> >>>> +        struct tcp_hdr th_copy;
> >>>> +
> >>>> +        th = rte_pktmbuf_read(m, off, sizeof(*th), &th_copy);
> >>>> +        if (unlikely(th == NULL))
> >>>> +            return pkt_type & (RTE_PTYPE_L2_MASK |
> >>>> +                RTE_PTYPE_L3_MASK);
> >>>> +        hdr_lens->l4_len = (th->data_off & 0xf0) >> 2;
> >>>> +    } else if ((pkt_type & RTE_PTYPE_L4_MASK) == RTE_PTYPE_L4_SCTP) {
> >>>> +        hdr_lens->l4_len = sizeof(struct sctp_hdr);
> >>>> +    } else {
> >>>> +        hdr_lens->l4_len = 0;
> >>>> +    }
> >>>> +
> >>>> +    return pkt_type;
> >>>> +}
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> index 4a34678..f468520 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> +++ b/lib/librte_mbuf/rte_mbuf_ptype.h
> >>>> @@ -545,6 +545,49 @@ extern "C" {
> >>>>           RTE_PTYPE_INNER_L3_MASK |                \
> >>>>           RTE_PTYPE_INNER_L4_MASK))
> >>>>   +struct rte_mbuf;
> >>>> +
> >>>> +/**
> >>>> + * Structure containing header lengths associated to a packet.
> >>>> + */
> >>>> +struct rte_mbuf_hdr_lens {
> >>>> +    uint8_t l2_len;
> >>>> +    uint8_t l3_len;
> >>>> +    uint8_t l4_len;
> >>>> +    uint8_t tunnel_len;
> >>>> +    uint8_t inner_l2_len;
> >>>> +    uint8_t inner_l3_len;
> >>>> +    uint8_t inner_l4_len;
> >>>> +};
> >>> [LC] The header parsing graph usually is not unique. The definition
> >>> maybe nice for the basic IP and L4 tunnel.
> >>> However it can't scale out to other cases, e.g. qinq, mac-in-mac, mpls
> >>> l2/l3 tunnel.
> >>> The parsing logic of "rte_pktmbuf_get_ptype()" and the definition of
> >>> "struct rte_mbuf_hdr_lens" consist a pair for one specific parser scheme.
> >>> In this case, the fixed function is to support below.
> >>>
> >>> + * Supported packet types are:
> >>> + *   L2: Ether
> >>> + *   L3: IPv4, IPv6
> >>> + *   L4: TCP, UDP, SCTP
> >>>
> >>> Of course, it can add more packet type detection logic in future. But
> >>> the more support, the higher the cost.
> >>>
> >>> One of the alternative way is to allow registering parser pair. APP
> >>> decides to choose the predefined scheme(by DPDK LIB), or to self-define
> >>> the parsing logic.
> >>> In this way, the scheme can do some assumption for the specific case and
> >>> ignore some useless graph detection.
> >>> In addition, besides the SW parser, the HW parser(identified by
> >>> packet_type in mbuf) can be turn on/off by leveraging the same manner.
> >>
> >>
> >> Sorry, I'm not sure I'm fully getting what you are saying. If I
> >> understand well, you would like to have something more flexible that
> >> supports the registration of protocol to be recognized?
> > [LC] Not on that granularity, but on the entire parsing routine.
> > rte_pktmbuf_get_ptype() as the common API, and can present in different
> behavior.
> > Usually in different scenario, the interested packet set is different.
> > For the specific case, can do some speculation pre-checking on the optimization
> perspective.
> >
> >>
> >> I'm not sure having a function with a dynamic registration method would
> >> really increase the performance compared to a static complete function.
> > [LC] No, it won't. But the overhead is not much, refer to rx_pkt_burst(is a
> callback either).
> > If someone only interest for IPv4-NoFrag-TCP stream, the easiest way maybe
> not layer by layer detection.
> > The straight forward way maybe, 1) load n bytes 2) compare mask 3) update
> ptype.
> > We require a normal way to do SW detection, current version is perfect.
> > My point is, we can provide a simple mechanism to allow other way, and under
> the same unified API.
> 
> Again, sorry, I'm not perfectly sure I understand what you are saying.
> 
> What you describe (mask packet data, then compare with a value) seems
> very similar to what ovs does. Do you mean we should have an API for that?
[LC] No. Sorry to make you confused.
If there's one function can well detect all kinds of packet in low cost, it's perfect.
But from case to case, the packet detection interest is difficult(IPDC, wireless, metro Ethernet and etc).
Considering the possible tradeoff of performance and completeness, to
allow dedicated parser tuned for special purpose is an alternative way.

> 
> I think once we have masked+compared the data, we may know much more
> than just a packet_type.
[LC] Detection packet layer by layer is the normal way. In some case, it doesn't have to.
For example, we assume there's one network using VXLAN-GPE..
To detect the packet layer by layer, need to walk through two step, UDP Port and VXLAN NP.
In fact, UDP+VXLAN(16B) as a whole to compare mask once, you can know it's a VXLAN w/ inner Ethernet or not.

Probably it's not a perfect cases. SW Parser is not a low cost stuff, from cases to cases, if there are some special, it has potential space to optimize. One possible pseudo code as below.

struct rte_ptype_parser {
	char name[128];
	uint32_t (*get_ptype)(const struct rte_mbuf *m, void *hdr_lens);
};

struct rte_ptype_parser def_parser = 
{
	.name = "ipdc"; 
	.get_ptype = ipdc_get_ptype;
};

uint32_t rte_pktmbuf_get_ptype(const struct rte_mbuf *m,
		void *hdr_lens)
{
	struct rte_ptype_parser parser = def_parser;
	
	[...]
	parser->get_ptype(m, hdr_lens);
	[...]
}

/* scheme for ipdc */
struct ipdc_hdr_lens {
	uint8_t l2_len;
	uint8_t l3_len;
	uint8_t l4_len;
	uint8_t tunnel_len;
	uint8_t inner_l2_len;
	uint8_t inner_l3_len;
	uint8_t inner_l4_len;
};
uint32_t ipdc_get_ptype(const struct rte_mbuf *m, void *hdr_lens)
{
	struct ipdc_hdr_lens ihl = (struct ipdc_hdr_lens*)hdr_lens;

	/* parser logic optimized for typical IP datacenter packet */
	[...]
}

/* scheme for l2mpls */
struct l2mpls_hdr_lens {
	uint8_t l2_len;
	uint8_t mpls_len;            /* total length for multi-layer */
	uint8_t inner_l2_len;
	uint8_t inner_l3_len;
};
uint32_t l2mpls_get_ptype(const struct rte_mbuf *m, void *hdr_lens)
{
	struct l2mpls_hdr_lens ihl = (struct l2mpls_hdr_lens*)hdr_lens;

	/* parser logic optimized for typical L2MPLS */
	[...]
}

> 
> 
> 
> >
> >> Actually, we will never support a tons of protocols since each layer
> >> packet type is 4 bits, and since it requires that at least one hw
> >> supports it.
> > [LC] Agree, it is today. But maybe dynamic in future, packet type definition as a
> template.
> >>
> >> As described in the cover letter, the 2 main goals of this patchset are
> >> to provide a reference implementation for packet type recognition, and
> >> enable the support of virtio offloads (I'll send the patchset soon).
> >> This function is adapted to these 2 usages. Are you thinking of another
> >> use-case that would not be covered?
> > [LC] That's excellent work.  Furthermore I believe it can cover all ethdev actually.
> > When HW can't report some demand packet type, then fallback to your SW
> parser version.
> > If the auto-switch can be transparent, that's perfect. Maybe rx callback and
> update ptype in mbuf?
> 
> I was also thinking about calling rte_pktmbuf_get_ptype() from a driver.
> I think drivers should not access to mbuf data if it's not absolutely
> required.
> Calling rte_pktmbuf_get_ptype() from inside a rx callback seems easily
> feasible, it may be useful for applications that mostly relies on
> packet_type to select an action.
> 
> 
> Regards,
> Olivier


More information about the dev mailing list