[dpdk-dev] [PATCH 01/17] mbuf: add definitions of unified packet types
Olivier MATZ
olivier.matz at 6wind.com
Mon Feb 2 12:18:16 CET 2015
Hi Helin,
On 02/02/2015 02:43 AM, Zhang, Helin wrote:
>>> +/*
>>> + * Sixteen bits are divided into several fields to mark packet types.
>>> +Note that
>>> + * each field is indexical.
>>> + * - Bit 3:0 is for tunnel types.
>>> + * - Bit 7:4 is for L3 or outer L3 (for tunneling case) types.
>>> + * - Bit 10:8 is for L4 types. It can also be used for inner L4 types for
>>> + * tunneling packets.
>>> + * - Bit 13:11 is for inner L3 types.
>>> + * - Bit 15:14 is reserved.
>>
>> Is there a reason why using this specific order?
> Yes, to support ixgbe Vector PMD, outer L3 types and L4 types need to be contiguous
> and in this order.
When you say "need to be", do you mean it's impossible to do in another
manner or just that it would be slower?
>> Also, there are 4 bits for outer L3 types and 3 bits for inner L3 types, but both of
>> them have 6 different supported types. Is it intentional?
> Yes, it is to support ixgbe Vector PMD. Contiguous 7 bits are needed, though 1 bit wasted.
To be honnest, I'm always a surprised that in dpdk we prefer having
a strange API just because it's faster or easier to do on one
specific driver (usually i40e or ixgbe). Unfortunately, trying to
optimize the API for one driver may result in making the rest of the
code (application and other drivers) slower and more complex.
In your proposition, there is no inner l4_type. I consider it's as
useful as the other fields. From what I see, there are only 2 bits
left. What do you think about changing the packet type to 64 bits now?
>From an API point of view, I think it would be good to have the
same structure for inner and outer types. For instance (this is
just an example):
union layer_pkt_type {
struct {
uint16_t l2_type:4;
uint16_t l3_type:4;
uint16_t l4_type:4;
uint16_t tun_type:4;
};
uint16_t u16;
};
struct pkt_type {
union layer_pkt_type outer;
union layer_pkt_type inner;
};
When your application decapsulates tunnels, you can just do
outer = inner and enter into the same code.
>>> + * RTE_PTYPE_L3_IPV6, RTE_PTYPE_L3_IPV6_EXT, RTE_PTYPE_L4_TCP,
>>> +RTE_PTYPE_L4_UDP
>>> + * and RTE_PTYPE_L4_SCTP should be kept as below in a contiguous 7 bits.
>>> + *
>>> + * Note that L3 types values are selected for checking IPV4/IPV6
>>> +header from
>>> + * performance point of view. Reading annotations of
>>> +RTE_ETH_IS_IPV4_HDR and
>>> + * RTE_ETH_IS_IPV6_HDR is needed for any future changes of L3 type
>> values.
>>> + */
>>> +#define RTE_PTYPE_UNKNOWN 0x0000 /*
>> 0b0000000000000000 */
>>> +/* bit 3:0 for tunnel types */
>>> +#define RTE_PTYPE_TUNNEL_IP 0x0001 /*
>> 0b0000000000000001 */
>>> +#define RTE_PTYPE_TUNNEL_TCP 0x0002 /*
>> 0b0000000000000010 */
>>> +#define RTE_PTYPE_TUNNEL_UDP 0x0003 /*
>> 0b0000000000000011 */
>>> +#define RTE_PTYPE_TUNNEL_GRE 0x0004 /*
>> 0b0000000000000100 */
>>> +#define RTE_PTYPE_TUNNEL_VXLAN 0x0005 /*
>> 0b0000000000000101 */
>>> +#define RTE_PTYPE_TUNNEL_NVGRE 0x0006 /*
>> 0b0000000000000110 */
>>> +#define RTE_PTYPE_TUNNEL_GENEVE 0x0007 /*
>> 0b0000000000000111 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT 0x0008 /*
>> 0b0000000000001000 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MAC 0x0009 /*
>> 0b0000000000001001 */
>>> +#define RTE_PTYPE_TUNNEL_GRENAT_MACVLAN 0x000a /*
>> 0b0000000000001010 */
>>> +#define RTE_PTYPE_TUNNEL_MASK 0x000f /*
>> 0b0000000000001111 */
>>> +/* bit 7:4 for L3 types */
>>> +#define RTE_PTYPE_L3_IPV4 0x0010 /*
>> 0b0000000000010000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT 0x0030 /*
>> 0b0000000000110000 */
>>> +#define RTE_PTYPE_L3_IPV6 0x0040 /*
>> 0b0000000001000000 */
>>> +#define RTE_PTYPE_L3_IPV4_EXT_UNKNOWN 0x0090 /*
>> 0b0000000010010000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT 0x00c0 /*
>> 0b0000000011000000 */
>>> +#define RTE_PTYPE_L3_IPV6_EXT_UNKNOWN 0x00e0 /*
>> 0b0000000011100000 */
>>> +#define RTE_PTYPE_L3_MASK 0x00f0 /*
>> 0b0000000011110000 */
>>
>> can we expect that when RTE_PTYPE_L3_IPV4, RTE_PTYPE_L3_IPV4_EXT or
>> RTE_PTYPE_L3_IPV4_EXT_UNKNOWN is set, the hardware also verified the
>> L3 checksum?
> RTE_PTYPE_L3_IPV4 means there is NONE-EXT. Each time only one of above 3 can be set.
> These bits don't indicate any checksum, checksum should be indicated by other flags.
> They are just for packet types hardware can recognized.
I think these 2 information are linked:
- if the hardware cannot recognize packet, it cannot calculate the
checksum because it does not know the packet type
- if the hardware can recognize the packet, it can verify that the
checksum is good or wrong
Today, we have:
- PKT_RX_IPV4_HDR and PKT_RX_IPV4_HDR_EXT to tell if the packet is
seen as IPv4 by the hw.
- We can suppose that:
- PKT_RX_IPV4_HDR(_EXT)=0 -> no hw checksum information
- PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=0 -> checksum
is correct
- PKT_RX_IPV4_HDR(_EXT)=1 and PKT_RX_IP_CKSUM_BAD=1 -> checksum
is not correct
- We cannot do the same with L4 because we have no L4 type info,
but it would be good to be able to do the same.
With your patch, you are removing the PKT_RX_IPV4_HDR and
PKT_RX_IPV4_HDR_EXT flags, but I think the above assumption
about checksum should be kept. As you are adding a L4 type
info, the same method could be applied to L4 checksums.
I think this would definitely solve the problem described by Stephen.
>> My understanding is:
>>
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 0
>> -> checksum was checked by hw and is good
>> - if packet_type is IPv4* and PKT_RX_IP_CKSUM_BAD is 1
>> -> checksum was checked by hw and is bad
>> - if packet_type is not IPv4*
>> -> checksum was not checked by hw
>>
>> I think it would solve the problem asked by Stephen
>> http://dpdk.org/ml/archives/dev/2015-January/011550.html
>>
>>> +/* bit 10:8 for L4 types */
>>> +#define RTE_PTYPE_L4_TCP 0x0100 /*
>> 0b0000000100000000 */
>>> +#define RTE_PTYPE_L4_UDP 0x0200 /*
>> 0b0000001000000000 */
>>> +#define RTE_PTYPE_L4_FRAG 0x0300 /*
>> 0b0000001100000000 */
>>> +#define RTE_PTYPE_L4_SCTP 0x0400 /*
>> 0b0000010000000000 */
>>> +#define RTE_PTYPE_L4_ICMP 0x0500 /*
>> 0b0000010100000000 */
>>> +#define RTE_PTYPE_L4_NONFRAG 0x0600 /*
>> 0b0000011000000000 */
>>> +#define RTE_PTYPE_L4_MASK 0x0700 /*
>> 0b0000011100000000 */
>>
>> Same question for L4.
>>
>> Note: it would means that if a hardware is able to recognize a TCP packet but
>> not to verify the checksum, it has to set RTE_PTYPE_L4 to unknown.
>>
>>> +/* bit 13:11 for inner L3 types */
>>> +#define RTE_PTYPE_INNER_L3_IPV4 0x0800 /*
>> 0b0000100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT 0x1000 /*
>> 0b0001000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6 0x1800 /*
>> 0b0001100000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV6_EXT 0x2000 /*
>> 0b0010000000000000 */
>>> +#define RTE_PTYPE_INNER_L3_IPV4_EXT_UNKNOWN 0x2800 /*
>>> +0b0010100000000000 */ #define
>> RTE_PTYPE_INNER_L3_IPV6_EXT_UNKNOWN 0x3000 /* 0b0011000000000000
>> */
> We cannot define the hardware behaviors, it just reports the hardware recognized
> packet information directly to the mbuf.
> Based on my experiment on i40e hardware, if a IPV4 packet with wrong checksum,
> by default, the PMD driver cannot see the packet at all. So we don't need to care
> about it too much!
I agree that the hardware reports some info that can be different
depending on the hw. But the role of the driver is to convert these info
into a common API with a well-defined behavior.
Regards,
Olivier
More information about the dev
mailing list