[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine

Olivier MATZ olivier.matz at 6wind.com
Tue Jan 20 13:39:12 CET 2015


Hi,

On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote:
>>>> I think a good definition would
>>>>     be:
>>>>
>>>>     Packet is IPv4. This flag must be set when using any offload
>>>>     feature (TSO, L3 or L4 checksum) to tell the NIC that the packet
>>>>     is an IPv4 packet.
>>>>
>>>>     That's why I added PKT_TX_IPV4 in the examples.
>>>
>>> I suppose we discussed it several times: both ways are possible.
>>>  From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM
>>> As mutually exclusive seems a bit more plausible.
>>>  From the upper layer - my understanding, that it is doesn't really matter.
>>> I thought we had an agreement about it in 1.8, no?
>>
>> Indeed, this was already discussed, but there was a lot of pressure
>> for 1.8.0 to push something, even not perfect. The fog around comments
>> shows that the API was not very clearly defined for 1.8.0. If you read
>> the comments of the API, it is impossible to understand when the
>> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say
>> more: the only place where the comments bring a valuable information
>> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and
>> PKT_TX_IP_CSUM are not exclusive...
>>
>> So I will fix that in my coming patch series. Just for information,
>> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not
>> exclusive flag would not require any change anywhere in the PMDs (even
>> in i40e).
>
> Right now - no.
> Though as I said from PMD perspective having them exclusive is a bit preferable.
> Again, I don't see any big difference from upper layer code.

Sure, it does not make a big difference in terms of code. But
in terms of API, the naming of the flag is coherent to what it is
used for. And it's easier to find a simple definition, like:

  * Packet is IPv4. This flag must be set when using any offload feature
  * (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
  * packet.

>> On the contrary, making them exclusive would require to
>> change the ixgbe TSO code because we check.
>
> Hmm, so you are saying there is a bug somewhere  in ixbe_rxtx.c?
> What particular place you are talking about?

Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not
PKT_TX_IPV4 as I thought), so it would work for both methods without
patching the code.

In this case, it means that both approach would not require to
modify the code.

>>>>     *Problem 3*: without using the word "fortville", it is difficult
>>>>     to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed,
>>>>     once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a
>>>>     tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT
>>>>     flag. In linux, the driver doesn't care about the tunnel type,
>>>>     it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6].
>>>
>>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set,
>>> but it is not obvious what type of tunnelling it would be.
>>> FVL HW supports HW TX offloads for different type of tunnelling and
>>> requires that SW provide information about tunnelling type.
>>>  From i40e datasheet:
>>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
>>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
>>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
>>> 10b - GRE tunneling header
>>> As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep
>>> PKT_TX_UDP_TUNNEL_PKT flag.
>>
>> As I've said: in linux, the driver doesn't care about the tunnel type,
>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations.
>
> Ok, and why it should be our problem?
> We have a lot of things done in a different manner then linux/freebsd kernel drivers,
> Why now it became a problem?

If linux doesn't need an equivalent flag for doing the same thing,
it probably means we don't need it either.

In a performance-oriented software like dpdk, having a flag that we
don't know what the hardware does with, that is not needed in other
drivers of the same harware, that makes the API harder to understand
could be a problem.

Another argument: if we can remove this flag, it would make the
testpmd commands reworkd proposed by Jijiang much more easy to
understand: only a new "csum parse-tunnel on|off" would be required,
and it can be explained in a few words.

I'll try to do some tests on a fortville NIC if I can find one. I'm
curious to see if we can transmit any encapsulation packet (ip in ip,
ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel).

We should avoid the need to specify the tunnel type in the OUTER
checksum API if we can, else it would limit us to specific
supported protocols.

>>>> I think the following cases should be *forbidden by the API*:
>>>>
>>>> case 9) calculate checksum of in_ip and in_tcp  (was case B.1 in [1])
>>>>
>>>>     mb->outer_l2_len = len(out_eth)
>>>>     mb->outer_l3_len = len(out_ip)
>>>>     mb->l2_len = len(out_udp + vxlan + in_eth)
>>>>     mb->l3_len = len(out_ip)
>>>>     mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
>>>>       PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
>>>>     set out_ip checksum to 0 in the packet
>>>>     set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
>>>>
>>>>     If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
>>>>     supported, but there is no reason to support it as there is
>>>>     already one way to do the same.
>>>>
>>>>     I think the driver should not even look at mb->outer_l2_len
>>>>     and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
>>>
>>> Why it should be forbidden?
>>> I admit it might be a bit slower than case 4),
>>> but I think absolutely legal way to setup HW offloads for inner L3/L4.
>>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
>>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
>>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
>>
>> I don't understand. The result in terms of hardware is exactly the
>> same than case 4). Why should we have 2 different ways for doing the
>> same thing?
>
> If HW supports that capability, why should we forbid it?
> Let user to choose himself what way to use.
> FVL spec lists it as a valid approach.

It is not a hardware feature. Case 4) and case 9) would fill the
hardware registers exactly the same. To me, it's just an API question.

> As one of possible use-cases:  HW VLAN tags insertion for both inner and outer packets.
> FVL can do that, though as I know our PMD doesn't implement it yet.
> For that, we'll need to specify at least:
> outer_l2_len, outer_l3_len, l2_len.
> While PKT_TX_OUTER_* might stay cleared.

If a VLAN flag has to be inserted in outer header, a new flag
PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
would still be correct:

   The driver should look at mb->outer_lX_len only if a
   PKT_TX_OUTER_* flag is present.

>> This is really confusing for an API. Moreover, you said
>> it: it is slower that case 4).
>
> I don't know would be slower then 4) or not for sure.
> That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1.
> Though I didn't measure the difference.
> That's actually one more reason why to allow and support it -
> so people can make sure that on FVL both ways work as expected and measure the difference.
>
> Konstantin


Regards,
Olivier



More information about the dev mailing list