[dpdk-dev] [PATCH v3 0/3] enhance TX checksum command and csum forwarding engine
Olivier MATZ
olivier.matz at 6wind.com
Tue Jan 20 13:39:12 CET 2015
Hi,
On 01/20/2015 02:12 AM, Ananyev, Konstantin wrote:
>>>> I think a good definition would
>>>> be:
>>>>
>>>> Packet is IPv4. This flag must be set when using any offload
>>>> feature (TSO, L3 or L4 checksum) to tell the NIC that the packet
>>>> is an IPv4 packet.
>>>>
>>>> That's why I added PKT_TX_IPV4 in the examples.
>>>
>>> I suppose we discussed it several times: both ways are possible.
>>> From PMD perspective - treating PKT_TX_IPV4 and PKT_TX_IP_CSUM
>>> As mutually exclusive seems a bit more plausible.
>>> From the upper layer - my understanding, that it is doesn't really matter.
>>> I thought we had an agreement about it in 1.8, no?
>>
>> Indeed, this was already discussed, but there was a lot of pressure
>> for 1.8.0 to push something, even not perfect. The fog around comments
>> shows that the API was not very clearly defined for 1.8.0. If you read
>> the comments of the API, it is impossible to understand when the
>> PKT_TX_IPV4 or PKT_TX_IP_CSUM flags must be set. I would even say
>> more: the only place where the comments bring a valuable information
>> (L4 checksum and TSO) describe the case where PKT_TX_IPV4 and
>> PKT_TX_IP_CSUM are not exclusive...
>>
>> So I will fix that in my coming patch series. Just for information,
>> I'm pretty sure that having PKT_TX_IPV4 and PKT_TX_IP_CSUM as not
>> exclusive flag would not require any change anywhere in the PMDs (even
>> in i40e).
>
> Right now - no.
> Though as I said from PMD perspective having them exclusive is a bit preferable.
> Again, I don't see any big difference from upper layer code.
Sure, it does not make a big difference in terms of code. But
in terms of API, the naming of the flag is coherent to what it is
used for. And it's easier to find a simple definition, like:
* Packet is IPv4. This flag must be set when using any offload feature
* (TSO, L3 or L4 checksum) to tell the NIC that the packet is an IPv4
* packet.
>> On the contrary, making them exclusive would require to
>> change the ixgbe TSO code because we check.
>
> Hmm, so you are saying there is a bug somewhere in ixbe_rxtx.c?
> What particular place you are talking about?
Sorry, I spoke too fast. In TSO code, we check PKT_TX_IP_CKSUM (and not
PKT_TX_IPV4 as I thought), so it would work for both methods without
patching the code.
In this case, it means that both approach would not require to
modify the code.
>>>> *Problem 3*: without using the word "fortville", it is difficult
>>>> to understand the goal of the flag PKT_TX_UDP_TUNNEL_PKT. Indeed,
>>>> once PKT_TX_OUTER_IPV4/6 is set, it looks obvious that it's a
>>>> tunnel packet. I suggest to remove the PKT_TX_UDP_TUNNEL_PKT
>>>> flag. In linux, the driver doesn't care about the tunnel type,
>>>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations [6].
>>>
>>> It might be obvious that it is a tunnel packet from PKT_TX_OUTER_* is set,
>>> but it is not obvious what type of tunnelling it would be.
>>> FVL HW supports HW TX offloads for different type of tunnelling and
>>> requires that SW provide information about tunnelling type.
>>> From i40e datasheet:
>>> L4TUNT L4 Tunneling Type (Teredo / GRE header / VXLAN header) indication:
>>> 00b - No UDP / GRE tunneling (field must be set to zero if EIPT equals to zero)
>>> 01b - UDP tunneling header (any UDP tunneling, VXLAN and Geneve).
>>> 10b - GRE tunneling header
>>> As we do plan to support other than UDP tunnelling types, I suppose we'll need to keep
>>> PKT_TX_UDP_TUNNEL_PKT flag.
>>
>> As I've said: in linux, the driver doesn't care about the tunnel type,
>> it always set I40E_TXD_CTX_UDP_TUNNELING for all encapsulations.
>
> Ok, and why it should be our problem?
> We have a lot of things done in a different manner then linux/freebsd kernel drivers,
> Why now it became a problem?
If linux doesn't need an equivalent flag for doing the same thing,
it probably means we don't need it either.
In a performance-oriented software like dpdk, having a flag that we
don't know what the hardware does with, that is not needed in other
drivers of the same harware, that makes the API harder to understand
could be a problem.
Another argument: if we can remove this flag, it would make the
testpmd commands reworkd proposed by Jijiang much more easy to
understand: only a new "csum parse-tunnel on|off" would be required,
and it can be explained in a few words.
I'll try to do some tests on a fortville NIC if I can find one. I'm
curious to see if we can transmit any encapsulation packet (ip in ip,
ip in gre, eth in gre, eth in vxlan, or even a proprietary tunnel).
We should avoid the need to specify the tunnel type in the OUTER
checksum API if we can, else it would limit us to specific
supported protocols.
>>>> I think the following cases should be *forbidden by the API*:
>>>>
>>>> case 9) calculate checksum of in_ip and in_tcp (was case B.1 in [1])
>>>>
>>>> mb->outer_l2_len = len(out_eth)
>>>> mb->outer_l3_len = len(out_ip)
>>>> mb->l2_len = len(out_udp + vxlan + in_eth)
>>>> mb->l3_len = len(out_ip)
>>>> mb->ol_flags |= PKT_TX_IPV4 | PKT_TX_UDP_TUNNEL_PKT | \
>>>> PKT_TX_IP_CSUM | PKT_TX_UDP_CKSUM;
>>>> set out_ip checksum to 0 in the packet
>>>> set out_udp checksum to pseudo header using rte_ipv4_phdr_cksum()
>>>>
>>>> If we remove the flag PKT_TX_UDP_TUNNEL_PKT, this cannot be
>>>> supported, but there is no reason to support it as there is
>>>> already one way to do the same.
>>>>
>>>> I think the driver should not even look at mb->outer_l2_len
>>>> and mb->outer_l3_len if no flag PKT_TX_OUTER_* is set.
>>>
>>> Why it should be forbidden?
>>> I admit it might be a bit slower than case 4),
>>> but I think absolutely legal way to setup HW offloads for inner L3/L4.
>>> As I said we need a PKT_TX_UDP_TUNNEL_PKT anyway, so I suppose
>>> PKT_TX_*_TUNNEL_PKT should be an indication is it a tunnel packet or not.
>>> PKT_TX_OUTER_* flags indicate does outer cksum offload is required or not.
>>
>> I don't understand. The result in terms of hardware is exactly the
>> same than case 4). Why should we have 2 different ways for doing the
>> same thing?
>
> If HW supports that capability, why should we forbid it?
> Let user to choose himself what way to use.
> FVL spec lists it as a valid approach.
It is not a hardware feature. Case 4) and case 9) would fill the
hardware registers exactly the same. To me, it's just an API question.
> As one of possible use-cases: HW VLAN tags insertion for both inner and outer packets.
> FVL can do that, though as I know our PMD doesn't implement it yet.
> For that, we'll need to specify at least:
> outer_l2_len, outer_l3_len, l2_len.
> While PKT_TX_OUTER_* might stay cleared.
If a VLAN flag has to be inserted in outer header, a new flag
PKT_TX_OUTER_INSERT_VLAN would be added. So my specification
would still be correct:
The driver should look at mb->outer_lX_len only if a
PKT_TX_OUTER_* flag is present.
>> This is really confusing for an API. Moreover, you said
>> it: it is slower that case 4).
>
> I don't know would be slower then 4) or not for sure.
> That's my guess, based on the fact that for 9) we need to fill 2 descriptors, while fro 4) - only 1.
> Though I didn't measure the difference.
> That's actually one more reason why to allow and support it -
> so people can make sure that on FVL both ways work as expected and measure the difference.
>
> Konstantin
Regards,
Olivier
More information about the dev
mailing list