[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

Vlad Zolotarov vladz at cloudius-systems.com
Sun Sep 13 14:24:59 CEST 2015



On 09/13/15 14:47, Ananyev, Konstantin wrote:
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Avi Kivity
>> Sent: Friday, September 11, 2015 6:48 PM
>> To: Thomas Monjalon; Vladislav Zolotarov; didier.pallard
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598
>>
>> On 09/11/2015 07:08 PM, Thomas Monjalon wrote:
>>> 2015-09-11 18:43, Avi Kivity:
>>>> On 09/11/2015 06:12 PM, Vladislav Zolotarov wrote:
>>>>> On Sep 11, 2015 5:55 PM, "Thomas Monjalon" <thomas.monjalon at 6wind.com
>>>>> <mailto:thomas.monjalon at 6wind.com>> wrote:
>>>>>> 2015-09-11 17:47, Avi Kivity:
>>>>>>> On 09/11/2015 05:25 PM, didier.pallard wrote:
>>>>>>>> Hi vlad,
>>>>>>>>
>>>>>>>> Documentation states that a packet (or multiple packets in transmit
>>>>>>>> segmentation) can span any number of
>>>>>>>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH
>>>>>>>> minus 2.
>>>>>>>>
>>>>>>>> Shouldn't there be a test in transmit function that drops
>>>>> properly the
>>>>>>>> mbufs with a too large number of
>>>>>>>> segments, while incrementing a statistic; otherwise transmit
>>>>> function
>>>>>>>> may be locked by the faulty packet without
>>>>>>>> notification.
>>>>>>>>
>>>>>>> What we proposed is that the pmd expose to dpdk, and dpdk expose
>>>>> to the
>>>>>>> application, an mbuf check function.  This way applications that can
>>>>>>> generate complex packets can verify that the device will be able to
>>>>>>> process them, and applications that only generate simple mbufs can
>>>>> avoid
>>>>>>> the overhead by not calling the function.
>>>>>> More than a check, it should be exposed as a capability of the port.
>>>>>> Anyway, if the application sends too much segments, the driver must
>>>>>> drop it to avoid hang, and maintain a dedicated statistic counter to
>>>>>> allow easy debugging.
>>>>> I agree with Thomas - this should not be optional. Malformed packets
>>>>> should be dropped. In the icgbe case it's a very simple test - it's a
>>>>> single branch per packet so i doubt that it could impose any
>>>>> measurable performance degradation.
>>>> A drop allows the application no chance to recover.  The driver must
>>>> either provide the ability for the application to know that it cannot
>>>> accept the packet, or it must fix it up itself.
>>> I have the feeling that everybody agrees on the same thing:
>>> the application must be able to make a well formed packet by checking
>>> limitations of the port. What about a field rte_eth_dev_info.max_tx_segs?
>> It is not generic enough.  i40e has a limit that it imposes post-TSO.
>>
>>
>>> In case the application fails in its checks, the driver must drop it and
>>> notify the user via a stat counter.
>>> The driver can also remove the hardware limitation by gathering the segments
>>> but it may be hard to implement and would be a slow operation.
>> I think that to satisfy both the 64b full line rate applications and the
>> more complicated full stack applications, this must be made optional.
>> In particular, and application that only forwards packets will never hit
>> a NIC's limits, so it need not take any action. That's why I think a
>> verification function is ideal; a forwarding application can ignore it,
>> and a complex application can call it, and if it fails the packet, it
>> can linearize it itself, removing complexity from dpdk itself.
> I think that's a good approach to that problem.
> As I remember we discussed something similar a while ago -
> A function (tx_prep() or something) that would check nb_segs and probably some other HW specific restrictions,
> calculate pseudo-header checksum, reset ip header len, etc.
>
>  From other hand we also can add two more fields into rte_eth_dev_info:
> 1) Max num of segs per TSO packet (tx_max_seg ?).
> 2) Max num of segs per single packet/TSO segment (tx_max_mtu_seg ?).
> So for ixgbe both will have value 40 - wthresh,
> while for i40e 1) would be UINT8_MAX and 2) will be 8.
> Then upper layer can use that information to select an optimal size for its TX buffers.

HW limitations differ from HW to HW not only by values but also by their 
nature - for instance for Qlogic bnx2x NICs the limitations may not be 
expressed in the values above so this must be a callback.

>   
> Konstantin
>



More information about the dev mailing list