[dpdk-dev] [PATCH] net/mlx5: fix link state update

Benoit Ganne (bganne) bganne at cisco.com
Mon Mar 30 14:03:10 CEST 2020


Hi Matan,

>>>> mlx5 PMD refuses to update link state if link speed is defined but
>>>> status is down or if link speed is undefined but status is up, even
>>>> if the ioctl() succeeded.
>>>> This prevents application to detect link up/down event, especially
>>>> when the link speed is not correctly detected.
>>> Do you use the wait option? Or no wait?
>> We are using the no wait option.
> I suggest to call again if failed for N retries time.

Unfortunately it will not solve our problem: if link speed is undefined but the link is up then the test '!dev_link.link_speed && dev_link.link_status' at http://git.dpdk.org/dpdk/tree/drivers/net/mlx5/mlx5_ethdev.c#n899 will always be true and the function will always return EAGAIN.
This actually happens in Azure with CX4-Lx VFs.

>> What I meant was to let the app decide whether it should retry or not,
>> based on the data it gets.
>> Right now, the PMD *prevents* the app to get link state if the link
>> speed is undefined even if the app does not care about link speed.

> In mlx5 this is not the case, we have no one updated and second not -
> there are going together:
> You can see that we have 2 different system calls: 1 to get up\down and
> second to get link speed.
> If link speed doesn't appropriate to the link state it may say that
> something was changed between the calls and the link status we got from
> the first call is not correct anymore.
> In this case, we should call both calls again, that’s what we are doing in
> "nowait" option.
> If the user doesn't want "nowait" option, (means PMD is not allowed to
> take more time for response) he should call again when the callback failed
> in the time and retries manner the user prefers.

Ok, now I understand the logic behind the current behavior: the 2 syscalls being not atomics, you try to detect inconsistencies that way.
But if the link speed is undefined, then the state will never be correctly updated.
I still believe it is unnecessarily heavy-handed: in most networking application I have seen (and I have 2 examples of current shipping networking products), a missing link speed is not critical whereas link being reported as down means no traffic flowing.

Best
ben


More information about the dev mailing list