[PATCH v2] net/failsafe: link_update request crashing at boot

Ferruh Yigit ferruh.yigit at amd.com
Fri Apr 12 13:27:20 CEST 2024


On 10/17/2023 5:43 PM, Stephen Hemminger wrote:
> On Tue, 15 Feb 2022 22:16:28 +0530
> Vipul Ashri <vipul.ashri at oracle.com> wrote:
> 
>> On 2/14/2022 10:24 PM, Stephen Hemminger wrote:
>>> On Mon, 14 Feb 2022 13:09:19 +0000
>>> Vipul Ashri <vipul.ashri at oracle.com> wrote:
>>>  
>>>> PORT 0 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, driver_type = 16)
>>>>
>>>> PORT 0 is polling for link-change, interrupts disabled
>>>>
>>>> [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File exists  
>>> Looks like secondary process support doesn't work with the flow rules logic.
>>> Maybe after that you are into error paths that may not recover correctly??  
>> Thanks! Stephen for looking at my analysis,
>>
>> yes some hotplug synchronization issue between eal_intr_thread and primary
>> thread, but we are able to recover with this patch.
>>
>> Reason is this fail-safe flow is inside our custom added boot-time 
>> polling to
>> update DPDK stats and calling ifindex ioctl to get interface data. 
>> Ideally we
>> should not start polling so early. but moreover calling ifindex ioctl is 
>> generic
>> functionality and should not break failsafe. We added this patch and 
>> gracefully
>> prevented the so many multiple crashes.
>>
>> Setup details :
>> Azure testbed with Accelerated Networking(SRIOV) enabled, failsafe using 
>> tap +
>> mellanox driver.
> 
> I don't work for Azure anymore, so can't really test this.
> A short explanation why this patch is stalled.
> 
> It seems like this patch is trying to avoid a crash when an earlier problem
> occurred, it is ok to do that but the original problem is still there
> and the testing it is impossible without having modified application.
> For the normal user, this just adds more always true checks in the
> configuration path. Ok, but it does add clutter.
> 
> Since failsafe should be deprecated fixing this seems less relevant
> as well.
>

As we are at the beginning of a new release, it is cleanup time.

This patch is originally from 2021, firstly sorry for not able to
conclude it timely.


@Vipul, is this patch still valid? Are you still using failsafe actively?


@Gaetan, what is the status of the failsafe driver? I am aware it was
kind of temporary solution, is it still required or actively used?
We are not getting much failsafe patches, but when we do it is sometimes
taking time to have the review, should we seek for more help there, what
do you think?


@Vipul, @Geatan, instead of dragging this old patch, would you be OK if
I update it as change requested, and you send a fresh version on top of
latest code if it is still valid?


Thanks,
ferruh



More information about the stable mailing list