[PATCH V2 1/4] net/bonding: fix non-active slaves aren't stopped

Ferruh Yigit ferruh.yigit at xilinx.com
Tue May 3 21:04:33 CEST 2022


On 5/3/2022 7:54 AM, Min Hu (Connor) wrote:
> Hi, Ferruh,
> 
> 在 2022/4/29 21:31, Ferruh Yigit 写道:
>> On 4/29/2022 7:45 AM, Min Hu (Connor) wrote:
>>> Hi, Ferruh,
>>>
>>> 在 2022/4/27 2:19, Ferruh Yigit 写道:
>>>> On 3/24/2022 3:00 AM, Min Hu (Connor) wrote:
>>>>> From: Huisong Li <lihuisong at huawei.com>
>>>>>
>>>>> When stopping a bonded port, all slaves should be deactivated. But 
>>>>> only
>>>>
>>>> s/deactivated/stopped/ ?
>>> not agreed. deactivated and stopped are different state for slave.
>>>
>>
>> Just to clarify the sentences, otherwise I see the 'stopped' and 
>> 'deactivated' states are different.
>> Next sentences complains that not all ports are stopped: "But only 
>> active slaves are stopped.", so I thought intention in this sentences 
>> to claim that all slaves should be stopped (but it mentions all slaves 
>> should be 'deactivated').
>> As long as you address the disconnection between two sentences, I 
>> don't mind the wording.
> Actually, there is something wrong with the wording.
> Yes, I should take your advice.
> 
>>
>>>>
>>>>> active slaves are stopped. So fix it and do "deactivae_slave()" for 
>>>>> active
>>>>
>>>> s/deactivae_slave()/deactivate_slave()/
>>>>
>>> agreed.
>>>
>>>>> slaves.
>>>>
>>>> Hi Connor,
>>>>
>>>> When a bonding port is closed, is it clear if all slave ports or 
>>>> active slave ports should be stopped?
>>> Yes, I think all the slave ports should be stopped(or try to be 
>>> stopped).
>>>>
>>>>>
>>>>> Fixes: 0911d4ec0183 ("net/bonding: fix crash when stopping mode 4 
>>>>> port")
>>>>> Cc: stable at dpdk.org
>>>>>
>>>>> Signed-off-by: Huisong Li <lihuisong at huawei.com>
>>>>> Signed-off-by: Min Hu (Connor) <humin29 at huawei.com>
>>>>> ---
>>>>>   drivers/net/bonding/rte_eth_bond_pmd.c | 20 +++++++++++---------
>>>>>   1 file changed, 11 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c 
>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index b305b6a35b..469dc71170 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -2118,18 +2118,20 @@ bond_ethdev_stop(struct rte_eth_dev *eth_dev)
>>>>>       internals->link_status_polling_enabled = 0;
>>>>>       for (i = 0; i < internals->slave_count; i++) {
>>>>>           uint16_t slave_id = internals->slaves[i].port_id;
>>>>> +
>>>>> +        internals->slaves[i].last_link_status = 0;
>>>>> +        ret = rte_eth_dev_stop(slave_id);
>>>>> +        if (ret != 0) {
>>>>> +            RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>>> +                     slave_id);
>>>>> +            return ret;
>>>>
>>>> Should it return here or try to stop all ports?
>>>> What about to record the return status, but keep continue to stop 
>>>> all ports. And return error if any of the stop failed?
> Well, I am glad you have found something unreasaonable about 'stop'.
> Let us see API 'rte_eth_dev_stop'
> 
> rte_eth_dev_stop(dev)
> {
>      ....
>      dev->data->dev_started = 0;
>      ret = (*dev->dev_ops->dev_stop)(dev)
>      retur ret;
> }
> This is unreasaonable. No matter 'dev_ops->dev_stop' succeed or fail,
> the state 'dev_started ' will always set to be '0'.
> 
> But this does not only influence bonding device but other devices like
> eth dev or vdev.
> This is the bug in rte ethdev level. I will send another patch to fix
> it.
> 

Hi Connor,

I agree this is an issue in the API, cc'ed Andrew and Thomas.

I vaguely remember that "dev_started = 0" was required for some dev_ops, 
but not quite sure, let me check this.
At worst we can do as following to be sure:

   dev->data->dev_started = 0;
   ret = (*dev->dev_ops->dev_stop)(dev)
   if (ret)
     dev->data->dev_started = 1;

Also we need to clarify in the API documentation (.h file), what is the 
status of the device if 'rte_eth_dev_stop()' returned error.


Btw, would you be OK to separate this ethdev patch from your bonding 
patch, to not stuck your series because of ethdev one.


> 
>>> I think no need to do this. APP only see the bonded device. If bonded
>>> device stop failed, it means it works failed. And the number of 
>>> "stopped" successfully slave does not make any sense.
>>>
>>
>> OK if trying to stop as much as possible 'slave' devices doesn't make 
>> sense, we can keep as it is.
>>
>> Btw, when functions fails at this point, bonding device itself already 
>> marked as stopped, right? And some of the slave devices may be stopped 
>> already before failure.
>> I don't know how confusing this is for the user, that stop() function 
>> is failed but bonding device state is 'stopped'. I don't know if 
>> function should recover at least bonding device status (back to 
>> started) on failure, what do you think?
>>
>>>>
>>>>> +        }
>>>>> +
>>>>> +        /* active slaves need to deactivate. */
>>>>
>>>> " active slaves need to be deactivated. " ?
>>> agreed.
>>>>
>>>>>           if (find_slave_by_id(internals->active_slaves,
>>>>>                   internals->active_slave_count, slave_id) !=
>>>>> -                        internals->active_slave_count) {
>>>>> -            internals->slaves[i].last_link_status = 0;
>>>>> -            ret = rte_eth_dev_stop(slave_id);
>>>>> -            if (ret != 0) {
>>>>> -                RTE_BOND_LOG(ERR, "Failed to stop device on port %u",
>>>>> -                         slave_id);
>>>>> -                return ret;
>>>>> -            }
>>>>> +                internals->active_slave_count)
>>>>
>>>> I think original indentation for this line is better.
>>>>
>>> agreed.
>>>>>               deactivate_slave(eth_dev, slave_id);
>>>>> -        }
>>>>>       }
>>>>>       return 0;
>>>>
>>>> .
>>
>> .



More information about the stable mailing list