[dpdk-dev] [PATCH 1/3] app/testpmd: fix port status of active slave device

lihuisong (C) lihuisong at huawei.com
Tue Feb 8 02:19:45 CET 2022


在 2022/2/4 20:07, Ferruh Yigit 写道:
> On 10/25/2021 7:39 AM, Min Hu (Connor) wrote:
>> From: Huisong Li <lihuisong at huawei.com>
>>
>> Stopping a bond device also stops all active slaves under the bond 
>> device.
>> If this port is bond device, we need to modify the port status of all
>> slaves from RTE_PORT_STARTED to RTE_PORT_STOPPED.
>>
>> Fixes: 0e545d3047fe ("app/testpmd: check stopping port is not in 
>> bonding")
>> Cc: stable at dpdk.org
>>
>> Signed-off-by: Huisong Li <lihuisong at huawei.com>
>> Signed-off-by: Min Hu (Connor) <humin29 at huawei.com>
>> ---
>>   app/test-pmd/cmdline.c |  1 +
>>   app/test-pmd/testpmd.c | 49 +++++++++++++++++++++++++++++++++++++++---
>>   app/test-pmd/testpmd.h |  3 ++-
>>   3 files changed, 49 insertions(+), 4 deletions(-)
>>
>> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
>> index 722f4fb9d9..5bfb4b509b 100644
>> --- a/app/test-pmd/cmdline.c
>> +++ b/app/test-pmd/cmdline.c
>> @@ -6639,6 +6639,7 @@ static void 
>> cmd_create_bonded_device_parsed(void *parsed_result,
>>                   "Failed to enable promiscuous mode for port %u: %s 
>> - ignore\n",
>>                   port_id, rte_strerror(-ret));
>>   +        ports[port_id].bond_flag = 1;
>>           ports[port_id].need_setup = 0;
>>           ports[port_id].port_status = RTE_PORT_STOPPED;
>>       }
>> diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
>> index af0e79fe6d..d6b9ebc4dd 100644
>> --- a/app/test-pmd/testpmd.c
>> +++ b/app/test-pmd/testpmd.c
>> @@ -65,6 +65,9 @@
>>   #ifdef RTE_EXEC_ENV_WINDOWS
>>   #include <process.h>
>>   #endif
>> +#ifdef RTE_NET_BOND
>> +#include <rte_eth_bond.h>
>> +#endif
>>     #include "testpmd.h"
>>   @@ -2986,6 +2989,35 @@ start_port(portid_t pid)
>>       return 0;
>>   }
>>   +#ifdef RTE_NET_BOND
>> +static void
>> +change_bonding_active_slave_port_status(portid_t bond_pid)
>
> The function sets the status explicitly to PORT_STOPPED, but function
> name is more generic, should we update the function name to reflect the
> functionality?
ok
>
>> +{
>> +    portid_t slave_pids[RTE_MAX_ETHPORTS];
>> +    struct rte_port *port;
>> +    int num_active_slaves;
>> +    portid_t slave_pid;
>> +    int i;
>> +
>> +    num_active_slaves = rte_eth_bond_active_slaves_get(bond_pid, 
>> slave_pids,
>> +                               RTE_MAX_ETHPORTS);
>> +    if (num_active_slaves < 0) {
>> +        fprintf(stderr, "Failed to get slave list for port = %u\n",
>> +            bond_pid);
>> +        return;
>> +    }
>> +
>> +    for (i = 0; i < num_active_slaves; i++) {
>> +        slave_pid = slave_pids[i];
>> +        port = &ports[slave_pid];
>> +        if (rte_atomic16_cmpset(&(port->port_status),
>> +            RTE_PORT_STARTED, RTE_PORT_STOPPED) == 0)
>> +            fprintf(stderr, "Port %u can not be set into stopped\n",
>> +                slave_pid);
>> +    }
>> +}
>> +#endif
>> +
>>   void
>>   stop_port(portid_t pid)
>>   {
>> @@ -3042,9 +3074,20 @@ stop_port(portid_t pid)
>>           if (port->flow_list)
>>               port_flow_flush(pi);
>>   -        if (eth_dev_stop_mp(pi) != 0)
>> -            RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n",
>> -                pi);
>
> Can you please remove the 'eth_dev_stop_mp()' function in this patch,
> which is removed in patch 2/3.
ok
>
>> +        if (is_proc_primary()) {
>> +#ifdef RTE_NET_BOND
>> +            /*
>> +             * Stopping a bond device also stops all active slaves
>> +             * under the bond device. If this port is bond device,
>> +             * we need to modify the port status of all slaves.
>> +             */
>> +            if (port->bond_flag == 1)
>> +                change_bonding_active_slave_port_status(pi);
>> +#endif
>> +            if (rte_eth_dev_stop(pi) != 0)
>> +                RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port 
>> %u\n",
>> +                    pi);
>
> Should we roll back the slave port status if 'rte_eth_dev_stop(pi)' 
> fails?
Yes, it is necessary here for slaves to fail to execute dev_stop() in 
bonding driver.

Btw, in thinking about this, I find a behavior that is not very reasonable.
Namely, only active slaves are stopped when a bonding device is stopped.
It can cause confusion in port status. For example, applications have to 
only modify
active slaves status to RTE_PORT_STOPPED and non-active slaves status is 
still
RTE_PORT_STARTED.
I think the bonding PMD should stop all slaves when a bonding device is 
stopped.
I checked the modification history about this in the bonding PMD. This 
behavior is
introduced by the following patch.

/*
commit 0911d4ec01839c9149a0df5758d00d9d57a47cea
Author: Radu Nicolau <radu.nicolau at intel.com>
Date:   Thu Nov 8 15:26:42 2018 +0000

     net/bonding: fix crash when stopping mode 4 port

     When stopping a bonded port all slaves are deactivated. Attempting
     to deactivate a slave that was never activated will result in a 
segfault
     when mode 4 is used.

     Fixes: 7486331308f6 ("net/bonding: stop and deactivate slaves on stop")
     Cc: stable at dpdk.org

     Signed-off-by: Radu Nicolau <radu.nicolau at intel.com>
     Acked-by: Chas Williams <chas3 at att.com>
*/

The root cause of the problem the above patch mentioned is that in mode 4,
the bonding PMD does not allocate rx/tx rings to non-active slave devices.
The call stack is as follows:
#0  0x0000000000b1250c in rte_ring_dequeue_bulk_elem (available=0x0, 
n=1, esize=8, obj_table=0xffffffff7c80, r=0x0) at 
../dpdk-next-net/lib/ring/rte_ring_elem.h:380
#1  rte_ring_dequeue_elem (esize=8, obj_p=0xffffffff7c80, r=0x0) at 
../dpdk-next-net/lib/ring/rte_ring_elem.h:476
#2  rte_ring_dequeue (obj_p=0xffffffff7c80, r=0x0) at 
../dpdk-next-net/lib/ring/rte_ring.h:463
#3  bond_mode_8023ad_deactivate_slave (bond_dev=0x4753200 
<rte_eth_devices+33024>, slave_id=0) at 
../dpdk-next-net/drivers/net/bonding/rte_eth_bond_8023ad.c:1163
#4  0x0000000000b29e10 in deactivate_slave (eth_dev=0x4753200 
<rte_eth_devices+33024>, port_id=0) at 
../dpdk-next-net/drivers/net/bonding/rte_eth_bond_api.c:117
#5  0x0000000000b44208 in bond_ethdev_stop (eth_dev=0x4753200 
<rte_eth_devices+33024>) at 
../dpdk-next-net/drivers/net/bonding/rte_eth_bond_pmd.c:2103
#6  0x00000000007966fc in rte_eth_dev_stop (port_id=2) at 
../dpdk-next-net/lib/ethdev/rte_ethdev.c:1894
#7  0x000000000055ea60 in eth_dev_stop_mp (port_id=2) at 
../dpdk-next-net/app/test-pmd/testpmd.c:613
#8  0x0000000000565230 in stop_port (pid=2) at 
../dpdk-next-net/app/test-pmd/testpmd.c:3059
#9  0x00000000004f7614 in cmd_operate_specific_port_parsed 
(parsed_result=0xffffffff91b0, cl=0x4829250, data=0x0) at 
../dpdk-next-net/app/test-pmd/cmdline.c:1261
#10 0x000000000078be24 in cmdline_parse (cl=0x4829250, buf=0x4829298 
"port stop 2\n") at ../dpdk-next-net/lib/cmdline/cmdline_parse.c:290
#11 0x0000000000789c34 in cmdline_valid_buffer (rdl=0x4829260, 
buf=0x4829298 "port stop 2\n", size=13) at 
../dpdk-next-net/lib/cmdline/cmdline.c:26
#12 0x000000000078f160 in rdline_char_in (rdl=0x4829260, c=10 '\n') at 
../dpdk-next-net/lib/cmdline/cmdline_rdline.c:446
#13 0x000000000078a0c8 in cmdline_in (cl=0x4829250, buf=0xfffffffff2e7 
"\n", size=1) at ../dpdk-next-net/lib/cmdline/cmdline.c:148
#14 0x000000000078a3b4 in cmdline_interact (cl=0x4829250) at 
../dpdk-next-net/lib/cmdline/cmdline.c:222
#15 0x000000000050bf98 in prompt () at 
../dpdk-next-net/app/test-pmd/cmdline.c:18001
#16 0x00000000005687c4 in main (argc=4, argv=0xfffffffff510) at 
../dpdk-next-net/app/test-pmd/testpmd.c:4268

For the problem Radu encountered, we only need to ensure that
non-active slaves doesn't deactivate.
I plan to add a patch in this patchset to fix this problem.
What do you think, Ferruh?
>
>> +        }
>>             if (rte_atomic16_cmpset(&(port->port_status),
>>               RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0)
>> diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
>> index e3995d24ab..ad3b4f875c 100644
>> --- a/app/test-pmd/testpmd.h
>> +++ b/app/test-pmd/testpmd.h
>> @@ -237,7 +237,8 @@ struct rte_port {
>>       struct rte_eth_txconf tx_conf[RTE_MAX_QUEUES_PER_PORT+1]; /**< 
>> per queue tx configuration */
>>       struct rte_ether_addr   *mc_addr_pool; /**< pool of multicast 
>> addrs */
>>       uint32_t                mc_addr_nb; /**< nb. of addr. in 
>> mc_addr_pool */
>> -    uint8_t                 slave_flag; /**< bonding slave port */
>> +    uint8_t                 slave_flag : 1, /**< bonding slave port */
>> +                bond_flag : 1; /**< port is bond device */
>
> Can't we detect if the port is a bonding port without introducing a new
> variable/state?
The bonding device is also an ethdev. I do not find the external API that
can be used to detect whether a port is a bonding port.
>
>>       struct port_flow        *flow_list; /**< Associated flows. */
>>       struct port_indirect_action *actions_list;
>>       /**< Associated indirect actions. */
>
> .


More information about the dev mailing list