[PATCH] app/testpmd: remove invalid ports when other process detach

Ferruh Yigit ferruh.yigit at amd.com
Fri May 20 17:14:58 CEST 2022


On 5/20/2022 4:05 PM, Ferruh Yigit wrote:
> [CAUTION: External Email]
> 
> On 3/2/2022 8:36 AM, fengchengwen wrote:
>> On 2022/3/2 16:26, Thomas Monjalon wrote:
>>> 02/03/2022 03:33, Chengwen Feng:
>>>> Start main and secondary process:
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=0
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>>     --num-procs=2 --proc-id=1
>>>> Execute following command in main process:
>>>>     port stop 0
>>>>     port detach 0
>>>> Execute following command in secondary process:
>>>>     set fwd mac
>>>>     start
>>>> The secondary process will display:
>>>>     Invalid port_id=0
>>>>     telcore 19 called rx_pkt_burst for not ready port 0
>>>>     stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>>>     7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>>>     6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>>>     5: [/usr/app/testpmd() [0x9c21d8]]
>>>>     4: [/usr/app/testpmd() [0x9c2108]]
>>>>     3: [/usr/app/testpmd() [0x9b6cf0]]
>>>>     2: [/usr/app/testpmd() [0xad8620]]
>>>>     1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>>
>>>> The root cause it that the secondary process has not removed invalid
>>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why the ports are not removed?
>>
> 
> It is referring to application (testpmd) level state, ethdev port is
> removed.
> 
> Above mentioned problem is valid since testpmd secondary process support
> is added.
> When primary hot remove a device, it updates relevant application state
> too, but for secondary process ethdev removes device without secondary
> process updating its state.
> 
> I agree that using ethdev events is appropriate for this case, since
> action originated from ethdev for secondary process, need a way to
> notify application about it.
> Another option can be checking and removing invalid ports in secondary
> before each forwarding start, but since we already have an event for
> destroy, using it looks good to me.
> 

Closing port in the primary has the same problem, secondary process 
state is not updated and crashes.
A high level design decision can be to reduce the application level 
states and rely on ethdev states more, to reduce/remove this ethdev and 
application state sync requirement,
@Aman, @Yuying, what do you think about this high level, long term goal?

(Like why there is a testpmd 'RTE_PORT_CLOSED' state, should be in ethdev?)

((BTW, why testpmd defined 'RTE_PORT_CLOSED' has 'RTE_' prefix, @Aman 
can you do a cleanup for it, 'RTE_' -> 'TESTPMD_' ?))


> 
>> Testpmd register function eth_event_callback to deal with DESTROY event,
>> currently it only assign ports[port_id].port_status with 
>> RTE_PORT_CLOSED, it
>> doesn't update other global variables like nb_ports.
>>
>>>
>>>> This patch adds a delay remove invalid ports invoke when process the
>>>> RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why do we need this delay?
>>
>> The remove_invalid_ports will scan rte_eth_devices[], when process the 
>> DESTROY
>> event, the rte_eth_devices[x] still valid, so here we should a delay 
>> logic.
>>
> 
> There is a dependency problem in the DESTROY event.
> 
> We need some ethdev information to be able to deliver the event, so
> ethdev can't be completely destroyed until DESTROY event is processed.
> Which means when application received DESTROY event, ethdev is not fully
> destroyed yet.
> 
> Except from above, 'remove_invalid_ports()' is called twice for primary
> process (as mentioned in commit log), this timer can be helping these
> two calls not conflict, but I am not sure if we can rely on a time for 
> this.
> Since the problem is mainly for secondary process, assuming control
> commands like detaching a device will be called by primary process, so
> @Feng what do you think about adding a secondary process check to in the
> 'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is
> called only once for primary process.
> 
>>>
>>> [...]
>>>> +static void
>>>> +remove_invalid_ports_callback(void *arg)
>>>> +{
>>>> +   RTE_SET_USED(arg);
>>>> +   remove_invalid_ports();
>>>> +}
>>> [...]
>>>>     case RTE_ETH_EVENT_DESTROY:
>>>>             ports[port_id].port_status = RTE_PORT_CLOSED;
>>>>             printf("Port %u is closed\n", port_id);
>>>> +           if (rte_eal_alarm_set(100000, 
>>>> remove_invalid_ports_callback,
>>>> +                           (void *)(intptr_t)port_id))
>>>> +                   fprintf(stderr,
>>>> +                           "Could not set up deferred device 
>>>> released\n");
>>>>             break;
>>>
>>>
>>>
>>>
>>> .
>>>
>>
> 



More information about the dev mailing list