[PATCH] app/testpmd: remove invalid ports when other process detach
Ferruh Yigit
ferruh.yigit at amd.com
Fri May 20 17:14:58 CEST 2022
On 5/20/2022 4:05 PM, Ferruh Yigit wrote:
> [CAUTION: External Email]
>
> On 3/2/2022 8:36 AM, fengchengwen wrote:
>> On 2022/3/2 16:26, Thomas Monjalon wrote:
>>> 02/03/2022 03:33, Chengwen Feng:
>>>> Start main and secondary process:
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>> --num-procs=2 --proc-id=0
>>>> ./dpdk-testpmd -a BDF0 -a BDF1 --proc-type=auto -- -i --rxq=8 --txq=8
>>>> --num-procs=2 --proc-id=1
>>>> Execute following command in main process:
>>>> port stop 0
>>>> port detach 0
>>>> Execute following command in secondary process:
>>>> set fwd mac
>>>> start
>>>> The secondary process will display:
>>>> Invalid port_id=0
>>>> telcore 19 called rx_pkt_burst for not ready port 0
>>>> stpmd> 8: [/lib64/libc.so.6(+0xdf600) [0xffff9e1dc600]]
>>>> 7: [/lib64/libpthread.so.0(+0x7c48) [0xffff9e28ac48]]
>>>> 6: [/usr/app/testpmd(eal_thread_loop+0x2c4) [0xb23574]]
>>>> 5: [/usr/app/testpmd() [0x9c21d8]]
>>>> 4: [/usr/app/testpmd() [0x9c2108]]
>>>> 3: [/usr/app/testpmd() [0x9b6cf0]]
>>>> 2: [/usr/app/testpmd() [0xad8620]]
>>>> 1: [/usr/app/testpmd(rte_dump_stack+0x20) [0xb1a130]]
>>>>
>>>> The root cause it that the secondary process has not removed invalid
>>>> ports when it processes RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why the ports are not removed?
>>
>
> It is referring to application (testpmd) level state, ethdev port is
> removed.
>
> Above mentioned problem is valid since testpmd secondary process support
> is added.
> When primary hot remove a device, it updates relevant application state
> too, but for secondary process ethdev removes device without secondary
> process updating its state.
>
> I agree that using ethdev events is appropriate for this case, since
> action originated from ethdev for secondary process, need a way to
> notify application about it.
> Another option can be checking and removing invalid ports in secondary
> before each forwarding start, but since we already have an event for
> destroy, using it looks good to me.
>
Closing port in the primary has the same problem, secondary process
state is not updated and crashes.
A high level design decision can be to reduce the application level
states and rely on ethdev states more, to reduce/remove this ethdev and
application state sync requirement,
@Aman, @Yuying, what do you think about this high level, long term goal?
(Like why there is a testpmd 'RTE_PORT_CLOSED' state, should be in ethdev?)
((BTW, why testpmd defined 'RTE_PORT_CLOSED' has 'RTE_' prefix, @Aman
can you do a cleanup for it, 'RTE_' -> 'TESTPMD_' ?))
>
>> Testpmd register function eth_event_callback to deal with DESTROY event,
>> currently it only assign ports[port_id].port_status with
>> RTE_PORT_CLOSED, it
>> doesn't update other global variables like nb_ports.
>>
>>>
>>>> This patch adds a delay remove invalid ports invoke when process the
>>>> RTE_ETH_EVENT_DESTROY event.
>>>
>>> Why do we need this delay?
>>
>> The remove_invalid_ports will scan rte_eth_devices[], when process the
>> DESTROY
>> event, the rte_eth_devices[x] still valid, so here we should a delay
>> logic.
>>
>
> There is a dependency problem in the DESTROY event.
>
> We need some ethdev information to be able to deliver the event, so
> ethdev can't be completely destroyed until DESTROY event is processed.
> Which means when application received DESTROY event, ethdev is not fully
> destroyed yet.
>
> Except from above, 'remove_invalid_ports()' is called twice for primary
> process (as mentioned in commit log), this timer can be helping these
> two calls not conflict, but I am not sure if we can rely on a time for
> this.
> Since the problem is mainly for secondary process, assuming control
> commands like detaching a device will be called by primary process, so
> @Feng what do you think about adding a secondary process check to in the
> 'remove_invalid_ports_callback()' call? So 'remove_invalid_ports()' is
> called only once for primary process.
>
>>>
>>> [...]
>>>> +static void
>>>> +remove_invalid_ports_callback(void *arg)
>>>> +{
>>>> + RTE_SET_USED(arg);
>>>> + remove_invalid_ports();
>>>> +}
>>> [...]
>>>> case RTE_ETH_EVENT_DESTROY:
>>>> ports[port_id].port_status = RTE_PORT_CLOSED;
>>>> printf("Port %u is closed\n", port_id);
>>>> + if (rte_eal_alarm_set(100000,
>>>> remove_invalid_ports_callback,
>>>> + (void *)(intptr_t)port_id))
>>>> + fprintf(stderr,
>>>> + "Could not set up deferred device
>>>> released\n");
>>>> break;
>>>
>>>
>>>
>>>
>>> .
>>>
>>
>
More information about the dev
mailing list