[dpdk-dev] [PATCH v6 1/3] ethdev: support device reset and recovery events
Ferruh Yigit
ferruh.yigit at intel.com
Thu Feb 18 16:32:45 CET 2021
On 10/12/2020 9:09 AM, Andrew Rybchenko wrote:
> On 10/12/20 12:29 AM, Thomas Monjalon wrote:
>> 09/10/2020 05:48, Kalesh A P:
>>> From: Kalesh AP <kalesh-anakkur.purayil at broadcom.com>
>>>
>>> Adding support for device reset and recovery events in the
>>> rte_eth_event framework. FW error and FW reset conditions would be
>>> managed internally by PMD without needing application intervention.
>>> In such cases, PMD would need reset/recovery events to notify application
>>> that PMD is undergoing a reset.
>>>
>>> Signed-off-by: Somnath Kotur <somnath.kotur at broadcom.com>
>>> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil at broadcom.com>
>>> Reviewed-by: Ajit Khaparde <ajit.khaparde at broadcom.com>
>>> Reviewed-by: Asaf Penso <asafp at nvidia.com>
>>
>> The ethdev maintainers are not Cc'ed.
>> Please use the option --cc-cmd devtools/get-maintainer.sh
>>
>>
>>> +Error recovery support
>>> +~~~~~~~~~~~~~~~~~~~~~~
>>> +
>>> +When the PMD detects a FW reset or error condition, it will try to recover
>>> +from the error without needing the application intervention. In such cases,
>>> +PMD would need events to notify the application that it is undergoing
>>> +an error recovery.
>>> +
>>> +The PMD will trigger RTE_ETH_EVENT_ERR_RECOVERING event to notify the
>>> +application that PMD detected a FW reset or FW error condition. PMD will
>>> +try to recover from the error by itself. Data path will be halted and
>>> +control path operations would fail during the recovery period.
>>> +
>>> +The PMD will trigger RTE_ETH_EVENT_RECOVERED event to notify the application
>>> +that the it has recovered from the error condition. Control path and data path
>>> +are up now. Since the device undergone a reset, flow rules offloaded prior to
>>> +the reset will be lost and the application has to recreate the rules again.
>
> What should be done if the state is not recoverable?
>
>>> diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethdev.h
>>> index 9759f13..9b4b015 100644
>>> --- a/lib/librte_ethdev/rte_ethdev.h
>>> +++ b/lib/librte_ethdev/rte_ethdev.h
>>> @@ -3207,6 +3207,23 @@ enum rte_eth_event_type {
>>> RTE_ETH_EVENT_DESTROY, /**< port is released */
>>> RTE_ETH_EVENT_IPSEC, /**< IPsec offload related event */
>>> RTE_ETH_EVENT_FLOW_AGED,/**< New aged-out flows is detected */
>>> + RTE_ETH_EVENT_ERR_RECOVERING,
>>> + /**< port recovering from an error
>>> + *
>>> + * PMD detected a FW reset or error condition.
>>> + * PMD will try to recover from the error.
>>> + * Data path will be halted and Control path operations
>>> + * would fail at this time.
>>> + */
>>
>> Does it mean the application has nothing to do when receiving this event?
>> I think the app should stop polling at least.
>>
>>> + RTE_ETH_EVENT_RECOVERED,
>>> + /**< port recovered from an error
>>> + *
>>> + * PMD has recovered from the error condition.
>>> + * Control path and Data path are up now.
>>> + * Since the device undergone a reset, flow rules
>>> + * offloaded prior to the reset will be lost and
>>> + * the application has to recreate the rules again.
>>> + */
>>
>> Please be more precise.
>> Should the app re-configure the port, setup the queues, start the port?
>>
>>
>
Hi Kalesh Anakkur,
The mechanics of notifying the application looks good, but the concerns seems
more about what application should do with this information.
PMD notifies the application on the FW/HW reset and pushes some
tasks/responsibilities to the application, but for this to be useful, these
tasks should be clear to application.
Think yourself in a situation that you are developing an application and you
received these events from a device that you don't know its internals, what will
you do?
Both Thomas and Andrew put cases that needs more clarification for application.
Can you please send a new version with those clarifications?
Thanks,
ferruh
More information about the dev
mailing list