[PATCH v4 2/3] ethdev: add standby state for live migration
Ori Kam
orika at nvidia.com
Tue Jan 31 14:50:06 CET 2023
Hi Rongwei,
> -----Original Message-----
> From: Rongwei Liu <rongweil at nvidia.com>
> Sent: Wednesday, 18 January 2023 17:45
>
> When a DPDK application must be upgraded,
> the traffic downtime should be shortened as much as possible.
> During the migration time, the old application may stay alive
> while the new application is starting and being configured.
>
> In order to optimize the switch to the new application,
> the old application may need to be aware of the presence
> of the new application being prepared.
> This is achieved with a new API allowing the user to change the
> new application state to standby and active later.
>
> The added function is trying to apply the new state to all probed
> ethdev ports. To make this API simple and easy to use,
> the same flags have to be accepted by all devices.
>
> This is the scenario of operations in the old and new applications:
> . device: already configured by the old application
> . new: start as active
> . new: probe the same device
> . new: set as standby
> . new: configure the device
> . device: has configurations from old and new applications
> . old: clear its device configuration
> . device: has only 1 configuration from new application
> . new: set as active
> . device: downtime for connecting all to the new application
> . old: shutdown
>
> The active role means network handling configurations are programmed
> to the HW immediately, and no behavior changed. This is the default state.
> The standby role means configurations are queued in the HW.
> If there is no application with active role,
> any configuration is effective immediately.
>
> Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> ---
> doc/guides/rel_notes/release_23_03.rst | 7 ++++
> lib/ethdev/ethdev_driver.h | 20 +++++++++
> lib/ethdev/rte_ethdev.c | 42 +++++++++++++++++++
> lib/ethdev/rte_ethdev.h | 56 ++++++++++++++++++++++++++
> lib/ethdev/version.map | 3 ++
> 5 files changed, 128 insertions(+)
>
> diff --git a/doc/guides/rel_notes/release_23_03.rst
> b/doc/guides/rel_notes/release_23_03.rst
> index b8c5b68d6c..5367123f24 100644
> --- a/doc/guides/rel_notes/release_23_03.rst
> +++ b/doc/guides/rel_notes/release_23_03.rst
> @@ -55,6 +55,13 @@ New Features
> Also, make sure to start the actual text at the margin.
> =======================================================
>
> +* **Added process state in ethdev to improve live migration.**
> +
> + Hot upgrade of an application may be accelerated by configuring
> + the new application in standby state while the old one is still active.
> + Such double ethdev configuration of the same device is possible
> + with the added process state API.
> +
>
> Removed Items
> -------------
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index 6a550cfc83..4a098410d5 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -219,6 +219,23 @@ typedef int (*eth_dev_reset_t)(struct rte_eth_dev
> *dev);
> /** @internal Function used to detect an Ethernet device removal. */
> typedef int (*eth_is_removed_t)(struct rte_eth_dev *dev);
>
> +/**
> + * @internal
> + * Set the role of the process to active or standby during live migration.
> + *
> + * @param dev
> + * Port (ethdev) handle.
> + * @param standby
> + * Role active if false, standby if true.
> + * @param flags
> + * Role specific flags.
> + *
> + * @return
> + * Negative value on error, 0 on success.
> + */
> +typedef int (*eth_dev_process_set_role_t)(struct rte_eth_dev *dev,
> + bool standby, uint32_t flags);
> +
> /**
> * @internal
> * Function used to enable the Rx promiscuous mode of an Ethernet device.
> @@ -1186,6 +1203,9 @@ struct eth_dev_ops {
> /** Check if the device was physically removed */
> eth_is_removed_t is_removed;
>
> + /** Set role during live migration */
> + eth_dev_process_set_role_t process_set_role;
> +
> eth_promiscuous_enable_t promiscuous_enable; /**< Promiscuous
> ON */
> eth_promiscuous_disable_t promiscuous_disable;/**< Promiscuous
> OFF */
> eth_allmulticast_enable_t allmulticast_enable;/**< Rx multicast ON
> */
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 5d5e18db1e..3a1fb64053 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -558,6 +558,48 @@ rte_eth_dev_owner_get(const uint16_t port_id,
> struct rte_eth_dev_owner *owner)
> return 0;
> }
>
> +int
> +rte_eth_process_set_role(bool standby, uint32_t flags)
> +{
> + struct rte_eth_dev_info dev_info = {0};
> + struct rte_eth_dev *dev;
> + uint16_t port_id;
> + int ret = 0;
> +
> + /* Check if all devices support process role. */
> + RTE_ETH_FOREACH_DEV(port_id) {
> + dev = &rte_eth_devices[port_id];
> + if (*dev->dev_ops->process_set_role != NULL &&
> + *dev->dev_ops->dev_infos_get != NULL &&
> + (*dev->dev_ops->dev_infos_get)(dev, &dev_info) ==
> 0 &&
> + (dev_info.dev_capa &
> RTE_ETH_DEV_CAPA_PROCESS_ROLE) != 0)
> + continue;
> + rte_errno = ENOTSUP;
> + return -rte_errno;
> + }
> + /* Call the driver callbacks. */
> + RTE_ETH_FOREACH_DEV(port_id) {
> + dev = &rte_eth_devices[port_id];
> + if ((*dev->dev_ops->process_set_role)(dev, standby, flags) <
> 0)
> + goto failure;
> + ret++;
> + }
> + return ret;
> +
> +failure:
> + /* Rollback all changed devices in case one failed. */
> + if (ret) {
> + RTE_ETH_FOREACH_DEV(port_id) {
> + dev = &rte_eth_devices[port_id];
> + (*dev->dev_ops->process_set_role)(dev, !standby,
> flags);
> + if (--ret == 0)
> + break;
> + }
> + }
> + rte_errno = EPERM;
> + return -rte_errno;
> +}
> +
> int
> rte_eth_dev_socket_id(uint16_t port_id)
> {
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index c129ca1eaf..1505396ced 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -1606,6 +1606,8 @@ struct rte_eth_conf {
> #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP RTE_BIT64(3)
> /** Device supports keeping shared flow objects across restart. */
> #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
> +/** Device supports process role changing. @see
> rte_eth_process_set_active */
> +#define RTE_ETH_DEV_CAPA_PROCESS_ROLE RTE_BIT64(5)
> /**@}*/
>
> /*
> @@ -2204,6 +2206,60 @@ int rte_eth_dev_owner_delete(const uint64_t
> owner_id);
> int rte_eth_dev_owner_get(const uint16_t port_id,
> struct rte_eth_dev_owner *owner);
>
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Set the role of the process to active or standby,
> + * affecting network traffic handling.
> + *
> + * If one device does not support this operation or fails,
> + * the whole operation is failed and rolled back.
> + *
> + * It is forbidden to have multiple processes with the same role
> + * unless only one of them is configured to handle the traffic.
> + *
> + * The application is active by default.
> + * The configuration from the active process is effective immediately
> + * while the configuration from the standby process is queued by hardware.
> + * When configuring the device from a standby process,
> + * it has no effect except for below situations:
> + * - traffic not handled by the active process configuration
> + * - no active process
> + *
> + * When a process is changed from a standby to an active role,
> + * all preceding configurations that are queued by hardware
> + * should become effective immediately.
> + * Before role transition, all the traffic handling configurations
> + * set by the active process should be flushed first.
> + *
> + * In summary, the operations are expected to happen in this order
> + * in "old" and "new" applications:
> + * device: already configured by the old application
> + * new: start as active
> + * new: probe the same device
> + * new: set as standby
> + * new: configure the device
> + * device: has configurations from old and new applications
> + * old: clear its device configuration
> + * device: has only 1 configuration from new application
> + * new: set as active
> + * device: downtime for connecting all to the new application
> + * old: shutdown
> + *
> + * @param standby
> + * Role active if false, standby if true.
> + * @param flags
> + * Role specific flags.
> + * @return
> + * Positive value on success, -rte_errno value on error:
> + * - (> 0) Number of switched devices.
> + * - (-ENOTSUP) if not supported by a device.
> + * - (-EPERM) if operation failed with a device.
> + */
> +__rte_experimental
> +int rte_eth_process_set_role(bool standby, uint32_t flags);
> +
> /**
> * Get the number of ports which are usable for the application.
> *
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 17201fbe0f..d5d3ea5421 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -298,6 +298,9 @@ EXPERIMENTAL {
> rte_flow_get_q_aged_flows;
> rte_mtr_meter_policy_get;
> rte_mtr_meter_profile_get;
> +
> + # added in 23.03
> + rte_eth_process_set_role;
> };
>
> INTERNAL {
> --
> 2.27.0
Acked-by: Ori Kam <orika at nvidia.com>
Best,
Ori
More information about the dev
mailing list