[PATCH v4 3/3] ethdev: add standby flags for live migration

Jerin Jacob jerinjacobk at gmail.com
Tue Jan 31 15:37:18 CET 2023


On Tue, Jan 31, 2023 at 2:31 PM Rongwei Liu <rongweil at nvidia.com> wrote:
>
> Hi Jerin:
>
> BR
> Rongwei
>
> > -----Original Message-----
> > From: Jerin Jacob <jerinjacobk at gmail.com>
> > Sent: Tuesday, January 31, 2023 16:46
> > To: Rongwei Liu <rongweil at nvidia.com>
> > Cc: dev at dpdk.org; Matan Azrad <matan at nvidia.com>; Slava Ovsiienko
> > <viacheslavo at nvidia.com>; Ori Kam <orika at nvidia.com>; NBU-Contact-
> > Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
> > stephen at networkplumber.org; Raslan Darawsheh <rasland at nvidia.com>;
> > Ferruh Yigit <ferruh.yigit at amd.com>; Andrew Rybchenko
> > <andrew.rybchenko at oktetlabs.ru>
> > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live migration
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, Jan 31, 2023 at 8:23 AM Rongwei Liu <rongweil at nvidia.com> wrote:
> > >
> > > HI Jerin:
> > >
> > > BR
> > > Rongwei
> > >
> > > > -----Original Message-----
> > > > From: Jerin Jacob <jerinjacobk at gmail.com>
> > > > Sent: Tuesday, January 31, 2023 01:10
> > > > To: Rongwei Liu <rongweil at nvidia.com>
> > > > Cc: dev at dpdk.org; Matan Azrad <matan at nvidia.com>; Slava Ovsiienko
> > > > <viacheslavo at nvidia.com>; Ori Kam <orika at nvidia.com>; NBU-Contact-
> > > > Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
> > > > stephen at networkplumber.org; Raslan Darawsheh <rasland at nvidia.com>;
> > > > Ferruh Yigit <ferruh.yigit at amd.com>; Andrew Rybchenko
> > > > <andrew.rybchenko at oktetlabs.ru>
> > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > migration
> > > >
> > > > External email: Use caution opening links or attachments
> > > >
> > > >
> > > > On Mon, Jan 30, 2023 at 8:17 AM Rongwei Liu <rongweil at nvidia.com>
> > wrote:
> > > > >
> > > > > Hi Jerin
> > > > >
> > > > > BR
> > > > > Rongwei
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Jerin Jacob <jerinjacobk at gmail.com>
> > > > > > Sent: Monday, January 23, 2023 21:20
> > > > > > To: Rongwei Liu <rongweil at nvidia.com>
> > > > > > Cc: dev at dpdk.org; Matan Azrad <matan at nvidia.com>; Slava
> > > > > > Ovsiienko <viacheslavo at nvidia.com>; Ori Kam <orika at nvidia.com>;
> > > > > > NBU-Contact- Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>;
> > > > > > stephen at networkplumber.org; Raslan Darawsheh
> > > > > > <rasland at nvidia.com>; Ferruh Yigit <ferruh.yigit at amd.com>;
> > > > > > Andrew Rybchenko <andrew.rybchenko at oktetlabs.ru>
> > > > > > Subject: Re: [PATCH v4 3/3] ethdev: add standby flags for live
> > > > > > migration
> > > > > >
> > > > > > External email: Use caution opening links or attachments
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 18, 2023 at 9:15 PM Rongwei Liu
> > > > > > <rongweil at nvidia.com>
> > > > wrote:
> > > > > > >
> > > > > > > Some flags are added to the process state API for live
> > > > > > > migration in order to change the behavior of the flow rules in a
> > standby process.
> > > > > > >
> > > > > > > Signed-off-by: Rongwei Liu <rongweil at nvidia.com>
> > > > > > > ---
> > > > > > >  lib/ethdev/rte_ethdev.h | 21 +++++++++++++++++++++
> > > > > > >  1 file changed, 21 insertions(+)
> > > > > > >
> > > > > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > > > > index
> > > > > > > 1505396ced..9ae4f426a7 100644
> > > > > > > --- a/lib/ethdev/rte_ethdev.h
> > > > > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > > > > @@ -2260,6 +2260,27 @@ int rte_eth_dev_owner_get(const
> > > > > > > uint16_t port_id,  __rte_experimental  int
> > > > > > > rte_eth_process_set_role(bool standby, uint32_t flags);
> > > > > > >
> > > > > > > +/**@{@name Process role flags
> > > > > > > + * used when migrating from an application to another one.
> > > > > > > + * @see rte_eth_process_set_active  */
> > > > > > > +/**
> > > > > > > + * When set on a standby process, ingress flow rules will be
> > > > > > > +effective
> > > > > > > + * in active and standby processes, so the ingress traffic
> > > > > > > +may be
> > > > duplicated.
> > > > > > > + */
> > > > > > > +#define RTE_ETH_PROCESS_FLAG_STANDBY_DUP_FLOW_INGRESS
> > > > > > RTE_BIT32(0)
> > > > > >
> > > > > >
> > > > > > How to duplicate if action has statefull items for example,
> > > > > > rte_flow_action_security::security_session -> it store the live
> > > > > > pointer rte_flow_action_meter::mtr_id; -> MTR object ID created
> > > > > > with
> > > > > > rte_mtr_create()
> > > > > I agree with you, not all actions can be supported in the
> > > > > active/standby
> > > > model.
> > > >
> > > > IMO, Where ever rules are not standalone (like QUEUE, RSS) etc, It
> > > > will be architecturally is not possible to migrate with pointers.
> > > > That's where I have concern generalizing this feature for this ethdev.
> > > >
> > > Not sure I understand your concern correctly. What' the pointer concept
> > here?
> >
> > I meant, Any HW resource driver deals with "pointers" or "fixed ID"
> > can not get the same value
> > for the new application. That's where I believe this whole concepts works for
> > very standalone rte_flow patterns and actions.
> >
> >
> > > Queue RSS actions can be migrated per my local test. Active/Standby
> > application have its fully own rxq/txq.
> >
> > Yes. It because it is standalone.
> >
> > > They are totally separated processes and like two members in pipeline. 2nd
> > member can't be feed if 1st member alive and handle the traffic.
> > >
> > > > Also, I don't believe there is any real HW support needed for this.
> > > > IMO, Having DPDK standard multiprocess can do this by keeping
> > > > secondary application can migrate, keeping all the SW logic in the
> > > > primary process by doing the housekeeping in the application. On
> > > > plus side, it works with pointers too.
> >
> > > IMO, in multiple process model, primary process usually owns the hardware
> > resources via mmap/iomap/pci_map etc.
> > > Secondary process is not able to run if primary quits no matter gracefully or
> > crashing.
> > > This patch wants to introduce a "backup to alive" model.
> > > Assume user wants to upgrade from DPDK version 22.03 to 23.03, 22.03 is
> > running and active role while 23.03 comes up in standby.
> > > Both DPDK processes have its own resources and doesn't rely on each other.
> > > User can migrate the application following the steps in commit message
> > with minimum traffic downtime.
> > > SW logic like flow rules can be done following iptables-save/iptables-restore
> > approach.
> > > >
> > > > I am not sure how much housekeeping offload to _HW_ in your case. In
> > > > my view, it should be generic utils functions to track the flow and
> > > > installing the rules using rte_flow APIs and keep the scope only for
> > rte_flow.
> > > For rules part, totally agree with you. Issue is there maybe millions
> > > of flow rules in field and each rule may take different steps to re-install per
> > vendor' implementations.
> >
> > I understand the desire for millon flow migrations. Which makes sense.IMO, It
> > may be just easy to make this feature just for rte_flow name space. Just have
> > APIs to export() existing rules for the given port and import() the rules
> > exported rather than going to ethdev space and call it as "live migration".
> >
> Do you mean the API naming should be "rte_flow_process_set_role()" instead of "rte_eth_process_set_role()" ?
> Also move to rte_flow.c/.h files? Are we good to keep the PMD callback in eth_dev layer?

Yes. something with rte_flow_ prefix and not sure _set_role() kind of scheme.

> Simple export()/import() may not work. Image some flow rules are exclusive and can't be issued from both applications.
> We need to stop old application. I am afraid this will introduce big time window which traffic stops.

Yes, I think the  sequence is
rte_flow_rules_export() on app 1
stop the app 1
rte_flow_rules_import() of app 1 by app2.


> Application won't like this behavior.
> With this callback, each PMD can specify each rule, queue it or use lower priority if exclusive. Or return error.
>
> > > This serial wants to propose a unified interface for upper layer application'
> > easy use.
> > > >
> > > > That's just my view. I leave to ethdev maintainers for the rest of
> > > > the review and decision on this series.
> > > >
> > > > > That' why we have return value checking and rollback.
> > > > > In Nvidia driver doc, we suggested user to start from 'rss/queue/jump'
> > > > actions.
> > > > > Meter is possible, at least per my view.
> > > > > Assume: "meter g_action queue 0 / y_action drop / r_action drop"
> > > > > Old application: create meter_id 'A' with pre-defined limitation.
> > > > > New application: create meter_id 'B' which has the same parameters
> > > > > with
> > > > 'A'.
> > > > > 1. 1st possible approach:
> > > > >         Hardware duplicates the traffic; old application use meter
> > > > > 'A' and new
> > > > application uses meter 'B' to control traffic throughputs.
> > > > >         Since traffic is duplicated, so it can go to different meters.
> > > > > 2. 2nd possible approach:
> > > > >              Meter 'A' and 'B' point to the same hardware
> > > > > resource, and traffic
> > > > reaches this part first and if green, duplication happens.


More information about the dev mailing list