[dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues

Ananyev, Konstantin konstantin.ananyev at intel.com
Wed Jun 23 11:49:08 CEST 2021
Previous message (by thread): [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues
Next message (by thread): [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> 
> On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote:
> >
> >> Currently, there is a hard limitation on the PMD power management
> >> support that only allows it to support a single queue per lcore. This is
> >> not ideal as most DPDK use cases will poll multiple queues per core.
> >>
> >> The PMD power management mechanism relies on ethdev Rx callbacks, so it
> >> is very difficult to implement such support because callbacks are
> >> effectively stateless and have no visibility into what the other ethdev
> >> devices are doing.  This places limitations on what we can do within the
> >> framework of Rx callbacks, but the basics of this implementation are as
> >> follows:
> >>
> >> - Replace per-queue structures with per-lcore ones, so that any device
> >>    polled from the same lcore can share data
> >> - Any queue that is going to be polled from a specific lcore has to be
> >>    added to the list of cores to poll, so that the callback is aware of
> >>    other queues being polled by the same lcore
> >> - Both the empty poll counter and the actual power saving mechanism is
> >>    shared between all queues polled on a particular lcore, and is only
> >>    activated when a special designated "power saving" queue is polled. To
> >>    put it another way, we have no idea which queue the user will poll in
> >>    what order, so we rely on them telling us that queue X is the last one
> >>    in the polling loop, so any power management should happen there.
> >> - A new API is added to mark a specific Rx queue as "power saving".
> >
> > Honestly, I don't understand the logic behind that new function.
> > I understand that depending on HW we ca monitor either one or multiple queues.
> > That's ok, but why we now need to mark one queue as a 'very special' one?
> 
> Because we don't know which of the queues we are supposed to sleep on.
> 
> Imagine a situation where you have 3 queues. What usually happens is you
> poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to
> enter power-optimized state on polling q2, because otherwise we're
> risking going into power optimized state while q1 or q2 have traffic.

That's why before going to sleep we need to make sure that for *all* queues
we have at least EMPTYPOLL_MAX empty polls.
Then the order of queue checking wouldn't matter.
With your example it should be:
if (q1.empty_polls >  EMPTYPOLL_MAX && q2. empty_polls >  EMPTYPOLL_MAX &&
     q3.empy_pools >  EMPTYPOLL_MAX)
        goto_sleep;

Don't take me wrong, I am not suggesting to make *precisely* that checks
in the actual code (it could be time consuming if number of checks is big),
but the logic needs to remain.

> 
> Worst case scenario, we enter sleep after polling q0, then traffic
> arrives at q2, we wake up, and then attempt to go to sleep on q1 instead
> of skipping it. Essentially, we will be attempting to sleep at every
> queue, instead of once in a loop. This *might* be OK for multi-monitor
> because we'll be aborting sleep due to sleep condition check failure,
> but for modes like rte_pause()/rte_power_pause()-based sleep, we will be
> entering sleep unconditionally, and will be risking to sleep at q1 while
> there's traffic at q2.
> 
> So, we need this mechanism to be activated once every *loop*, not per queue.
> 
> > Why can't rte_power_ethdev_pmgmt_queue_enable() just:
> > Check is number of monitored queues exceed HW/SW capabilities,
> > and if so then just return a failure.
> > Otherwise add queue to the list and treat them all equally, i.e:
> > go to power save mode when number of sequential empty polls on
> > all monitored queues will exceed EMPTYPOLL_MAX threshold?
> >
> >>    Failing to call this API will result in no power management, however
> >>    when having only one queue per core it is obvious which queue is the
> >>    "power saving" one, so things will still work without this new API for
> >>    use cases that were previously working without it.
> >> - The limitation on UMWAIT-based polling is not removed because UMWAIT
> >>    is incapable of monitoring more than one address.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> >> ---
> >>   lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++-------
> >>   lib/power/rte_power_pmd_mgmt.h |  34 ++++
> >>   lib/power/version.map          |   3 +
> >>   3 files changed, 306 insertions(+), 66 deletions(-)
> >>
> >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> >> index 0707c60a4f..60dd21a19c 100644
> >> --- a/lib/power/rte_power_pmd_mgmt.c
> >> +++ b/lib/power/rte_power_pmd_mgmt.c
> >> @@ -33,7 +33,19 @@ enum pmd_mgmt_state {
> >>        PMD_MGMT_ENABLED
> >>   };
> >>
> >> -struct pmd_queue_cfg {
> >> +struct queue {
> >> +     uint16_t portid;
> >> +     uint16_t qid;
> >> +};
> >
> > Just a thought: if that would help somehow, it can be changed to:
> > union queue {
> >          uint32_t raw;
> >          struct { uint16_t portid, qid;
> >          };
> > };
> >
> > That way in queue find/cmp functions below you can operate with single raw 32-bt values.
> > Probably not that important, as all these functions are on slow path, but might look nicer.
> 
> Sure, that can work. We actually do comparisons with power save queue on
> fast path, so maybe that'll help.
> 
> >
> >> +struct pmd_core_cfg {
> >> +     struct queue queues[RTE_MAX_ETHPORTS];
> >
> > If we'll have ability to monitor multiple queues per lcore, would it be always enough?
> >  From other side, it is updated on control path only.
> > Wouldn't normal list with malloc(/rte_malloc) would be more suitable here?
> 
> You're right, it should be dynamically allocated.
> 
> >
> >> +     /**< Which port-queue pairs are associated with this lcore? */
> >> +     struct queue power_save_queue;
> >> +     /**< When polling multiple queues, all but this one will be ignored */
> >> +     bool power_save_queue_set;
> >> +     /**< When polling multiple queues, power save queue must be set */
> >> +     size_t n_queues;
> >> +     /**< How many queues are in the list? */
> >>        volatile enum pmd_mgmt_state pwr_mgmt_state;
> >>        /**< State of power management for this queue */
> >>        enum rte_power_pmd_mgmt_type cb_mode;
> >> @@ -43,8 +55,97 @@ struct pmd_queue_cfg {
> >>        uint64_t empty_poll_stats;
> >>        /**< Number of empty polls */
> >>   } __rte_cache_aligned;
> >> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE];
> >>
> >> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT];
> >> +static inline bool
> >> +queue_equal(const struct queue *l, const struct queue *r)
> >> +{
> >> +     return l->portid == r->portid && l->qid == r->qid;
> >> +}
> >> +
> >> +static inline void
> >> +queue_copy(struct queue *dst, const struct queue *src)
> >> +{
> >> +     dst->portid = src->portid;
> >> +     dst->qid = src->qid;
> >> +}
> >> +
> >> +static inline bool
> >> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) {
> >
> > Here and in other places - any reason why standard DPDK coding style is not used?
> 
> Just accidental :)
> 
> >
> >> +     const struct queue *pwrsave = &cfg->power_save_queue;
> >> +
> >> +     /* if there's only single queue, no need to check anything */
> >> +     if (cfg->n_queues == 1)
> >> +             return true;
> >> +     return cfg->power_save_queue_set && queue_equal(q, pwrsave);
> >> +}
> >> +
> 
> 
> --
> Thanks,
> Anatoly
Previous message (by thread): [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues
Next message (by thread): [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the dev mailing list