[dpdk-dev] [PATCH v3 6/7] power: support monitoring multiple Rx queues
Ananyev, Konstantin
konstantin.ananyev at intel.com
Tue Jun 29 02:07:44 CEST 2021
> >> Use the new multi-monitor intrinsic to allow monitoring multiple ethdev
> >> Rx queues while entering the energy efficient power state. The multi
> >> version will be used unconditionally if supported, and the UMWAIT one
> >> will only be used when multi-monitor is not supported by the hardware.
> >>
> >> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
> >> ---
> >> doc/guides/prog_guide/power_man.rst | 9 ++--
> >> lib/power/rte_power_pmd_mgmt.c | 76 ++++++++++++++++++++++++++++-
> >> 2 files changed, 80 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
> >> index fac2c19516..3245a5ebed 100644
> >> --- a/doc/guides/prog_guide/power_man.rst
> >> +++ b/doc/guides/prog_guide/power_man.rst
> >> @@ -221,13 +221,16 @@ power saving whenever empty poll count reaches a certain number.
> >> The "monitor" mode is only supported in the following configurations and scenarios:
> >>
> >> * If ``rte_cpu_get_intrinsics_support()`` function indicates that
> >> + ``rte_power_monitor_multi()`` function is supported by the platform, then
> >> + monitoring multiple Ethernet Rx queues for traffic will be supported.
> >> +
> >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only
> >> ``rte_power_monitor()`` is supported by the platform, then monitoring will be
> >> limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be
> >> monitored from a different lcore).
> >>
> >> -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the
> >> - ``rte_power_monitor()`` function is not supported, then monitor mode will not
> >> - be supported.
> >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of the
> >> + two monitoring functions are supported, then monitor mode will not be supported.
> >>
> >> * Not all Ethernet devices support monitoring, even if the underlying
> >> platform may support the necessary CPU instructions. Please refer to
> >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c
> >> index 7762cd39b8..aab2d4f1ee 100644
> >> --- a/lib/power/rte_power_pmd_mgmt.c
> >> +++ b/lib/power/rte_power_pmd_mgmt.c
> >> @@ -155,6 +155,24 @@ queue_list_remove(struct pmd_core_cfg *cfg, const union queue *q)
> >> return 0;
> >> }
> >>
> >> +static inline int
> >> +get_monitor_addresses(struct pmd_core_cfg *cfg,
> >> + struct rte_power_monitor_cond *pmc)
> >> +{
> >> + const struct queue_list_entry *qle;
> >> + size_t i = 0;
> >> + int ret;
> >> +
> >> + TAILQ_FOREACH(qle, &cfg->head, next) {
> >> + struct rte_power_monitor_cond *cur = &pmc[i];
> >
> > Looks like you never increment 'i' value inside that function.
> > Also it probably will be safer to add 'num' parameter to check that
> > we will never over-run pmc[] boundaries.
>
> Will fix in v4, good catch!
>
> >
> >> + const union queue *q = &qle->queue;
> >> + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur);
> >> + if (ret < 0)
> >> + return ret;
> >> + }
> >> + return 0;
> >> +}
> >> +
> >> static void
> >> calc_tsc(void)
> >> {
> >> @@ -183,6 +201,48 @@ calc_tsc(void)
> >> }
> >> }
> >>
> >> +static uint16_t
> >> +clb_multiwait(uint16_t port_id, uint16_t qidx,
> >> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
> >> + uint16_t max_pkts __rte_unused, void *addr __rte_unused)
> >> +{
> >> + const unsigned int lcore = rte_lcore_id();
> >> + const union queue q = {.portid = port_id, .qid = qidx};
> >> + const bool empty = nb_rx == 0;
> >> + struct pmd_core_cfg *q_conf;
> >> +
> >> + q_conf = &lcore_cfg[lcore];
> >> +
> >> + /* early exit */
> >> + if (likely(!empty)) {
> >> + q_conf->empty_poll_stats = 0;
> >> + } else {
> >> + /* do we care about this particular queue? */
> >> + if (!queue_is_power_save(q_conf, &q))
> >> + return nb_rx;
> >
> > I still don't understand the need of 'special' power_save queue here...
> > Why we can't just have a function:
> >
> > get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(struct pmd_core_cfg *lcore_cfg),
> > and then just:
> >
> > /* all queues have at least EMPTYPOLL_MAX sequential empty polls */
> > if (get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(q_conf) == 0) {
> > /* go into power-save mode here */
> > }
>
> Okay, let's go through this step by step :)
>
> Let's suppose we have three queues - q0, q1 and q2. We want to sleep
> whenever there's no traffic on *all of them*, however we cannot know
> that until we have checked all of them.
>
> So, let's suppose that q0, q1 and q2 were empty all this time, but now
> some traffic arrived at q2 while we're still checking q0. We see that q0
> is empty, and all of the queues were empty for the last N polls, so we
> think we will be safe to sleep at q0 despite the fact that traffic has
> just arrived at q2.
> This is not an issue with MONITOR mode because we will be able to see if
> current Rx ring descriptor is busy or not via the NIC callback, *but
> this is not possible* with PAUSE and SCALE modes, because they don't
> have the sneaky lookahead function of MONITOR! So, with PAUSE and SCALE
> modes, it is possible to end up in a situation where you *think* you
> don't have any traffic, but you actually do, you just haven't checked
> the relevant queue yet.
I think such situation is unavoidable.
Yes, traffic can arrive to *any* queue at *any* time.
With your example above - user choose q2 as 'special' queue, but
traffic actually arrives on q0 or q1.
And yes, if user choose PAUSE or SCALE methods he *can* miss the traffic,
because as you said for these methods there is no notification mechanisms.
I think there are just unavoidable limitations with these power-save methods.
> In order to prevent this from happening, we do not sleep on every queue,
> instead we sleep *once* per loop.
Yes, totally agree we shouldn't sleep on *every* queue.
We need to go to sleep when there is no traffic on *any* of queues we monitor.
> That is, we check q0, check q1, check
> q2, and only then we decide whether we want to sleep or not.
> Of course, with such scheme it is still possible to e.g. sleep in q2
> while there's traffic waiting in q0,
Yes, exactly.
> but worst case is less bad with
> this scheme, because we'll be doing at worst 1 extra sleep.
Hmm, I think it would be one extra sleep anyway.
> Whereas with what you're suggesting, if we had e.g. 10 queues to poll,
> and we checked q1 but traffic has just arrived at q0, we'll be sleeping
> at q1, then we'll be sleeping at q2, then we'll be sleeping at q3, then
> we'll be sleeping at q4, then we'll be sleeping at q5.... and 9 sleeps
> later we finally reach q0 and find out after all this time that we
> shouldn't have slept in the first place.
Ah ok, I think I understand now what you are saying.
Sure, to avoid such situation, we'll need to maintain extra counters and
update them properly when we go to sleep.
I should state it clearly at the beginning.
It might be easier to explain what I meant by code snippet:
lcore_conf needs 2 counters:
uint64_t nb_queues_ready_to_sleep;
uint64_t nb_sleeps;
Plus each queue needs 2 counters:
uint64_t nb_empty_polls;
uint64_t nb_sleeps;
Now, at rx_callback():
/* check did sleep happen since previous call,
if yes, then reset queue counters */
if (queue->nb_sleeps != lcore_conf->nb_sleeps) {
queue->nb_sleeps = lcore_conf->nb_sleeps;
queue->nb_empty_polls = 0;
}
/* packet arrived, reset counters */
if (nb_rx != 0) {
/* queue is not 'ready_to_sleep' any more */
if (queue->nb_empty_polls > EMPTYPOLL_MAX)
lcore_conf-> nb_queues_ready_to_sleep--;
queue->nb_empty_polls = 0;
/* empty poll */
} else {
/* queue reaches EMPTYPOLL_MAX threshold, mark it as 'ready_to_sleep' */
if (queue->nb_empty_polls == EMPTYPOLL_MAX)
lcore_conf-> nb_queues_ready_to_sleep++;
queue->nb_empty_polls++;
}
/* no traffic on any queue for at least EMPTYPOLL_MAX iterations */
if (lcore_conf-> nb_queues_ready_to_sleep == lcore_conf->n_queues) {
/* update counters and sleep */
lcore_conf->nb_sleeps++;
lcore_conf-> nb_queues_ready_to_sleep = 0;
goto_sleep();
}
}
> Hopefully you get the point now :)
>
> So, the idea here is, for any N queues, sleep only once, not N times.
>
> >
> >> +
> >> + /*
> >> + * we can increment unconditionally here because if there were
> >> + * non-empty polls in other queues assigned to this core, we
> >> + * dropped the counter to zero anyway.
> >> + */
> >> + q_conf->empty_poll_stats++;
> >> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
> >> + struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS];
> >
> > I think you need here:
> > struct rte_power_monitor_cond pmc[q_conf->n_queues];
>
> I think VLA's are generally agreed upon to be unsafe, so i'm avoiding
> them here.
Wonder why?
These days DPDK uses VLA in dozens of places...
But if you'd like to avoid VLA - you can use alloca(),
or have lcore_conf->pmc[] and realloc() it when new queue is
added/removed from the list.
>
> >
> >
> >> + uint16_t ret;
> >> +
> >> + /* gather all monitoring conditions */
> >> + ret = get_monitor_addresses(q_conf, pmc);
> >> +
> >> + if (ret == 0)
> >> + rte_power_monitor_multi(pmc,
> >> + q_conf->n_queues, UINT64_MAX);
> >> + }
> >> + }
> >> +
> >> + return nb_rx;
> >> +}
> >> +
> >> static uint16_t
> >> clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused,
> >> uint16_t nb_rx, uint16_t max_pkts __rte_unused,
> >> @@ -348,14 +408,19 @@ static int
> >> check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata)
> >> {
> >> struct rte_power_monitor_cond dummy;
> >> + bool multimonitor_supported;
> >>
> >> /* check if rte_power_monitor is supported */
> >> if (!global_data.intrinsics_support.power_monitor) {
> >> RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n");
> >> return -ENOTSUP;
> >> }
> >> + /* check if multi-monitor is supported */
> >> + multimonitor_supported =
> >> + global_data.intrinsics_support.power_monitor_multi;
> >>
> >> - if (cfg->n_queues > 0) {
> >> + /* if we're adding a new queue, do we support multiple queues? */
> >> + if (cfg->n_queues > 0 && !multimonitor_supported) {
> >> RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n");
> >> return -ENOTSUP;
> >> }
> >> @@ -371,6 +436,13 @@ check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata)
> >> return 0;
> >> }
> >>
> >> +static inline rte_rx_callback_fn
> >> +get_monitor_callback(void)
> >> +{
> >> + return global_data.intrinsics_support.power_monitor_multi ?
> >> + clb_multiwait : clb_umwait;
> >> +}
> >> +
> >> int
> >> rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
> >> uint16_t queue_id, enum rte_power_pmd_mgmt_type mode)
> >> @@ -434,7 +506,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id,
> >> if (ret < 0)
> >> goto end;
> >>
> >> - clb = clb_umwait;
> >> + clb = get_monitor_callback();
> >> break;
> >> case RTE_POWER_MGMT_TYPE_SCALE:
> >> /* check if we can add a new queue */
> >> --
> >> 2.25.1
> >
>
>
> --
> Thanks,
> Anatoly
More information about the dev
mailing list