[dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Jun 24 17:25:55 CEST 2021


> >>>>>> I did a quick prototype for this, and i don't think it is going to work.
> >>>>>>
> >>>>>> Callbacks with just "current value" as argument will be pretty limited
> >>>>>> and will only really work for cases where we know what we are expecting.
> >>>>>> However, for cases like event/dlb or net/mlx5, the expected value is (or
> >>>>>> appears to be) dependent upon some internal device data, and is not
> >>>>>> constant like in case of net/ixgbe for example.
> >>>>>>
> >>>>>> This can be fixed by passing an opaque pointer, either by storing it in
> >>>>>> the monitor condition, or by passing it directly to rte_power_monitor at
> >>>>>> invocation time.
> >>>>>>
> >>>>>> The latter doesn't work well because when we call rte_power_monitor from
> >>>>>> inside the rte_power library, we lack the context necessary to get said
> >>>>>> opaque pointer.
> >>>>>>
> >>>>>> The former doesn't work either, because the only place where we can get
> >>>>>> this argument is inside get_monitor_addr, but the opaque pointer must
> >>>>>> persist after we exit that function in order to avoid use-after-free -
> >>>>>> which means that it either has to be statically allocated (which means
> >>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated
> >>>>>> (which a big no-no on a hotpath).
> >>>>>
> >>>>> If I get you right, expected_value (and probably mask) can be variable ones.
> >>>>> So for callback approach to work we need to pass all this as parameters
> >>>>> to PMD comparison callback:
> >>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
> >>>>> Correct?
> >>>>
> >>>> If we have both expected value, mask, and current value, then what's the
> >>>> point of the callback? The point of the callback would be to pass just
> >>>> the current value, and let the callback decide what's the expected value
> >>>> and how to compare it.
> >>>
> >>> For me the main point of callback is to hide PMD specific comparison semantics.
> >>> Basically they provide us with some values in struct rte_power_monitor_cond,
> >>> and then it is up to them how to interpret them in their comparison function.
> >>> All we'll do for them: will read the value at address provided.
> >>> I understand that it looks like an overkill, as majority of these comparison functions
> >>> will be like:
> >>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
> >>> {
> >>>           return ((real_val & mask) == expected_val);
> >>> }
> >>> Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it.
> >>>
> >>>>
> >>>> So, we can either let callback handle expected values itself by having
> >>>> an opaque callback-specific argument (which means it has to persist
> >>>> between .get_monitor_addr() and rte_power_monitor() calls),
> >>>
> >>> But that's what we doing already - PMD fills rte_power_monitor_cond values
> >>> for us, we store them somewhere and then use them to decide should we go to sleep or not.
> >>> All callback does - moves actual values interpretation back to PMD:
> >>> Right now:
> >>> PMD:      provide PMC values
> >>> POWER: store PMC values somewhere
> >>>                   read the value at address provided in PMC
> >>>                   interpret PMC values and newly read value and make the decision
> >>>
> >>> With callback:
> >>> PMD:      provide PMC values
> >>> POWER: store PMC values somewhere
> >>>                   read the value at address provided in PMC
> >>> PMD:      interpret PMC values and newly read value and make the decision
> >>>
> >>> Or did you mean something different here?
> >>>
> >>>> or we do the
> >>>> comparisons inside rte_power_monitor(), and store the expected/mask
> >>>> values in the monitor condition, and *don't* have any callbacks at all.
> >>>> Are you suggesting an alternative to the above two options?
> >>>
> >>> As I said in my first mail - we can just replace 'inverse' with 'op'.
> >>> That at least will make this API extendable, if someone will need
> >>> something different in future.
> >>>
> >>> Another option is
> >>
> >> Right, so the idea is store the PMD-specific data in the monitor
> >> condition, and leave it to the callback to interpret it.
> >>
> >> The obvious question then is, how many values is enough? Two? Three?
> >> Four? This option doesn't really solve the basic issue, it just kicks
> >> the can down the road. I guess three values should be enough for
> >> everyone (tm) ? :D
> >>
> >> I don't like the 'op' thing because if the goal is to be flexible, it's
> >> unnecessarily limiting *and* makes the API even more complex to use. I
> >> would rather have a number of PMD-specific values and leave it up to the
> >> callback to interpret them, because at least that way we're not limited
> >> to predefined operations on the monitor condition data.
> >
> > Just to make sure we are talking about the same, does what you propose
> > looks like that:
> >
> >   struct rte_power_monitor_cond {
> >          volatile void *addr;  /**< Address to monitor for changes */
> >          uint8_t size;    /**< Data size (in bytes) that will be used to compare
> >                            *   expected value (`val`) with data read from the
> >                            *   monitored memory location (`addr`). Can be 1, 2,
> >                            *   4, or 8. Supplying any other value will result in
> >                            *   an error.
> >                            */
> >          int (*cmp)(uint64_t real_value, const uint64_t opaque[4]);
> >          uint64_t opaque[4];  /*PMD specific data, used by comparison call-back below */
> > };
> >
> > And then in rte_power_monitor():
> > ...
> > uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size);
> > if (pmc->cmp(cur_value, pmc->opaque) != 0) {
> >      /* goto sleep */
> > }
> >
> > ?
> >
> 
> Something like that, yes.
> 

Seems reasonable to me.
Thanks
Konstantin


More information about the dev mailing list