[dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Ananyev, Konstantin konstantin.ananyev at intel.com
Thu Jun 24 16:57:17 CEST 2021



> >>>> I did a quick prototype for this, and i don't think it is going to work.
> >>>>
> >>>> Callbacks with just "current value" as argument will be pretty limited
> >>>> and will only really work for cases where we know what we are expecting.
> >>>> However, for cases like event/dlb or net/mlx5, the expected value is (or
> >>>> appears to be) dependent upon some internal device data, and is not
> >>>> constant like in case of net/ixgbe for example.
> >>>>
> >>>> This can be fixed by passing an opaque pointer, either by storing it in
> >>>> the monitor condition, or by passing it directly to rte_power_monitor at
> >>>> invocation time.
> >>>>
> >>>> The latter doesn't work well because when we call rte_power_monitor from
> >>>> inside the rte_power library, we lack the context necessary to get said
> >>>> opaque pointer.
> >>>>
> >>>> The former doesn't work either, because the only place where we can get
> >>>> this argument is inside get_monitor_addr, but the opaque pointer must
> >>>> persist after we exit that function in order to avoid use-after-free -
> >>>> which means that it either has to be statically allocated (which means
> >>>> it's not thread-safe for a non-trivial case), or dynamically allocated
> >>>> (which a big no-no on a hotpath).
> >>>
> >>> If I get you right, expected_value (and probably mask) can be variable ones.
> >>> So for callback approach to work we need to pass all this as parameters
> >>> to PMD comparison callback:
> >>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
> >>> Correct?
> >>
> >> If we have both expected value, mask, and current value, then what's the
> >> point of the callback? The point of the callback would be to pass just
> >> the current value, and let the callback decide what's the expected value
> >> and how to compare it.
> >
> > For me the main point of callback is to hide PMD specific comparison semantics.
> > Basically they provide us with some values in struct rte_power_monitor_cond,
> > and then it is up to them how to interpret them in their comparison function.
> > All we'll do for them: will read the value at address provided.
> > I understand that it looks like an overkill, as majority of these comparison functions
> > will be like:
> > int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
> > {
> >          return ((real_val & mask) == expected_val);
> > }
> > Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it.
> >
> >>
> >> So, we can either let callback handle expected values itself by having
> >> an opaque callback-specific argument (which means it has to persist
> >> between .get_monitor_addr() and rte_power_monitor() calls),
> >
> > But that's what we doing already - PMD fills rte_power_monitor_cond values
> > for us, we store them somewhere and then use them to decide should we go to sleep or not.
> > All callback does - moves actual values interpretation back to PMD:
> > Right now:
> > PMD:      provide PMC values
> > POWER: store PMC values somewhere
> >                  read the value at address provided in PMC
> >                  interpret PMC values and newly read value and make the decision
> >
> > With callback:
> > PMD:      provide PMC values
> > POWER: store PMC values somewhere
> >                  read the value at address provided in PMC
> > PMD:      interpret PMC values and newly read value and make the decision
> >
> > Or did you mean something different here?
> >
> >> or we do the
> >> comparisons inside rte_power_monitor(), and store the expected/mask
> >> values in the monitor condition, and *don't* have any callbacks at all.
> >> Are you suggesting an alternative to the above two options?
> >
> > As I said in my first mail - we can just replace 'inverse' with 'op'.
> > That at least will make this API extendable, if someone will need
> > something different in future.
> >
> > Another option is
> 
> Right, so the idea is store the PMD-specific data in the monitor
> condition, and leave it to the callback to interpret it.
> 
> The obvious question then is, how many values is enough? Two? Three?
> Four? This option doesn't really solve the basic issue, it just kicks
> the can down the road. I guess three values should be enough for
> everyone (tm) ? :D
> 
> I don't like the 'op' thing because if the goal is to be flexible, it's
> unnecessarily limiting *and* makes the API even more complex to use. I
> would rather have a number of PMD-specific values and leave it up to the
> callback to interpret them, because at least that way we're not limited
> to predefined operations on the monitor condition data.

Just to make sure we are talking about the same, does what you propose
looks like that:

 struct rte_power_monitor_cond {
        volatile void *addr;  /**< Address to monitor for changes */
        uint8_t size;    /**< Data size (in bytes) that will be used to compare
                          *   expected value (`val`) with data read from the
                          *   monitored memory location (`addr`). Can be 1, 2,
                          *   4, or 8. Supplying any other value will result in
                          *   an error.  
                          */
        int (*cmp)(uint64_t real_value, const uint64_t opaque[4]);
        uint64_t opaque[4];  /*PMD specific data, used by comparison call-back below */
};

And then in rte_power_monitor():
...
uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size);
if (pmc->cmp(cur_value, pmc->opaque) != 0) {
    /* goto sleep */
}

?
 


More information about the dev mailing list