[dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Burakov, Anatoly anatoly.burakov at intel.com
Thu Jun 24 17:04:45 CEST 2021


On 24-Jun-21 3:57 PM, Ananyev, Konstantin wrote:
> 
> 
>>>>>> I did a quick prototype for this, and i don't think it is going to work.
>>>>>>
>>>>>> Callbacks with just "current value" as argument will be pretty limited
>>>>>> and will only really work for cases where we know what we are expecting.
>>>>>> However, for cases like event/dlb or net/mlx5, the expected value is (or
>>>>>> appears to be) dependent upon some internal device data, and is not
>>>>>> constant like in case of net/ixgbe for example.
>>>>>>
>>>>>> This can be fixed by passing an opaque pointer, either by storing it in
>>>>>> the monitor condition, or by passing it directly to rte_power_monitor at
>>>>>> invocation time.
>>>>>>
>>>>>> The latter doesn't work well because when we call rte_power_monitor from
>>>>>> inside the rte_power library, we lack the context necessary to get said
>>>>>> opaque pointer.
>>>>>>
>>>>>> The former doesn't work either, because the only place where we can get
>>>>>> this argument is inside get_monitor_addr, but the opaque pointer must
>>>>>> persist after we exit that function in order to avoid use-after-free -
>>>>>> which means that it either has to be statically allocated (which means
>>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated
>>>>>> (which a big no-no on a hotpath).
>>>>>
>>>>> If I get you right, expected_value (and probably mask) can be variable ones.
>>>>> So for callback approach to work we need to pass all this as parameters
>>>>> to PMD comparison callback:
>>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
>>>>> Correct?
>>>>
>>>> If we have both expected value, mask, and current value, then what's the
>>>> point of the callback? The point of the callback would be to pass just
>>>> the current value, and let the callback decide what's the expected value
>>>> and how to compare it.
>>>
>>> For me the main point of callback is to hide PMD specific comparison semantics.
>>> Basically they provide us with some values in struct rte_power_monitor_cond,
>>> and then it is up to them how to interpret them in their comparison function.
>>> All we'll do for them: will read the value at address provided.
>>> I understand that it looks like an overkill, as majority of these comparison functions
>>> will be like:
>>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
>>> {
>>>           return ((real_val & mask) == expected_val);
>>> }
>>> Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it.
>>>
>>>>
>>>> So, we can either let callback handle expected values itself by having
>>>> an opaque callback-specific argument (which means it has to persist
>>>> between .get_monitor_addr() and rte_power_monitor() calls),
>>>
>>> But that's what we doing already - PMD fills rte_power_monitor_cond values
>>> for us, we store them somewhere and then use them to decide should we go to sleep or not.
>>> All callback does - moves actual values interpretation back to PMD:
>>> Right now:
>>> PMD:      provide PMC values
>>> POWER: store PMC values somewhere
>>>                   read the value at address provided in PMC
>>>                   interpret PMC values and newly read value and make the decision
>>>
>>> With callback:
>>> PMD:      provide PMC values
>>> POWER: store PMC values somewhere
>>>                   read the value at address provided in PMC
>>> PMD:      interpret PMC values and newly read value and make the decision
>>>
>>> Or did you mean something different here?
>>>
>>>> or we do the
>>>> comparisons inside rte_power_monitor(), and store the expected/mask
>>>> values in the monitor condition, and *don't* have any callbacks at all.
>>>> Are you suggesting an alternative to the above two options?
>>>
>>> As I said in my first mail - we can just replace 'inverse' with 'op'.
>>> That at least will make this API extendable, if someone will need
>>> something different in future.
>>>
>>> Another option is
>>
>> Right, so the idea is store the PMD-specific data in the monitor
>> condition, and leave it to the callback to interpret it.
>>
>> The obvious question then is, how many values is enough? Two? Three?
>> Four? This option doesn't really solve the basic issue, it just kicks
>> the can down the road. I guess three values should be enough for
>> everyone (tm) ? :D
>>
>> I don't like the 'op' thing because if the goal is to be flexible, it's
>> unnecessarily limiting *and* makes the API even more complex to use. I
>> would rather have a number of PMD-specific values and leave it up to the
>> callback to interpret them, because at least that way we're not limited
>> to predefined operations on the monitor condition data.
> 
> Just to make sure we are talking about the same, does what you propose
> looks like that:
> 
>   struct rte_power_monitor_cond {
>          volatile void *addr;  /**< Address to monitor for changes */
>          uint8_t size;    /**< Data size (in bytes) that will be used to compare
>                            *   expected value (`val`) with data read from the
>                            *   monitored memory location (`addr`). Can be 1, 2,
>                            *   4, or 8. Supplying any other value will result in
>                            *   an error.
>                            */
>          int (*cmp)(uint64_t real_value, const uint64_t opaque[4]);
>          uint64_t opaque[4];  /*PMD specific data, used by comparison call-back below */
> };
> 
> And then in rte_power_monitor():
> ...
> uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size);
> if (pmc->cmp(cur_value, pmc->opaque) != 0) {
>      /* goto sleep */
> }
> 
> ?
> 

Something like that, yes.

-- 
Thanks,
Anatoly


More information about the dev mailing list