[dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion

Burakov, Anatoly anatoly.burakov at intel.com
Thu Jun 24 17:54:03 CEST 2021


On 24-Jun-21 4:25 PM, Ananyev, Konstantin wrote:
> 
>>>>>>>> I did a quick prototype for this, and i don't think it is going to work.
>>>>>>>>
>>>>>>>> Callbacks with just "current value" as argument will be pretty limited
>>>>>>>> and will only really work for cases where we know what we are expecting.
>>>>>>>> However, for cases like event/dlb or net/mlx5, the expected value is (or
>>>>>>>> appears to be) dependent upon some internal device data, and is not
>>>>>>>> constant like in case of net/ixgbe for example.
>>>>>>>>
>>>>>>>> This can be fixed by passing an opaque pointer, either by storing it in
>>>>>>>> the monitor condition, or by passing it directly to rte_power_monitor at
>>>>>>>> invocation time.
>>>>>>>>
>>>>>>>> The latter doesn't work well because when we call rte_power_monitor from
>>>>>>>> inside the rte_power library, we lack the context necessary to get said
>>>>>>>> opaque pointer.
>>>>>>>>
>>>>>>>> The former doesn't work either, because the only place where we can get
>>>>>>>> this argument is inside get_monitor_addr, but the opaque pointer must
>>>>>>>> persist after we exit that function in order to avoid use-after-free -
>>>>>>>> which means that it either has to be statically allocated (which means
>>>>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated
>>>>>>>> (which a big no-no on a hotpath).
>>>>>>>
>>>>>>> If I get you right, expected_value (and probably mask) can be variable ones.
>>>>>>> So for callback approach to work we need to pass all this as parameters
>>>>>>> to PMD comparison callback:
>>>>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
>>>>>>> Correct?
>>>>>>
>>>>>> If we have both expected value, mask, and current value, then what's the
>>>>>> point of the callback? The point of the callback would be to pass just
>>>>>> the current value, and let the callback decide what's the expected value
>>>>>> and how to compare it.
>>>>>
>>>>> For me the main point of callback is to hide PMD specific comparison semantics.
>>>>> Basically they provide us with some values in struct rte_power_monitor_cond,
>>>>> and then it is up to them how to interpret them in their comparison function.
>>>>> All we'll do for them: will read the value at address provided.
>>>>> I understand that it looks like an overkill, as majority of these comparison functions
>>>>> will be like:
>>>>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask)
>>>>> {
>>>>>            return ((real_val & mask) == expected_val);
>>>>> }
>>>>> Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it.
>>>>>
>>>>>>
>>>>>> So, we can either let callback handle expected values itself by having
>>>>>> an opaque callback-specific argument (which means it has to persist
>>>>>> between .get_monitor_addr() and rte_power_monitor() calls),
>>>>>
>>>>> But that's what we doing already - PMD fills rte_power_monitor_cond values
>>>>> for us, we store them somewhere and then use them to decide should we go to sleep or not.
>>>>> All callback does - moves actual values interpretation back to PMD:
>>>>> Right now:
>>>>> PMD:      provide PMC values
>>>>> POWER: store PMC values somewhere
>>>>>                    read the value at address provided in PMC
>>>>>                    interpret PMC values and newly read value and make the decision
>>>>>
>>>>> With callback:
>>>>> PMD:      provide PMC values
>>>>> POWER: store PMC values somewhere
>>>>>                    read the value at address provided in PMC
>>>>> PMD:      interpret PMC values and newly read value and make the decision
>>>>>
>>>>> Or did you mean something different here?
>>>>>
>>>>>> or we do the
>>>>>> comparisons inside rte_power_monitor(), and store the expected/mask
>>>>>> values in the monitor condition, and *don't* have any callbacks at all.
>>>>>> Are you suggesting an alternative to the above two options?
>>>>>
>>>>> As I said in my first mail - we can just replace 'inverse' with 'op'.
>>>>> That at least will make this API extendable, if someone will need
>>>>> something different in future.
>>>>>
>>>>> Another option is
>>>>
>>>> Right, so the idea is store the PMD-specific data in the monitor
>>>> condition, and leave it to the callback to interpret it.
>>>>
>>>> The obvious question then is, how many values is enough? Two? Three?
>>>> Four? This option doesn't really solve the basic issue, it just kicks
>>>> the can down the road. I guess three values should be enough for
>>>> everyone (tm) ? :D
>>>>
>>>> I don't like the 'op' thing because if the goal is to be flexible, it's
>>>> unnecessarily limiting *and* makes the API even more complex to use. I
>>>> would rather have a number of PMD-specific values and leave it up to the
>>>> callback to interpret them, because at least that way we're not limited
>>>> to predefined operations on the monitor condition data.
>>>
>>> Just to make sure we are talking about the same, does what you propose
>>> looks like that:
>>>
>>>    struct rte_power_monitor_cond {
>>>           volatile void *addr;  /**< Address to monitor for changes */
>>>           uint8_t size;    /**< Data size (in bytes) that will be used to compare
>>>                             *   expected value (`val`) with data read from the
>>>                             *   monitored memory location (`addr`). Can be 1, 2,
>>>                             *   4, or 8. Supplying any other value will result in
>>>                             *   an error.
>>>                             */
>>>           int (*cmp)(uint64_t real_value, const uint64_t opaque[4]);
>>>           uint64_t opaque[4];  /*PMD specific data, used by comparison call-back below */
>>> };
>>>
>>> And then in rte_power_monitor():
>>> ...
>>> uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size);
>>> if (pmc->cmp(cur_value, pmc->opaque) != 0) {
>>>       /* goto sleep */
>>> }
>>>
>>> ?
>>>
>>
>> Something like that, yes.
>>
> 
> Seems reasonable to me.
> Thanks
> Konstantin
> 

OK, i'll implement this in v2. Thanks for your input!

-- 
Thanks,
Anatoly


More information about the dev mailing list