[dpdk-dev] [PATCH v2] examples/distributor: detect high frequency cores
Hunt, David
david.hunt at intel.com
Thu Mar 28 16:20:38 CET 2019
On 28/3/2019 3:10 PM, Burakov, Anatoly wrote:
> On 28-Mar-19 2:42 PM, Hunt, David wrote:
>>
>> On 28/3/2019 1:58 PM, Burakov, Anatoly wrote:
>>> On 28-Mar-19 1:13 PM, David Hunt wrote:
>>>> The distributor application is bottlenecked by the distributor core,
>>>> so if we can give more frequency to this core, then the overall
>>>> performance of the application may increase.
>>>>
>>>> This patch uses the rte_power_get_capabilities() API to query the
>>>> cores provided in the core mask, and if any high frequency cores are
>>>> found (e.g. Turbo Boost is enabled), we will pin the distributor
>>>> workload to that core.
>>>>
>>>> Signed-off-by: Liang Ma <liang.j.ma at intel.com>
>>>> Signed-off-by: David Hunt <david.hunt at intel.com>
>>>> ---
>>>
>>> <...>
>>>
>>>> + if (power_lib_initialised)
>>>> + rte_power_exit(rte_lcore_id());
>>>> printf("\nCore %u exiting tx task.\n", rte_lcore_id());
>>>> return 0;
>>>> }
>>>> @@ -575,9 +582,35 @@ lcore_worker(struct lcore_params *p)
>>>> if (num > 0)
>>>> app_stats.worker_bursts[p->worker_id][num-1]++;
>>>> }
>>>> + if (power_lib_initialised)
>>>> + rte_power_exit(rte_lcore_id());
>>>> + rte_free(p);
>>>> return 0;
>>>> }
>>>> +static int
>>>> +init_power_library(void)
>>>> +{
>>>> + int ret = 0, lcore_id;
>>>> + RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>>>> + if (rte_lcore_is_enabled(lcore_id)) {
>>>
>>> Please correct me if i'm wrong, but RTE_LCORE_FOREACH_SLAVE already
>>> checks if the lcore is enabled.
>>
>>
>> You're correct, I'll fix in next version.
>>
>>
>>>
>>> <...>
>>>
>>>> + if (power_lib_initialised) {
>>>> + /*
>>>> + * Here we'll pre-assign lcore ids to the rx, tx and
>>>> + * distributor workloads if there's higher frequency
>>>> + * on those cores e.g. if Turbo Boost is enabled.
>>>> + * It's also worth mentioning that it will assign cores in a
>>>> + * specific order, so that if there's less than three
>>>> + * available, the higher frequency cores will go to the
>>>> + * distributor first, then rx, then tx.
>>>> + */
>>>> + RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>>>> +
>>>> + rte_power_get_capabilities(lcore_id, &lcore_cap);
>>>> +
>>>> + if (lcore_cap.turbo == 1) {
>>>> + priority_num++;
>>>> + switch (priority_num) {
>>>> + case 1:
>>>> + distr_core_id = lcore_id;
>>>> + printf("Distributor on priority core %d\n",
>>>> + lcore_id);
>>>> + break;
>>>> + case 2:
>>>> + rx_core_id = lcore_id;
>>>> + printf("Rx on priority core %d\n",
>>>> + lcore_id);
>>>> + break;
>>>> + case 3:
>>>> + tx_core_id = lcore_id;
>>>> + printf("Tx on priority core %d\n",
>>>> + lcore_id);
>>>> + break;
>>>> + default:
>>>> + break;
>>>> + }
>>>
>>> This seems to be doing the same thing as right below (assigning
>>> lcore id's in order), yet in one case you use a switch, and in the
>>> other you use a simple loop. I don't see priority_num used anywhere
>>> else, so you might as well simplify this loop to be similar to what
>>> you have below, with "skip-if-not-turbo, if not assigned,
>>> assign-and-continue" type flow.
>>
>>
>> There doing different things. The loop with the switch is looking for
>> up to three priority cores, and storing those choices in
>> distr_core_id, tx_core_id and rx_core_id. This is because we don't
>> know which are the priority cores ahead of time. priority_num is used
>> in the switch statement, and when it finds a priority core, it
>> increments, so we know which variable to assign with the next
>> available priority core. Imagine we have turbo enabled on cores 2,4
>> and 6. That's what I'm trying to solve here.
>>
>> Then, when we get to the next loop, we're just assigning the
>> non-priority cores if the three key workloads have not already been
>> assigned a core, hence the simple loop, using the remaining cores.
>>
>> I looked at simplifying the flow, but as far as I can see, I need two
>> stages, a 'discovery' for the priority cores first, then whatever is
>> left can be done in a normal loop.
>>
>> Does that make sense, or my I missing an obvious refactor opportunity?
>
> I don't see how this is different from what you're doing below.
>
> You are looping over cores, checking if it's a priority core, and
> assigning any priority cores found to distributor, Rx and Tx cores, in
> that order.
>
> Below, you're looping over cores, checking if the core is already
> assigned, and assigning these cores to distributor, Rx and Tx cores,
> in that order.
>
> So, the only tangible difference between the two is 1) the check for
> whether the cores are already assigned (you don't need that because
> these cores *cannot* be attached - you haven't looped over them yet!),
> and 2) check for whether the core is priority.
>
> Just to clarify: i'm not saying merge the two loops, that can't work
> :) I'm saying, drop the switch and rewrite it like this:
>
Ah, gocha now. Will do in next rev. Thanks!
> for (cores) {
> if (core not priority)
> continue;
> if (dist_core not assigned) {
> assign dist core;
> continue;
> }
> if (rx core not assigned) {
> assign rx core;
> continue;
> }
> if (tx core not assigned) {
> assign tx core;
> continue;
> }
> }
>
> The functionality would be equivalent to the current switch method,
> but the code would be much clearer :)
>
>>
>>
>>>
>>> Once that is fixed,
>>>
>>> Reviewed-by: Anatoly Burakov <anatoly.burakov at intel.com>
>>>
>>
>
>
More information about the dev
mailing list