[dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores

Tan, Jianfeng jianfeng.tan at intel.com
Thu Mar 10 02:36:33 CET 2016



On 3/10/2016 3:33 AM, Ananyev, Konstantin wrote:
>
>>>>>>>>>> On 3/8/2016 4:54 PM, Panu Matilainen wrote:
>>>>>>>>>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
>>>>>>>>>>>> This patch adds option, --avail-cores, to use lcores which are
>>>>>>>>>>>> available
>>>>>>>>>>>> by calling pthread_getaffinity_np() to narrow down detected cores
>>>>>>>>>>>> before
>>>>>>>>>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores).
>>>>>>>>>>>>
>>>>>>>>>>>> Test example:
>>>>>>>>>>>> $ taskset 0xc0000 ./examples/helloworld/build/helloworld \
>>>>>>>>>>>>             --avail-cores -m 1024
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
>>>>>>>>>>>> Acked-by: Neil Horman <nhorman at tuxdriver.com>
>>>>>>>>>>> Hmm, to me this sounds like something that should be done always so
>>>>>>>>>>> there's no need for an option. Or if there's a chance it might do the
>>>>>>>>>>> wrong thing in some rare circumstance then perhaps there should be a
>>>>>>>>>>> disabler option instead?
>>>>>>>>>> Thanks for comments.
>>>>>>>>>>
>>>>>>>>>> Yes, there's a use case that we cannot handle.
>>>>>>>>>>
>>>>>>>>>> If we make it as default, DPDK applications may fail to start, when user
>>>>>>>>>> specifies a core in isolcpus and its parent process (say bash) has a
>>>>>>>>>> cpuset affinity that excludes isolcpus. Originally, DPDK applications
>>>>>>>>>> just blindly do pthread_setaffinity_np() and it always succeeds because
>>>>>>>>>> it always has root privilege to change any cpu affinity.
>>>>>>>>>>
>>>>>>>>>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
>>>>>>>>>> flagged as undetected (in my older implementation) and leads to failure.
>>>>>>>>>> To make it correct, we would always add "taskset mask" (or other ways)
>>>>>>>>>> before DPDK application cmd lines.
>>>>>>>>>>
>>>>>>>>>> How do you think?
>>>>>>>>> I still think it sounds like something that should be done by default
>>>>>>>>> and maybe be overridable with some flag, rather than the other way
>>>>>>>>> around. Another alternative might be detecting the cores always but if
>>>>>>>>> running as root, override but with a warning.
>>>>>>>> For your second solution, only root can setaffinity to isolcpus?
>>>>>>>> Your first solution seems like a promising way for me.
>>>>>>>>
>>>>>>>>> But I dont know, just wondering. To look at it from another angle: why
>>>>>>>>> would somebody use this new --avail-cores option and in what
>>>>>>>>> situation, if things "just work" otherwise anyway?
>>>>>>>> For DPDK applications, the most common case to initialize DPDK is like
>>>>>>>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users need
>>>>>>>> to specify which cores to run and how much hugepages are used. Suppose
>>>>>>>> we need this dpdk-app to run in a container, users already give those
>>>>>>>> information when they build up the cgroup for it to run inside, this
>>>>>>>> option or this patch is to make DPDK more smart to discover how much
>>>>>>>> resource will be used. Make sense?
>>>>>>> But then, all we need might be just a script that would extract this information from the system
>>>>>>> and form a proper cmdline parameter for DPDK?
>>>>>> Yes, a script will work. Or to construct (argc, argv) to call
>>>>>> rte_eal_init() in the application. But as Neil Horman once suggested, a
>>>>>> simple pthread_getaffinity_np() will get all things done. So if it worth
>>>>>> a patch here?
>>>>> Don't know...
>>>>> Personally I would prefer not to put extra logic inside EAL.
>>>>> For me - there are too many different options already.
>>>> Then how about make it default in rte_eal_cpu_init()? And it is already
>>>> known it will bring trouble to those use isolcpus users, they need to
>>>> add "taskset [mask]" before starting a DPDK app.
>>> As I said - provide a script?
>> Yes. But what I want to say is this script is hard to be right, if there
>> are different kinds of limitations. (Barely happen though :-) )
> My thought was to keep dpdk code untouched - i.e. let it still blindly set_pthread_affinity()
> based on the input parameters, and in addition provide a script for those who want to run
> in '--avail-cores' mode.
> So it could do 'taskset -p $$' and then either form -c parameter list  for the app,
> or check existing -c/-l/--lcores parameter and complain if not allowed pcpu detected.
> But ok, might be it is easier and more convenient to have this logic inside EAL,
> then in a separate script.
>
>>> Same might be for amount of hugepage memory available to the user?
>> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used
>> by app themself (more like an artificial argument) ...
>>>>>    From other side looking at the patch itself:
>>>>> You are updating lcore_count and lcore_config[],based on physical cpu availability,
>>>>> but these days it is not always one-to-one mapping between EAL lcore and physical cpu.
>>>>> Shouldn't that be taken into account?
>>>> I have not see the problem so far, because this work is done before
>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core
>>>> is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or
>>>> could you please give more hints?
>>> I didn't test try changes, so probably I am missing something.
>>> Let say iuser allowed to use only cpus 0-3.
>>> If he would type with:
>>>    --avail-cores  --lcores='(1-7)@2',
>>> then only lcores 1-3 would be started.
>>> Again if user would specify '2@(1-7)' it would also be undetected
>>> that cpus 4-7 are note available to the user.
>>> Is that so?
>> After reading the code:
>> For case --lcores='(1-7)@2', lcores 1-7 would be started, and bind to
>> pcore 2.
>> For case --lcores='2@(1-7)', this will fail with "core 4 unavailable".
>>
>> It's because:
>> a.  although 1:1 mapping is built-up and flagged as detected if pcore is
>> found in sysfs. (ROLE_RTE, cpuset, detected is true)
>> b. in the beginning of eal_parse_lcores(), "reset lcore config".
>> (ROLE_OFF, cpuset is empty, detected is still true)
>> c. pcore cpuset will be checked by convert_to_cpuset using the previous
>> "detected" value.
> Ok, my bad then - I misunderstood the code.
> Thanks for explanation.
> So if I get it right now - first inside lib/librte_eal/common/eal_common_lcore.c
> Both lcore_count and lcore_config relate to the pcpus.
> Then later, at lib/librte_eal/common/eal_common_options.c
> they are overwritten related to lcores information.
> Except lcore_config[].detected, which seems kept intact.
> Is that correct?

Yes, exactly. And really appreciate that you raise up this question for 
discussion.

>
>> I have tested it with the patch. Result aligns above analysis.
>> For case --lcores='(1-7)@2': sudo taskset 0xf
>> ./examples/helloworld/build/helloworld --avail-cores --lcores='(1-7)@2'
>> ...
>> hello from core 2
>> hello from core 3
>> hello from core 4
>> hello from core 5
>> hello from core 6
>> hello from core 7
>> hello from core 1
>>
>> For case --lcores='2@(1-7)': sudo taskset 0xf
>> ./examples/helloworld/build/helloworld --avail-cores --lcores='2@(1-7)'
>> ...
>> EAL: core 4 unavailable
>> EAL: invalid parameter for --lcores
>> ...
>>
>> One thing may worth mention: shall "detected" be maintained in struct
>> lcore_config? Maybe we need to maintain an data structure for pcores?
> Yes, it might be good to split pcpu and lcores information somehow,
> as it is a bit confusing right now.
> But I suppose this is a subject for another patch/discussion.

Yes, just another topic.

Thanks,
Jianfeng

> Konstantin
>
>



More information about the dev mailing list