Bug 1137 - CPU affinity set incorrectly when lcore_id 0 is not the master-lcore
Summary: CPU affinity set incorrectly when lcore_id 0 is not the master-lcore
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: core (show other bugs)
Version: 22.11
Hardware: All Linux
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2022-11-30 19:41 CET by Lawrence Troup
Modified: 2023-04-04 18:29 CEST (History)
0 users



Attachments
Logs showing incorrect CPU set assignment (3.29 KB, text/plain)
2022-11-30 19:41 CET, Lawrence Troup
Details
Proposed fix for issue (732 bytes, patch)
2022-12-01 10:43 CET, Lawrence Troup
Details | Diff

Description Lawrence Troup 2022-11-30 19:41:16 CET
Created attachment 233 [details]
Logs showing incorrect CPU set assignment

When a range of CPUs are used (e.g. 0-3), and the master-lcore is set to non-zero, the CPU affinity for lcore-id 0 is set incorrectly, due to its cpuset being overwritten by the control-thread creation.

CPU arguments passed are '-c f --master-lcore 3', to indicate that CPUs 0-3 should be used, with the master on CPU 3. In particular, DPDK itself is initialized from CPU 3.

When the control threads (eal-intr-thread, rte_mp_handle) are created, they are initialized from CPU3 - so inherit the cpuset containing just this CPU. When calling __rte_thread_init(), ctrl_thread_init() passes the result of rte_lcore_id() - but this is not yet initialized for this thread - so is set to 0.

This means that internally, the lcore_id for the control-thread is set to 0 - and 
in particular, the call to thread_update_affinity() overwrites the cpuset for lcore_id=0 with the cpuset of CPU3:

memmove(&lcore_config[lcore_id].cpuset, cpusetp,
			sizeof(rte_cpuset_t));

This all occurs before the main __rte_thread_init() call for each Slave thread - so that the slave thread associated with lcore_id, which should be running on CPU0, instead has its affinity incorrectly set to CPU3.

RTE logs are attached showing this behavior (and including some additional logs added locally to print the lcore-id and cpusets being passed).

The fix for this should be to make ctrl_thread_init() more similar to rte_thread_register(), so that it calls eal_lcore_non_eal_allocate() to assign an lcore-id, then passes this to __rte_thread_init(). I have tested a fix for this locally to confirm.
Comment 1 Lawrence Troup 2022-12-01 10:43:21 CET
Created attachment 234 [details]
Proposed fix for issue

The attached diff fixes the issue, by allocating a non-EAL lcore-id for control threads to use.

Note You need to log in before you can comment on or make changes to this bug.