[dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file
Yuanhan Liu
yuanhan.liu at linux.intel.com
Mon Mar 7 14:13:22 CET 2016
CC'ed EAL hugepage maintainer, which is something you should do when
send a patch.
On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote:
> Originally, there're two cons in using hugepage: a. needs root
> privilege to touch /proc/self/pagemap, which is a premise to
> alllocate physically contiguous memseg; b. possibly too many
> hugepage file are created, especially used with 2M hugepage.
>
> For virtual devices, they don't care about physical-contiguity
> of allocated hugepages at all. Option --single-file is to
> provide a way to allocate all hugepages into single mem-backed
> file.
>
> Known issue:
> a. single-file option relys on kernel to allocate numa-affinitive
> memory.
> b. possible ABI break, originally, --no-huge uses anonymous memory
> instead of file-backed way to create memory.
>
> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
...
> @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config *internal_cfg)
> "be specified together with --"OPT_NO_HUGE"\n");
> return -1;
> }
> + if (internal_cfg->single_file && internal_cfg->force_sockets == 1) {
> + RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot "
> + "be specified together with --"OPT_SOCKET_MEM"\n");
> + return -1;
> + }
> + if (internal_cfg->single_file && internal_cfg->hugepage_unlink) {
> + RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
> + "be specified together with --"OPT_SINGLE_FILE"\n");
> + return -1;
> + }
The two limitation doesn't make sense to me.
> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 6008533..68ef49a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1102,20 +1102,54 @@ rte_eal_hugepage_init(void)
> /* get pointer to global configuration */
> mcfg = rte_eal_get_configuration()->mem_config;
>
> - /* hugetlbfs can be disabled */
> - if (internal_config.no_hugetlbfs) {
> - addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> - MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> + /* when hugetlbfs is disabled or single-file option is specified */
> + if (internal_config.no_hugetlbfs || internal_config.single_file) {
> + int fd;
> + uint64_t pagesize;
> + unsigned socket_id = rte_socket_id();
> + char filepath[MAX_HUGEPAGE_PATH];
> +
> + if (internal_config.no_hugetlbfs) {
> + eal_get_hugefile_path(filepath, sizeof(filepath),
> + "/dev/shm", 0);
> + pagesize = RTE_PGSIZE_4K;
> + } else {
> + struct hugepage_info *hpi;
> +
> + hpi = &internal_config.hugepage_info[0];
> + eal_get_hugefile_path(filepath, sizeof(filepath),
> + hpi->hugedir, 0);
> + pagesize = hpi->hugepage_sz;
> + }
> + fd = open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
> + if (fd < 0) {
> + RTE_LOG(ERR, EAL, "%s: open %s failed: %s\n",
> + __func__, filepath, strerror(errno));
> + return -1;
> + }
> +
> + if (ftruncate(fd, internal_config.memory) < 0) {
> + RTE_LOG(ERR, EAL, "ftuncate %s failed: %s\n",
> + filepath, strerror(errno));
> + return -1;
> + }
> +
> + addr = mmap(NULL, internal_config.memory,
> + PROT_READ | PROT_WRITE,
> + MAP_SHARED | MAP_POPULATE, fd, 0);
> if (addr == MAP_FAILED) {
> - RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> - strerror(errno));
> + RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n",
> + __func__, strerror(errno));
> return -1;
> }
> mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
> mcfg->memseg[0].addr = addr;
> - mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
> + mcfg->memseg[0].hugepage_sz = pagesize;
> mcfg->memseg[0].len = internal_config.memory;
> - mcfg->memseg[0].socket_id = 0;
> + mcfg->memseg[0].socket_id = socket_id;
I saw quite few issues:
- Assume I have a system with two hugepage sizes: 1G (x4) and 2M (x512),
mounted at /dev/hugepages and /mnt, respectively.
Here we then got an 5G internal_config.memory, and your code will
try to mmap 5G on the first mount point (/dev/hugepages) due to the
hardcode logic in your code:
hpi = &internal_config.hugepage_info[0];
eal_get_hugefile_path(filepath, sizeof(filepath),
hpi->hugedir, 0);
But it has 4G in total, therefore, it will fails.
- As you stated, socket_id is hardcoded, which could be wrong.
- As stated in above, the option limitation doesn't seem right to me.
I mean, --single-file should be able to work with --socket-mem option
in semantic.
And I have been thinking how to deal with those issues properly, and a
__very immature__ solution come to my mind (which could be simply not
working), but anyway, here is FYI: we go through the same process to
handle normal huge page initilization to --single-file option as well.
But we take different actions or no actions at all at some stages when
that option is given, which is a bit similiar with the way of handling
RTE_EAL_SINGLE_FILE_SEGMENTS.
And we create one hugepage file for each node, each page size. For a
system like mine above (2 nodes), it may populate files like following:
- 1G x 2 on node0
- 1G x 2 on node1
- 2M x 256 on node0
- 2M x 256 on node1
That could normally fit your case. Though 4 nodes looks like the maximum
node number, --socket-mem option may relieve the limit a bit.
And if we "could" not care the socket_id being set correctly, we could
just simply allocate one file for each hugepage size. That would work
well for your container enabling.
BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding
another option --single-file looks really confusing to me.
To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add
another option, say --no-sort (I confess this name sucks, but you get
my point). With that, we could make sure to create as least huge page
files as possible, to fit your case.
--yliu
More information about the dev
mailing list