[dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file

Yuanhan Liu yuanhan.liu at linux.intel.com
Mon Mar 7 14:13:22 CET 2016


CC'ed EAL hugepage maintainer, which is something you should do when
send a patch.

On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote:
> Originally, there're two cons in using hugepage: a. needs root
> privilege to touch /proc/self/pagemap, which is a premise to
> alllocate physically contiguous memseg; b. possibly too many
> hugepage file are created, especially used with 2M hugepage.
> 
> For virtual devices, they don't care about physical-contiguity
> of allocated hugepages at all. Option --single-file is to
> provide a way to allocate all hugepages into single mem-backed
> file.
> 
> Known issue:
> a. single-file option relys on kernel to allocate numa-affinitive
> memory.
> b. possible ABI break, originally, --no-huge uses anonymous memory
> instead of file-backed way to create memory.
> 
> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
...
> @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config *internal_cfg)
>  			"be specified together with --"OPT_NO_HUGE"\n");
>  		return -1;
>  	}
> +	if (internal_cfg->single_file && internal_cfg->force_sockets == 1) {
> +		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot "
> +			"be specified together with --"OPT_SOCKET_MEM"\n");
> +		return -1;
> +	}
> +	if (internal_cfg->single_file && internal_cfg->hugepage_unlink) {
> +		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
> +			"be specified together with --"OPT_SINGLE_FILE"\n");
> +		return -1;
> +	}

The two limitation doesn't make sense to me.

> diff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c b/lib/librte_eal/linuxapp/eal/eal_memory.c
> index 6008533..68ef49a 100644
> --- a/lib/librte_eal/linuxapp/eal/eal_memory.c
> +++ b/lib/librte_eal/linuxapp/eal/eal_memory.c
> @@ -1102,20 +1102,54 @@ rte_eal_hugepage_init(void)
>  	/* get pointer to global configuration */
>  	mcfg = rte_eal_get_configuration()->mem_config;
>  
> -	/* hugetlbfs can be disabled */
> -	if (internal_config.no_hugetlbfs) {
> -		addr = mmap(NULL, internal_config.memory, PROT_READ | PROT_WRITE,
> -				MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
> +	/* when hugetlbfs is disabled or single-file option is specified */
> +	if (internal_config.no_hugetlbfs || internal_config.single_file) {
> +		int fd;
> +		uint64_t pagesize;
> +		unsigned socket_id = rte_socket_id();
> +		char filepath[MAX_HUGEPAGE_PATH];
> +
> +		if (internal_config.no_hugetlbfs) {
> +			eal_get_hugefile_path(filepath, sizeof(filepath),
> +					      "/dev/shm", 0);
> +			pagesize = RTE_PGSIZE_4K;
> +		} else {
> +			struct hugepage_info *hpi;
> +
> +			hpi = &internal_config.hugepage_info[0];
> +			eal_get_hugefile_path(filepath, sizeof(filepath),
> +					      hpi->hugedir, 0);
> +			pagesize = hpi->hugepage_sz;
> +		}
> +		fd = open(filepath, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
> +		if (fd < 0) {
> +			RTE_LOG(ERR, EAL, "%s: open %s failed: %s\n",
> +				__func__, filepath, strerror(errno));
> +			return -1;
> +		}
> +
> +		if (ftruncate(fd, internal_config.memory) < 0) {
> +			RTE_LOG(ERR, EAL, "ftuncate %s failed: %s\n",
> +				filepath, strerror(errno));
> +			return -1;
> +		}
> +
> +		addr = mmap(NULL, internal_config.memory,
> +			    PROT_READ | PROT_WRITE,
> +			    MAP_SHARED | MAP_POPULATE, fd, 0);
>  		if (addr == MAP_FAILED) {
> -			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n", __func__,
> -					strerror(errno));
> +			RTE_LOG(ERR, EAL, "%s: mmap() failed: %s\n",
> +				__func__, strerror(errno));
>  			return -1;
>  		}
>  		mcfg->memseg[0].phys_addr = (phys_addr_t)(uintptr_t)addr;
>  		mcfg->memseg[0].addr = addr;
> -		mcfg->memseg[0].hugepage_sz = RTE_PGSIZE_4K;
> +		mcfg->memseg[0].hugepage_sz = pagesize;
>  		mcfg->memseg[0].len = internal_config.memory;
> -		mcfg->memseg[0].socket_id = 0;
> +		mcfg->memseg[0].socket_id = socket_id;

I saw quite few issues:

- Assume I have a system with two hugepage sizes: 1G (x4) and 2M (x512),
  mounted at /dev/hugepages and /mnt, respectively.

  Here we then got an 5G internal_config.memory, and your code will
  try to mmap 5G on the first mount point (/dev/hugepages) due to the
  hardcode logic in your code:

      hpi = &internal_config.hugepage_info[0];
      eal_get_hugefile_path(filepath, sizeof(filepath),
      		      hpi->hugedir, 0);

  But it has 4G in total, therefore, it will fails.

- As you stated, socket_id is hardcoded, which could be wrong.

- As stated in above, the option limitation doesn't seem right to me.

  I mean, --single-file should be able to work with --socket-mem option
  in semantic.


And I have been thinking how to deal with those issues properly, and a
__very immature__ solution come to my mind (which could be simply not
working), but anyway, here is FYI: we go through the same process to
handle normal huge page initilization to --single-file option as well.
But we take different actions or no actions at all at some stages when
that option is given, which is a bit similiar with the way of handling
RTE_EAL_SINGLE_FILE_SEGMENTS.

And we create one hugepage file for each node, each page size. For a
system like mine above (2 nodes), it may populate files like following:

- 1G x 2 on node0
- 1G x 2 on node1
- 2M x 256 on node0
- 2M x 256 on node1

That could normally fit your case. Though 4 nodes looks like the maximum
node number, --socket-mem option may relieve the limit a bit.

And if we "could" not care the socket_id being set correctly, we could
just simply allocate one file for each hugepage size. That would work
well for your container enabling.

BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding
another option --single-file looks really confusing to me.

To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add
another option, say --no-sort (I confess this name sucks, but you get
my point). With that, we could make sure to create as least huge page
files as possible, to fit your case.

	--yliu


More information about the dev mailing list