[dpdk-dev] [PATCH v2 1/5] mem: add --single-file to create single mem-backed file

Tan, Jianfeng jianfeng.tan at intel.com
Wed Mar 9 15:44:01 CET 2016


Hi,

On 3/8/2016 10:44 AM, Yuanhan Liu wrote:
> On Tue, Mar 08, 2016 at 09:55:10AM +0800, Tan, Jianfeng wrote:
>> Hi Yuanhan,
>>
>> On 3/7/2016 9:13 PM, Yuanhan Liu wrote:
>>> CC'ed EAL hugepage maintainer, which is something you should do when
>>> send a patch.
>> Thanks for doing this.
>>
>>> On Fri, Feb 05, 2016 at 07:20:24PM +0800, Jianfeng Tan wrote:
>>>> Originally, there're two cons in using hugepage: a. needs root
>>>> privilege to touch /proc/self/pagemap, which is a premise to
>>>> alllocate physically contiguous memseg; b. possibly too many
>>>> hugepage file are created, especially used with 2M hugepage.
>>>>
>>>> For virtual devices, they don't care about physical-contiguity
>>>> of allocated hugepages at all. Option --single-file is to
>>>> provide a way to allocate all hugepages into single mem-backed
>>>> file.
>>>>
>>>> Known issue:
>>>> a. single-file option relys on kernel to allocate numa-affinitive
>>>> memory.
>>>> b. possible ABI break, originally, --no-huge uses anonymous memory
>>>> instead of file-backed way to create memory.
>>>>
>>>> Signed-off-by: Huawei Xie <huawei.xie at intel.com>
>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com>
>>> ...
>>>> @@ -956,6 +961,16 @@ eal_check_common_options(struct internal_config *internal_cfg)
>>>>   			"be specified together with --"OPT_NO_HUGE"\n");
>>>>   		return -1;
>>>>   	}
>>>> +	if (internal_cfg->single_file && internal_cfg->force_sockets == 1) {
>>>> +		RTE_LOG(ERR, EAL, "Option --"OPT_SINGLE_FILE" cannot "
>>>> +			"be specified together with --"OPT_SOCKET_MEM"\n");
>>>> +		return -1;
>>>> +	}
>>>> +	if (internal_cfg->single_file && internal_cfg->hugepage_unlink) {
>>>> +		RTE_LOG(ERR, EAL, "Option --"OPT_HUGE_UNLINK" cannot "
>>>> +			"be specified together with --"OPT_SINGLE_FILE"\n");
>>>> +		return -1;
>>>> +	}
>>> The two limitation doesn't make sense to me.
>> For the force_sockets option, my original thought on --single-file option
>> is, we don't sort those pages (require root/cap_sys_admin) and even don't
>> look up numa information because it may contain both sockets' memory.
>>
>> For the hugepage_unlink option, those hugepage files get closed in the end
>> of memory initialization, if we even unlink those hugepage files, so we
>> cannot share those with other processes (say backend).
> Yeah, I know how the two limitations come, from your implementation. I
> was just wondering if they both are __truly__ the limitations. I mean,
> can we get rid of them somehow?
>
> For --socket-mem option, if we can't handle it well, or if we could
> ignore the socket_id for allocated huge page, yes, the limitation is
> a true one.

To make it work with --socket-mem option, we need to call 
mbind()/set_mempolicy(), which leads to including "LDFLAGS += -lnuma" a 
mandatory line in mk file. Don't know if it's  acceptable to bring in 
dependency on libnuma.so?


>
> But for the second option, no, we should be able to co-work it with
> well. One extra action is you should not invoke "close(fd)" for those
> huge page files. And then you can get all the informations as I stated
> in a reply to your 2nd patch.

As discussed yesterday, I think there's a open files limitation for each 
process, if we keep those FDs open, it will bring failure to those 
existing programs. If others treat it as a problem?
...
>>> BTW, since we already have SINGLE_FILE_SEGMENTS (config) option, adding
>>> another option --single-file looks really confusing to me.
>>>
>>> To me, maybe you could base the SINGLE_FILE_SEGMENTS option, and add
>>> another option, say --no-sort (I confess this name sucks, but you get
>>> my point). With that, we could make sure to create as least huge page
>>> files as possible, to fit your case.
>> This is a great advice. So how do you think of --converged, or
>> --no-scattered-mem, or any better idea?
> TBH, none of them looks great to me, either. But I have no better
> options. Well, --no-phys-continuity looks like the best option to
> me so far :)

I'd like to make it a little more concise, how about --no-phys-contig? 
In addition, Yuanhan thinks there's still no literal meaning that just 
create one file for each hugetlbfs (or socket). But from my side, 
there's an indirect meaning, because if no need to promise 
physically-contig, then no need to create hugepages one by one. Anyone 
can give your option here? Thanks.

Thanks,
Jianfeng


More information about the dev mailing list