[dpdk-dev] [PATCH 3/4] mempool: introduce block size align flag

santosh santosh.shukla at caviumnetworks.com
Wed Jul 5 09:35:57 CEST 2017


Hi Olivier,

On Monday 03 July 2017 10:07 PM, Olivier Matz wrote:

> On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla <santosh.shukla at caviumnetworks.com> wrote:
>> Some mempool hw like octeontx/fpa block, demands block size aligned
>> buffer address.
>>
> What is the meaning of block size aligned?

block size is total_elem_sz.

> Does it mean that the address has to be a multiple of total_elt_size?

yes.

> Is this constraint on the virtual address only?
>
both.

>> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
>> If this flag is set:
>> 1) adjust 'off' value to block size aligned value.
>> 2) Allocate one additional buffer. This buffer is used to make sure that
>> requested 'n' buffers get correctly populated to mempool.
>> Example:
>> 	elem_sz = 2432 // total element size.
>> 	n = 2111 // requested number of buffer.
>> 	off = 2304 // new buf_offset value after step 1)
>> 	vaddr = 0x0 // actual start address of pool
>> 	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
>>
>> Since 'off' is a non-zero value so below condition would fail for the
>> block size align case.
>>
>> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
>>
>> Which is incorrect behavior. Additional buffer will solve this
>> problem and correctly populate 'n' buffer to mempool for the aligned
>> mode.
> Sorry, but the example is not very clear.
>
which part?

I'll try to reword.

The problem statement is:
- We want start of buffer address aligned to block_sz aka total_elt_sz.

Proposed solution in this patch:
- Let's say that we get 'x' size of memory chunk from memzone.
- Ideally we start using buffer at address 0 to...(x-block_sz).
- Not necessarily first buffer address i.e. 0 is aligned to block_sz.
- So we derive offset value for block_sz alignment purpose i.e..'off' . 
- That 'off' makes sure that first va/pa address of buffer is blk_sz aligned.
- Calculating 'off' may end up sacrificing first buffer of pool. So total
number of buffer in pool is n-1, Which is incorrect behavior, Thats why
we add 1 addition buffer. We request memzone to allocate (n+1 * total_elt_sz) pool
area when F_BLK_SZ_ALIGNED flag is set.

>> Signed-off-by: Santosh Shukla <santosh.shukla at caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob at caviumnetworks.com>
>> ---
>>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
>>  lib/librte_mempool/rte_mempool.h |  1 +
>>  2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>> index 7dec2f51d..2010857f0 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>  {
>>  	unsigned total_elt_sz;
>>  	unsigned i = 0;
>> -	size_t off;
>> +	size_t off, delta;
>>  	struct rte_mempool_memhdr *memhdr;
>>  	int ret;
>>  
>> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>  	memhdr->free_cb = free_cb;
>>  	memhdr->opaque = opaque;
>>  
>> -	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>> +	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
>> +		delta = (uintptr_t)vaddr % total_elt_sz;
>> +		off = total_elt_sz - delta;
>> +		/* Validate alignment */
>> +		if (((uintptr_t)vaddr + off) % total_elt_sz) {
>> +			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
>> +			return -EINVAL;
>> +		}
>> +	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
>>  	else
>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
> What is the purpose of this test? Can it fail?

Purpose is to sanity check blk_sz alignment. No it won;t fail.
I thought better to keep sanity check but if you see no value
then will remove in v2?

> Not sure having the delta variable is helpful. However, adding a
> small comment like this could help:
>
> 	/* align object start address to a multiple of total_elt_sz */
> 	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
> 	
> About style, please don't mix brackets and no-bracket blocks in the
> same if/elseif/else.

ok, in v2. 

>> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>  	}
>>  
>>  	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>> +
>>  	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
>> -		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>> +		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
>> +			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
>> +							pg_shift);
>> +		else
>> +			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>  
>>  		ret = snprintf(mz_name, sizeof(mz_name),
>>  			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
>
> One issue I see here is that this new flag breaks the function
> rte_mempool_xmem_size(), which calculates the maximum amount of memory
> required to store a given number of objects.
>
> It also probably breaks rte_mempool_xmem_usage().
>
> I don't have any good solution for now. A possibility is to change
> the behavior of these functions for everyone, meaning that we will
> always reserve more memory that really required. If this is done on
> every memory chunk (struct rte_mempool_memhdr), it can eat a lot
> of memory.
>
> Another approach would be to change the API of this function to
> pass the capability flags, or the mempool pointer... but there is
> a problem because these functions are usually called before the
> mempool is instanciated.
>
Per my description on [1/4]. If we agree to call
_ops_get_capability() at very beginning i.e.. at _populate_default()
then 'mp->flag' has capability flag. and We could add one more argument
in _xmem_size( , flag)/_xmem_usage(, flag). 
- xmem_size / xmem_usage() to check for that capability bit in 'flag'.
- if set then increase 'elt_num' by num.

That way your approach 2) make sense to me and it will very well fit
in design. Won't waste memory like you mentioned in approach 1).

Does that make sense?

>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>> index fd8722e69..99a20263d 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -267,6 +267,7 @@ struct rte_mempool {
>>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
>>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
>> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
>>  
>>  /**
>>   * @internal When debug is enabled, store some statistics.
> Same comment than for patch 3: the explanation should really be clarified.
> It's a hw specific limitation, which won't be obvious for the people that
> will read that code, so we must document it as clear as possible.
>
I won't see this as HW limitation. As mentioned in [1/4], even application
can request for block alignment, right?

But I agree that I will reword comment.

Thanks.




More information about the dev mailing list