[dpdk-dev,3/4] mempool: introduce block size align flag

Message ID 20170621173248.1313-4-santosh.shukla@caviumnetworks.com (mailing list archive)
State Changes Requested, archived
Delegated to: Thomas Monjalon
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation fail Compilation issues

Commit Message

Santosh Shukla June 21, 2017, 5:32 p.m. UTC
  Some mempool hw like octeontx/fpa block, demands block size aligned
buffer address.

Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
If this flag is set:
1) adjust 'off' value to block size aligned value.
2) Allocate one additional buffer. This buffer is used to make sure that
requested 'n' buffers get correctly populated to mempool.
Example:
	elem_sz = 2432 // total element size.
	n = 2111 // requested number of buffer.
	off = 2304 // new buf_offset value after step 1)
	vaddr = 0x0 // actual start address of pool
	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)

Since 'off' is a non-zero value so below condition would fail for the
block size align case.

(((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))

Which is incorrect behavior. Additional buffer will solve this
problem and correctly populate 'n' buffer to mempool for the aligned
mode.

Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
---
 lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
 lib/librte_mempool/rte_mempool.h |  1 +
 2 files changed, 17 insertions(+), 3 deletions(-)
  

Comments

Olivier Matz July 3, 2017, 4:37 p.m. UTC | #1
On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:
> Some mempool hw like octeontx/fpa block, demands block size aligned
> buffer address.
> 

What is the meaning of block size aligned?

Does it mean that the address has to be a multiple of total_elt_size?

Is this constraint on the virtual address only?


> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
> If this flag is set:
> 1) adjust 'off' value to block size aligned value.
> 2) Allocate one additional buffer. This buffer is used to make sure that
> requested 'n' buffers get correctly populated to mempool.
> Example:
> 	elem_sz = 2432 // total element size.
> 	n = 2111 // requested number of buffer.
> 	off = 2304 // new buf_offset value after step 1)
> 	vaddr = 0x0 // actual start address of pool
> 	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
> 
> Since 'off' is a non-zero value so below condition would fail for the
> block size align case.
> 
> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
> 
> Which is incorrect behavior. Additional buffer will solve this
> problem and correctly populate 'n' buffer to mempool for the aligned
> mode.

Sorry, but the example is not very clear.


> 
> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
>  lib/librte_mempool/rte_mempool.h |  1 +
>  2 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 7dec2f51d..2010857f0 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>  {
>  	unsigned total_elt_sz;
>  	unsigned i = 0;
> -	size_t off;
> +	size_t off, delta;
>  	struct rte_mempool_memhdr *memhdr;
>  	int ret;
>  
> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>  	memhdr->free_cb = free_cb;
>  	memhdr->opaque = opaque;
>  
> -	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
> +	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
> +		delta = (uintptr_t)vaddr % total_elt_sz;
> +		off = total_elt_sz - delta;
> +		/* Validate alignment */
> +		if (((uintptr_t)vaddr + off) % total_elt_sz) {
> +			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
> +			return -EINVAL;
> +		}
> +	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>  		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
>  	else
>  		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;

What is the purpose of this test? Can it fail?

Not sure having the delta variable is helpful. However, adding a
small comment like this could help:

	/* align object start address to a multiple of total_elt_sz */
	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
	
About style, please don't mix brackets and no-bracket blocks in the
same if/elseif/else.

> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>  	}
>  
>  	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> +
>  	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
> -		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
> +		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
> +			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
> +							pg_shift);
> +		else
> +			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>  
>  		ret = snprintf(mz_name, sizeof(mz_name),
>  			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);


One issue I see here is that this new flag breaks the function
rte_mempool_xmem_size(), which calculates the maximum amount of memory
required to store a given number of objects.

It also probably breaks rte_mempool_xmem_usage().

I don't have any good solution for now. A possibility is to change
the behavior of these functions for everyone, meaning that we will
always reserve more memory that really required. If this is done on
every memory chunk (struct rte_mempool_memhdr), it can eat a lot
of memory.

Another approach would be to change the API of this function to
pass the capability flags, or the mempool pointer... but there is
a problem because these functions are usually called before the
mempool is instanciated.


> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> index fd8722e69..99a20263d 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -267,6 +267,7 @@ struct rte_mempool {
>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
>  
>  /**
>   * @internal When debug is enabled, store some statistics.

Same comment than for patch 3: the explanation should really be clarified.
It's a hw specific limitation, which won't be obvious for the people that
will read that code, so we must document it as clear as possible.
  
Santosh Shukla July 5, 2017, 7:35 a.m. UTC | #2
Hi Olivier,

On Monday 03 July 2017 10:07 PM, Olivier Matz wrote:

> On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:
>> Some mempool hw like octeontx/fpa block, demands block size aligned
>> buffer address.
>>
> What is the meaning of block size aligned?

block size is total_elem_sz.

> Does it mean that the address has to be a multiple of total_elt_size?

yes.

> Is this constraint on the virtual address only?
>
both.

>> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
>> If this flag is set:
>> 1) adjust 'off' value to block size aligned value.
>> 2) Allocate one additional buffer. This buffer is used to make sure that
>> requested 'n' buffers get correctly populated to mempool.
>> Example:
>> 	elem_sz = 2432 // total element size.
>> 	n = 2111 // requested number of buffer.
>> 	off = 2304 // new buf_offset value after step 1)
>> 	vaddr = 0x0 // actual start address of pool
>> 	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
>>
>> Since 'off' is a non-zero value so below condition would fail for the
>> block size align case.
>>
>> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
>>
>> Which is incorrect behavior. Additional buffer will solve this
>> problem and correctly populate 'n' buffer to mempool for the aligned
>> mode.
> Sorry, but the example is not very clear.
>
which part?

I'll try to reword.

The problem statement is:
- We want start of buffer address aligned to block_sz aka total_elt_sz.

Proposed solution in this patch:
- Let's say that we get 'x' size of memory chunk from memzone.
- Ideally we start using buffer at address 0 to...(x-block_sz).
- Not necessarily first buffer address i.e. 0 is aligned to block_sz.
- So we derive offset value for block_sz alignment purpose i.e..'off' . 
- That 'off' makes sure that first va/pa address of buffer is blk_sz aligned.
- Calculating 'off' may end up sacrificing first buffer of pool. So total
number of buffer in pool is n-1, Which is incorrect behavior, Thats why
we add 1 addition buffer. We request memzone to allocate (n+1 * total_elt_sz) pool
area when F_BLK_SZ_ALIGNED flag is set.

>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>> ---
>>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
>>  lib/librte_mempool/rte_mempool.h |  1 +
>>  2 files changed, 17 insertions(+), 3 deletions(-)
>>
>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>> index 7dec2f51d..2010857f0 100644
>> --- a/lib/librte_mempool/rte_mempool.c
>> +++ b/lib/librte_mempool/rte_mempool.c
>> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>  {
>>  	unsigned total_elt_sz;
>>  	unsigned i = 0;
>> -	size_t off;
>> +	size_t off, delta;
>>  	struct rte_mempool_memhdr *memhdr;
>>  	int ret;
>>  
>> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>  	memhdr->free_cb = free_cb;
>>  	memhdr->opaque = opaque;
>>  
>> -	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>> +	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
>> +		delta = (uintptr_t)vaddr % total_elt_sz;
>> +		off = total_elt_sz - delta;
>> +		/* Validate alignment */
>> +		if (((uintptr_t)vaddr + off) % total_elt_sz) {
>> +			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
>> +			return -EINVAL;
>> +		}
>> +	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
>>  	else
>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
> What is the purpose of this test? Can it fail?

Purpose is to sanity check blk_sz alignment. No it won;t fail.
I thought better to keep sanity check but if you see no value
then will remove in v2?

> Not sure having the delta variable is helpful. However, adding a
> small comment like this could help:
>
> 	/* align object start address to a multiple of total_elt_sz */
> 	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
> 	
> About style, please don't mix brackets and no-bracket blocks in the
> same if/elseif/else.

ok, in v2. 

>> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>  	}
>>  
>>  	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>> +
>>  	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
>> -		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>> +		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
>> +			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
>> +							pg_shift);
>> +		else
>> +			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>  
>>  		ret = snprintf(mz_name, sizeof(mz_name),
>>  			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
>
> One issue I see here is that this new flag breaks the function
> rte_mempool_xmem_size(), which calculates the maximum amount of memory
> required to store a given number of objects.
>
> It also probably breaks rte_mempool_xmem_usage().
>
> I don't have any good solution for now. A possibility is to change
> the behavior of these functions for everyone, meaning that we will
> always reserve more memory that really required. If this is done on
> every memory chunk (struct rte_mempool_memhdr), it can eat a lot
> of memory.
>
> Another approach would be to change the API of this function to
> pass the capability flags, or the mempool pointer... but there is
> a problem because these functions are usually called before the
> mempool is instanciated.
>
Per my description on [1/4]. If we agree to call
_ops_get_capability() at very beginning i.e.. at _populate_default()
then 'mp->flag' has capability flag. and We could add one more argument
in _xmem_size( , flag)/_xmem_usage(, flag). 
- xmem_size / xmem_usage() to check for that capability bit in 'flag'.
- if set then increase 'elt_num' by num.

That way your approach 2) make sense to me and it will very well fit
in design. Won't waste memory like you mentioned in approach 1).

Does that make sense?

>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>> index fd8722e69..99a20263d 100644
>> --- a/lib/librte_mempool/rte_mempool.h
>> +++ b/lib/librte_mempool/rte_mempool.h
>> @@ -267,6 +267,7 @@ struct rte_mempool {
>>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
>>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
>> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
>>  
>>  /**
>>   * @internal When debug is enabled, store some statistics.
> Same comment than for patch 3: the explanation should really be clarified.
> It's a hw specific limitation, which won't be obvious for the people that
> will read that code, so we must document it as clear as possible.
>
I won't see this as HW limitation. As mentioned in [1/4], even application
can request for block alignment, right?

But I agree that I will reword comment.

Thanks.
  
Olivier Matz July 10, 2017, 1:15 p.m. UTC | #3
On Wed, 5 Jul 2017 13:05:57 +0530, santosh <santosh.shukla@caviumnetworks.com> wrote:
> Hi Olivier,
> 
> On Monday 03 July 2017 10:07 PM, Olivier Matz wrote:
> 
> > On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:  
> >> Some mempool hw like octeontx/fpa block, demands block size aligned
> >> buffer address.
> >>  
> > What is the meaning of block size aligned?  
> 
> block size is total_elem_sz.
> 
> > Does it mean that the address has to be a multiple of total_elt_size?  
> 
> yes.
> 
> > Is this constraint on the virtual address only?
> >  
> both.

You mean virtual address and physical address must be a multiple of
total_elt_size? How is it possible?

For instance, I have this on my dpdk instance:
Segment 0: phys:0x52c00000, len:4194304, virt:0x7fed26000000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 1: phys:0x53400000, len:163577856, virt:0x7fed1c200000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
Segment 2: phys:0x5d400000, len:20971520, virt:0x7fed1ac00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
...

I assume total_elt_size is 2176.

In segment 0, I need to use (virt = 0x7fed26000000 + 1536) to be
multiple of 2176. But the corresponding physical address (0x52c00000 + 1536)
is not multiple of 2176. 


Please clarify if only the virtual address has to be aligned.


> 
> >> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
> >> If this flag is set:
> >> 1) adjust 'off' value to block size aligned value.
> >> 2) Allocate one additional buffer. This buffer is used to make sure that
> >> requested 'n' buffers get correctly populated to mempool.
> >> Example:
> >> 	elem_sz = 2432 // total element size.
> >> 	n = 2111 // requested number of buffer.
> >> 	off = 2304 // new buf_offset value after step 1)
> >> 	vaddr = 0x0 // actual start address of pool
> >> 	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
> >>
> >> Since 'off' is a non-zero value so below condition would fail for the
> >> block size align case.
> >>
> >> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
> >>
> >> Which is incorrect behavior. Additional buffer will solve this
> >> problem and correctly populate 'n' buffer to mempool for the aligned
> >> mode.  
> > Sorry, but the example is not very clear.
> >  
> which part?
> 
> I'll try to reword.
> 
> The problem statement is:
> - We want start of buffer address aligned to block_sz aka total_elt_sz.
> 
> Proposed solution in this patch:
> - Let's say that we get 'x' size of memory chunk from memzone.
> - Ideally we start using buffer at address 0 to...(x-block_sz).
> - Not necessarily first buffer address i.e. 0 is aligned to block_sz.
> - So we derive offset value for block_sz alignment purpose i.e..'off' . 
> - That 'off' makes sure that first va/pa address of buffer is blk_sz aligned.
> - Calculating 'off' may end up sacrificing first buffer of pool. So total
> number of buffer in pool is n-1, Which is incorrect behavior, Thats why
> we add 1 addition buffer. We request memzone to allocate (n+1 * total_elt_sz) pool
> area when F_BLK_SZ_ALIGNED flag is set.
> 
> >> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
> >> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> >> ---
> >>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
> >>  lib/librte_mempool/rte_mempool.h |  1 +
> >>  2 files changed, 17 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> >> index 7dec2f51d..2010857f0 100644
> >> --- a/lib/librte_mempool/rte_mempool.c
> >> +++ b/lib/librte_mempool/rte_mempool.c
> >> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
> >>  {
> >>  	unsigned total_elt_sz;
> >>  	unsigned i = 0;
> >> -	size_t off;
> >> +	size_t off, delta;
> >>  	struct rte_mempool_memhdr *memhdr;
> >>  	int ret;
> >>  
> >> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
> >>  	memhdr->free_cb = free_cb;
> >>  	memhdr->opaque = opaque;
> >>  
> >> -	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
> >> +	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
> >> +		delta = (uintptr_t)vaddr % total_elt_sz;
> >> +		off = total_elt_sz - delta;
> >> +		/* Validate alignment */
> >> +		if (((uintptr_t)vaddr + off) % total_elt_sz) {
> >> +			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
> >> +			return -EINVAL;
> >> +		}
> >> +	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
> >>  		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
> >>  	else
> >>  		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;  
> > What is the purpose of this test? Can it fail?  
> 
> Purpose is to sanity check blk_sz alignment. No it won;t fail.
> I thought better to keep sanity check but if you see no value
> then will remove in v2?

yes please


> > Not sure having the delta variable is helpful. However, adding a
> > small comment like this could help:
> >
> > 	/* align object start address to a multiple of total_elt_sz */
> > 	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
> > 	
> > About style, please don't mix brackets and no-bracket blocks in the
> > same if/elseif/else.  
> 
> ok, in v2. 
> 
> >> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
> >>  	}
> >>  
> >>  	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
> >> +
> >>  	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
> >> -		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
> >> +		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
> >> +			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
> >> +							pg_shift);
> >> +		else
> >> +			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
> >>  
> >>  		ret = snprintf(mz_name, sizeof(mz_name),
> >>  			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);  
> >
> > One issue I see here is that this new flag breaks the function
> > rte_mempool_xmem_size(), which calculates the maximum amount of memory
> > required to store a given number of objects.
> >
> > It also probably breaks rte_mempool_xmem_usage().
> >
> > I don't have any good solution for now. A possibility is to change
> > the behavior of these functions for everyone, meaning that we will
> > always reserve more memory that really required. If this is done on
> > every memory chunk (struct rte_mempool_memhdr), it can eat a lot
> > of memory.
> >
> > Another approach would be to change the API of this function to
> > pass the capability flags, or the mempool pointer... but there is
> > a problem because these functions are usually called before the
> > mempool is instanciated.
> >  
> Per my description on [1/4]. If we agree to call
> _ops_get_capability() at very beginning i.e.. at _populate_default()
> then 'mp->flag' has capability flag. and We could add one more argument
> in _xmem_size( , flag)/_xmem_usage(, flag). 
> - xmem_size / xmem_usage() to check for that capability bit in 'flag'. 
> - if set then increase 'elt_num' by num.
> 
> That way your approach 2) make sense to me and it will very well fit
> in design. Won't waste memory like you mentioned in approach 1).
> 
> Does that make sense?

The use case of rte_mempool_xmem_size()/rte_mempool_xmem_usage()
is to determine how much memory is needed to instanciate a mempool:

  sz = rte_mempool_xmem_size(...);
  ptr = allocate(sz);
  paddr_table = get_phys_map(ptr);
  mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);

If we want to transform the code to use another mempool_ops, it is
not possible:

  mp = rte_mempool_create_empty();
  rte_mempool_set_ops_byname(mp, "my-handler");
  sz = rte_mempool_xmem_size(...);   /* <<< the mp pointer is not passed */
  ptr = allocate(sz);
  paddr_table = get_phys_map(ptr);
  mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);

So, yes, this approach would work but it needs to change the API.
I think it is possible to keep a compat with previous versions.

 
> >> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
> >> index fd8722e69..99a20263d 100644
> >> --- a/lib/librte_mempool/rte_mempool.h
> >> +++ b/lib/librte_mempool/rte_mempool.h
> >> @@ -267,6 +267,7 @@ struct rte_mempool {
> >>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
> >>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
> >>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
> >> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
> >>  
> >>  /**
> >>   * @internal When debug is enabled, store some statistics.  
> > Same comment than for patch 3: the explanation should really be clarified.
> > It's a hw specific limitation, which won't be obvious for the people that
> > will read that code, so we must document it as clear as possible.
> >  
> I won't see this as HW limitation. As mentioned in [1/4], even application
> can request for block alignment, right?

What would be the reason for an application would request this block aligment?
The total size of the element is usually not known by the application,
because the mempool adds its header and footer.
  
Santosh Shukla July 10, 2017, 4:22 p.m. UTC | #4
On Monday 10 July 2017 06:45 PM, Olivier Matz wrote:

> On Wed, 5 Jul 2017 13:05:57 +0530, santosh <santosh.shukla@caviumnetworks.com> wrote:
>> Hi Olivier,
>>
>> On Monday 03 July 2017 10:07 PM, Olivier Matz wrote:
>>
>>> On Wed, 21 Jun 2017 17:32:47 +0000, Santosh Shukla <santosh.shukla@caviumnetworks.com> wrote:  
>>>> Some mempool hw like octeontx/fpa block, demands block size aligned
>>>> buffer address.
>>>>  
>>> What is the meaning of block size aligned?  
>> block size is total_elem_sz.
>>
>>> Does it mean that the address has to be a multiple of total_elt_size?  
>> yes.
>>
>>> Is this constraint on the virtual address only?
>>>  
>> both.
> You mean virtual address and physical address must be a multiple of
> total_elt_size? How is it possible?
>
> For instance, I have this on my dpdk instance:
> Segment 0: phys:0x52c00000, len:4194304, virt:0x7fed26000000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 1: phys:0x53400000, len:163577856, virt:0x7fed1c200000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
> Segment 2: phys:0x5d400000, len:20971520, virt:0x7fed1ac00000, socket_id:0, hugepage_sz:2097152, nchannel:0, nrank:0
> ...
>
> I assume total_elt_size is 2176.
>
> In segment 0, I need to use (virt = 0x7fed26000000 + 1536) to be
> multiple of 2176. But the corresponding physical address (0x52c00000 + 1536)
> is not multiple of 2176. 
>
>
> Please clarify if only the virtual address has to be aligned.

Yes. Its a virtual address.

>
>>>> Introducing an MEMPOOL_F_POOL_BLK_SZ_ALIGNED flag.
>>>> If this flag is set:
>>>> 1) adjust 'off' value to block size aligned value.
>>>> 2) Allocate one additional buffer. This buffer is used to make sure that
>>>> requested 'n' buffers get correctly populated to mempool.
>>>> Example:
>>>> 	elem_sz = 2432 // total element size.
>>>> 	n = 2111 // requested number of buffer.
>>>> 	off = 2304 // new buf_offset value after step 1)
>>>> 	vaddr = 0x0 // actual start address of pool
>>>> 	pool_len = 5133952 // total pool length i.e.. (elem_sz * n)
>>>>
>>>> Since 'off' is a non-zero value so below condition would fail for the
>>>> block size align case.
>>>>
>>>> (((vaddr + off) + (elem_sz * n)) <= (vaddr + pool_len))
>>>>
>>>> Which is incorrect behavior. Additional buffer will solve this
>>>> problem and correctly populate 'n' buffer to mempool for the aligned
>>>> mode.  
>>> Sorry, but the example is not very clear.
>>>  
>> which part?
>>
>> I'll try to reword.
>>
>> The problem statement is:
>> - We want start of buffer address aligned to block_sz aka total_elt_sz.
>>
>> Proposed solution in this patch:
>> - Let's say that we get 'x' size of memory chunk from memzone.
>> - Ideally we start using buffer at address 0 to...(x-block_sz).
>> - Not necessarily first buffer address i.e. 0 is aligned to block_sz.
>> - So we derive offset value for block_sz alignment purpose i.e..'off' . 
>> - That 'off' makes sure that first va/pa address of buffer is blk_sz aligned.
>> - Calculating 'off' may end up sacrificing first buffer of pool. So total
>> number of buffer in pool is n-1, Which is incorrect behavior, Thats why
>> we add 1 addition buffer. We request memzone to allocate (n+1 * total_elt_sz) pool
>> area when F_BLK_SZ_ALIGNED flag is set.
>>
>>>> Signed-off-by: Santosh Shukla <santosh.shukla@caviumnetworks.com>
>>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>> ---
>>>>  lib/librte_mempool/rte_mempool.c | 19 ++++++++++++++++---
>>>>  lib/librte_mempool/rte_mempool.h |  1 +
>>>>  2 files changed, 17 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
>>>> index 7dec2f51d..2010857f0 100644
>>>> --- a/lib/librte_mempool/rte_mempool.c
>>>> +++ b/lib/librte_mempool/rte_mempool.c
>>>> @@ -350,7 +350,7 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>>>  {
>>>>  	unsigned total_elt_sz;
>>>>  	unsigned i = 0;
>>>> -	size_t off;
>>>> +	size_t off, delta;
>>>>  	struct rte_mempool_memhdr *memhdr;
>>>>  	int ret;
>>>>  
>>>> @@ -387,7 +387,15 @@ rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
>>>>  	memhdr->free_cb = free_cb;
>>>>  	memhdr->opaque = opaque;
>>>>  
>>>> -	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>>> +	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
>>>> +		delta = (uintptr_t)vaddr % total_elt_sz;
>>>> +		off = total_elt_sz - delta;
>>>> +		/* Validate alignment */
>>>> +		if (((uintptr_t)vaddr + off) % total_elt_sz) {
>>>> +			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
>>>> +			return -EINVAL;
>>>> +		}
>>>> +	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
>>>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
>>>>  	else
>>>>  		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;  
>>> What is the purpose of this test? Can it fail?  
>> Purpose is to sanity check blk_sz alignment. No it won;t fail.
>> I thought better to keep sanity check but if you see no value
>> then will remove in v2?
> yes please
>
>
>>> Not sure having the delta variable is helpful. However, adding a
>>> small comment like this could help:
>>>
>>> 	/* align object start address to a multiple of total_elt_sz */
>>> 	off = total_elt_sz - ((uintptr_t)vaddr % total_elt_sz);
>>> 	
>>> About style, please don't mix brackets and no-bracket blocks in the
>>> same if/elseif/else.  
>> ok, in v2. 
>>
>>>> @@ -555,8 +563,13 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>>>  	}
>>>>  
>>>>  	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
>>>> +
>>>>  	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
>>>> -		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>>> +		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
>>>> +			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
>>>> +							pg_shift);
>>>> +		else
>>>> +			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
>>>>  
>>>>  		ret = snprintf(mz_name, sizeof(mz_name),
>>>>  			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);  
>>> One issue I see here is that this new flag breaks the function
>>> rte_mempool_xmem_size(), which calculates the maximum amount of memory
>>> required to store a given number of objects.
>>>
>>> It also probably breaks rte_mempool_xmem_usage().
>>>
>>> I don't have any good solution for now. A possibility is to change
>>> the behavior of these functions for everyone, meaning that we will
>>> always reserve more memory that really required. If this is done on
>>> every memory chunk (struct rte_mempool_memhdr), it can eat a lot
>>> of memory.
>>>
>>> Another approach would be to change the API of this function to
>>> pass the capability flags, or the mempool pointer... but there is
>>> a problem because these functions are usually called before the
>>> mempool is instanciated.
>>>  
>> Per my description on [1/4]. If we agree to call
>> _ops_get_capability() at very beginning i.e.. at _populate_default()
>> then 'mp->flag' has capability flag. and We could add one more argument
>> in _xmem_size( , flag)/_xmem_usage(, flag). 
>> - xmem_size / xmem_usage() to check for that capability bit in 'flag'. 
>> - if set then increase 'elt_num' by num.
>>
>> That way your approach 2) make sense to me and it will very well fit
>> in design. Won't waste memory like you mentioned in approach 1).
>>
>> Does that make sense?
> The use case of rte_mempool_xmem_size()/rte_mempool_xmem_usage()
> is to determine how much memory is needed to instanciate a mempool:
>
>   sz = rte_mempool_xmem_size(...);
>   ptr = allocate(sz);
>   paddr_table = get_phys_map(ptr);
>   mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);
>
> If we want to transform the code to use another mempool_ops, it is
> not possible:
>
>   mp = rte_mempool_create_empty();
>   rte_mempool_set_ops_byname(mp, "my-handler");
>   sz = rte_mempool_xmem_size(...);   /* <<< the mp pointer is not passed */
>   ptr = allocate(sz);
>   paddr_table = get_phys_map(ptr);
>   mp = rte_mempool_xmem_create(..., ptr, ..., paddr_table, ...);
>
> So, yes, this approach would work but it needs to change the API.
> I think it is possible to keep a compat with previous versions.
>
>  

Ok. I'm planning to send out deprecation notice for xmem_size/usage api.
Does that make sense to you?

>>>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>>>> index fd8722e69..99a20263d 100644
>>>> --- a/lib/librte_mempool/rte_mempool.h
>>>> +++ b/lib/librte_mempool/rte_mempool.h
>>>> @@ -267,6 +267,7 @@ struct rte_mempool {
>>>>  #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
>>>>  #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
>>>>  #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
>>>> +#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
>>>>  
>>>>  /**
>>>>   * @internal When debug is enabled, store some statistics.  
>>> Same comment than for patch 3: the explanation should really be clarified.
>>> It's a hw specific limitation, which won't be obvious for the people that
>>> will read that code, so we must document it as clear as possible.
>>>  
>> I won't see this as HW limitation. As mentioned in [1/4], even application
>> can request for block alignment, right?
> What would be the reason for an application would request this block aligment?
> The total size of the element is usually not known by the application,
> because the mempool adds its header and footer.
>
There were patches for custom alignment in past [1]. So its not new
initiative. 
Application don't have to know the internal block_size (total_elem_sz), but
My point is - Application can very much request for start address aligned
to block_size.
Besides that, We have enough space in MEMPOOL_F_ area so keeping alignment_flag
is not a problem.

Thanks.
[1] http://dpdk.org/dev/patchwork/patch/23885/

[1] http://dpdk.org/dev/patchwork/patch/23885/
  

Patch

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 7dec2f51d..2010857f0 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -350,7 +350,7 @@  rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 {
 	unsigned total_elt_sz;
 	unsigned i = 0;
-	size_t off;
+	size_t off, delta;
 	struct rte_mempool_memhdr *memhdr;
 	int ret;
 
@@ -387,7 +387,15 @@  rte_mempool_populate_phys(struct rte_mempool *mp, char *vaddr,
 	memhdr->free_cb = free_cb;
 	memhdr->opaque = opaque;
 
-	if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
+	if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED) {
+		delta = (uintptr_t)vaddr % total_elt_sz;
+		off = total_elt_sz - delta;
+		/* Validate alignment */
+		if (((uintptr_t)vaddr + off) % total_elt_sz) {
+			RTE_LOG(ERR, MEMPOOL, "vaddr(%p) not aligned to total_elt_sz(%u)\n", (vaddr + off), total_elt_sz);
+			return -EINVAL;
+		}
+	} else if (mp->flags & MEMPOOL_F_NO_CACHE_ALIGN)
 		off = RTE_PTR_ALIGN_CEIL(vaddr, 8) - vaddr;
 	else
 		off = RTE_PTR_ALIGN_CEIL(vaddr, RTE_CACHE_LINE_SIZE) - vaddr;
@@ -555,8 +563,13 @@  rte_mempool_populate_default(struct rte_mempool *mp)
 	}
 
 	total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size;
+
 	for (mz_id = 0, n = mp->size; n > 0; mz_id++, n -= ret) {
-		size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
+		if (mp->flags & MEMPOOL_F_POOL_BLK_SZ_ALIGNED)
+			size = rte_mempool_xmem_size(n + 1, total_elt_sz,
+							pg_shift);
+		else
+			size = rte_mempool_xmem_size(n, total_elt_sz, pg_shift);
 
 		ret = snprintf(mz_name, sizeof(mz_name),
 			RTE_MEMPOOL_MZ_FORMAT "_%d", mp->name, mz_id);
diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
index fd8722e69..99a20263d 100644
--- a/lib/librte_mempool/rte_mempool.h
+++ b/lib/librte_mempool/rte_mempool.h
@@ -267,6 +267,7 @@  struct rte_mempool {
 #define MEMPOOL_F_POOL_CREATED   0x0010 /**< Internal: pool is created. */
 #define MEMPOOL_F_NO_PHYS_CONTIG 0x0020 /**< Don't need physically contiguous objs. */
 #define MEMPOOL_F_POOL_CONTIG    0x0040 /**< Detect physcially contiguous objs */
+#define MEMPOOL_F_POOL_BLK_SZ_ALIGNED 0x0080 /**< Align buffer address to block size*/
 
 /**
  * @internal When debug is enabled, store some statistics.