[dpdk-dev] [PATCH v2 23/41] mempool: add support for the new allocation methods

Burakov, Anatoly anatoly.burakov at intel.com
Fri Mar 23 12:25:20 CET 2018


On 20-Mar-18 11:35 AM, Shreyansh Jain wrote:
> Hello Anatoly,
> 
> On Wed, Mar 7, 2018 at 10:26 PM, Anatoly Burakov
> <anatoly.burakov at intel.com> wrote:
>> If a user has specified that the zone should have contiguous memory,
>> use the new _contig allocation API's instead of normal ones.
>> Otherwise, account for the fact that unless we're in IOVA_AS_VA
>> mode, we cannot guarantee that the pages would be physically
>> contiguous, so we calculate the memzone size and alignments as if
>> we were getting the smallest page size available.
>>
>> Signed-off-by: Anatoly Burakov <anatoly.burakov at intel.com>
>> ---
> 
> [...]
> 
>>   static void
>>   mempool_add_elem(struct rte_mempool *mp, void *obj, rte_iova_t iova)
>>   {
>> @@ -549,6 +570,7 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>          unsigned mz_id, n;
>>          unsigned int mp_flags;
>>          int ret;
>> +       bool force_contig, no_contig;
>>
>>          /* mempool must not be populated */
>>          if (mp->nb_mem_chunks != 0)
>> @@ -563,10 +585,46 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>          /* update mempool capabilities */
>>          mp->flags |= mp_flags;
>>
>> -       if (rte_eal_has_hugepages()) {
>> -               pg_shift = 0; /* not needed, zone is physically contiguous */
>> +       no_contig = mp->flags & MEMPOOL_F_NO_PHYS_CONTIG;
>> +       force_contig = mp->flags & MEMPOOL_F_CAPA_PHYS_CONTIG;
>> +
>> +       /*
>> +        * there are several considerations for page size and page shift here.
>> +        *
>> +        * if we don't need our mempools to have physically contiguous objects,
>> +        * then just set page shift and page size to 0, because the user has
>> +        * indicated that there's no need to care about anything.
> 
> I think the above case is not handled properly here.
> reason below...
> 
>> +        *
>> +        * if we do need contiguous objects, there is also an option to reserve
>> +        * the entire mempool memory as one contiguous block of memory, in
>> +        * which case the page shift and alignment wouldn't matter as well.
>> +        *
>> +        * if we require contiguous objects, but not necessarily the entire
>> +        * mempool reserved space to be contiguous, then there are two options.
>> +        *
>> +        * if our IO addresses are virtual, not actual physical (IOVA as VA
>> +        * case), then no page shift needed - our memory allocation will give us
>> +        * contiguous physical memory as far as the hardware is concerned, so
>> +        * act as if we're getting contiguous memory.
>> +        *
>> +        * if our IO addresses are physical, we may get memory from bigger
>> +        * pages, or we might get memory from smaller pages, and how much of it
>> +        * we require depends on whether we want bigger or smaller pages.
>> +        * However, requesting each and every memory size is too much work, so
>> +        * what we'll do instead is walk through the page sizes available, pick
>> +        * the smallest one and set up page shift to match that one. We will be
>> +        * wasting some space this way, but it's much nicer than looping around
>> +        * trying to reserve each and every page size.
>> +        */
>> +
>> +       if (no_contig || force_contig || rte_eal_iova_mode() == RTE_IOVA_VA) {
>>                  pg_sz = 0;
>> +               pg_shift = 0;
>>                  align = RTE_CACHE_LINE_SIZE;
> 
> So, assuming dpaa2 as example, I ran testpmd. IOVA=VA is the mode.
> pg_sz = 0 is set.
> same as before applying the hotplug patchset except that earlier this
> decision was purely based on availability of hugepages
> (rte_eal_has_hugepages()).
> Moving on...
> 
>> +       } else if (rte_eal_has_hugepages()) {
>> +               pg_sz = get_min_page_size();
>> +               pg_shift = rte_bsf32(pg_sz);
>> +               align = pg_sz;
>>          } else {
>>                  pg_sz = getpagesize();
>>                  pg_shift = rte_bsf32(pg_sz);
>> @@ -585,23 +643,34 @@ rte_mempool_populate_default(struct rte_mempool *mp)
>>                          goto fail;
>>                  }
>>
>> -               mz = rte_memzone_reserve_aligned(mz_name, size,
>> -                       mp->socket_id, mz_flags, align);
>> -               /* not enough memory, retry with the biggest zone we have */
>> -               if (mz == NULL)
>> -                       mz = rte_memzone_reserve_aligned(mz_name, 0,
>> +               if (force_contig) {
>> +                       /*
>> +                        * if contiguous memory for entire mempool memory was
>> +                        * requested, don't try reserving again if we fail.
>> +                        */
>> +                       mz = rte_memzone_reserve_aligned_contig(mz_name, size,
>> +                               mp->socket_id, mz_flags, align);
>> +               } else {
>> +                       mz = rte_memzone_reserve_aligned(mz_name, size,
>>                                  mp->socket_id, mz_flags, align);
>> +                       /* not enough memory, retry with the biggest zone we
>> +                        * have
>> +                        */
>> +                       if (mz == NULL)
>> +                               mz = rte_memzone_reserve_aligned(mz_name, 0,
>> +                                       mp->socket_id, mz_flags, align);
>> +               }
>>                  if (mz == NULL) {
>>                          ret = -rte_errno;
>>                          goto fail;
>>                  }
>>
>> -               if (mp->flags & MEMPOOL_F_NO_PHYS_CONTIG)
>> +               if (no_contig)
>>                          iova = RTE_BAD_IOVA;
>>                  else
>>                          iova = mz->iova;
>>
>> -               if (rte_eal_has_hugepages())
>> +               if (rte_eal_has_hugepages() && force_contig)
> 
> So, pre-hotplugging patch, call used to enter mempool_populate_iova.
> But, with the 'force_contig' not set (in app/test-pmd/testpmd.c:521)
> while calling rte_pktmbuf_pool_create, rte_mempool_populate_va is
> called instead.
> 
>>                          ret = rte_mempool_populate_iova(mp, mz->addr,
>>                                  iova, mz->len,
>>                                  rte_mempool_memchunk_mz_free,
>> --
>> 2.7.4
> 
> This is called with pg_sz = 0:
> 678                 else
>>> # 679                   ret = rte_mempool_populate_virt(mp, mz->addr,
> 680                                 mz->len, pg_sz,
> 681                                 rte_mempool_memchunk_mz_free,
> 682                                 (void *)(uintptr_t)mz);
> 
> In this function,
> 
> 512         /* address and len must be page-aligned */
> 513         if (RTE_PTR_ALIGN_CEIL(addr, pg_sz) != addr)
> 514                 return -EINVAL;
> 
> This is where error is returned.
> 
> I don't think RTE_PTR_ALIGN_CEIL is designed to handle pg_sz = 0.
> 
> It is roughly equivalent to:
> RTE_PTR_ALIGN_FLOOR(((uintptr_t)addr - 1), pg_sz) which returns NULL
> (0 ~ pg_sz).
> 
> Basically, this ends up failing rte_mempool_populate_default.
> 
> I think the reason is the assumption that when
> rte_mempool_populate_virt is called, it can handle 0 page sizes (there
> would issues besides the above RTE_PTR_ALIGN_CEIL as well, like a
> for-loop looping over off+pg_sz), is wrong. It needs a valid page-size
> value to work with (!0).
> 
> So, basically, DPAA2 is stuck with this patch because of above issue,
> if I am correctly comprehending it as above.
> 
> Regards,
> Shreyansh
> 

Thanks for finding this issue. Fix is now pushed to github for testing.

-- 
Thanks,
Anatoly


More information about the dev mailing list