[dpdk-dev] [PATCH v1 0/6] mempool: add bucket driver
Olivier Matz
olivier.matz at 6wind.com
Thu Apr 19 18:41:10 CEST 2018
Hi Andrew,
Sorry for the late feedback, few comments below.
On Mon, Mar 26, 2018 at 05:12:53PM +0100, Andrew Rybchenko wrote:
> The initial patch series [1] (RFCv1 is [2]) is split into two to simplify
> processing. It is the second part which relies on the first one [3].
>
> It should be applied on top of [4] and [3].
>
> The patch series adds bucket mempool driver which allows to allocate
> (both physically and virtually) contiguous blocks of objects and adds
> mempool API to do it. It is still capable to provide separate objects,
> but it is definitely more heavy-weight than ring/stack drivers.
> The driver will be used by the future Solarflare driver enhancements
> which allow to utilize physical contiguous blocks in the NIC firmware.
>
> The target usecase is dequeue in blocks and enqueue separate objects
> back (which are collected in buckets to be dequeued). So, the memory
> pool with bucket driver is created by an application and provided to
> networking PMD receive queue. The choice of bucket driver is done using
> rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous
> block allocation should report the bucket driver as the only supported
> and preferred one.
>
> Introduction of the contiguous block dequeue operation is proven by
> performance measurements using autotest with minor enhancements:
> - in the original test bulks are powers of two, which is unacceptable
> for us, so they are changed to multiple of contig_block_size;
> - the test code is duplicated to support plain dequeue and
> dequeue_contig_blocks;
> - all the extra test variations (with/without cache etc) are eliminated;
> - a fake read from the dequeued buffer is added (in both cases) to
> simulate mbufs access.
>
> start performance test for bucket (without cache)
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 111935488
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 115290931
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 353055539
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 353330790
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 224657407
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 230411468
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 706700902
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 703673139
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Srate_persec= 425236887
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Srate_persec= 437295512
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Srate_persec= 1343409356
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Srate_persec= 1336567397
> start performance test for bucket (without cache + contiguous dequeue)
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 122945536
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 126458265
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 374262988
> mempool_autotest cache= 0 cores= 1 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 377316966
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 244842496
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 251618917
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 751226060
> mempool_autotest cache= 0 cores= 2 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 756233010
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 30 Crate_persec= 462068120
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 1 n_keep= 60 Crate_persec= 476997221
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 30 Crate_persec= 1432171313
> mempool_autotest cache= 0 cores= 4 n_get_bulk= 15 n_put_bulk= 15 n_keep= 60 Crate_persec= 1438829771
>
> The number of objects in the contiguous block is a function of bucket
> memory size (.config option) and total element size. In the future
> additional API with possibility to pass parameters on mempool allocation
> may be added.
>
> It breaks ABI since changes rte_mempool_ops. The ABI version is already
> bumped in [4].
>
>
> [1] https://dpdk.org/ml/archives/dev/2018-January/088698.html
> [2] https://dpdk.org/ml/archives/dev/2017-November/082335.html
> [3] https://dpdk.org/ml/archives/dev/2018-March/093807.html
> [4] https://dpdk.org/ml/archives/dev/2018-March/093196.html
As discussed privately, at first glance I was a bit reticent to
introduce a new API in mempool that will only be available in one
mempool driver.
There have been the same debate for several features of ethdev: should
we provide a generic API for a feature available in only one driver? Or
should the driver provide its own API?
Given that the mempool driver API is not that big currently, and that it
can bring a performance enhancement (which is the primary goal of DPDK),
I think we can give a chance to this patchset. Drivers that want to use
this new bucket driver can take advantage of the new API, keeping a
fallback mode to still be working with other mempool drivers.
The bet is:
- drivers and aplication try the bucket driver and its new API, and
they notice a performance increase
- the get_contig_block API is implemented in some other drivers, if
possible (not easy for default one at least)
- the bucket driver could become the default driver, if the performance
increase is significant and wide.
By integrating this patchset, I hope we can also have some feedback
about the performance of this driver in other situations. My worries are
about pipeline+multicore use-cases, where it may add some pressure
(races conditions?) on the bucket header.
Finally, I think (as discussed privately) that the tests should be
updated to be able to reproduce the tests in this cover letter. I just
did a quick test by replacing "stack" by "bucket" in autotest (see patch
below) and it fails in populate(). I did not investigate more, maybe the
parameters are not correct for bucket.
--- a/test/test/test_mempool.c
+++ b/test/test/test_mempool.c
@@ -498,7 +498,7 @@ test_mempool(void)
printf("cannot allocate mp_stack mempool\n");
goto err;
}
- if (rte_mempool_set_ops_byname(mp_stack, "stack", NULL) < 0) {
+ if (rte_mempool_set_ops_byname(mp_stack, "bucket", NULL) < 0) {
printf("cannot set stack handler\n");
goto err;
}
Thank you for this work, hope we'll be on time for rc1.
Olivier
More information about the dev
mailing list