Overview: Attempt to allocate a new mlx5 MR entry when global Btree cache is full ends up calling mlx5_malloc with the EXTERNAL_HEAP_MIN_SOCKET_ID socket ID, given that no external-memory heap has been created. Instead the rte_extmem_* API is used. Steps to reproduce: - start a primary DPDK process, on a NIC compatible with mlx5_core driver; - use rte_extmem_register() to register >512 pages of 4KB; - use rte_dev_dma_map() to dma-map each page; - rte_eth_tx_burst() an mbuf with an external buffer from the last page of the registered memory (or a page above index 512). (a virtual address that will not be found in the global Btree cache); Actual results: mlx5_malloc in mlx5_mr_create_primary() fails with "Unable to allocate memory for a new MR". From this point forward, packets never reach the other end. Expected results: MR entries should be successfully retrieved from backup or created when cache becomes full; calling mlx5_malloc() on external heap socket should not be possible when rte_extmem_* API is used. As it is stated in the DPDK documentation[1], "Memory added this way will not be available for any regular DPDK allocators". Build Date & Hardware: 20 Jun 2023 on Debian GNU/Linux 4.18.0 [1]: https://doc.dpdk.org/guides-21.11/prog_guide/env_abstraction_layer.html
This was fixed b this patch: https://git.dpdk.org/dpdk/commit/?h=releases&id=147f6fb42bd7637b37a9180b0774275531c05f9b could you kindly confirm?
https://git.dpdk.org/dpdk/commit/?h=releases&id=147f6fb42bd7637b37a9180b0774275531c05f9b
Hi, Unfortunately that patch only targets a memory socket issue with the ASO mechanism. However, in my setup ASO is never an issue - I actually do not believe it is enabled. To give a little more insight, the problem I am describing manifests on the data path: - rte_eth_tx_burst(); - mlx5_tx_burst_*() is called; - at some later point, in mr_lookup_caches(), mr_btree_lookup() returns UINT32_MAX because all 256 entries in the cache have been occupied and last memory registration did not catch an empty slot; - when mr_lookup_caches() fails, mlx5_mr_create() -> mlx5_mr_create_primary() is called; - mlx5_malloc() at line 723 fails because it is called with an inappropriate socket ID (the socket ID of the memseg list associated with an external buffer (prior with rte_extmem_register()), EXTERNAL_HEAP_MIN_SOCKET_ID, which does not actually have a valid heap associated, from which memory could be allocated.