[dpdk-dev] vhost: fix iotlb pool out-of-memory handling

Message ID 20180125150653.32284-1-maxime.coquelin@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Yuanhan Liu
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation fail Compilation issues

Commit Message

Maxime Coquelin Jan. 25, 2018, 3:06 p.m. UTC
  In the unlikely case the IOTLB memory pool runs out of memory,
an issue may happen if all entries are used by the IOTLB cache,
and an IOTLB miss happen. If the iotlb pending list is empty,
then no memory is freed and allocation fails a second time.

This patch fixes this by doing an IOTLB cache random evict if
the IOTLB pending list is empty, ensuring the second allocation
try will succeed.

In the same spirit, the opposite is done when inserting an
IOTLB entry in the IOTLB cache fails due to out of memory. In
this case, the IOTLB pending is flushed if the IOTLB cache is
empty to ensure the new entry can be inserted.

Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
Fixes: f72c2ad63aeb ("vhost: add pending IOTLB miss request list and helpers")

Cc: stable@dpdk.org
Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/iotlb.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)
  

Comments

Tiwei Bie Jan. 26, 2018, 8:03 a.m. UTC | #1
On Thu, Jan 25, 2018 at 04:06:53PM +0100, Maxime Coquelin wrote:
> In the unlikely case the IOTLB memory pool runs out of memory,
> an issue may happen if all entries are used by the IOTLB cache,
> and an IOTLB miss happen. If the iotlb pending list is empty,
> then no memory is freed and allocation fails a second time.
> 
> This patch fixes this by doing an IOTLB cache random evict if
> the IOTLB pending list is empty, ensuring the second allocation
> try will succeed.
> 
> In the same spirit, the opposite is done when inserting an
> IOTLB entry in the IOTLB cache fails due to out of memory. In
> this case, the IOTLB pending is flushed if the IOTLB cache is
> empty to ensure the new entry can be inserted.
> 
> Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
> Fixes: f72c2ad63aeb ("vhost: add pending IOTLB miss request list and helpers")
> 
[...]
> @@ -95,9 +98,11 @@ vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq,
>  
>  	ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
>  	if (ret) {
> -		RTE_LOG(INFO, VHOST_CONFIG,
> -				"IOTLB pool empty, clear pending misses\n");
> -		vhost_user_iotlb_pending_remove_all(vq);
> +		RTE_LOG(INFO, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");
> +		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
> +			vhost_user_iotlb_pending_remove_all(vq);
> +		else
> +			vhost_user_iotlb_cache_random_evict(vq);
>  		ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
>  		if (ret) {
>  			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
> @@ -186,8 +191,11 @@ vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
>  
>  	ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
>  	if (ret) {
> -		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, evict one entry\n");
> -		vhost_user_iotlb_cache_random_evict(vq);
> +		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");

Maybe we should use the same log level for both cases?
Currently, the first one is INFO, and this one is DEBUG.

> +		if (!TAILQ_EMPTY(&vq->iotlb_list))
> +			vhost_user_iotlb_cache_random_evict(vq);
> +		else
> +			vhost_user_iotlb_pending_remove_all(vq);

The IOTLB entry will be invalidated explicitly if it's
unmapped in the frontend. So all the IOTLB entries in
IOTLB cache are supposed to be valid. So I think maybe
it's better to always prefer to free memory from IOTLB
pending list. Something like:

	if (TAILQ_EMPTY(&vq->iotlb_list))
		vhost_user_iotlb_pending_remove_all(vq);
	else if (TAILQ_EMPTY(&vq->iotlb_pending_list))
		vhost_user_iotlb_cache_random_evict(vq);
	else
		vhost_user_iotlb_pending_random_evict(vq);

Besides, in __vhost_iova_to_vva():

uint64_t
__vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
		    uint64_t iova, uint64_t size, uint8_t perm)
{
	uint64_t vva, tmp_size;

	if (unlikely(!size))
		return 0;

	tmp_size = size;

	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
	if (tmp_size == size)
		return vva;

	if (!vhost_user_iotlb_pending_miss(vq, iova + tmp_size, perm)) {
		/*
		 * iotlb_lock is read-locked for a full burst,
		 * but it only protects the iotlb cache.
		 * In case of IOTLB miss, we might block on the socket,
		 * which could cause a deadlock with QEMU if an IOTLB update
		 * is being handled. We can safely unlock here to avoid it.
		 */
		vhost_user_iotlb_rd_unlock(vq);

		vhost_user_iotlb_pending_insert(vq, iova + tmp_size, perm);
		vhost_user_iotlb_miss(dev, iova + tmp_size, perm);

The vhost_user_iotlb_miss() may fail. If it fails,
the inserted pending entry should be removed from
the pending list.

Best regards,
Tiwei Bie
  
Jens Freimann Jan. 26, 2018, 12:40 p.m. UTC | #2
On Thu, Jan 25, 2018 at 04:06:53PM +0100, Maxime Coquelin wrote:
>In the unlikely case the IOTLB memory pool runs out of memory,
>an issue may happen if all entries are used by the IOTLB cache,
>and an IOTLB miss happen. If the iotlb pending list is empty,
>then no memory is freed and allocation fails a second time.
>
>This patch fixes this by doing an IOTLB cache random evict if
>the IOTLB pending list is empty, ensuring the second allocation
>try will succeed.
>
>In the same spirit, the opposite is done when inserting an
>IOTLB entry in the IOTLB cache fails due to out of memory. In
>this case, the IOTLB pending is flushed if the IOTLB cache is
>empty to ensure the new entry can be inserted.
>
>Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
>Fixes: f72c2ad63aeb ("vhost: add pending IOTLB miss request list and helpers")
>
>Cc: stable@dpdk.org
>Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>---
> lib/librte_vhost/iotlb.c | 18 +++++++++++++-----
> 1 file changed, 13 insertions(+), 5 deletions(-)
>

Reviewed-by: Jens Freimann <jfreimann@redhat.com>

regards,
Jens
  
Maxime Coquelin Jan. 26, 2018, 3:23 p.m. UTC | #3
Hi Tiwei,

On 01/26/2018 09:03 AM, Tiwei Bie wrote:
> On Thu, Jan 25, 2018 at 04:06:53PM +0100, Maxime Coquelin wrote:
>> In the unlikely case the IOTLB memory pool runs out of memory,
>> an issue may happen if all entries are used by the IOTLB cache,
>> and an IOTLB miss happen. If the iotlb pending list is empty,
>> then no memory is freed and allocation fails a second time.
>>
>> This patch fixes this by doing an IOTLB cache random evict if
>> the IOTLB pending list is empty, ensuring the second allocation
>> try will succeed.
>>
>> In the same spirit, the opposite is done when inserting an
>> IOTLB entry in the IOTLB cache fails due to out of memory. In
>> this case, the IOTLB pending is flushed if the IOTLB cache is
>> empty to ensure the new entry can be inserted.
>>
>> Fixes: d012d1f293f4 ("vhost: add IOTLB helper functions")
>> Fixes: f72c2ad63aeb ("vhost: add pending IOTLB miss request list and helpers")
>>
> [...]
>> @@ -95,9 +98,11 @@ vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq,
>>   
>>   	ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
>>   	if (ret) {
>> -		RTE_LOG(INFO, VHOST_CONFIG,
>> -				"IOTLB pool empty, clear pending misses\n");
>> -		vhost_user_iotlb_pending_remove_all(vq);
>> +		RTE_LOG(INFO, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");
>> +		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
>> +			vhost_user_iotlb_pending_remove_all(vq);
>> +		else
>> +			vhost_user_iotlb_cache_random_evict(vq);
>>   		ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
>>   		if (ret) {
>>   			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
>> @@ -186,8 +191,11 @@ vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
>>   
>>   	ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
>>   	if (ret) {
>> -		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, evict one entry\n");
>> -		vhost_user_iotlb_cache_random_evict(vq);
>> +		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");
> 
> Maybe we should use the same log level for both cases?
> Currently, the first one is INFO, and this one is DEBUG.

Right, I will set DEBUG log level for both.

> 
>> +		if (!TAILQ_EMPTY(&vq->iotlb_list))
>> +			vhost_user_iotlb_cache_random_evict(vq);
>> +		else
>> +			vhost_user_iotlb_pending_remove_all(vq);
> 
> The IOTLB entry will be invalidated explicitly if it's
> unmapped in the frontend. So all the IOTLB entries in
> IOTLB cache are supposed to be valid. So I think maybe
> it's better to always prefer to free memory from IOTLB
> pending list. Something like:
> 
> 	if (TAILQ_EMPTY(&vq->iotlb_list))
> 		vhost_user_iotlb_pending_remove_all(vq);
> 	else if (TAILQ_EMPTY(&vq->iotlb_pending_list))
> 		vhost_user_iotlb_cache_random_evict(vq);
> 	else
> 		vhost_user_iotlb_pending_random_evict(vq);

I agree that all the IOTLB entries in the cache are supposed to
be valid. But removing in pending list first might not be a good idea,
as if the pending entry is valid, it means an IOTLB update is on its way
to the backend. So it could cause another MISS request to be sent for
the same address if the PMD thread retry the translation afterwards.

> Besides, in __vhost_iova_to_vva():
> 
> uint64_t
> __vhost_iova_to_vva(struct virtio_net *dev, struct vhost_virtqueue *vq,
> 		    uint64_t iova, uint64_t size, uint8_t perm)
> {
> 	uint64_t vva, tmp_size;
> 
> 	if (unlikely(!size))
> 		return 0;
> 
> 	tmp_size = size;
> 
> 	vva = vhost_user_iotlb_cache_find(vq, iova, &tmp_size, perm);
> 	if (tmp_size == size)
> 		return vva;
> 
> 	if (!vhost_user_iotlb_pending_miss(vq, iova + tmp_size, perm)) {
> 		/*
> 		 * iotlb_lock is read-locked for a full burst,
> 		 * but it only protects the iotlb cache.
> 		 * In case of IOTLB miss, we might block on the socket,
> 		 * which could cause a deadlock with QEMU if an IOTLB update
> 		 * is being handled. We can safely unlock here to avoid it.
> 		 */
> 		vhost_user_iotlb_rd_unlock(vq);
> 
> 		vhost_user_iotlb_pending_insert(vq, iova + tmp_size, perm);
> 		vhost_user_iotlb_miss(dev, iova + tmp_size, perm);
> 
> The vhost_user_iotlb_miss() may fail. If it fails,
> the inserted pending entry should be removed from
> the pending list.

Right, thanks for spotting this.
I will fix this in a separate patch.

Regards,
Maxime
> Best regards,
> Tiwei Bie
>
  

Patch

diff --git a/lib/librte_vhost/iotlb.c b/lib/librte_vhost/iotlb.c
index b74cc6a78..459894b93 100644
--- a/lib/librte_vhost/iotlb.c
+++ b/lib/librte_vhost/iotlb.c
@@ -50,6 +50,9 @@  struct vhost_iotlb_entry {
 
 #define IOTLB_CACHE_SIZE 2048
 
+static void
+vhost_user_iotlb_cache_random_evict(struct vhost_virtqueue *vq);
+
 static void
 vhost_user_iotlb_pending_remove_all(struct vhost_virtqueue *vq)
 {
@@ -95,9 +98,11 @@  vhost_user_iotlb_pending_insert(struct vhost_virtqueue *vq,
 
 	ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
 	if (ret) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-				"IOTLB pool empty, clear pending misses\n");
-		vhost_user_iotlb_pending_remove_all(vq);
+		RTE_LOG(INFO, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");
+		if (!TAILQ_EMPTY(&vq->iotlb_pending_list))
+			vhost_user_iotlb_pending_remove_all(vq);
+		else
+			vhost_user_iotlb_cache_random_evict(vq);
 		ret = rte_mempool_get(vq->iotlb_pool, (void **)&node);
 		if (ret) {
 			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");
@@ -186,8 +191,11 @@  vhost_user_iotlb_cache_insert(struct vhost_virtqueue *vq, uint64_t iova,
 
 	ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
 	if (ret) {
-		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, evict one entry\n");
-		vhost_user_iotlb_cache_random_evict(vq);
+		RTE_LOG(DEBUG, VHOST_CONFIG, "IOTLB pool empty, clear entries\n");
+		if (!TAILQ_EMPTY(&vq->iotlb_list))
+			vhost_user_iotlb_cache_random_evict(vq);
+		else
+			vhost_user_iotlb_pending_remove_all(vq);
 		ret = rte_mempool_get(vq->iotlb_pool, (void **)&new_node);
 		if (ret) {
 			RTE_LOG(ERR, VHOST_CONFIG, "IOTLB pool still empty, failure\n");