[5/5] vhost: remove useless casts to volatile

Message ID 20181205094957.1938-6-maxime.coquelin@redhat.com (mailing list archive)
State Superseded, archived
Delegated to: Maxime Coquelin
Headers
Series vhost: add missing barriers, remove useless volatiles |

Checks

Context Check Description
ci/Intel-compilation success Compilation OK

Commit Message

Maxime Coquelin Dec. 5, 2018, 9:49 a.m. UTC
  Cast to volatile is done when reading avail index and writing
the used index. This would not be necessary if proper barriers
are used.

Now that the read barrier has been added, we can remove these
cast to volatile.

Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
---
 lib/librte_vhost/virtio_net.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
  

Comments

Ilya Maximets Dec. 5, 2018, 1:52 p.m. UTC | #1
On 05.12.2018 12:49, Maxime Coquelin wrote:
> Cast to volatile is done when reading avail index and writing
> the used index. This would not be necessary if proper barriers
> are used.

'volatile' and barriers are not really connected. 'volatile' is
the disabling of the compiler optimizations, while barriers are
for runtime CPU level optimizations. In general, casts here made
to force compiler to actually read the value and not cache it
somehow. In fact that vhost library never writes to avail index,
"very smart" compiler could drop it at all. None of modern compilers
will do that for a single operation within a function, so,
volatiles are not really necessary in current code, but they could
save some nerves in case of code/compiler changes.

OTOH, IMHO, the main purpose of the casts in current code is
the self-documenting. Casts forces to pay special attention to
these variables and reminds that they could be updated in other
process. Casts allows to understand which variables are local and
which are shared. I don't think that we should remove them anyway.

> 
> Now that the read barrier has been added, we can remove these
> cast to volatile.
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
>  lib/librte_vhost/virtio_net.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
> index 679ce388b..eab1a5b4c 100644
> --- a/lib/librte_vhost/virtio_net.c
> +++ b/lib/librte_vhost/virtio_net.c
> @@ -114,7 +114,7 @@ flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>  
>  	vhost_log_cache_sync(dev, vq);
>  
> -	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
> +	vq->used->idx += vq->shadow_used_idx;
>  	vq->shadow_used_idx = 0;
>  	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
>  		sizeof(vq->used->idx));
> @@ -794,7 +794,7 @@ virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  	struct buf_vector buf_vec[BUF_VECTOR_MAX];
>  	uint16_t avail_head;
>  
> -	avail_head = *((volatile uint16_t *)&vq->avail->idx);
> +	avail_head = vq->avail->idx;
>  
>  	/*
>  	 * The ordering between avail index and
> @@ -1379,8 +1379,7 @@ virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
>  		}
>  	}
>  
> -	free_entries = *((volatile uint16_t *)&vq->avail->idx) -
> -			vq->last_avail_idx;
> +	free_entries = vq->avail->idx - vq->last_avail_idx;
>  	if (free_entries == 0)
>  		return 0;
>  
> 
>
  
Maxime Coquelin Dec. 6, 2018, 4:59 p.m. UTC | #2
Hi Ilya,

On 12/5/18 2:52 PM, Ilya Maximets wrote:
> On 05.12.2018 12:49, Maxime Coquelin wrote:
>> Cast to volatile is done when reading avail index and writing
>> the used index. This would not be necessary if proper barriers
>> are used.
> 
> 'volatile' and barriers are not really connected. 'volatile' is
> the disabling of the compiler optimizations, while barriers are
> for runtime CPU level optimizations. In general, casts here made
> to force compiler to actually read the value and not cache it
> somehow. In fact that vhost library never writes to avail index,
> "very smart" compiler could drop it at all. None of modern compilers
> will do that for a single operation within a function, so,
> volatiles are not really necessary in current code, but they could
> save some nerves in case of code/compiler changes.

Ok, thanks for the explanation.
Why don't we do the same in Virtio PMD?

> OTOH, IMHO, the main purpose of the casts in current code is
> the self-documenting. Casts forces to pay special attention to
> these variables and reminds that they could be updated in other
> process. Casts allows to understand which variables are local and
> which are shared. I don't think that we should remove them anyway.

It is not only self-documenting, it has an impact on generated code:

>>
>> Now that the read barrier has been added, we can remove these
>> cast to volatile.
>>
>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>> ---
>>   lib/librte_vhost/virtio_net.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>> index 679ce388b..eab1a5b4c 100644
>> --- a/lib/librte_vhost/virtio_net.c
>> +++ b/lib/librte_vhost/virtio_net.c
>> @@ -114,7 +114,7 @@ flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>>   
>>   	vhost_log_cache_sync(dev, vq);
>>   
>> -	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
>> +	vq->used->idx += vq->shadow_used_idx;

With cast to volatile:
	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
     35f8:	49 8b 53 10          	mov    0x10(%r11),%rdx
	vq->shadow_used_idx = 0;
     35fc:	31 db                	xor    %ebx,%ebx
	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
     35fe:	0f b7 42 02          	movzwl 0x2(%rdx),%eax
     3602:	66 41 03 43 70       	add    0x70(%r11),%ax
     3607:	66 89 42 02          	mov    %ax,0x2(%rdx)
	vq->shadow_used_idx = 0;

Without it:
	vq->used->idx += vq->shadow_used_idx;
     35f8:	49 8b 43 10          	mov    0x10(%r11),%rax
     35fc:	41 0f b7 53 70       	movzwl 0x70(%r11),%edx
	vq->shadow_used_idx = 0;
     3601:	31 db                	xor    %ebx,%ebx
	vq->used->idx += vq->shadow_used_idx;
     3603:	66 01 50 02          	add    %dx,0x2(%rax)
	vq->shadow_used_idx = 0;

If my understanding is correct there is no functional change, but we 
save one instruction by removing the cast to volatile.

Thanks,
Maxime
  
Ilya Maximets Dec. 7, 2018, 11:16 a.m. UTC | #3
On 06.12.2018 19:59, Maxime Coquelin wrote:
> Hi Ilya,
> 
> On 12/5/18 2:52 PM, Ilya Maximets wrote:
>> On 05.12.2018 12:49, Maxime Coquelin wrote:
>>> Cast to volatile is done when reading avail index and writing
>>> the used index. This would not be necessary if proper barriers
>>> are used.
>>
>> 'volatile' and barriers are not really connected. 'volatile' is
>> the disabling of the compiler optimizations, while barriers are
>> for runtime CPU level optimizations. In general, casts here made
>> to force compiler to actually read the value and not cache it
>> somehow. In fact that vhost library never writes to avail index,
>> "very smart" compiler could drop it at all. None of modern compilers
>> will do that for a single operation within a function, so,
>> volatiles are not really necessary in current code, but they could
>> save some nerves in case of code/compiler changes.
> 
> Ok, thanks for the explanation.
> Why don't we do the same in Virtio PMD?

Maybe we should. It works because in virtio all the accesses wrapped
by short access functions like 'vq_update_avail_idx'. And we, actually,
never reading the same value twice in the same function. Compilers
today does not optimize such memory accesses.

> 
>> OTOH, IMHO, the main purpose of the casts in current code is
>> the self-documenting. Casts forces to pay special attention to
>> these variables and reminds that they could be updated in other
>> process. Casts allows to understand which variables are local and
>> which are shared. I don't think that we should remove them anyway.
> 
> It is not only self-documenting, it has an impact on generated code:
> 
>>>
>>> Now that the read barrier has been added, we can remove these
>>> cast to volatile.
>>>
>>> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
>>> ---
>>>   lib/librte_vhost/virtio_net.c | 7 +++----
>>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
>>> index 679ce388b..eab1a5b4c 100644
>>> --- a/lib/librte_vhost/virtio_net.c
>>> +++ b/lib/librte_vhost/virtio_net.c
>>> @@ -114,7 +114,7 @@ flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
>>>         vhost_log_cache_sync(dev, vq);
>>>   -    *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
>>> +    vq->used->idx += vq->shadow_used_idx;
> 
> With cast to volatile:
>     *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
>     35f8:    49 8b 53 10              mov    0x10(%r11),%rdx
>     vq->shadow_used_idx = 0;
>     35fc:    31 db                    xor    %ebx,%ebx
>     *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
>     35fe:    0f b7 42 02              movzwl 0x2(%rdx),%eax
>     3602:    66 41 03 43 70           add    0x70(%r11),%ax
>     3607:    66 89 42 02              mov    %ax,0x2(%rdx)
>     vq->shadow_used_idx = 0;
> 
> Without it:
>     vq->used->idx += vq->shadow_used_idx;
>     35f8:    49 8b 43 10              mov    0x10(%r11),%rax
>     35fc:    41 0f b7 53 70           movzwl 0x70(%r11),%edx
>     vq->shadow_used_idx = 0;
>     3601:    31 db                    xor    %ebx,%ebx
>     vq->used->idx += vq->shadow_used_idx;
>     3603:    66 01 50 02              add    %dx,0x2(%rax)
>     vq->shadow_used_idx = 0;
> 
> If my understanding is correct there is no functional change, but we save one instruction by removing the cast to volatile.

IMHO, it's a gcc issue that it could not understand that cast and
dereference could be dropped. For example, clang on my ubuntu
generates equal code:

With cast to volatile:

        *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
   32550:       41 0f b7 42 70          movzwl 0x70(%r10),%eax
   32555:       49 8b 4a 10             mov    0x10(%r10),%rcx
   32559:       66 01 41 02             add    %ax,0x2(%rcx)
        vq->shadow_used_idx = 0;
   3255d:       66 41 c7 42 70 00 00    movw   $0x0,0x70(%r10)

Without it:

        vq->used->idx += vq->shadow_used_idx;
   32550:       41 0f b7 42 70          movzwl 0x70(%r10),%eax
   32555:       49 8b 4a 10             mov    0x10(%r10),%rcx
   32559:       66 01 41 02             add    %ax,0x2(%rcx)
        vq->shadow_used_idx = 0;
   3255d:       66 41 c7 42 70 00 00    movw   $0x0,0x70(%r10)


However, different code appears only in '+=' case.
Why we have this increment at all? Following change will eliminate
the generated code difference:

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 5e1a1a727..5776975ca 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -114,7 +114,7 @@ flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 
        vhost_log_cache_sync(dev, vq);
 
-       *(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
+       *(volatile uint16_t *)&vq->used->idx = vq->last_used_idx;
        vq->shadow_used_idx = 0;
        vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
                sizeof(vq->used->idx));
---

What do you think?


Best regards, Ilya Maximets.
  

Patch

diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c
index 679ce388b..eab1a5b4c 100644
--- a/lib/librte_vhost/virtio_net.c
+++ b/lib/librte_vhost/virtio_net.c
@@ -114,7 +114,7 @@  flush_shadow_used_ring_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 
 	vhost_log_cache_sync(dev, vq);
 
-	*(volatile uint16_t *)&vq->used->idx += vq->shadow_used_idx;
+	vq->used->idx += vq->shadow_used_idx;
 	vq->shadow_used_idx = 0;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 		sizeof(vq->used->idx));
@@ -794,7 +794,7 @@  virtio_dev_rx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct buf_vector buf_vec[BUF_VECTOR_MAX];
 	uint16_t avail_head;
 
-	avail_head = *((volatile uint16_t *)&vq->avail->idx);
+	avail_head = vq->avail->idx;
 
 	/*
 	 * The ordering between avail index and
@@ -1379,8 +1379,7 @@  virtio_dev_tx_split(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		}
 	}
 
-	free_entries = *((volatile uint16_t *)&vq->avail->idx) -
-			vq->last_avail_idx;
+	free_entries = vq->avail->idx - vq->last_avail_idx;
 	if (free_entries == 0)
 		return 0;