[dpdk-dev] [PATCH] cast used->idx to volatile

Xie, Huawei huawei.xie at intel.com
Mon Mar 30 17:56:18 CEST 2015


On 3/30/2015 5:21 PM, Linhaifeng wrote:
>
> On 2015/3/24 18:06, Xie, Huawei wrote:
>> On 3/24/2015 3:44 PM, Linhaifeng wrote:
>>> On 2015/3/24 9:53, Xie, Huawei wrote:
>>>> On 3/24/2015 9:00 AM, Linhaifeng wrote:
>>>>> On 2015/3/23 20:54, Xie, Huawei wrote:
>>>>>>> -----Original Message-----
>>>>>>> From: Linhaifeng [mailto:haifeng.lin at huawei.com]
>>>>>>> Sent: Monday, March 23, 2015 8:24 PM
>>>>>>> To: dev at dpdk.org
>>>>>>> Cc: Ouyang, Changchun; Xie, Huawei
>>>>>>> Subject: Re: [dpdk-dev] [PATCH] cast used->idx to volatile
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 2015/3/21 16:07, linhaifeng wrote:
>>>>>>>> From: Linhaifeng <haifeng.lin at huawei.com>
>>>>>>>>
>>>>>>>> Same as rte_vhost_enqueue_burst we should cast used->idx
>>>>>>>> to volatile before notify guest.
>>>>>>>>
>>>>>>>> Signed-off-by: Linhaifeng <haifeng.lin at huawei.com>
>>>>>>>> ---
>>>>>>>>  lib/librte_vhost/vhost_rxtx.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
>>>>>>>> index 535c7a1..8d674d1 100644
>>>>>>>> --- a/lib/librte_vhost/vhost_rxtx.c
>>>>>>>> +++ b/lib/librte_vhost/vhost_rxtx.c
>>>>>>>> @@ -722,7 +722,7 @@ rte_vhost_dequeue_burst(struct virtio_net *dev,
>>>>>>> uint16_t queue_id,
>>>>>>>>  	}
>>>>>>>>
>>>>>>>>  	rte_compiler_barrier();
>>>>>>>> -	vq->used->idx += entry_success;
>>>>>>>> +	*(volatile uint16_t *)&vq->used->idx += entry_success;
>>>>>> Haifeng:
>>>>>> We have compiler barrier before and an external function call behind, so we don't need volatile  here.
>>>>>> Do you meet issue?
>>>>>>
>>>>> Tx_q is sometimes stopped when we use virtio_net. Because vhost thought there are no buffers in tx_q and virtio_net
>>>>> though vhost haven't handle all packets so we have to restart VM to restore work.
>>>>>
>>>>> The status in VM is:
>>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246687] net eth7: virtnet_poll
>>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246690] net eth7: receive_buf
>>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246693] net eth7: vi->num=239
>>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246695] net eth7: svq:avail->idx=52939 used->idx=52939 num_free=18 num_added=0 svq->last_used_idx=52820
>>>>> Mar 18 17:11:10 linux-b2ij kernel: [46337.246699] net eth7: rvq:avail->idx=36215 used->idx=35977 num_free=18 num_added=0 rvq->last_used_idx=35977
>>>>> Mar 18 17:11:11 linux-b2ij kernel: [46337.901038] net eth7: dev_queue_xmit, qdisc->flags=4, qdisc->state deactiveed=0
>>>>> Mar 18 17:11:12 linux-b2ij kernel: [46337.901042] net eth7: dev_queue_xmit, txq->state=1, stopped=1
>>>>>
>>>>> Why compiler barrier not take effect in our case? Is compiler barrier depended on -O3 option? We use -O2 option.
>>>> compiler barrier always works regardless of the optimization option.
>>>> I don't get your story, but the key thing is, do you check the asm code?
>>>> If called from outside as an API, how is it possible it is optimized?
>>>> there is only one update to used->idx in that function.
>>> Do you mean rte_vhost_enqueue_burst also not need cast used->idx to volatile ? Why not remove it?
>> I checked the code. Seems we can remove. That is another issue.
>> For your issue, you meet problem, and submit this this patch, but i am a
>> bit confused it is the root cause. Do you check the asm code that
>> volatile is optimized?
>>
> I had wrote a demo try to find out the different between rte_compiler_barrier and volatile.
> It seems no any effect on rte_compiler_barrier().

Haifeng:

I think it doesn't make too much sense to use volatile for local variables.

In our rte_vhost_dequeue_burst, there is one memory write to the
used->idx, and there is compiler barrier to keep the order.
Besides, as an API, how could that memory write to be optimized as
register access?

Even if you call rte_vhost_dequeue_burst in the same src file, which
means in the same translation unit, there is function call after which
has side effect, it still couldn't be optimized.

Anyway, could we directly check the asm code of rte_vhost_dequeue_burst
to see whether it is optimized?

-huawei
>
> -------->test1: without rte_compiler_barrier and volatile
>
> #include <rte_atomic.h>
>
> int main()
> {
>         int i,j;
>
>         *(int*)&i = 2;
>         *(int*)&j = 3;
>         printf("i=%d j=%d", i, j);
> }
> linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7.0/x86_64-native-linuxapp-gcc/include/ -O3
> linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10
>         .file   "test.c"
>         .section        .rodata.str1.1,"aMS", at progbits,1
> .LC0:
>         .string "i=%d j=%d"
>         .text
>         .p2align 4,,15
> .globl main
>         .type   main, @function
> main:
> .LFB571:
>         movl    $3, %edx
>         movl    $2, %esi
>         movl    $.LC0, %edi
>         xorl    %eax, %eax
>         jmp     printf
> .LFE571:
>         .size   main, .-main
> 		
> 		
> -------->test2: use rte_compiler_barrier
> note: the asm code same as test1
> 		
> linux-LOubNs:/mnt/sdc/linhf/test # cat test.c
> #include <stdio.h>
> #include <rte_atomic.h>
>
> int main()
> {
>         int i,j;
>
>         *(int*)&i = 2;
>         rte_compiler_barrier();
>         *(int*)&j = 3;
>         printf("i=%d j=%d", i, j);
> }
> linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7.0/x86_64-native-linuxapp-gcc/include/ -O3
> linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10
>         .file   "test.c"
>         .section        .rodata.str1.1,"aMS", at progbits,1
> .LC0:
>         .string "i=%d j=%d"
>         .text
>         .p2align 4,,15
> .globl main
>         .type   main, @function
> main:
> .LFB571:
>         movl    $3, %edx
>         movl    $2, %esi
>         movl    $.LC0, %edi
>         xorl    %eax, %eax
>         jmp     printf
> .LFE571:
>         .size   main, .-main
>
> 		
> -------->test3: use volatile
> 		
> linux-LOubNs:/mnt/sdc/linhf/test # cat test.c
> #include <stdio.h>
> #include <rte_atomic.h>
>
> int main()
> {
>         int i,j;
>
>         *(volatile int*)&i = 2;
>         *(volatile int*)&j = 3;
>         printf("i=%d j=%d", i, j);
> }
> linux-LOubNs:/mnt/sdc/linhf/test # gcc -S test.c -I /usr/include/dpdk-1.7.0/x86_64-native-linuxapp-gcc/include/ -O3
> linux-LOubNs:/mnt/sdc/linhf/test # cat test.s |grep main -B 10
>         .file   "test.c"
>         .section        .rodata.str1.1,"aMS", at progbits,1
> .LC0:
>         .string "i=%d j=%d"
>         .text
>         .p2align 4,,15
> .globl main
>         .type   main, @function
> main:
> .LFB571:
>         movl    $2, -4(%rsp)
>         movl    $3, -8(%rsp)
>         movl    $.LC0, %edi
>         movl    -8(%rsp), %edx
>         movl    -4(%rsp), %esi
>         xorl    %eax, %eax
>         jmp     printf
> .LFE571:
>         .size   main, .-main
>
>>>>>>>>  	/* Kick guest if required. */
>>>>>>>>  	if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
>>>>>>>>  		eventfd_write((int)vq->callfd, 1);
>>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Haifeng
>>>> .
>>>>
>>
>> .
>>
>
>



More information about the dev mailing list