[dpdk-dev] [PATCH] vhost: fix connect hang in client mode

Ilya Maximets i.maximets at samsung.com
Thu Jul 21 11:45:32 CEST 2016


On 21.07.2016 12:37, Yuanhan Liu wrote:
> On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote:
>> If something abnormal happened to QEMU, 'connect()' can block calling
>> thread (e.g. main thread of OVS) forever or for a really long time.
>> This can break whole application or block the reconnection thread.
>>
>> Example with OVS:
>>
>> 	ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce
>> 	(gdb) bt
>> 	#0  connect () from /lib64/libpthread.so.0
>> 	#1  vhost_user_create_client (vsocket=0xa816e0)
>> 	#2  rte_vhost_driver_register
>> 	#3  netdev_dpdk_vhost_user_construct
>> 	#4  netdev_open (name=0xa664b0 "vhost1")
>> 	[...]
>> 	#11 main
>>
>> Fix that by setting non-blocking mode for client sockets for connection.
>>
>> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode")
> 
> Thanks for spotting and fixing yet another bug!
> 
>>  
>> +static int
>> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
> 
> I don't quite understand why this is needed: connect() with O_NONBLOCK
> flag set is not enough?

There is a little issue with non-blocking connect() call. Connection
establishing may be started but '-1' returned with 'errno = EINPROGRESS'.
In this case we must wait on fd until it will be available for writing.
After that we need to check current status of connection using getsockopt().

I don't sure that we're able to get such situation, but it's documented,
and, I think, we should handle it.

See 'man connect' for details.

Best regards, Ilya Maximets.


More information about the dev mailing list