[dpdk-dev] [PATCH] vhost: fix connect hang in client mode
Ilya Maximets
i.maximets at samsung.com
Thu Jul 21 11:45:32 CEST 2016
On 21.07.2016 12:37, Yuanhan Liu wrote:
> On Thu, Jul 21, 2016 at 11:21:15AM +0300, Ilya Maximets wrote:
>> If something abnormal happened to QEMU, 'connect()' can block calling
>> thread (e.g. main thread of OVS) forever or for a really long time.
>> This can break whole application or block the reconnection thread.
>>
>> Example with OVS:
>>
>> ovs_rcu(urcu2)|WARN|blocked 512000 ms waiting for main to quiesce
>> (gdb) bt
>> #0 connect () from /lib64/libpthread.so.0
>> #1 vhost_user_create_client (vsocket=0xa816e0)
>> #2 rte_vhost_driver_register
>> #3 netdev_dpdk_vhost_user_construct
>> #4 netdev_open (name=0xa664b0 "vhost1")
>> [...]
>> #11 main
>>
>> Fix that by setting non-blocking mode for client sockets for connection.
>>
>> Fixes: 64ab701c3d1e ("vhost: add vhost-user client mode")
>
> Thanks for spotting and fixing yet another bug!
>
>>
>> +static int
>> +vhost_user_connect_nonblock(int fd, struct sockaddr *un, size_t sz)
>
> I don't quite understand why this is needed: connect() with O_NONBLOCK
> flag set is not enough?
There is a little issue with non-blocking connect() call. Connection
establishing may be started but '-1' returned with 'errno = EINPROGRESS'.
In this case we must wait on fd until it will be available for writing.
After that we need to check current status of connection using getsockopt().
I don't sure that we're able to get such situation, but it's documented,
and, I think, we should handle it.
See 'man connect' for details.
Best regards, Ilya Maximets.
More information about the dev
mailing list