[dpdk-dev] [PATCH] vhost: fix MQ fails to startup

Yuanhan Liu yuanhan.liu at linux.intel.com
Fri Apr 28 04:25:58 CEST 2017


On Thu, Apr 27, 2017 at 10:52:20AM +0200, Maxime Coquelin wrote:
> 
> 
> On 04/27/2017 10:20 AM, Yuanhan Liu wrote:
> >On Thu, Apr 27, 2017 at 09:56:47AM +0200, Maxime Coquelin wrote:
> >>Hi Zhiyong,
> >>
> >>+Marc-André
> >>
> >>On 04/27/2017 08:34 AM, Zhiyong Yang wrote:
> >>>vhost since dpdk17.02 + qemu2.7 and above will cause failures of
> >>>new connection when negotiating to set MQ. (one queue pair works
> >>>well).Because there exist some bugs in qemu code when introducing
> >>>VHOST_USER_PROTOCOL_F_REPLY_ACK to qemu. when dealing with the vhost
> >>>message VHOST_USER_SET_MEM_TABLE for the second time, qemu indeed
> >>>doesn't send the messge (The message needs to be sent only once)but
> >>>still will be waiting for dpdk's reply ack, then, qemu is always
> >>>freezing. DPDK code works in the right way.
> >>
> >>I'm looking at Qemu's vhost_user_set_mem_table() function, but fail to
> >>see how it could wait for the reply-ack if it didn't send the
> >>VHOST_USER_SET_MEM_TABLE request before.
> >>
> >>>But the feature
> >>>VHOST_USER_PROTOCOL_F_REPLY_ACK has to be disabled by default at the
> >>>dpdk side in order to avoid the feature support of DPDK + qemu at
> >>>the same time. if doing like that, MQ can works well. Once Qemu bugs
> >>>have been fixed and upstreamed, we can enable it.
> >>
> >>The problem is for DPDK to detect whether bug is fixed in Qemu.
> >>Maybe only way would be to have a new protocol feature flag, which is
> >>not really its role.
> >
> >Wouldn't that be an overkill, judging that REPLY_ACK is not a must
> >feature?
> 
> Yes, maybe. But it was introduced to fix (possible) race conditions:
> https://lists.gnu.org/archive/html/qemu-devel/2016-07/msg06173.html

But AFAIK, that commit has been reverted:

    commit 94c9cb31c04737f86be29afefbff401cd23bc24d
    Author: Michael S. Tsirkin <mst at redhat.com>
    Date:   Mon Aug 15 16:35:24 2016 +0300
    
        Revert "vhost-user: Attempt to fix a race with set_mem_table."
    
        This reverts commit 28ed5ef16384f12500abd3647973ee21b03cbe23.
    
        I still think it's the right thing to do, but
        tests have been failing sporadically.
    
        Revert for now, and hope to fix it before the release.

> 
> Note that I planned to use this feature for the device IOTLB
> implementation to let the backend decide whether it wants the IOTLB
> misses synchronous or asynchronous. But I can still change the protocol
> spec to make this behavior specific to this request.

Maybe we could introduce a version message? With that, we could tell
whether the frontend has fixed the known bug or not.

Note that we already has the "version" info in current vhost-user spec.
It's just 2 bits in the message "flag" field though, which is not quite
enough.

	--yliu


More information about the dev mailing list