[dpdk-dev] Vhost user: Increase robustness by kicking guest at ring full

Patrik Andersson R patrik.r.andersson at ericsson.com
Thu Aug 25 08:55:32 CEST 2016


Hi,

during trouble shooting sessions (OVS 2.4.1, DPDK 2.2.0) it was noticed
that some guests trigger the SET_VRING_CALL message rather frequently. This
can be all from a few times per minute up to 10 times per second.

 From DPDK log:
...
2016-08-01T19:58:39.829222+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:39.829232+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:251
2016-08-01T19:58:39.829246+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:39.829250+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:215
2016-08-01T19:58:40.778491+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:40.778501+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:251
2016-08-01T19:58:40.778517+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:40.778521+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:215
2016-08-01T19:58:41.813467+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:41.813479+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:251
2016-08-01T19:58:41.813499+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: read message VHOST_USER_SET_VRING_CALL, 1
2016-08-01T19:58:41.813505+09:00 compute-0-6 ovs-vswitchd[140481]: 
VHOST_CONFIG: vring call idx:0 file:215
...

Note that the ", 1" at the end of the log entries is the file handle index
added in a debug build of DPDK, not part of vanilla DPDK.


At high packet rate this might induce the kicking of the guest to fail
repeatedly while enqueueing packets, due to the vq->callfd not being valid
during the time its being reconfigured.

Sporadically this leads to the virtio ring becoming full. Once full the
enqueue functionality in DPDK stops kicking the guest. As the guest is
interrupt driven and has not received all kicks it will not empty the
virtio ring. Possibly there is some flaw also in the guest virtio driver
to make this happen.

To "solve" this problem, the kick operation in virtio_dev_merge_rx() was
excluded from the pkt_idx > 0 condition. A similar change was done in
virtio_dev_rx().


Original vhost_rxtx.c, virtio_dev_merge_rx():
...
merge_rx_exit:
     if (likely(pkt_idx)) {
         /* flush used->idx update before we read avail->flags. */
         rte_mb();

         /* Kick the guest if necessary. */
         if (!(vq->avail->flags & VRING_AVAIL_F_NO_INTERRUPT))
             eventfd_write(vq->callfd, (eventfd_t)1);
     }

     return pkt_idx;
}
...


Questions

   - Is it a valid operation to change the call/kick file descriptors
     (frequently) during device operation?

   - For stability reasons it seems to me that performing a kick even 
when the
     virtio ring is full is prudent. Since the check for packets put on the
     ring is there at all in the code, could it be that there is a penalty
     of kicking at ring full?

   - Would there be other ways to protect against the call file descriptor
     changing frequently? Assuming that virtio device events in the 
guest will
     cause the occasional SET_VRING_CALL message as part of normal 
operation.


Any discussion on this topic will be appreciated.


Regards,

Patrik



More information about the dev mailing list