[dpdk-dev] [PATCH v3] vhost: support inflight share memory protocol feature

lin li chuckylinchuckylin at gmail.com
Tue Apr 30 10:37:58 CEST 2019


Tiwei Bie <tiwei.bie at intel.com> 于2019年4月29日周一 下午6:55写道:
>
> On Mon, Apr 29, 2019 at 12:07:05PM +0800, lin li wrote:
> > Tiwei Bie <tiwei.bie at intel.com> 于2019年4月28日周日 下午7:18写道:
> > > On Fri, Apr 26, 2019 at 05:40:21AM -0400, Li Lin wrote:
> [...]
> > > > @@ -98,12 +102,26 @@ struct rte_vhost_memory {
> > > >       struct rte_vhost_mem_region regions[];
> > > >  };
> > > >
> > > > +typedef struct VhostUserInflightEntry {
> > > > +     uint8_t inflight;
> > > > +} VhostUserInflightEntry;
> > > > +
> > > > +typedef struct VhostInflightInfo {
> > > > +     uint16_t version;
> > > > +     uint16_t last_inflight_io;
> > > > +     uint16_t used_idx;
> > > > +     VhostUserInflightEntry desc[0];
> > > > +} VhostInflightInfo;
> > >
> > > Is there any details on above structure? Why does it not match
> > > QueueRegionSplit or QueueRegionPacked structures described in
> > > qemu/docs/interop/vhost-user.txt?
> >
> > Qemu have its vhost-user backend,
>
> Do you mean contrib/libvhost-user in QEMU?
>
> > qemu did the submission of IO in it.
>
> What does this mean?
>
> > The implementation of dpdk is more general. It is just to mark inflight entry.
>
> Based on the discussion in below threads:
>
> https://patchwork.kernel.org/patch/10753893/
> https://patchwork.kernel.org/patch/10817735/
>
> IIUC, above structure is the extra buffer allocated by slave to
> store the information of inflight descriptors and share with master
> for persistent. All vhost-user backends' implementation should
> follow the vhost-user protocol (including the format of above
> structure) defined in qemu/docs/interop/vhost-user.txt. Right?

Yes you're right. I've confirmed with QEMU that these data structures
are consistent with qemu.
It will be consistent with QEMU in the next version.

>
> > The submission of inflight entry is handle over to different backends.
> > They have their own ways to handle it, such as spdk.
> > So there are some differences in data structure.
> >
> > >
> [...]
> > > > +static int
> > > > +vhost_user_set_inflight_fd(struct virtio_net **pdev, VhostUserMsg *msg,
> > > > +             int main_fd __rte_unused)
> > > > +{
> > > > +     int fd, i;
> > > > +     uint64_t mmap_size, mmap_offset;
> > > > +     uint16_t num_queues, queue_size;
> > > > +     uint32_t pervq_inflight_size;
> > > > +     void *rc;
> > > > +     struct vhost_virtqueue *vq;
> > > > +     struct virtio_net *dev = *pdev;
> > > > +
> > > > +     fd = msg->fds[0];
> > > > +     if (msg->size != sizeof(msg->payload.inflight) || fd < 0) {
> > > > +             RTE_LOG(ERR, VHOST_CONFIG, "Invalid set_inflight_fd message size is %d,fd is %d\n",
> > > > +                     msg->size, fd);
> > > > +             return -1;
> > >
> > > Ditto.
> > >
> > > > +     }
> > > > +
> > > > +     mmap_size = msg->payload.inflight.mmap_size;
> > > > +     mmap_offset = msg->payload.inflight.mmap_offset;
> > > > +     num_queues = msg->payload.inflight.num_queues;
> > > > +     queue_size = msg->payload.inflight.queue_size;
> > > > +     pervq_inflight_size = get_pervq_shm_size(queue_size);
> > > > +
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd mmap_size: %lu\n", mmap_size);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd mmap_offset: %lu\n", mmap_offset);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd num_queues: %u\n", num_queues);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd queue_size: %u\n", queue_size);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd fd: %d\n", fd);
> > > > +     RTE_LOG(INFO, VHOST_CONFIG,
> > > > +             "set_inflight_fd pervq_inflight_size: %d\n",
> > > > +             pervq_inflight_size);
> > > > +
> > > > +     if (dev->inflight_info.addr)
> > > > +             munmap(dev->inflight_info.addr, dev->inflight_info.size);
> > > > +
> > > > +     rc = mmap(0, mmap_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > > > +                     fd, mmap_offset);
> > >
> > > Why call it rc? Maybe addr is a better name?
> >
> > In some scenarios, shared memory is reallocated or resized by qemu, so
> > again mmap is needed.
>
> In which case the shared memory will be reallocated or resized by QEMU?

QEMU replies that it needs to copy inflight share memory data when
live migration occurs.
If the size of inflight buffer on two machines is not the same, QEMU
will reallocate memory according to the old inflight buffer size.

>
> >
> > >
> > > > +     if (rc == MAP_FAILED) {
> > > > +             RTE_LOG(ERR, VHOST_CONFIG, "failed to mmap share memory.\n");
> > > > +             return -1;
> > >
> > > Should always return RTE_VHOST_MSG_RESULT_* in
> > > message handler.
> > >
> > > > +     }
> > > > +
> > > > +     if (dev->inflight_info.fd)
> > > > +             close(dev->inflight_info.fd);
> > > > +
> > > > +     dev->inflight_info.fd = fd;
> > > > +     dev->inflight_info.addr = rc;
> > > > +     dev->inflight_info.size = mmap_size;
> > > > +
> > > > +     for (i = 0; i < num_queues; i++) {
> > > > +             vq = dev->virtqueue[i];
> > > > +             vq->inflight = (VhostInflightInfo *)rc;
> > > > +             rc = (void *)((char *)rc + pervq_inflight_size);
> > > > +     }
> > > > +
> > > > +     return RTE_VHOST_MSG_RESULT_OK;
> > > > +}
> [...]
> > > > diff --git a/lib/librte_vhost/vhost_user.h b/lib/librte_vhost/vhost_user.h
> > > > index 2a650fe4b..35f969b1b 100644
> > > > --- a/lib/librte_vhost/vhost_user.h
> > > > +++ b/lib/librte_vhost/vhost_user.h
> > > > @@ -23,7 +23,8 @@
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_CRYPTO_SESSION) | \
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_SLAVE_SEND_FD) | \
> > > >                                        (1ULL << VHOST_USER_PROTOCOL_F_HOST_NOTIFIER) | \
> > > > -                                      (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT))
> > > > +                                     (1ULL << VHOST_USER_PROTOCOL_F_PAGEFAULT) | \
> > > > +                                     (1ULL << VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD))
> > >
> > > It will advertise this feature for builtin net and crypto
> > > backends. It's probably not what you intended.
> > >
> >
> > Indeed, this feature is mainly used for spdk-like backends. You mean
> > this function is disabled by default?
>
> External backends should use rte_vhost_driver_set_protocol_features()
> to advertise the protocol features they support.

Thanks,It will be modified in the next version.


More information about the dev mailing list