[dpdk-dev,RFC] vhost: new rte_vhost API proposal

Message ID 1525958573-184361-1-git-send-email-dariuszx.stojaczyk@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Maxime Coquelin
Headers

Checks

Context Check Description
ci/checkpatch warning coding style issues
ci/Intel-compilation success Compilation OK

Commit Message

Stojaczyk, DariuszX May 10, 2018, 1:22 p.m. UTC
  rte_vhost has been confirmed not to work with some Virtio devices
(it's not vhost-user spec compliant, see details below) and fixing
it directly would require quite a big amount of changes which would
completely break backwards compatibility. This library is intended
to smooth out the transition. It exposes a low-level API for
implementing new Virtio drivers/targets. The existing rte_vhost
is about to be refactored to use rte_virtio library underneath, and
demanding drivers could now use rte_virtio directly.

rte_virtio would offer both vhost and virtio driver APIs. These two
have a lot of common code for vhost-user handling or PCI access for
initiator/virtio-vhost-user (and possibly vDPA) so there's little
sense to keep target and initiator code separated between different
libs. Of course, the APIs would be separate - only some parts of
the code would be shared.

rte_virtio intends to abstract away most vhost-user/virtio-vhost-user
specifics and to allow developers to implement Virtio targets/drivers
with an ease. It calls user-provided callbacks once proper device
initialization state has been reached. That is - memory mappings
have changed, virtqueues are ready to be processed, features have
changed in runtime, etc.

Compared to the rte_vhost, this lib additionally allows the following:
* ability to start/stop particular queues - that's required
by the vhost-user spec. rte_vhost has been already confirmed
not to work with some Virtio devices which do not initialize
some of their management queues.
* most callbacks are now asynchronous - it greatly simplifies
the event handling for asynchronous applications and doesn't
make anything harder for synchronous ones.
* this is low-level API. It doesn't have any vhost-net, nvme
or crypto references. These backend-specific libraries will
be later refactored to use *this* generic library underneath.
This implies that the library doesn't do any virtqueue processing,
it only delivers vring addresses to the user, so he can process
virtqueues by himself.
* abstracting away PCI/vhost-user.
* The API imposes how public functions can be called and how
internal data can change, so there's only a minimal work required
to ensure thread-safety. Possibly no mutexes are required at all.
* full Virtio 1.0/vhost-user specification compliance.

This patch only introduces the API. Some additional functions
for vDPA might be still required, but everything present here
so far shouldn't need changing.

Signed-off-by: Dariusz Stojaczyk <dariuszx.stojaczyk@intel.com>
---
 lib/librte_virtio/rte_virtio.h | 245 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 245 insertions(+)
 create mode 100644 lib/librte_virtio/rte_virtio.h
  

Comments

Stojaczyk, DariuszX May 11, 2018, 5:55 a.m. UTC | #1
Hi,

> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, May 11, 2018 12:37 AM
> On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > rte_virtio would offer both vhost and virtio driver APIs. These two
> > have a lot of common code for vhost-user handling or PCI access for
> > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > sense to keep target and initiator code separated between different
> > libs. Of course, the APIs would be separate - only some parts of
> > the code would be shared.
> 
> The API below seems to be for vhost backends (aka slaves).  rte_virtio_*
> is a misnomer because vhost and virtio are two different things.  This
> is not for implementing virtio devices, it's specifically for vhost
> devices.

I agree it's named a bit off if we're talking about vhost. My idea was to introduce a generic library for userspace Virtio processing and that's where the name came from. Even when you use just the vhost API that's introduced here, you are only required to implement vring processing, config access, and possibly interrupt handling, all of which are typical Virtio things. The vhost logic is hidden inside.

> 
> Vhost does not offer the full virtio device model - otherwise it would
> be just another transport in the VIRTIO specification.  Instead vhost is
> a protocol for vhost devices, which are subsets of virtio devices.
> 
> I suggest calling it rte_vhost2 since it's basically a new, incompatible
> rte_vhost API.

Rte_vhost2 sounds better for what we have now, but would that name be still valid once we add a true Virtio driver functionality? (I believe it's called Virtio PMD in DPDK right now). That driver would reuse a lot of the vhost code for PCI and vhost-user, so it makes some sense to put these two together. 

I don't think rte_vhost2 is a permanent name anyway, so maybe we could call it like so for now, and rename it later once I introduce that additional Virtio functionality? Would that work?

> 
> Also, the initiator/target terminology does not match the vhost-user
> specification.  It uses master/client and slave/backend/server.  Adding
> another pair of words makes things more confusing.  Please stick to the
> words used by the spec.

Ack.

> 
> > +/**
> > + * Device/queue related callbacks, all optional. Provided callback
> > + * parameters are guaranteed not to be NULL until explicitly specified.
> 
> s/until/unless/ ?

Ack.

> > + /**
> > + * Stop processing vq. It shouldn't be accessed after this callback
> > + * completes (via tgt_cb_complete). This can be called prior to
> shutdown
> 
> s/tgt_cb_complete/rte_virtio_tgt_cb_complete/

Ack.

> 
> > + * or before actions that require changing vhost device/vq state.
> > + */
> > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq);
> > + /** Device disconnected. All queues are guaranteed to be stopped by
> now */
> > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > + /**
> > + * Custom message handler. `vdev` and `vq` can be NULL. This is called
> > + * for backend-specific actions. The `id` should be prefixed by the
> 
> Since vdev can be NULL, does this mean custom_msg() may be invoked at
> any time during the lifecycle and even before/after
> device_create()/device_destroy()?

Theoretically. I was thinking of some poorly-written backends notifying they're out of internal resources, but I agree it's just poor. I'll remove the `vdev can be NULL` part.

> > + */
> > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq,
> > +   char *id, void *ctx);
> 
> What is the purpose of id and why is it char* instead of const char*?

Ack, It should be const. (same thing goes to every other char* in this patch)

For example vhost-crypto introduces two new vhost-user messages for initializing and destroying crypto session. The underlying vhost-crypto vhost-user backend after receiving such message could execute this callback as follows:

struct my_crypto_data *data = calloc();
[...]
Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);

> 
> Is ctx the "message"?  If ctx is untrusted message data from an external
> process, why is there no size argument?  Who validates the message size?

Ack. Will add size parameter.

> 
> > +
> > + /**
> > + * Interrupt handler, synchronous. If this callback is set to NULL,
> > + * rte_virtio will hint the initiators not to send any interrupts.
> > + */
> > + void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq
> *vq);
> 
> Devices often have multiple types of queues.  Some of them may be
> suitable for polling, others may be suitable for an interrupt-driven
> model.  Is there a way to enable/disable interrupts for specific queues?

Thanks, I didn't think of that. I'll need to move the responsibility of setting vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT to the user.

> 
> > + /** Device config read, synchronous. */
> 
> What is the meaning of the return value?

How about the following:
\return 0 if `config` has been successfully set, -1 otherwise.

An error ( -1 ) is propagated all the way to the master so he can handle it his way.

> 
> > + int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config,
> > +  uint32_t config_len);
> > + /** Device config changed by the driver, synchronous. */
> 
> What is the meaning of the return value?
> 
> What is the meaning of the flags?

Good call. I actually can't find any doc/usage of this API.
Changpeng (the original get/set_config author, +CC'ed), could you document this function briefly here?

> 
> > + int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config,
> > +  uint32_t offset, uint32_t len, uint32_t flags);
> > +};
> > +
> > +/**
> > + * Registers a new vhost target accepting remote connections. Multiple
> > + * available transports are available. It is possible to create a Vhost-
> user
> > + * Unix domain socket polling local connections or connect to a
> physical
> > + * Virtio device and install an interrupt handler .
> > + * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost-
> user",
> > + * "PCI-vDPA", "vhost-user".
> > + * \param trid identifier of the device. For PCI this would be the BDF
> address,
> > + * for vhost-user the socket name.
> > + * \param trctx additional data for the specified transport. Can be
> NULL.
> > + * \param tgt_ops callbacks to be called upon reaching specific
> initialization
> > + * states.
> > + * \param features supported Virtio features. To be negotiated with
> the
> > + * driver ones. rte_virtio will append a couple of generic feature bits
> > + * which are required by the Virtio spec. TODO list these features here
> > + * \return 0 on success, negative errno otherwise
> > + */
> > +int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx,
> > +   struct rte_virtio_tgt_ops *tgt_ops,
> > +   uint64_t features);
> > +
> > +/**
> > + * Finish async device tgt ops callback. Unless a tgt op has been
> documented
> > + * as 'synchronous' this function must be called at the end of the op
> handler.
> > + * It can be called either before or after the op handler returns.
> rte_virtio
> > + * won't call any callbacks while another one hasn't been finished yet.
> > + * \param vdev vhost device
> > + * \param rc 0 on success, negative errno otherwise.
> > + */
> > +int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc);
> 
> How can this function fail and how is the caller supposed to handle
> failure?

If -1 is returned, the current callback will be perceived as failed. So if ` device_create` is completed with rc != 0, this lib will teardown the device and no subsequent `device_destroy` will be called. Similar thing goes with queues - if a queue failed to start, it won't need to be stopped. Since you pointed it out - I'll mention it somewhere in the doc. I didn't do so in the first place because it's analogous to how rte_vhost works now.

> 
> Are there any thread-safety rules regarding this API?  Can it be called
> from a different thread than the callback?

Yes. I should mention it.

> 
> > +
> > +/**
> > + * Unregisters a vhost target asynchronously.
> 
> How are existing device instances affected?

Ack. How about:
All active queues will be stopped and all devices destroyed.

This is analogous to what rte_vhost has now.

> 
> > + * \param cb_fn callback to be called on finish
> > + * \param cb_arg argument for \c cb_fn
> > + */
> > +void rte_virtio_tgt_unregister(char *trid,
> 
> One of the rte_vhost API limitations is that the ID namespace is shared
> between transports.  The same seems to be the case here.
> 
> It assumes that "PCI", "PCI-vhost-user", "PCI-vDPA", and "vhost-user"
> are never instantiated with the same trid.  UNIX domain sockets can have
> arbitrary filenames (that resemble a PCI BDF).  And who knows what
> other
> transports will be added in the future.
> 
> I think namespace collisions could be a problem.

Ack, I'll add `char *trtype` param to the unregister func.

> 
> > +      void (*cb_fn)(void *arg), void *cb_arg);
> 
> Which thread is the callback invoked from?

It'll be called from the same thread that calls rte_virtio_tgt_ops. I'll mention it in the doc, thanks.

rte_virtio_tgt_unregister should also return an error code for cases where a device with given trtype/trid couldn't be found. It'll prevent some implementations from waiting endlessly for cb_fn to be called.

Thanks,
D.
  
Stojaczyk, DariuszX May 18, 2018, 7:51 a.m. UTC | #2
> -----Original Message-----
> From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> Sent: Friday, May 11, 2018 6:06 PM
> On Fri, May 11, 2018 at 05:55:45AM +0000, Stojaczyk, DariuszX wrote:
> > > -----Original Message-----
> > > From: Stefan Hajnoczi [mailto:stefanha@redhat.com]
> > > Sent: Friday, May 11, 2018 12:37 AM
> > > On Thu, May 10, 2018 at 03:22:53PM +0200, Dariusz Stojaczyk wrote:
> > > > rte_virtio would offer both vhost and virtio driver APIs. These two
> > > > have a lot of common code for vhost-user handling or PCI access for
> > > > initiator/virtio-vhost-user (and possibly vDPA) so there's little
> > > > sense to keep target and initiator code separated between different
> > > > libs. Of course, the APIs would be separate - only some parts of
> > > > the code would be shared.
> > >
> > > The API below seems to be for vhost backends (aka slaves).
> rte_virtio_*
> > > is a misnomer because vhost and virtio are two different things.  This
> > > is not for implementing virtio devices, it's specifically for vhost
> > > devices.
> >
> > I agree it's named a bit off if we're talking about vhost. My idea was to
> introduce a generic library for userspace Virtio processing and that's
> where the name came from. Even when you use just the vhost API that's
> introduced here, you are only required to implement vring processing,
> config access, and possibly interrupt handling, all of which are typical
> Virtio things. The vhost logic is hidden inside.
> 
> No, the vhost logic is not hidden: there is custom_msg() and the whole
> tgt_ops struct is an abstraction of the vhost protocol, not virtio.
> 
> It sounds like you're hoping to create a single API that can support
> both vhost and virtio access.  For example, one "net" device backend
> implementation using rte_virtio can be accessed via vhost or virtio.
> 
> This won't work because vhost and virtio are not equivalent.  vhost-net
> devices don't implement the virtio-net config space and they only have a
> subset of the virtqueues.  vhost-net devices support special vhost
> messages that don't exist in virtio-net.
> 
> Additionally, the virtio and vhost-user specifications are independent
> and make no promise of a 1:1 mapping.  They have the freedom to
> change
> in ways which will break any abstraction you come up with today.
> 
> I hope it will be possible to unify the two in the future, but that
> needs to happen at the spec level first, before trying to unify them in
> code.
> 
> This is why I'm belaboring the point that vhost should not be confused
> with virtio.  Each needs to be separate and clearly identified to avoid
> confusion.
> 	

Ok, I'm convinced now. Thanks for the explanation. I'll name the lib rte_vhost2 in v2.


> >
> > >
> > > Vhost does not offer the full virtio device model - otherwise it would
> > > be just another transport in the VIRTIO specification.  Instead vhost is
> > > a protocol for vhost devices, which are subsets of virtio devices.
> > >
> > > I suggest calling it rte_vhost2 since it's basically a new, incompatible
> > > rte_vhost API.
> >
> > Rte_vhost2 sounds better for what we have now, but would that name
> be still valid once we add a true Virtio driver functionality? (I believe it's
> called Virtio PMD in DPDK right now). That driver would reuse a lot of the
> vhost code for PCI and vhost-user, so it makes some sense to put these
> two together.
> >
> > I don't think rte_vhost2 is a permanent name anyway, so maybe we
> could call it like so for now, and rename it later once I introduce that
> additional Virtio functionality? Would that work?
> 
> The natural layering for is that vhost depends on virtio.  Virtio header
> files (feature bits, config space layout, vring layout) and the vring
> API can be reused by vhost.
> 
> Virtio doesn't need knowledge of virtio though and the two can be in
> separate packages without code duplication.
> 
> That said, it doesn't really matter whether there are rte_virtio +
> rte_vhost2 packages or a single rte_virtio package, as long as the
> function and struct names for vhost interfaces contain the name "vhost"
> so they cannot be confused with virtio.
> 
> > > > + * or before actions that require changing vhost device/vq state.
> > > > + */
> > > > + void (*queue_stop)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq);
> > > > + /** Device disconnected. All queues are guaranteed to be stopped
> by
> > > now */
> > > > + void (*device_destroy)(struct rte_virtio_dev *vdev);
> > > > + /**
> > > > + * Custom message handler. `vdev` and `vq` can be NULL. This is
> called
> > > > + * for backend-specific actions. The `id` should be prefixed by the
> > >
> > > Since vdev can be NULL, does this mean custom_msg() may be invoked
> at
> > > any time during the lifecycle and even before/after
> > > device_create()/device_destroy()?
> >
> > Theoretically. I was thinking of some poorly-written backends notifying
> they're out of internal resources, but I agree it's just poor. I'll remove the
> `vdev can be NULL` part.
> 
> Okay, I wasn't suggesting it's bad, I just wanted the docs to state at
> which points in the lifecycle this callback can be invoked.
> 
> > > > + */
> > > > + void (*custom_msg)(struct rte_virtio_dev *vdev, struct
> rte_virtio_vq
> > > *vq,
> > > > +   char *id, void *ctx);
> > >
> > > What is the purpose of id and why is it char* instead of const char*?
> >
> > Ack, It should be const. (same thing goes to every other char* in this
> patch)
> >
> > For example vhost-crypto introduces two new vhost-user messages for
> initializing and destroying crypto session. The underlying vhost-crypto
> vhost-user backend after receiving such message could execute this
> callback as follows:
> >
> > struct my_crypto_data *data = calloc();
> > [...]
> > Ops->custom_msg(vdev, NULL, "crypto_sess_init", data);
> 
> So it's necessary to modify rte_virtio vhost code when implementing new
> device backends with custom messages?
> 
> It seems like rte_virtio needs to have knowledge of how to parse any
> custom messages :(.  It would be cleaner for rte_virtio to have no
> knowledge of device-specific messages.
> 
> And how does the device backend reply to custom messages?
>

The library would send proper response after receiving rte_virtio_tgt_cb_complete(). If it needs additional data from the user, there's the `ctx` field in custom_msg that he [the user] can write into.

However, I started to work on the implementation and came to conclusion that it's unnecessarily difficult to implement new Vhost device backends this way. I've changed the custom_msg callback to parse raw Vhost-user messages now. Still, new Vhost-user messages are usually a type of a protocol extension negotiated by a protocol feature flag, and protocol extensions should be implemented inside the lib in my opinion. If a protocol extension changes existing message rather than introduces new one, we'll *need* to implement it inside the lib. 

Both solutions have their good and bad points.
I'm sending v2 in a couple minutes, maybe it'll help us decide which one is better.

> > > > +/**
> > > > + * Unregisters a vhost target asynchronously.
> > >
> > > How are existing device instances affected?
> >
> > Ack. How about:
> > All active queues will be stopped and all devices destroyed.
> >
> > This is analogous to what rte_vhost has now.
> 
> Sounds good.
> 
> Stefan

Regards,
D.
  

Patch

diff --git a/lib/librte_virtio/rte_virtio.h b/lib/librte_virtio/rte_virtio.h
new file mode 100644
index 0000000..0203d5e
--- /dev/null
+++ b/lib/librte_virtio/rte_virtio.h
@@ -0,0 +1,245 @@ 
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright (c) Intel Corporation.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/vhost.h>
+
+/** Single memory region. Both physically and virtually contiguous */
+struct rte_virtio_mem_region {
+ uint64_t guest_phys_addr;
+ uint64_t guest_user_addr;
+ uint64_t host_user_addr;
+ uint64_t size;
+ void *mmap_addr;
+ uint64_t mmap_size;
+ int fd;
+};
+
+struct rte_virtio_memory {
+ uint32_t nregions;
+ struct rte_virtio_mem_region regions[];
+};
+
+/**
+ * Vhost device created and managed by rte_virtio. Accessible via
+ * \c rte_virtio_tgt_ops callbacks. This is only a part of the real
+ * vhost device data. This struct is published just for inline vdev
+ * functions to access their data directly.
+ */
+struct rte_virtio_dev {
+ struct rte_virtio_memory *mem;
+ uint64_t features;
+};
+
+/**
+ * Virtqueue created and managed by rte_virtio. Accessible via
+ * \c rte_virtio_tgt_ops callbacks.
+ */
+struct rte_virtio_vq {
+ struct vring_desc *desc;
+ struct vring_avail *avail;
+ struct vring_used *used;
+ /* available only if F_LOG_ALL has been negotiated */
+ void *log;
+ uint16_t size;
+};
+
+/**
+ * Device/queue related callbacks, all optional. Provided callback
+ * parameters are guaranteed not to be NULL until explicitly specified.
+ */
+struct rte_virtio_tgt_ops {
+ /** New initiator connected. */
+ void (*device_create)(struct rte_virtio_dev *vdev);
+ /**
+ * Device is ready to operate. vdev->mem is now available.
+ * This callback may be called multiple times as memory mappings
+ * can change dynamically. All queues are guaranteed to be stopped
+ * by now.
+ */
+ void (*device_init)(struct rte_virtio_dev *vdev);
+ /**
+ * Features have changed in runtime. Queues might be still running
+ * at this point.
+ */
+ void (*device_features_changed)(struct rte_virtio_dev *vdev);
+ /**
+ * Start processing vq. The `vq` is guaranteed not to be modified before
+ * `queue_stop` is called.
+ */
+ void (*queue_start)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /**
+ * Stop processing vq. It shouldn't be accessed after this callback
+ * completes (via tgt_cb_complete). This can be called prior to shutdown
+ * or before actions that require changing vhost device/vq state.
+ */
+ void (*queue_stop)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /** Device disconnected. All queues are guaranteed to be stopped by now */
+ void (*device_destroy)(struct rte_virtio_dev *vdev);
+ /**
+ * Custom message handler. `vdev` and `vq` can be NULL. This is called
+ * for backend-specific actions. The `id` should be prefixed by the
+ * backend name (net/crypto/scsi) and `ctx` is message-specific data
+ * that should be available until tgt_cb_complete is called.
+ */
+ void (*custom_msg)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq,
+   char *id, void *ctx);
+
+ /**
+ * Interrupt handler, synchronous. If this callback is set to NULL,
+ * rte_virtio will hint the initiators not to send any interrupts.
+ */
+ void (*queue_kick)(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+ /** Device config read, synchronous. */
+ int (*get_config)(struct rte_virtio_dev *vdev, uint8_t *config,
+  uint32_t config_len);
+ /** Device config changed by the driver, synchronous. */
+ int (*set_config)(struct rte_virtio_dev *vdev, uint8_t *config,
+  uint32_t offset, uint32_t len, uint32_t flags);
+};
+
+/**
+ * Registers a new vhost target accepting remote connections. Multiple
+ * available transports are available. It is possible to create a Vhost-user
+ * Unix domain socket polling local connections or connect to a physical
+ * Virtio device and install an interrupt handler .
+ * \param trtype type of the transport used, e.g. "PCI", "PCI-vhost-user",
+ * "PCI-vDPA", "vhost-user".
+ * \param trid identifier of the device. For PCI this would be the BDF address,
+ * for vhost-user the socket name.
+ * \param trctx additional data for the specified transport. Can be NULL.
+ * \param tgt_ops callbacks to be called upon reaching specific initialization
+ * states.
+ * \param features supported Virtio features. To be negotiated with the
+ * driver ones. rte_virtio will append a couple of generic feature bits
+ * which are required by the Virtio spec. TODO list these features here
+ * \return 0 on success, negative errno otherwise
+ */
+int rte_virtio_tgt_register(char *trtype, char *trid, void *trctx,
+   struct rte_virtio_tgt_ops *tgt_ops,
+   uint64_t features);
+
+/**
+ * Finish async device tgt ops callback. Unless a tgt op has been documented
+ * as 'synchronous' this function must be called at the end of the op handler.
+ * It can be called either before or after the op handler returns. rte_virtio
+ * won't call any callbacks while another one hasn't been finished yet.
+ * \param vdev vhost device
+ * \param rc 0 on success, negative errno otherwise.
+ */
+int rte_virtio_tgt_cb_complete(struct rte_virtio_dev *vdev, int rc);
+
+/**
+ * Unregisters a vhost target asynchronously.
+ * \param cb_fn callback to be called on finish
+ * \param cb_arg argument for \c cb_fn
+ */
+void rte_virtio_tgt_unregister(char *trid,
+      void (*cb_fn)(void *arg), void *cb_arg);
+
+/**
+ * Bypass F_IOMMU_PLATFORM and translate gpa directly.
+ * \param mem vhost device memory
+ * \param gpa guest physical address
+ * \param len length of the memory to translate (in bytes). If requested
+ * memory chunk crosses memory region boundary, the *len will be set to
+ * the remaining, maximum length of virtually contiguous memory. In such
+ * case the user will be required to call another gpa_to_vva(gpa + *len).
+ * \return vhost virtual address or NULL if requested `gpa` is not mapped.
+ */
+static inline void *
+rte_virtio_gpa_to_vva(struct rte_virtio_memory *mem, uint64_t gpa, uint64_t *len)
+{
+ struct rte_virtio_mem_region *r;
+ uint32_t i;
+
+ for (i = 0; i < mem->nregions; i++) {
+ r = &mem->regions[i];
+ if (gpa >= r->guest_phys_addr &&
+    gpa <  r->guest_phys_addr + r->size) {
+
+ if (unlikely(*len > r->guest_phys_addr + r->size - gpa)) {
+ *len = r->guest_phys_addr + r->size - gpa;
+ }
+
+ return gpa - r->guest_phys_addr +
+       r->host_user_addr;
+ }
+ }
+ *len = 0;
+
+ return 0;
+}
+
+/**
+ * Translate I/O virtual address to vhost address space.
+ * If F_IOMMU_PLATFORM has been negotiated, this might potentially
+ * send a TLB miss and wait for the TLB update response.
+ * If F_IOMMU_PLATFORM has not been negotiated, `iova` is
+ * a physical address and `perm` is ignored.
+ * \param vdev vhost device
+ * \param iova I/O virtual address
+ * \param len length of the memory to translate (in bytes). If requested
+ * memory chunk crosses memory region boundary, the *len will be set to
+ * the remaining, maximum length of virtually contiguous memory. In such
+ * case the user will be required to call another gpa_to_vva(gpa + *len).
+ * \perm VHOST_ACCESS_RO,VHOST_ACCESS_WO or VHOST_ACCESS_RW
+ * \return vhost virtual address or NULL if requested `iova` is not mapped
+ * or the `perm` doesn't match.
+ */
+static inline void *
+rte_virtio_iova_to_vva(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq,
+      uint64_t iova, uint32_t *len, uint8_t perm)
+{
+ void *__vhost_iova_to_vva(struct virtio_net * dev, struct vhost_virtqueue * vq,
+  uint64_t iova, uint64_t size, uint8_t perm);
+
+ if (!(vdev->features & (1ULL << VIRTIO_F_IOMMU_PLATFORM))) {
+ return rte_virtio_gpa_to_vva(vdev->mem, iova, len);
+ }
+
+ return __vhost_iova_to_vva(vdev, vq, iova, len, perm);
+}
+
+/**
+ * Notify the driver about vq change. This is an eventfd_write for vhost-user
+ * or MMIO write for PCI devices.
+ */
+void rte_virtio_dev_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+
+/**
+ * Notify the driver about device config change. This will result in \c
+ * rte_virtio_tgt_ops->get_config being called. This is an eventfd_write
+ * for vhost-user or MMIO write for PCI devices
+ */
+void rte_virtio_dev_cfg_call(struct rte_virtio_dev *vdev, struct rte_virtio_vq *vq);
+