[dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

Jerin Jacob jerinjacobk at gmail.com
Tue Jun 22 19:25:24 CEST 2021


On Fri, Jun 18, 2021 at 3:11 PM fengchengwen <fengchengwen at huawei.com> wrote:
>
> On 2021/6/18 13:52, Jerin Jacob wrote:
> > On Thu, Jun 17, 2021 at 2:46 PM Bruce Richardson
> > <bruce.richardson at intel.com> wrote:
> >>
> >> On Wed, Jun 16, 2021 at 08:07:26PM +0530, Jerin Jacob wrote:
> >>> On Wed, Jun 16, 2021 at 3:47 PM fengchengwen <fengchengwen at huawei.com> wrote:
> >>>>
> >>>> On 2021/6/16 15:09, Morten Brørup wrote:
> >>>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson
> >>>>>> Sent: Tuesday, 15 June 2021 18.39
> >>>>>>
> >>>>>> On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
> >>>>>>> This patch introduces 'dmadevice' which is a generic type of DMA
> >>>>>>> device.
> >>>>>>>
> >>>>>>> The APIs of dmadev library exposes some generic operations which can
> >>>>>>> enable configuration and I/O with the DMA devices.
> >>>>>>>
> >>>>>>> Signed-off-by: Chengwen Feng <fengchengwen at huawei.com>
> >>>>>>> ---
> >>>>>> Thanks for sending this.
> >>>>>>
> >>>>>> Of most interest to me right now are the key data-plane APIs. While we
> >>>>>> are
> >>>>>> still in the prototyping phase, below is a draft of what we are
> >>>>>> thinking
> >>>>>> for the key enqueue/perform_ops/completed_ops APIs.
> >>>>>>
> >>>>>> Some key differences I note in below vs your original RFC:
> >>>>>> * Use of void pointers rather than iova addresses. While using iova's
> >>>>>> makes
> >>>>>>   sense in the general case when using hardware, in that it can work
> >>>>>> with
> >>>>>>   both physical addresses and virtual addresses, if we change the APIs
> >>>>>> to use
> >>>>>>   void pointers instead it will still work for DPDK in VA mode, while
> >>>>>> at the
> >>>>>>   same time allow use of software fallbacks in error cases, and also a
> >>>>>> stub
> >>>>>>   driver than uses memcpy in the background. Finally, using iova's
> >>>>>> makes the
> >>>>>>   APIs a lot more awkward to use with anything but mbufs or similar
> >>>>>> buffers
> >>>>>>   where we already have a pre-computed physical address.
> >>>>>> * Use of id values rather than user-provided handles. Allowing the
> >>>>>> user/app
> >>>>>>   to manage the amount of data stored per operation is a better
> >>>>>> solution, I
> >>>>>>   feel than proscribing a certain about of in-driver tracking. Some
> >>>>>> apps may
> >>>>>>   not care about anything other than a job being completed, while other
> >>>>>> apps
> >>>>>>   may have significant metadata to be tracked. Taking the user-context
> >>>>>>   handles out of the API also makes the driver code simpler.
> >>>>>> * I've kept a single combined API for completions, which differs from
> >>>>>> the
> >>>>>>   separate error handling completion API you propose. I need to give
> >>>>>> the
> >>>>>>   two function approach a bit of thought, but likely both could work.
> >>>>>> If we
> >>>>>>   (likely) never expect failed ops, then the specifics of error
> >>>>>> handling
> >>>>>>   should not matter that much.
> >>>>>>
> >>>>>> For the rest, the control / setup APIs are likely to be rather
> >>>>>> uncontroversial, I suspect. However, I think that rather than xstats
> >>>>>> APIs,
> >>>>>> the library should first provide a set of standardized stats like
> >>>>>> ethdev
> >>>>>> does. If driver-specific stats are needed, we can add xstats later to
> >>>>>> the
> >>>>>> API.
> >>>>>>
> >>>>>> Appreciate your further thoughts on this, thanks.
> >>>>>>
> >>>>>> Regards,
> >>>>>> /Bruce
> >>>>>
> >>>>> I generally agree with Bruce's points above.
> >>>>>
> >>>>> I would like to share a couple of ideas for further discussion:
> >>>
> >>>
> >>> I believe some of the other requirements and comments for generic DMA will be
> >>>
> >>> 1) Support for the _channel_, Each channel may have different
> >>> capabilities and functionalities.
> >>> Typical cases are, each channel have separate source and destination
> >>> devices like
> >>> DMA between PCIe EP to Host memory, Host memory to Host memory, PCIe
> >>> EP to PCIe EP.
> >>> So we need some notion of the channel in the specification.
> >>>
> >>
> >> Can you share a bit more detail on what constitutes a channel in this case?
> >> Is it equivalent to a device queue (which we are flattening to individual
> >> devices in this API), or to a specific configuration on a queue?
> >
> > It not a queue. It is one of the attributes for transfer.
> > I.e in the same queue, for a given transfer it can specify the
> > different "source" and "destination" device.
> > Like CPU to Sound card, CPU to network card etc.
> >
> >
> >>
> >>> 2) I assume current data plane APIs are not thread-safe. Is it right?
> >>>
> >> Yes.
> >>
> >>>
> >>> 3) Cookie scheme outlined earlier looks good to me. Instead of having
> >>> generic dequeue() API
> >>>
> >>> 4) Can split the rte_dmadev_enqueue_copy(uint16_t dev_id, void * src,
> >>> void * dst, unsigned int length);
> >>> to two stage API like, Where one will be used in fastpath and other
> >>> one will use used in slowpath.
> >>>
> >>> - slowpath API will for take channel and take other attributes for transfer
> >>>
> >>> Example syantx will be:
> >>>
> >>> struct rte_dmadev_desc {
> >>>            channel id;
> >>>            ops ; // copy, xor, fill etc
> >>>           other arguments specific to dma transfer // it can be set
> >>> based on capability.
> >>>
> >>> };
> >>>
> >>> rte_dmadev_desc_t rte_dmadev_preprare(uint16_t dev_id,  struct
> >>> rte_dmadev_desc *dec);
> >>>
> >>> - Fastpath takes arguments that need to change per transfer along with
> >>> slow-path handle.
> >>>
> >>> rte_dmadev_enqueue(uint16_t dev_id, void * src, void * dst, unsigned
> >>> int length,  rte_dmadev_desc_t desc)
> >>>
> >>> This will help to driver to
> >>> -Former API form the device-specific descriptors in slow path  for a
> >>> given channel and fixed attributes per transfer
> >>> -Later API blend "variable" arguments such as src, dest address with
> >>> slow-path created descriptors
> >>>
> >>
> >> This seems like an API for a context-aware device, where the channel is the
> >> config data/context that is preserved across operations - is that correct?
> >> At least from the Intel DMA accelerators side, we have no concept of this
> >> context, and each operation is completely self-described. The location or
> >> type of memory for copies is irrelevant, you just pass the src/dst
> >> addresses to reference.
> >
> > it is not context-aware device. Each HW JOB is self-described.
> > You can view it different attributes of transfer.
> >
> >
> >>
> >>> The above will give better performance and is the best trade-off c
> >>> between performance and per transfer variables.
> >>
> >> We may need to have different APIs for context-aware and context-unaware
> >> processing, with which to use determined by the capabilities discovery.
> >> Given that for these DMA devices the offload cost is critical, more so than
> >> any other dev class I've looked at before, I'd like to avoid having APIs
> >> with extra parameters than need to be passed about since that just adds
> >> extra CPU cycles to the offload.
> >
> > If driver does not support additional attributes and/or the
> > application does not need it, rte_dmadev_desc_t can be NULL.
> > So that it won't have any cost in the datapath. I think, we can go to
> > different API
> > cases if we can not abstract problems without performance impact.
> > Otherwise, it will be too much
> > pain for applications.
>
> Yes, currently we plan to use different API for different case, e.g.
>   rte_dmadev_memcpy()  -- deal with local to local memcopy
>   rte_dmadev_memset()  -- deal with fill with local memory with pattern
> maybe:
>   rte_dmadev_imm_data()  --deal with copy very little data
>   rte_dmadev_p2pcopy()   --deal with peer-to-peer copy of diffenet PCIE addr
>
> These API capabilities will be reflected in the device capability set so that
> application could know by standard API.


There will be a lot of combination of that it will be like M x N cross
base case, It won't scale.

>
> >
> > Just to understand, I think, we need to HW capabilities and how to
> > have a common API.
> > I assume HW will have some HW JOB descriptors which will be filled in
> > SW and submitted to HW.
> > In our HW,  Job descriptor has the following main elements
> >
> > - Channel   // We don't expect the application to change per transfer
> > - Source address - It can be scatter-gather too - Will be changed per transfer
> > - Destination address - It can be scatter-gather too - Will be changed
> > per transfer
> > - Transfer Length - - It can be scatter-gather too - Will be changed
> > per transfer
> > - IOVA address where HW post Job completion status PER Job descriptor
> > - Will be changed per transfer
> > - Another sideband information related to channel  // We don't expect
> > the application to change per transfer
> > - As an option, Job completion can be posted as an event to
> > rte_event_queue  too // We don't expect the application to change per
> > transfer
>
> The 'option' field looks like a software interface field, but not HW descriptor.

It is in HW descriptor.

>
> >
> > @Richardson, Bruce @fengchengwen @Hemant Agrawal
> >
> > Could you share the options for your HW descriptors  which you are
> > planning to expose through API like above so that we can easily
> > converge on fastpath API
> >
>
> Kunpeng HW descriptor is self-describing, and don't need refer context info.
>
> Maybe the fields which was fix with some transfer type could setup by driver, and
> don't expose to application.

Yes. I agree.I think, that reason why I though to have
rte_dmadev_prep() call to convert DPDK DMA transfer attributes to HW
specific descriptors
and have single enq() operation with variable argument(through enq
parameter) and fix argumenents through rte_dmadev_prep() call object.

>
> So that we could use more generic way to define the API.
>
> >
> >
> >>
> >> /Bruce
> >
> > .
> >
>


More information about the dev mailing list