[dpdk-dev] [PATCH] gpudev: introduce memory API

Thomas Monjalon thomas at monjalon.net
Mon Jun 7 12:29:50 CEST 2021


06/06/2021 07:28, Jerin Jacob:
> On Sun, Jun 6, 2021 at 6:44 AM Honnappa Nagarahalli
> > This patch does not provide the big picture view of what the processing looks like using GPU. It would be good to explain that.
> > For ex:
> > 1) Will the notion of GPU hidden from the application? i.e. is the application allowed to launch kernels?
> >         1a) Will DPDK provide abstract APIs to launch kernels?
> >      This would require us to have the notion of GPU in DPDK and the application would depend on the availability of GPU in the system.

Not sure "kernels" is a well known word in this context.
I propose talking about computing tasks.
The DPDK application running on the CPU must be synchronized
with the tasks running on devices, so yes we need a way
to decide what to launch and when from the DPDK application.

> > 2) Is launching kernels hidden? i.e. the application still calls DPDK abstract APIs (such as encryption/decryption APIs) without knowing that the encryption/decryption is happening on GPU.
> >      This does not require us to have a notion of GPU in DPDK at the API level
> 
> I will leave this to Thomas.

The general need is to allow running any kind of processing on devices.
Some processing may be very specific, others could fit in the existing
class API like crypto and regex.
I think implementing such specific class drivers based on tasks
dynamically loaded on the device may be done as a second step.

Thank you for the questions, it helps defining the big picture
for the next revision of the patch.

> > If we keep CXL in mind, I would imagine that in the future the devices on PCIe could have their own local memory. May be some of the APIs could use generic names. For ex: instead of calling it as "rte_gpu_malloc" may be we could call it as "rte_dev_malloc". This way any future device which hosts its own memory that need to be managed by the application, can use these APIs.
> 
> That is a good thought. it is possible to hook the download firmware,
> memory management, Job management(as messages to/from device) to
> rte_device itself.
> I think, one needs to consider, how to integrate with the existing
> DPDK subsystem, for example: If one decided to implement bbdev or
> regexdev with such computing device,
> Need to consider, Is it better to have bbdev driver has depended
> gpudev or rte_device has this callback and use with bbdev driver.

Absolutely. If a specialized driver class fits with a workload,
it is best handled with a driver in its specific class.

> > > > > Yes baseband processing is one possible usage of GPU with DPDK.
> > > > > We could also imagine some security analysis, or any machine learning...
> > > > >
> > > > > > I can think of "coprocessor-dev" as one of the name.
> > > > >
> > > > > "coprocessor" looks too long as prefix of the functions.
> > >
> > > Yes. Libray name can be lengthy, but API prefix should be 3 letters kind short
> > > form will be required.
> > >
> > >
> > > > >
> > > > > > We do have similar machine learning co-processors(for compute) if
> > > > > > we can keep a generic name and it is for the above functions we
> > > > > > may use this subsystem as well in the future.
> > > > >
> > > >
> > > > Accelerator, 'acce_dev' ? ;-)
> > >
> > > It may get confused with HW accelerators.
> > >
> > >
> > > Some of the options I can think of. Sorting in my preference.
> > >
> > > library name, API prefix
> > > 1) libhpc-dev, rte_hpc_ (hpc-> Heterogeneous processor compute)
> > > 2) libhc-dev, rte_hc_
> > > (https://en.wikipedia.org/wiki/Heterogeneous_computing see: Example
> > > hardware)
> > > 3) libpu-dev, rte_pu_ (pu -> processing unit)
> > > 4) libhp-dev, rte_hp_ (hp->heterogeneous processor)
> > > 5) libcoprocessor-dev, rte_cps_ ?
> > > 6) libcompute-dev, rte_cpt_ ?
> > > 7) libgpu-dev, rte_gpu_
> > 
> > These seem to assume that the application can launch its own workload on the device? Does DPDK need to provide abstract APIs for launching work on a device?

That's the difficult part.
We should not try to re-invent CUDA or OpenCL.
I think this part should not be in DPDK.
We only need to synchronize with dynamic nature of the device workload.
We will be more specific in the v2.





More information about the dev mailing list