[dpdk-dev] [PATCH] gpudev: introduce memory API

Thomas Monjalon thomas at monjalon.net
Thu Jun 3 09:26:02 CEST 2021


03/06/2021 09:06, Andrew Rybchenko:
> On 6/2/21 11:35 PM, Thomas Monjalon wrote:
> > From: Elena Agostini <eagostini at nvidia.com>
> > 
> > The new library gpudev is for dealing with GPU from a DPDK application
> > in a vendor-agnostic way.
> > 
> > As a first step, the features are focused on memory management.
> > A function allows to allocate memory inside the GPU,
> > while another one allows to use main (CPU) memory from the GPU.
> > 
> > The infrastructure is prepared to welcome drivers in drivers/gpu/
> > as the upcoming NVIDIA one, implementing the gpudev API.
> > Other additions planned for next revisions:
> >   - C implementation file
> >   - guide documentation
> >   - unit tests
> >   - integration in testpmd to enable Rx/Tx to/from GPU memory.
> > 
> > The next step should focus on GPU processing task control.
> > 
> > Signed-off-by: Elena Agostini <eagostini at nvidia.com>
> > Signed-off-by: Thomas Monjalon <thomas at monjalon.net>
> 
> 
> LGTM as an RFC. It is definitely to a patch to apply
> since implementation is missing. See my notes below.

Yes sorry I forgot the RFC tag when sending.

[...]
> > +typedef int (*gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void **ptr);
> > +typedef int (*gpu_free_t)(struct rte_gpu_dev *dev, void *ptr);
> 
> Not that important but I always prefer to typedef
> function prototypes w/o pointer and use pointer in
> the structure below. I.e.
> 
> typedef int (gpu_malloc_t)(struct rte_gpu_dev *dev, size_t size, void
> **ptr);
> 
> It allows to specify that corresponding callback
> must comply to the prototype and produce build
> error otherwise (and do not rely on warnings), e.g.
> 
> static gpu_malloc_t mlx5_gpu_malloc;
> static int
> mlx5_gpu_malloc(struct rte_gpu_dev *dev, size_t size, void **ptr)
> {
>      ...
> }
> 
> May be a new library should go this way.

I agree.
> 
> > +
> > +struct rte_gpu_dev {
> > +	/* Backing device. */
> > +	struct rte_device *device;
> > +	/* GPU info structure. */
> > +	struct rte_gpu_info info;
> > +	/* Counter of processes using the device. */
> > +	uint16_t process_cnt;
> > +	/* If device is currently used or not. */
> > +	enum rte_gpu_state state;
> > +	/* FUNCTION: Allocate memory on the GPU. */
> > +	gpu_malloc_t gpu_malloc;
> > +	/* FUNCTION: Allocate memory on the CPU visible from the GPU. */
> > +	gpu_malloc_t gpu_malloc_visible;
> > +	/* FUNCTION: Free allocated memory on the GPU. */
> > +	gpu_free_t gpu_free;
> 
> Don't we need a callback to get dev_info?

Yes it's my miss.

[...]
> > +__rte_experimental
> > +int rte_gpu_dev_info_get(uint16_t gpu_id, struct rte_gpu_info **info);
> 
> Hm, I think it is better to have 'struct rte_gpu_info *info'.
> Why should it allocate and return memory to be freed by caller?

No you're right, I overlooked it.

[...]
> > + * Allocate a chunk of memory on the GPU.
> 
> Looking a below function it is required to clarify here if
> the memory is visible or invisible to GPU (or both allowed).

This function allocates on the GPU so it is visible by the GPU.
I feel I misunderstand your question.

> > + *
> > + * @param gpu_id
> > + *   GPU ID to allocate memory.
> > + * @param size
> > + *   Number of bytes to allocate.
> 
> Is behaviour defined if zero size is requested?
> IMHO, it would be good to define.

OK

> > + * @param ptr
> > + *   Pointer to store the address of the allocated memory.
> > + *
> > + * @return
> > + *   0 on success, -1 otherwise.
> 
> Don't we want to differentiate various errors using
> negative errno as it is done in many DPDK libraries?

Yes I think so, I was just too much lazy to do it in this RFC.

> > + */
> > +__rte_experimental
> > +int rte_gpu_malloc(uint16_t gpu_id, size_t size, void **ptr);
> 
> May be *malloc() should return a pointer and "negative"
> values used to report various errnos?

I don't understand what you mean by negative values if it is a pointer.
We could return a pointer and use rte_errno.

> The problem with the approach that comparison vs NULL will
> not work in this case and we need special macro or small
> inline function to check error condition.
> 
> Returned pointer is definitely more convenient, but above
> not may result in bugs.

I don't know what is better.

[...]
> > + * Deallocate a chunk of memory allocated with rte_gpu_malloc*.
> > + *
> > + * @param gpu_id
> > + *   Reference GPU ID.
> > + * @param ptr
> > + *   Pointer to the memory area to be deallocated.
> 
> I think it should be NOP in the case of NULL pointer and it
> should be documented. If not, it must be documented as well.

OK for NOP.




More information about the dev mailing list