[EXT] Re: [dpdk-dev] [PATCH v1 00/12] mldev: introduce machine learning device library

Thomas Monjalon thomas at monjalon.net
Fri Jan 27 12:34:15 CET 2023


Hi,

Shivah Shankar, please quote your replies
so we can distinguish what I said from what you say.

Please try to understand my questions, you tend to reply to something else.


27/01/2023 05:29, Jerin Jacob:
> On Fri, Jan 27, 2023 at 8:04 AM Shivah Shankar Shankar Narayan Rao
> <sshankarnara at marvell.com> wrote:
> > 25/01/2023 20:01, Jerin Jacob:
> > > On Wed, Jan 25, 2023 at 7:50 PM Thomas Monjalon <thomas at monjalon.net> wrote:
> > > > 14/11/2022 13:02, jerinj at marvell.com:
> > > > > > ML Model: An ML model is an algorithm trained over a dataset. A
> > > > > > model consists of procedure/algorithm and data/pattern required to
> > > > > > make predictions on live data. Once the model is created and
> > > > > > trained outside of the DPDK scope,
> > > > > > the model can be loaded via rte_ml_model_load() and then start it
> > > > > > using rte_ml_model_start() API. The rte_ml_model_params_update()
> > > > > > can be used to update the model
> > > > > > parameters such as weight and bias without unloading the model
> > > > > > using rte_ml_model_unload().> > > > > 
> > > > > The fact that the model is prepared outside means the model format
> > > > > is free and probably different per mldev driver.
> > > > > I think it is OK but it requires a lot of documentation effort to
> > > > > explain how to bind the model and its parameters with the DPDK API.
> > > > > Also we may need to pass some metadata from the model builder to the
> > > > > inference engine in order to enable optimizations prepared in the
> > > > > model.
> > > > > And the other way, we may need inference capabilities in order to
> > > > > generate an optimized model which can run in the inference engine.
> > > > 
> > > > The base API specification kept absolute minimum. Currently, weight
> > > > and biases parameters updated through rte_ml_model_params_update(). It
> > > > can be extended when there are drivers supports it or if you have any
> > > > specific parameter you would like to add it in
> > > > rte_ml_model_params_update().
> > > 
> > > This function is
> > > int rte_ml_model_params_update(int16_t dev_id, int16_t model_id, void
> > > *buffer);
> > > 
> > > How are we supposed to provide separate parameters in this void* ?
> >
> > Just to clarify on what "parameters" mean,
> > they just mean weights and biases of the model,
> > which are the parameters for a model.
> > Also, the Proposed APIs are for running the inference
> > on a pre-trained model.
> > For running the inference the amount of parameters tuning
> > needed/done is limited/none.

Why is it limited?
I think you are limiting to *your* model.

> > The only parameters that get may get changed are the Weights and Bias
> > which the API rte_ml_model_params_update() caters to.

We cannot imagine a model with more type of parameters?

> > While running the inference on a Model there won't be any random
> > addition or removal of operators to/from the model or there won't
> > be any changes in the actual flow of model.
> > Since the only parameter that can be changed is Weights and Biases
> > the above API should take care.

No, you don't reply to my question.
I want to be able to change a single parameter.
I am expecting a more fine-grain API than a simple "void*".
We could give the name of the parameter and a value, why not?

> > > > Other metadata data like batch, shapes, formats queried using
> > > > rte_ml_io_info().
> > > Copying:
> > > +/** Input and output data information structure
> > > + *
> > > + * Specifies the type and shape of input and output data.
> > > + */
> > > +struct rte_ml_io_info {
> > > +       char name[RTE_ML_STR_MAX];
> > > +       /**< Name of data */
> > > +       struct rte_ml_io_shape shape;
> > > +       /**< Shape of data */
> > > +       enum rte_ml_io_type qtype;
> > > +       /**< Type of quantized data */
> > > +       enum rte_ml_io_type dtype;
> > > +       /**< Type of de-quantized data */ };
> > > 
> > > Is it the right place to notify the app that some model optimizations
> > > are supported? (example: merge some operations in the graph)
> >
> > The inference is run on a pre-trained model, which means
> > any merges /additions of operations to the graph are NOT done.
> > If any such things are done then the changed model needs to go
> > through the training and compilation once again
> > which is out of scope of these APIs.

Please try to understand what I am saying.
I want the application to be able to know some capabilities are supported
by the inference driver.
So it will allow to generate the model with some optimizations.

> > > > > [...]
> > > > > > Typical application utilisation of the ML API will follow the
> > > > > > following programming flow.
> > > > > > 
> > > > > > - rte_ml_dev_configure()
> > > > > > - rte_ml_dev_queue_pair_setup()
> > > > > > - rte_ml_model_load()
> > > > > > - rte_ml_model_start()
> > > > > > - rte_ml_model_info()
> > > > > > - rte_ml_dev_start()
> > > > > > - rte_ml_enqueue_burst()
> > > > > > - rte_ml_dequeue_burst()
> > > > > > - rte_ml_model_stop()
> > > > > > - rte_ml_model_unload()
> > > > > > - rte_ml_dev_stop()
> > > > > > - rte_ml_dev_close()
> > > > > 
> > > > > Where is parameters update in this flow?
> > > > 
> > > > Added the mandatory APIs in the top level flow doc.
> > > > rte_ml_model_params_update() used to update the parameters.
> > > 
> > > The question is "where" should it be done?
> > > Before/after start?
> >
> > The model image comes with the Weights and Bias
> > and will be loaded and used as a part of rte_ml_model_load
> > and rte_ml_model_start.
> > In rare scenarios where the user wants to update
> > the Weights and Bias of an already loaded model,
> > the rte_ml_model_stop can be called to stop the model
> > and the Weights and Biases can be updated using the
> > The parameters (Weights&Biases) can be updated
> > when the  rte_ml_model_params_update() API
> > followed by rte_ml_model_start to start the model
> > with the new Weights and Biases.

OK please sure it is documented that parameters update
must be done on a stopped engine.

> > > > > Should we update all parameters at once or can it be done more
> > > > > fine-grain?
> > > > 
> > > > Currently, rte_ml_model_params_update() can be used to update weight
> > > > and bias via buffer when device is in stop state and without unloading
> > > > the model.

Passing a raw buffer is a really dark API.
We need to know how to fill the buffer.

> > > The question is "can we update a single parameter"?
> > > And how?
> > 
> > As mentioned above for running inference the model is already trained
> > the only parameter that is updated is the Weights and Biases.
> > "Parameters" is another word for Weights and Bias.
> > No other parameters are considered.

You are not replying to the question.
How can we update a single parameter?

> > Are there any other parameters you have on your mind?

No

> > > > > Question about the memory used by mldev:
> > > > > Can we manage where the memory is allocated (host, device, mix,
> > > > > etc)?
> > > > 
> > > > Just passing buffer pointers now like other subsystem.
> > > > Other EAL infra service can take care of the locality of memory as it
> > > > is not specific to ML dev.
> > > 
> > > I was thinking about memory allocation required by the inference engine.
> > > How to specify where to allocate? Is it just hardcoded in the driver?
> >
> > Any memory within the hardware is managed by the driver.
> 
> I think, Thomas is asking input and output memory for interference. If
> so, the parameters for
> struct rte_ml_buff_seg or needs to add type or so. Thomas, Please
> propose what parameters you want here.
> In case if it is for internal driver memory, We can pass the memory
> type in rte_ml_dev_configure(), If so, please propose
> the memory types you need and the parameters.

I'm talking about the memory used by the driver to make the inference works.
In some cases we may prefer the hardware using host memory,
sometimes use the device memory.
I think that's something we may tune in the configuration.
I suppose we are fine with allocation hardcoded in the driver for now,
as I don't have a clear need.




More information about the dev mailing list