[dpdk-dev] [RFC v2] doc compression API for DPDK

Verma, Shally Shally.Verma at cavium.com
Fri Jan 12 14:49:16 CET 2018


Hi Fiona

> -----Original Message-----
> From: Trahe, Fiona [mailto:fiona.trahe at intel.com]
> Sent: 12 January 2018 00:24
> To: Verma, Shally <Shally.Verma at cavium.com>; Ahmed Mansour
> <ahmed.mansour at nxp.com>; dev at dpdk.org
> Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> Gupta, Ashish <Ashish.Gupta at cavium.com>; Sahu, Sunila
> <Sunila.Sahu at cavium.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> <Mahipal.Challa at cavium.com>; Jain, Deepak K <deepak.k.jain at intel.com>;
> Hemant Agrawal <hemant.agrawal at nxp.com>; Roy Pledge
> <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>; Trahe,
> Fiona <fiona.trahe at intel.com>
> Subject: RE: [RFC v2] doc compression API for DPDK
> 
> Hi Shally, Ahmed,
> 
> 
> > -----Original Message-----
> > From: Verma, Shally [mailto:Shally.Verma at cavium.com]
> > Sent: Wednesday, January 10, 2018 12:55 PM
> > To: Ahmed Mansour <ahmed.mansour at nxp.com>; Trahe, Fiona
> <fiona.trahe at intel.com>; dev at dpdk.org
> > Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> Gupta, Ashish
> > <Ashish.Gupta at cavium.com>; Sahu, Sunila <Sunila.Sahu at cavium.com>;
> De Lara Guarch, Pablo
> > <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> <Mahipal.Challa at cavium.com>; Jain, Deepak K
> > <deepak.k.jain at intel.com>; Hemant Agrawal
> <hemant.agrawal at nxp.com>; Roy Pledge
> > <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>
> > Subject: RE: [RFC v2] doc compression API for DPDK
> >
> > HI Ahmed
> >
> > > -----Original Message-----
> > > From: Ahmed Mansour [mailto:ahmed.mansour at nxp.com]
> > > Sent: 10 January 2018 00:38
> > > To: Verma, Shally <Shally.Verma at cavium.com>; Trahe, Fiona
> > > <fiona.trahe at intel.com>; dev at dpdk.org
> > > Cc: Athreya, Narayana Prasad <NarayanaPrasad.Athreya at cavium.com>;
> > > Gupta, Ashish <Ashish.Gupta at cavium.com>; Sahu, Sunila
> > > <Sunila.Sahu at cavium.com>; De Lara Guarch, Pablo
> > > <pablo.de.lara.guarch at intel.com>; Challa, Mahipal
> > > <Mahipal.Challa at cavium.com>; Jain, Deepak K
> <deepak.k.jain at intel.com>;
> > > Hemant Agrawal <hemant.agrawal at nxp.com>; Roy Pledge
> > > <roy.pledge at nxp.com>; Youri Querry <youri.querry_1 at nxp.com>
> > > Subject: Re: [RFC v2] doc compression API for DPDK
> > >
> > > Hi Shally,
> > >
> > > Thanks for the summary. It is very helpful. Please see comments below
> > >
> > >
> > > On 1/4/2018 6:45 AM, Verma, Shally wrote:
> > > > This is an RFC v2 document to brief understanding and requirements on
> > > compression API proposal in DPDK. It is based on "[RFC v3] Compression
> API
> > > in DPDK
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdpd
> > >
> k.org%2Fdev%2Fpatchwork%2Fpatch%2F32331%2F&data=02%7C01%7Cahm
> > >
> ed.mansour%40nxp.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea
> > >
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C636506631207323264&sdata=JF
> > > tOnJxajgXX7s3DMZ79K7VVM7TXO8lBd6rNeVlsHDg%3D&reserved=0 ".
> > > > Intention of this document is to align on concepts built into
> compression
> > > API, its usage and identify further requirements.
> > > >
> > > > Going further it could be a base to Compression Module Programmer
> > > Guide.
> > > >
> > > > Current scope is limited to
> > > > - definition of the terminology which makes up foundation of
> compression
> > > API
> > > > - typical API flow expected to use by applications
> > > > - Stateless and Stateful operation definition and usage after RFC v1 doc
> > > review
> > >
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdev.
> > > dpdk.narkive.com%2FCHS5l01B%2Fdpdk-dev-rfc-v1-doc-compression-
> api-
> > > for-
> > >
> dpdk&data=02%7C01%7Cahmed.mansour%40nxp.com%7C80bd3270430c473
> > >
> fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6
> > >
> 36506631207323264&sdata=Fy7xKIyxZX97i7vEM6NqgrvnqKrNrWOYLwIA5dEH
> > > QNQ%3D&reserved=0
> > > >
> > > > 1. Overview
> > > > ~~~~~~~~~~~
> > > >
> > > > A. Compression Methodologies in compression API
> > > > ===========================================
> > > > DPDK compression supports two types of compression methodologies:
> > > > - Stateless - each data object is compressed individually without any
> > > reference to previous data,
> > > > - Stateful -  each data object is compressed with reference to previous
> data
> > > object i.e. history of data is needed for compression / decompression
> > > > For more explanation, please refer RFC
> > >
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw
> > >
> ww.ietf.org%2Frfc%2Frfc1951.txt&data=02%7C01%7Cahmed.mansour%40nx
> > >
> p.com%7C80bd3270430c473fa71d08d55368a0e1%7C686ea1d3bc2b4c6fa92cd9
> > >
> 9c5c301635%7C0%7C0%7C636506631207323264&sdata=pfp2VX1w3UxH5YLcL
> > > 2R%2BvKXNeS7jP46CsASq0B1SETw%3D&reserved=0
> > > >
> > > > To support both methodologies, DPDK compression introduces two key
> > > concepts: Session and Stream.
> > > >
> > > > B. Notion of a session in compression API
> > > > ==================================
> > > > A Session in DPDK compression is a logical entity which is setup one-
> time
> > > with immutable parameters i.e. parameters that don't change across
> > > operations and devices.
> > > > A session can be shared across multiple devices and multiple operations
> > > simultaneously.
> > > > A typical Session parameters includes info such as:
> > > > - compress / decompress
> > > > - compression algorithm and associated configuration parameters
> > > >
> > > > Application can create different sessions on a device initialized with
> > > same/different xforms. Once a session is initialized with one xform it
> cannot
> > > be re-initialized.
> > > >
> > > > C. Notion of stream in compression API
> > > >  =======================================
> > > > Unlike session which carry common set of information across
> operations, a
> > > stream in DPDK compression is a logical entity which identify related set
> of
> > > operations and carry operation specific information as needed by device
> > > during its processing.
> > > > It is device specific data structure which is opaque to application, setup
> and
> > > maintained by device.
> > > >
> > > > A stream can be used with *only* one op at a time i.e. no two
> operations
> > > can share same stream simultaneously.
> > > > A stream is *must* for stateful ops processing and optional for
> stateless
> > > (Please see respective sections for more details).
> > > >
> > > > This enables sharing of a session by multiple threads handling different
> > > data set as each op carry its own context (internal states, history buffers
> et
> > > el) in its attached stream.
> > > > Application should call rte_comp_stream_create() and attach to op
> before
> > > beginning of  operation processing and free via rte_comp_stream_free()
> > > after its complete.
> > > >
> > > > C. Notion of burst operations in compression API
> > > >  =======================================
> > > > A burst in DPDK compression is an array of operations where each op
> carry
> > > independent set of data. i.e. a burst can look like:
> > > >
> > > >                                       ----------------------------------------------------------------
> -----
> > > ------------------------------------
> > > >               enque_burst (|op1.no_flush | op2.no_flush | op3.flush_final |
> > > op4.no_flush | op5.no_flush |)
> > > >                                        ----------------------------------------------------------------
> ----
> > > -------------------------------------
> > > >
> > > > Where, op1 .. op5 are all independent of each other and carry entirely
> > > different set of data.
> > > > Each op can be attached to same/different session but *must* be
> attached
> > > to different stream.
> > > >
> > > > Each op (struct rte_comp_op) carry compression/decompression
> > > operational parameter and is both an input/output parameter.
> > > > PMD gets source, destination and checksum information at input and
> > > update it with bytes consumed and produced and checksum at output.
> > > >
> > > > Since each operation in a burst is independent and thus can complete
> out-
> > > of-order,  applications which need ordering, should setup per-op user
> data
> > > area with reordering information so that it can determine enqueue order
> at
> > > deque.
> > > >
> > > > Also if multiple threads calls enqueue_burst() on same queue pair then
> it's
> > > application onus to use proper locking mechanism to ensure exclusive
> > > enqueuing of operations.
> > > >
> > > > D. Stateless Vs Stateful
> > > > ===================
> > > > Compression API provide RTE_COMP_FF_STATEFUL feature flag for
> PMD
> > > to reflect its support for Stateful operation. Each op carry an op type
> > > indicating if it's to be processed stateful or stateless.
> > > >
> > > > D.1 Compression API Stateless operation
> > > > ------------------------------------------------------
> > > > An op is processed stateless if it has
> > > > -              flush value is set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL
> > > (required only on compression side),
> > > > -	 op_type set to RTE_COMP_OP_STATELESS
> > > > -              All-of the required input and sufficient large output buffer to
> store
> > > output i.e. OUT_OF_SPACE can never occur.
> > > >
> > > > When all of the above conditions are met, PMD initiates stateless
> > > processing and releases acquired resources after processing of current
> > > operation is complete i.e. full input consumed and full output written.
> [Fiona] I think 3rd condition conflicts with D1.1 below and anyway cannot be
> a precondition. i.e.
> PMD must initiate stateless processing based on RTE_COMP_OP_STATELESS.
> It can't always know if the output buffer is big enough before processing, it
> must process the input data and
> only when it has consumed it all can it know that all the output data fits or
> doesn't fit in the output buffer.
> 
> I'd suggest rewording as follows:
> An op is processed statelessly if op_type is set to RTE_COMP_OP_STATELESS
> In this case
> - The flush value must be set to RTE_FLUSH_FULL or RTE_FLUSH_FINAL
> (required only on compression side),
> - All of the input data must be in the src buffer
> - The dst buffer should be sufficiently large enough to hold the expected
> output
> The PMD acquires the necessary resources to process the op. After
> processing of current operation is
> complete, whether successful or not, it releases acquired resources and no
> state, history or data is
> held in the PMD or carried over to subsequent ops.
> In SUCCESS case full input is consumed and full output written and status is
> set to RTE_COMP_OP_STATUS_SUCCESS.
> OUT-OF-SPACE as D1.1 below.
> 

[Shally] Ok. Agreed.

> > > > Application can optionally attach a stream to such ops. In such case,
> > > application must attach different stream to each op.
> > > >
> > > > Application can enqueue stateless burst via making consecutive
> > > enque_burst() calls i.e. Following is relevant usage:
> > > >
> > > > enqueued = rte_comp_enque_burst (dev_id, qp_id, ops1, nb_ops);
> > > > enqueued = rte_comp_enque_burst(dev_id, qp_id, ops2, nb_ops);
> > > >
> > > > *Note - Every call has different ops array i.e.  same rte_comp_op array
> > > *cannot be re-enqueued* to process next batch of data until previous
> ones
> > > are completely processed.
> > > >
> > > > D.1.1 Stateless and OUT_OF_SPACE
> > > > ------------------------------------------------
> > > > OUT_OF_SPACE is a condition when output buffer runs out of space
> and
> > > where PMD still has more data to produce. If PMD run into such
> condition,
> > > then it's an error condition in stateless processing.
> > > > In such case, PMD resets itself and return with status
> > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with produced=consumed=0
> i.e.
> > > no input read, no output written.
> > > > Application can resubmit an full input with larger output buffer size.
> > >
> > > [Ahmed] Can we add an option to allow the user to read the data that
> was
> > > produced while still reporting OUT_OF_SPACE? this is mainly useful for
> > > decompression applications doing search.
> >
> > [Shally] It is there but applicable for stateful operation type (please refer to
> handling out_of_space under
> > "Stateful Section").
> > By definition, "stateless" here means that application (such as IPCOMP)
> knows maximum output size
> > guaranteedly and ensure that uncompressed data size cannot grow more
> than provided output buffer.
> > Such apps can submit an op with type = STATELESS and provide full input,
> then PMD assume it has
> > sufficient input and output and thus doesn't need to maintain any contexts
> after op is processed.
> > If application doesn't know about max output size, then it should process it
> as stateful op i.e. setup op
> > with type = STATEFUL and attach a stream so that PMD can maintain
> relevant context to handle such
> > condition.
> [Fiona] There may be an alternative that's useful for Ahmed, while still
> respecting the stateless concept.
> In Stateless case where a PMD reports OUT_OF_SPACE in decompression
> case
> it could also return consumed=0, produced = x, where x>0. X indicates the
> amount of valid data which has
>  been written to the output buffer. It is not complete, but if an application
> wants to search it it may be sufficient.
> If the application still wants the data it must resubmit the whole input with a
> bigger output buffer, and
>  decompression will be repeated from the start, it
>  cannot expect to continue on as the PMD has not maintained state, history
> or data.
> I don't think there would be any need to indicate this in capabilities, PMDs
> which cannot provide this
> functionality would always return produced=consumed=0, while PMDs which
> can could set produced > 0.
> If this works for you both, we could consider a similar case for compression.
> 

[Shally] Sounds Fine to me. Though then in that case, consume should also be updated to actual consumed by PMD.
Setting consumed = 0 with produced > 0 doesn't correlate. 

> >
> > >
> > > > D.2 Compression API Stateful operation
> > > > ----------------------------------------------------------
> > > >  A Stateful operation in DPDK compression means application invokes
> > > enqueue burst() multiple times to process related chunk of data either
> > > because
> > > > - Application broke data into several ops, and/or
> > > > - PMD ran into out_of_space situation during input processing
> > > >
> > > > In case of either one or all of the above conditions, PMD is required to
> > > maintain state of op across enque_burst() calls and
> > > > ops are setup with op_type RTE_COMP_OP_STATEFUL, and begin with
> > > flush value = RTE_COMP_NO/SYNC_FLUSH and end at flush value
> > > RTE_COMP_FULL/FINAL_FLUSH.
> > > >
> > > > D.2.1 Stateful operation state maintenance
> > > > ---------------------------------------------------------------
> > > > It is always an ideal expectation from application that it should parse
> > > through all related chunk of source data making its mbuf-chain and
> enqueue
> > > it for stateless processing.
> > > > However, if it need to break it into several enqueue_burst() calls, then
> an
> > > expected call flow would be something like:
> > > >
> > > > enqueue_burst( |op.no_flush |)
> > >
> > > [Ahmed] The work is now in flight to the PMD.The user will call dequeue
> > > burst in a loop until all ops are received. Is this correct?
> > >
> > > > deque_burst(op) // should dequeue before we enqueue next
> >
> > [Shally] Yes. Ideally every submitted op need to be dequeued. However
> this illustration is specifically in
> > context of stateful op processing to reflect if a stream is broken into
> chunks, then each chunk should be
> > submitted as one op at-a-time with type = STATEFUL and need to be
> dequeued first before next chunk is
> > enqueued.
> >
> > > > enqueue_burst( |op.no_flush |)
> > > > deque_burst(op) // should dequeue before we enqueue next
> > > > enqueue_burst( |op.full_flush |)
> > >
> > > [Ahmed] Why now allow multiple work items in flight? I understand that
> > > occasionaly there will be OUT_OF_SPACE exception. Can we just
> distinguish
> > > the response in exception cases?
> >
> > [Shally] Multiples ops are allowed in flight, however condition is each op in
> such case is independent of
> > each other i.e. belong to different streams altogether.
> > Earlier (as part of RFC v1 doc) we did consider the proposal to process all
> related chunks of data in single
> > burst by passing them as ops array but later found that as not-so-useful for
> PMD handling for various
> > reasons. You may please refer to RFC v1 doc review comments for same.
> [Fiona] Agree with Shally. In summary, as only one op can be processed at a
> time, since each needs the
> state of the previous, to allow more than 1 op to be in-flight at a time would
> force PMDs to implement internal queueing and exception handling for
> OUT_OF_SPACE conditions you mention.
> If the application has all the data, it can put it into chained mbufs in a single
> op rather than
> multiple ops, which avoids pushing all that complexity down to the PMDs.
> 
> >
> > > >
> > > > Here an op *must* be attached to a stream and every subsequent
> > > enqueue_burst() call should carry *same* stream. Since PMD maintain
> ops
> > > state in stream, thus it is mandatory for application to attach stream to
> such
> > > ops.
> [Fiona] I think you're referring only to a single stream above, but as there
> may be many different streams,
> maybe add the following?
> Above is simplified to show just a single stream. However there may be
> many streams, and each
> enqueue_burst() may contain ops from different streams, as long as there is
> only one op in-flight from any
> stream at a given time.
> 

[Shally] Ok get it. 

> 
> > > >
> > > > D.2.2 Stateful and Out_of_Space
> > > > --------------------------------------------
> > > > If PMD support stateful and run into OUT_OF_SPACE situation, then it is
> > > not an error condition for PMD. In such case, PMD return with status
> > > RTE_COMP_OP_STATUS_OUT_OF_SPACE with consumed = number of
> input
> > > bytes read and produced = length of complete output buffer.
> [Fiona] - produced would be <= output buffer len (typically =, but could be a
> few bytes less)
> 
> 
> > > > Application should enqueue op with source starting at consumed+1 and
> > > output buffer with available space.
> > >
> > > [Ahmed] Related to OUT_OF_SPACE. What status does the user recieve
> in a
> > > decompression case when the end block is encountered before the end
> of
> > > the input? Does the PMD continue decomp? Does it stop there and
> return
> > > the stop index?
> > >
> >
> > [Shally] Before I could answer this, please help me understand your use
> case . When you say  "when the
> > end block is encountered before the end of the input?" Do you mean -
> > "Decompressor process a final block (i.e. has BFINAL=1 in its header) and
> there's some footer data after
> > that?" Or
> > you mean "decompressor process one block and has more to process till its
> final block?"
> > What is "end block" and "end of input" reference here?
> >
> > > >
> > > > D.2.3 Sliding Window Size
> > > > ------------------------------------
> > > > Every PMD will reflect in its algorithm capability structure maximum
> length
> > > of Sliding Window in bytes which would indicate maximum history buffer
> > > length used by algo.
> > > >
> > > > 2. Example API illustration
> > > > ~~~~~~~~~~~~~~~~~~~~~~~
> > > >
> [Fiona] I think it would be useful to show an example of both a STATELESS
> flow and a STATEFUL flow.
> 

[Shally] Ok. I can add simplified version to illustrate API usage in both cases.

> > > > Following is an illustration on API usage  (This is just one flow, other
> variants
> > > are also possible):
> > > > 1. rte_comp_session *sess = rte_compressdev_session_create
> > > (rte_mempool *pool);
> > > > 2. rte_compressdev_session_init (int dev_id, rte_comp_session *sess,
> > > rte_comp_xform *xform, rte_mempool *sess_pool);
> > > > 3. rte_comp_op_pool_create(rte_mempool ..)
> > > > 4. rte_comp_op_bulk_alloc (struct rte_mempool *mempool, struct
> > > rte_comp_op **ops, uint16_t nb_ops);
> > > > 5. for every rte_comp_op in ops[],
> > > >     5.1 rte_comp_op_attach_session (rte_comp_op *op,
> rte_comp_session
> > > *sess);
> > > >     5.2 op.op_type = RTE_COMP_OP_STATELESS
> > > >     5.3 op.flush = RTE_FLUSH_FINAL
> > > > 6. [Optional] for every rte_comp_op in ops[],
> > > >     6.1 rte_comp_stream_create(int dev_id, rte_comp_session *sess,
> void
> > > **stream);
> > > >     6.2 rte_comp_op_attach_stream(rte_comp_op *op,
> rte_comp_session
> > > *stream);
> > >
> > > [Ahmed] What is the semantic effect of attaching a stream to every op?
> will
> > > this application benefit for this given that it is setup with op_type
> STATELESS
> >
> > [Shally] By role, stream is data structure that hold all information that PMD
> need to maintain for an op
> > processing and thus it's marked device specific. It is required for stateful
> processing but optional for
> > statelss as PMD doesn't need to maintain context once op is processed
> unlike stateful.
> > It may be of advantage to use stream for stateless to some of the PMD.
> They can be designed to do one-
> > time per op setup (such as mapping session params) during
> stream_create() in control path than data
> > path.
> >
> [Fiona] yes, we agreed that stream_create() should be called for every
> session and if it
> returns non-NULL the PMD needs it so op_attach_stream() must be called.
> However I've just realised we don't have a STATEFUL/STATELESS param on
> the xform, just on the op.
> So we could either add stateful/stateless param to stream_create() ?
> OR add stateful/stateless param to xform so it would be in session?

[Shally] No it shouldn't be as part of session or xform as sessions aren't stateless/stateful.
So, we shouldn't alter the current definition of session or xforms.
If we need to mention it, then it could be added as part of stream_create() as it's device specific and depending upon op_type() device can then setup stream resources.

> However, Shally, can you reconsider if you really need it for STATELESS or if
> the data you want to
> store there could be stored in the session? Or if it's needed per-op does it
> really need
> to be visible on the API as a stream or could it be hidden within the PMD?

[Shally] I would say it is not mandatory but a desirable feature that I am suggesting. 
I am only trying to enable optimization in data path which may be of help to some PMD designs as they can use stream_create() to do setup that are 1-time per op and regardless of op_type, such as I mentioned, setting up user session params to device sess params.
We can hide it inside PMD however there may be slight overhead in datapath depending on PMD design.
But I would say, it's not a blocker for us to freeze on current spec. We can revisit this feature later because it will not alter base API functionality.

Thanks
Shally

> 
> > >
> > > > 7.for every rte_comp_op in ops[],
> > > >      7.1 set up with src/dst buffer
> > > > 8. enq = rte_compressdev_enqueue_burst (dev_id, qp_id, &ops,
> nb_ops);
> > > > 9. do while (dqu < enq) // Wait till all of enqueued are dequeued
> > > >     9.1 dqu = rte_compressdev_dequeue_burst (dev_id, qp_id, &ops,
> enq);
> > >
> > > [Ahmed] I am assuming that waiting for all enqueued to be dequeued is
> not
> > > strictly necessary, but is just the chosen example in this case
> > >
> >
> > [Shally] Yes. By design, for burst_size>1 each op is independent of each
> other. So app may proceed as soon
> > as it dequeue any.
> >
> > > > 10. Repeat 7 for next batch of data
> > > > 11. for every ops in ops[]
> > > >       11.1 rte_comp_stream_free(op->stream);
> > > > 11. rte_comp_session_clear (sess) ;
> > > > 12. rte_comp_session_terminate(ret_comp_sess *session)
> > > >
> > > > Thanks
> > > > Shally
> > > >
> > > >



More information about the dev mailing list