Bug 935 - aesni_mb_pmd does not trigger parallel processing for multiple jobs
Summary: aesni_mb_pmd does not trigger parallel processing for multiple jobs
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: cryptodev (show other bugs)
Version: 20.11
Hardware: x86 Linux
: Normal major
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2022-02-07 01:56 CET by changchun zhang
Modified: 2022-02-17 18:28 CET (History)
2 users (show)



Attachments

Description changchun zhang 2022-02-07 01:56:12 CET
The issue exists in DPDK 20.11 and later.

The intel-ipsec-mb library supports gathering multiple jobs and process the multi-jobs in parallel. However in the current aesni_mb_pmd, the 
aesni_mb_dequeue_burst() has a bug which leads to the intel-ipsec-mb does not run in parallel mode at all. Each time the a crypto op is dequeued from the ring, the aes_mb_dequeu_burst() will call the flush_mb_mgr directly to finish this job processing.

In detail:
static uint16_t
aesni_mb_dequeue_burst(void *queue_pair, struct rte_crypto_op **ops,
		uint16_t nb_ops)
{
	struct ipsec_mb_qp *qp = queue_pair;
	IMB_MGR *mb_mgr = qp->mb_mgr;
	struct rte_crypto_op *op;
	IMB_JOB *job;
	int retval, processed_jobs = 0;

	if (unlikely(nb_ops == 0 || mb_mgr == NULL))
		return 0;

	uint8_t digest_idx = qp->digest_idx;

	do {
		/* Get next free mb job struct from mb manager */
		job = IMB_GET_NEXT_JOB(mb_mgr);
......
		retval = rte_ring_dequeue(qp->ingress_queue, (void **)&op);
......
		job = IMB_SUBMIT_JOB(mb_mgr);
......
		if (job)
			processed_jobs += handle_completed_jobs(qp, mb_mgr,
					job, &ops[processed_jobs],
					nb_ops - processed_jobs);

	} while (processed_jobs < nb_ops);	

	if (processed_jobs < 1)
		processed_jobs += flush_mb_mgr(qp, mb_mgr,
				&ops[processed_jobs],
				nb_ops - processed_jobs);

	return processed_jobs;
}

After submit the first job, the intel-mb-ipsec library does process this job as it is waiting enough jobs submitted, however, in this pmd, it triggers the flush_mb_mgr() if the first submitted job is not processed. Consequently, the parallel processing is always not happening. We are actually processing the packet in single buffer mode.

During the debug test, I have to disable below code
	if (processed_jobs < 1)
		processed_jobs += flush_mb_mgr(qp, mb_mgr,
				&ops[processed_jobs],
				nb_ops - processed_jobs);

for intel-mb-ipsec to gather enough jobs to launch prarallel processing.
Comment 1 Pablo de Lara 2022-02-17 14:09:14 CET
Hi,

I would not consider this a ug. This only happens if a single operation is enqueued and the algorithm is implemented in a multi-buffer "way". The PMD expects a burst of packets to be submitted, and therefore, crypto processing is done and some operations are returned, so these last lines of code will not be executed.
The purpose of this check is to avoid an scenario where no operations are returned when calling crypto_dequeue_burst() while there are some outstanding operations in the intel-ipsec-mb buffer manager.

Thanks,
Pablo
Comment 2 changchun zhang 2022-02-17 15:53:23 CET
Hi Pablo,

Thanks for looking into it.

So in this way, to make sure the operations are parallel processed in the intel-ipsec-mb lib, the application has to enqueue enough operations each time. It seems that the application layer is to be responsible for gathering multiple operations before enqueue the burst, though the intel-ipsec-mb itself usually are waiting for multiple jobs to fill the mutli-lanes for SIMD operations, if no flush command is issued.

You know sometimes if the security traffic is not that high thus each time there could be 1 pending operation in the queue. In this situation, parallel computing is not actually not happening if the application does not wait for the multi packets to perform enqueue.

So, if this is not a bug, I am wondering if this is a limitation.

Thank,
Changchun
Comment 3 Fan Zhang 2022-02-17 18:18:18 CET
Hi Changchun,

Thanks!
If the change is done as you modified we will have some packets left in the mb mgr queue in the end, which we cannot allow that. The flush is a necessary "evil" to make sure all packets are returned to the application in the end.

But I guess there is no good solution to make everybody happy. However this indeed can be tweaked from application side. If latency is not big issue you may have a counter for enqueued packets, when it reaches certain threshold you can dequeue once - this shall achieve similar parallel as you expected but with one necessary flush in the end.

But as a library and PMD we cannot make such assumptions, like we won't know when is the next enqueue/dequeue call. Of course we can enrich the API to either forcing manually flush or other functionalities - feel free to send a patch to the mailing list. 

Regards,
Fan
Comment 4 changchun zhang 2022-02-17 18:28:47 CET
Hi Fan,

Yes, I understand my change is just a workaround with the side effect you mentioned.

And I agree with you that the application can do something. If we prefer to do nothing in the PMD, I would think it is possible to put more comments to explain the situation or limitation? Anyway, thanks for confirm this observation. 

Best,
Changchun

Note You need to log in before you can comment on or make changes to this bug.