[dpdk-dev,RFC] lib/librte_ether: consistent PMD batching behavior

From: Andrew Rybchenko [mailto:arybchenko@solarflare.com]

  The rte_eth_tx_burst() function in the file Rte_ethdev.h is invoked to
transmit output packets on the output queue for DPDK applications as
follows.

static inline uint16_t
rte_eth_tx_burst(uint8_t port_id, uint16_t queue_id,
                 struct rte_mbuf **tx_pkts, uint16_t nb_pkts);

Note: The fourth parameter nb_pkts: The number of packets to transmit.
The rte_eth_tx_burst() function returns the number of packets it actually
sent. The return value equal to *nb_pkts* means that all packets have been
sent, and this is likely to signify that other output packets could be
immediately transmitted again. Applications that implement a "send as many
packets to transmit as possible" policy can check this specific case and
keep invoking the rte_eth_tx_burst() function until a value less than
*nb_pkts* is returned.

When you call TX only once in rte_eth_tx_burst, you may get different
behaviors from different PMDs. One problem that every DPDK user has to
face is that they need to take the policy into consideration at the app-
lication level when using any specific PMD to send the packets whether or
not it is necessary, which brings usage complexities and makes DPDK users
easily confused since they have to learn the details on TX function limit
of specific PMDs and have to handle the different return value: the number
of packets transmitted successfully for various PMDs. Some PMDs Tx func-
tions have a limit of sending at most 32 packets for every invoking, some
PMDs have another limit of at most 64 packets once, another ones have imp-
lemented to send as many packets to transmit as possible, etc. This will
easily cause wrong usage for DPDK users.

This patch proposes to implement the above policy in DPDK lib in order to
simplify the application implementation and avoid the incorrect invoking
as well. So, DPDK Users don't need to consider the implementation policy
and to write duplicated code at the application level again when sending
packets. In addition to it, the users don't need to know the difference of
specific PMD TX and can transmit the arbitrary number of packets as they
expect when invoking TX API rte_eth_tx_burst, then check the return value
to get the number of packets actually sent.

How to implement the policy in DPDK lib? Two solutions are proposed below.

Solution 1:
Implement the wrapper functions to remove some limits for each specific
PMDs as i40e_xmit_pkts_simple and ixgbe_xmit_pkts_simple do like that.

Solution 2:
Implement the policy in the function rte_eth_tx_burst() at the ethdev lay-
er in a more consistent batching way. Make best effort to send *nb_pkts*
packets with bursts of no more than 32 by default since many DPDK TX PMDs
are using this max TX burst size(32). In addition, one data member which
defines the max TX burst size such as "uint16_t max_tx_burst_pkts;"will be
added to rte_eth_dev_data, which drivers can override if they work with
bursts of 64 or other NB(thanks for Bruce <bruce.richardson@intel.com>'s
suggestion). This can reduce the performance impacting to the lowest limit.

I prefer the latter between the 2 solutions because it makes DPDK code more
consistent and easier and avoids to write too much duplicate logic in DPDK
source code. In addition, I think no or a little performance drop is
brought by solution 2. But ABI change will be introduced.

In fact, the current rte_eth_rx_burst() function is using the similar
mechanism and faces the same problem as rte_eth_tx_burst().

static inline uint16_t
rte_eth_rx_burst(uint8_t port_id, uint16_t queue_id,
                 struct rte_mbuf **rx_pkts, const uint16_t nb_pkts);

Applications are responsible of implementing the policy "retrieve as many
received packets as possible", and check this specific case and keep
invoking the rte_eth_rx_burst() function until a value less than *nb_pkts*
is returned.

The patch proposes to apply the above method to rte_eth_rx_burst() as well.

In summary, The purpose of the RFC makes the job easier and more simple for
driver writers and avoids to write too much duplicate code at the applica-
tion level.

Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
---
 lib/librte_ether/rte_ethdev.h | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

Message ID	1484905876-60165-1-git-send-email-zhiyong.yang@intel.com (mailing list archive)
State	RFC, archived
Delegated to:	Thomas Monjalon
Headers	Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 931032BBD; Fri, 20 Jan 2017 10:52:36 +0100 (CET) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 801A52BA7 for <dev@dpdk.org>; Fri, 20 Jan 2017 10:52:34 +0100 (CET) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga104.jf.intel.com with ESMTP; 20 Jan 2017 01:52:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,258,1477983600"; d="scan'208";a="56193591" Received: from unknown (HELO dpdk5.bj.intel.com) ([172.16.182.188]) by fmsmga006.fm.intel.com with ESMTP; 20 Jan 2017 01:52:31 -0800 From: Zhiyong Yang <zhiyong.yang@intel.com> To: dev@dpdk.org Cc: thomas.monjalon@6wind.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, Zhiyong Yang <zhiyong.yang@intel.com> Date: Fri, 20 Jan 2017 17:51:16 +0800 Message-Id: <1484905876-60165-1-git-send-email-zhiyong.yang@intel.com> X-Mailer: git-send-email 2.7.4 Subject: [dpdk-dev] [RFC] lib/librte_ether: consistent PMD batching behavior X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel compilation	success	Compilation OK

[dpdk-dev,RFC] lib/librte_ether: consistent PMD batching behavior

Checks

Commit Message

Comments

Patch