[dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx cycles/packet

De Lara Guarch, Pablo pablo.de.lara.guarch at intel.com
Mon Oct 13 14:56:29 CEST 2014



> -----Original Message-----
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Liang, Cunming
> Sent: Sunday, October 12, 2014 12:11 PM
> To: Neil Horman
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> cycles/packet
> 
> Hi Neil,
> 
> Very appreciate your comments.
> I add inline reply, will send v3 asap when we get alignment.
> 
> BRs,
> Liang Cunming
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman at tuxdriver.com]
> > Sent: Saturday, October 11, 2014 1:52 AM
> > To: Liang, Cunming
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v2 1/4] app/test: unit test for rx and tx
> cycles/packet
> >
> > On Fri, Oct 10, 2014 at 08:29:58PM +0800, Cunming Liang wrote:
> > > It provides unit test to measure cycles/packet in NIC loopback mode.
> > > It simply gives the average cycles of IO used per packet without test
> equipment.
> > > When doing the test, make sure the link is UP.
> > >
> > > Usage Example:
> > > 1. Run unit test app in interactive mode
> > >     app/test -c f -n 4 -- -i
> > > 2. Run and wait for the result
> > >     pmd_perf_autotest
> > >
> > > There's option to choose rx/tx pair, default is vector.
> > >     set_rxtx_mode [vector|scalar|full|hybrid]
> > > Note: To get acurate scalar fast, please choose 'vector' or 'hybrid' without
> > INC_VEC=y in config
> > >
> > > Signed-off-by: Cunming Liang <cunming.liang at intel.com>
> > > Acked-by: Bruce Richardson <bruce.richardson at intel.com>
> >
> > Notes inline
> >
> > > ---
> > >  app/test/Makefile                   |    1 +
> > >  app/test/commands.c                 |   38 +++
> > >  app/test/packet_burst_generator.c   |    4 +-
> > >  app/test/test.h                     |    4 +
> > >  app/test/test_pmd_perf.c            |  626
> > +++++++++++++++++++++++++++++++++++
> > >  lib/librte_pmd_ixgbe/ixgbe_ethdev.c |    6 +
> > >  6 files changed, 677 insertions(+), 2 deletions(-)
> > >  create mode 100644 app/test/test_pmd_perf.c
> > >
> > > diff --git a/app/test/Makefile b/app/test/Makefile
> > > index 6af6d76..ebfa0ba 100644
> > > --- a/app/test/Makefile
> > > +++ b/app/test/Makefile
> > > @@ -56,6 +56,7 @@ SRCS-y += test_memzone.c
> > >
> > >  SRCS-y += test_ring.c
> > >  SRCS-y += test_ring_perf.c
> > > +SRCS-y += test_pmd_perf.c
> > >
> > >  ifeq ($(CONFIG_RTE_LIBRTE_TABLE),y)
> > >  SRCS-y += test_table.c
> > > diff --git a/app/test/commands.c b/app/test/commands.c
> > > index a9e36b1..f1e746e 100644
> > > --- a/app/test/commands.c
> > > +++ b/app/test/commands.c
> > > @@ -310,12 +310,50 @@ cmdline_parse_inst_t cmd_quit = {
> > >
> > > +#define NB_ETHPORTS_USED                (1)
> > > +#define NB_SOCKETS                      (2)
> > > +#define MEMPOOL_CACHE_SIZE 250
> > > +#define MBUF_SIZE (2048 + sizeof(struct rte_mbuf) +
> > RTE_PKTMBUF_HEADROOM)
> > Don't you want to size this in accordance with the amount of data your
> sending
> > (64 Bytes as noted above)?
> [Liang, Cunming] The case is designed to measure small packet IO cost with
> normal mbuf size.
> Even if decreasing the size, it won't gain significant cycles.
> >
> > > +static void
> > > +print_ethaddr(const char *name, const struct ether_addr *eth_addr)
> > > +{
> > > +	printf("%s%02X:%02X:%02X:%02X:%02X:%02X", name,
> > > +		eth_addr->addr_bytes[0],
> > > +		eth_addr->addr_bytes[1],
> > > +		eth_addr->addr_bytes[2],
> > > +		eth_addr->addr_bytes[3],
> > > +		eth_addr->addr_bytes[4],
> > > +		eth_addr->addr_bytes[5]);
> > > +}
> > > +
> > This was copieed from print_ethaddr.  Seems like a good candidate for a
> common
> > function in rte_ether.h
> [Liang, Cunming] Agree with you, some of samples now use it with the same
> copy.
> I'll rework it. Adding 'ether_format_addr' in rte_ether.h only for format the
> 48bits address output.
> And leaving other prints for application customization.
> >
> >
> > > +}
> > > +
> > > +static void
> > > +signal_handler(int signum)
> > > +{
> > > +	/* When we receive a USR1 signal, print stats */
> > I think you mean SIGUSR2, below, SIGUSR1 tears the test down and exits
> the
> > program
> [Liang, Cunming] Thanks, it's a typo.
> >
> > > +	if (signum == SIGUSR1) {
> > SIGINT instead.  Thats the common practice.
> [Liang, Cunming] I understood your opinion.
> The considerations I'm not using SIGINT instead are:
> 1. We unset ISIG in c_lflag of term. CRTL+C won't trigger SIGINT in command
> interactive.
>   It always has to explicitly send signal. No matter SIGUSR1 or SIGINT.
> 2. By SIGINT semantic, expect to terminate the process.
>   Here I expect to force stop this case, but still alive in command line.
>   After it stopped, it can run again or start to run other test cases.
>   So I keep SIGINT, SIGUSR1 in different behavior.
> 3. It should be rarely used.
>   Only when exception timeout, I leave this backdoor for automation test
> control.
>   For manual test, we can easily force kill the process.
> 
> >
> > > +		printf("Force Stop!\n");
> > > +		stop = 1;
> > > +	}
> > > +	if (signum == SIGUSR2)
> > > +		stats_display(0);
> > > +}
> > > +/* main processing loop */
> > > +static int
> > > +main_loop(__rte_unused void *args)
> > > +{
> > > +#define PACKET_SIZE 64
> > > +#define FRAME_GAP 12
> > > +#define MAC_PREAMBLE 8
> > > +	struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
> > > +	unsigned lcore_id;
> > > +	unsigned i, portid, nb_rx = 0, nb_tx = 0;
> > > +	struct lcore_conf *conf;
> > > +	uint64_t prev_tsc, cur_tsc;
> > > +	int pkt_per_port;
> > > +	uint64_t packets_per_second, total_packets;
> > > +
> > > +	lcore_id = rte_lcore_id();
> > > +	conf = &lcore_conf[lcore_id];
> > > +	if (conf->status != LCORE_USED)
> > > +		return 0;
> > > +
> > > +	pkt_per_port =  MAX_TRAFIC_BURST / conf->nb_ports;
> > > +
> > > +	int idx = 0;
> > > +	for (i = 0; i < conf->nb_ports; i++) {
> > > +		int num = pkt_per_port;
> > > +		portid = conf->portlist[i];
> > > +		printf("inject %d packet to port %d\n", num, portid);
> > > +		while (num) {
> > > +			nb_tx = RTE_MIN(MAX_PKT_BURST, num);
> > > +			nb_tx = rte_eth_tx_burst(portid, 0,
> > > +						&tx_burst[idx], nb_tx);
> > > +			num -= nb_tx;
> > > +			idx += nb_tx;
> > > +		}
> > > +	}
> > > +	printf("Total packets inject to prime ports = %u\n", idx);
> > > +
> > > +	packets_per_second = (link_mbps * 1000 * 1000) /
> > > +		+((PACKET_SIZE + FRAME_GAP + MAC_PREAMBLE) *
> CHAR_BIT);
> > > +	printf("Each port will do %"PRIu64" packets per second\n",
> > > +		+packets_per_second);
> > > +
> > > +	total_packets = RTE_TEST_DURATION * conf->nb_ports *
> > packets_per_second;
> > > +	printf("Test will stop after at least %"PRIu64" packets received\n",
> > > +		+ total_packets);
> > > +
> > > +	prev_tsc = rte_rdtsc();
> > > +
> > > +	while (likely(!stop)) {
> > > +		for (i = 0; i < conf->nb_ports; i++) {
> > > +			portid = conf->portlist[i];
> > > +			nb_rx = rte_eth_rx_burst((uint8_t) portid, 0,
> > > +						 pkts_burst,
> MAX_PKT_BURST);
> > > +			if (unlikely(nb_rx == 0)) {
> > > +				idle++;
> > > +				continue;
> > > +			}
> > > +
> > > +			count += nb_rx;
> > Doesn't take into consideration error conditions.  rte_eth_rx_burst can
> return
> > -ENOTSUP
> [Liang, Cunming] It returns -ENOTSUP when turning on ETHDEV_DEBUG
> CONFIG.
> The error is used to avoid no function call register.
> When ETHDEV_DEBUG turn off, the NULL function call cause segfault
> directly.
> So I think it's a library internal error.
> In such library exceptional case, I prefer not expecting sample/application to
> condition check library functional error.
> >
> > > +			nb_tx = rte_eth_tx_burst(portid, 0, pkts_burst,
> nb_rx);
> > Ditto with -ENOTSUP
> >
> > > +			if (unlikely(nb_tx < nb_rx)) {
> > What makes this unlikely?  Seems like a perfectly reasonable condition to
> happen
> > to me.  If the network is busy, its completely likely that you will receive
> more
> > frames than you send, if you elect to receive all frames.
> [Liang, Cunming] For this case, NIC works in MAC loopback mode.
> It firstly injects numbers of packets to NIC. Then NIC will loopback all packets
> to ingress side.
> The code here will receive the packets, and send them back to NIC
> Packets looping inside all come from initial injection.
> As the total number of injected packets is much less than in-chip queue size,
> the tx egress queue shouldn't block desc. ring update.
> So when receiving packets in line rate and nb_tx < nb_rx, it means tx cannot
> archive line rate.
> When it happens, the cycles/packets result make no sense, as the bottle
> neck is NIC.
> The drop counter can record it.
> >
> > > +				drop += (nb_rx - nb_tx);
> > > +				do {
> > > +
> 	rte_pktmbuf_free(pkts_burst[nb_tx]);
> > Defer this, it skews your timing
> [Liang, Cunming] Agree with you, I ever thought about it.
> This test cases is designed to measure pure IO RX/TX routine.
> When doing tx burst with result nb_tx < nb_rx, we either repeat re-send or
> drop it.
> Each way introduces noise(adding additional control code), resending much
> times even cost more than free it.
> The cycles/packets is useful when there's no packet drop, otherwise it gives
> the hint where the problem comes from (by idle or drop).
> >
> > > +				} while (++nb_tx < nb_rx);
> > > +			}
> > > +		}
> > > +		if (unlikely(count >= total_packets))
> > > +			break;
> > Whats the reasoning here?  Do you only ever expect to receive frames that
> you
> > send?  If so, seems like this should call for a big warning to get printed.
> [Liang, Cunming] The loop exits when the pre-calculated total_packets are
> received.
> As the nb_rx is unpredictable, the packet counter may large equal than
> total_packets the last time.
> The reason unlikely used here is because the condition becomes true only
> the last time.
> >
> > > +	}
> > > +
> > > +	cur_tsc = rte_rdtsc();
> > > +
> > > +	for (i = 0; i < conf->nb_ports; i++) {
> > > +		portid = conf->portlist[i];
> > > +		int nb_free = pkt_per_port;
> > > +		do { /* dry out */
> > > +			nb_rx = rte_eth_rx_burst((uint8_t) portid, 0,
> > > +						 pkts_burst,
> MAX_PKT_BURST);
> > > +			nb_tx = 0;
> > > +			while (nb_tx < nb_rx)
> > > +				rte_pktmbuf_free(pkts_burst[nb_tx++]);
> > > +			nb_free -= nb_rx;
> > > +		} while (nb_free != 0);
> > > +		printf("free %d mbuf left in port %u\n", pkt_per_port,
> portid);
> > > +	}
> > > +
> > Whats the purpose of this?  Are you trying to flush the device?  Wouldn't it
> be
> > enough just to stop the interface?
> [Liang, Cunming] If we only run the cases once and exit, it doesn't matter.
> But it's designed to run multi-times without exit, for the purpose of warming
> up or for getting average number.
> So stopping device is not enough, we have to release the flying packets.
> 
> >
> > > +	if (count == 0)
> > > +		return -1;
> > > +
> > > +	printf("%lu packet, %lu drop, %lu idle\n", count, drop, idle);
> > > +	printf("Result: %ld cycles per packet\n", (cur_tsc - prev_tsc) / count);
> > > +
> > Bad math here.  Theres no guarantee that the tsc hasn't wrapped
> (potentially
> > more than once) depending on your test length.  you need to check the tsc
> before
> > and after each burst and record an average of deltas instead, accounting in
> each
> > instance for the possibility of wrap.
> [Liang, Cunming] I'm not sure catch your point correctly.
> I think both cur_tsc and prev_tsc are 64 bits width.
> For 3GHz, I think it won't wrapped so quick.
> As it's uint64_t, so even get wrapped, the delta should still be correct.

You need to change those %lu to %PRIu64, or you will get a compilation error when 
using 32-bit targets, since those variables are uint64_t. There are other parts of the 
code with same problem, as well.


> >
> > > +	return 0;
> > > +}
> > > +
> > > +static int
> > > +test_pmd_perf(void)
> > > +{
> > > +	uint16_t nb_ports, num, nb_lcores, slave_id = (uint16_t)-1;
> > > +	uint16_t nb_rxd = RTE_TEST_RX_DESC_DEFAULT;
> > > +	uint16_t nb_txd = RTE_TEST_TX_DESC_DEFAULT;
> > > +	uint16_t portid;
> > > +	uint16_t nb_rx_queue = 1, nb_tx_queue = 1;
> > > +	int socketid = -1;
> > > +	int ret;
> > > +
> > > +	printf("Start PMD RXTX cycles cost test.\n");
> > > +
> > > +	signal(SIGUSR1, signal_handler);
> > Again SIGINT here.
> >
> > > +	signal(SIGUSR2, signal_handler);
> > > +
> > > +	nb_ports = rte_eth_dev_count();
> > > +	if (nb_ports < NB_ETHPORTS_USED) {
> > > +		printf("At least %u port(s) used for perf. test\n",
> > > +		       NB_ETHPORTS_USED);
> > > +		return -1;
> > > +	}
> > > +
> > > +	if (nb_ports > RTE_MAX_ETHPORTS)
> > > +		nb_ports = RTE_MAX_ETHPORTS;
> > > +
> > > +	nb_lcores = rte_lcore_count();
> > > +
> > > +	memset(lcore_conf, 0, sizeof(lcore_conf));
> > > +	init_lcores();
> > > +
> > > +	init_mbufpool(NB_MBUF);
> > > +
> > > +	reset_count();
> > > +	num = 0;
> > > +	for (portid = 0; portid < nb_ports; portid++) {
> > > +		if (socketid == -1) {
> > > +			socketid = rte_eth_dev_socket_id(portid);
> > > +			slave_id = alloc_lcore(socketid);
> > > +			if (slave_id == (uint16_t)-1) {
> > > +				printf("No avail lcore to run test\n");
> > > +				return -1;
> > > +			}
> > > +			printf("Performance test runs on lcore %u socket
> %u\n",
> > > +			       slave_id, socketid);
> > > +		}
> > > +
> > > +		if (socketid != rte_eth_dev_socket_id(portid)) {
> > > +			printf("Skip port %d\n", portid);
> > > +			continue;
> > > +		}
> > > +
> > > +		/* port configure */
> > > +		ret = rte_eth_dev_configure(portid, nb_rx_queue,
> > > +					    nb_tx_queue, &port_conf);
> > > +		if (ret < 0)
> > > +			rte_exit(EXIT_FAILURE,
> > > +				"Cannot configure device: err=%d,
> port=%d\n",
> > > +				 ret, portid);
> > > +
> > > +		rte_eth_macaddr_get(portid, &ports_eth_addr[portid]);
> > > +		printf("Port %u ", portid);
> > > +		print_ethaddr("Address:", &ports_eth_addr[portid]);
> > > +		printf("\n");
> > > +
> > > +		/* tx queue setup */
> > > +		ret = rte_eth_tx_queue_setup(portid, 0, nb_txd,
> > > +					     socketid, &tx_conf);
> > > +		if (ret < 0)
> > > +			rte_exit(EXIT_FAILURE,
> > > +				"rte_eth_tx_queue_setup: err=%d, "
> > > +				"port=%d\n", ret, portid);
> > > +
> > > +		/* rx queue steup */
> > > +		ret = rte_eth_rx_queue_setup(portid, 0, nb_rxd,
> > > +						socketid, &rx_conf,
> > > +						mbufpool[socketid]);
> > > +		if (ret < 0)
> > > +			rte_exit(EXIT_FAILURE, "rte_eth_rx_queue_setup:
> err=%d,"
> > > +				 "port=%d\n", ret, portid);
> > > +
> > > +		/* Start device */
> > > +		stop = 0;
> > > +		ret = rte_eth_dev_start(portid);
> > > +		if (ret < 0)
> > > +			rte_exit(EXIT_FAILURE,
> > > +				"rte_eth_dev_start: err=%d, port=%d\n",
> > > +				ret, portid);
> > > +
> > > +		/* always eanble promiscuous */
> > > +		rte_eth_promiscuous_enable(portid);
> > > +
> > > +		lcore_conf[slave_id].portlist[num++] = portid;
> > > +		lcore_conf[slave_id].nb_ports++;
> > > +	}
> > > +	check_all_ports_link_status(nb_ports, RTE_PORT_ALL);
> > > +
> > > +	init_traffic(mbufpool[socketid], tx_burst, MAX_TRAFIC_BURST);
> > > +
> > > +	rte_eal_remote_launch(main_loop, NULL, slave_id);
> > > +	if (rte_eal_wait_lcore(slave_id) < 0)
> > > +		return -1;
> > > +
> > > +	/* port tear down */
> > > +	for (portid = 0; portid < nb_ports; portid++) {
> > > +		if (socketid != rte_eth_dev_socket_id(portid))
> > > +			continue;
> > > +
> > > +		rte_eth_dev_stop(portid);
> > > +	}
> > > +
> > Clean up your allocated memory/lcores/etc?
> [Liang, Cunming] It do cleanup on the beginning of case.
> "Init_lcores","init_mbufpool","reset_count" guarantee the data clean before
> each testing.
> And mbufpool only allocated once even if we run multiple times.
> >
> > Neil



More information about the dev mailing list