[v1] net/af_xdp: support need wakeup feature

Message ID 20190617142303.85240-1-xiaolong.ye@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Ferruh Yigit
Headers
Series [v1] net/af_xdp: support need wakeup feature |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/intel-Performance-Testing success Performance Testing PASS
ci/mellanox-Performance-Testing success Performance Testing PASS
ci/Intel-compilation success Compilation OK

Commit Message

Xiaolong Ye June 17, 2019, 2:23 p.m. UTC
  This patch adds a new devarg to support the need_wakeup flag for Tx and
fill rings, when this flag is set by the driver, it means that the
userspace application has to explicitly wake up the kernel Rx or kernel Tx
processing by issuing a syscall. Poll() can wake up both and sendto() or
its alternatives will wake up Tx processing only.

This feature is to provide efficient support for case that application and
driver are executing on the same core.

Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
---

Original busy poll feature has morphed into need_wakeup flag in
kernel side, the main purpose is the same, that is to support both
application and driver executing on the same core efficiently.

kernel side patchset can be found at netdev mailing list.
https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t

It is targeted for v5.3

 drivers/net/af_xdp/rte_eth_af_xdp.c | 51 ++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 15 deletions(-)
  

Comments

David Marchand June 17, 2019, 8:03 a.m. UTC | #1
On Mon, Jun 17, 2019 at 9:42 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:

> This patch adds a new devarg to support the need_wakeup flag for Tx and
> fill rings, when this flag is set by the driver, it means that the
> userspace application has to explicitly wake up the kernel Rx or kernel Tx
> processing by issuing a syscall. Poll() can wake up both and sendto() or
> its alternatives will wake up Tx processing only.
>
> This feature is to provide efficient support for case that application and
> driver are executing on the same core.
>
> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> ---
>
> Original busy poll feature has morphed into need_wakeup flag in
> kernel side, the main purpose is the same, that is to support both
> application and driver executing on the same core efficiently.
>
> kernel side patchset can be found at netdev mailing list.
>
> https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t
>
> It is targeted for v5.3
>

- Is this really optional? Adding too many options is just a nightmare
later...

- I suppose this will break compilation with kernels that have af_xdp but
are < 5.3.
  
David Marchand June 17, 2019, 8:51 a.m. UTC | #2
On Mon, Jun 17, 2019 at 10:45 AM Ye Xiaolong <xiaolong.ye@intel.com> wrote:

> On 06/17, David Marchand wrote:
> >On Mon, Jun 17, 2019 at 9:42 AM Xiaolong Ye <xiaolong.ye@intel.com>
> wrote:
> >
> >> This patch adds a new devarg to support the need_wakeup flag for Tx and
> >> fill rings, when this flag is set by the driver, it means that the
> >> userspace application has to explicitly wake up the kernel Rx or kernel
> Tx
> >> processing by issuing a syscall. Poll() can wake up both and sendto() or
> >> its alternatives will wake up Tx processing only.
> >>
> >> This feature is to provide efficient support for case that application
> and
> >> driver are executing on the same core.
> >>
> >> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> >> ---
> >>
> >> Original busy poll feature has morphed into need_wakeup flag in
> >> kernel side, the main purpose is the same, that is to support both
> >> application and driver executing on the same core efficiently.
> >>
> >> kernel side patchset can be found at netdev mailing list.
> >>
> >>
> https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t
> >>
> >> It is targeted for v5.3
> >>
> >
> >- Is this really optional? Adding too many options is just a nightmare
> >later...
>
> Hmm, I think we can remove this option and alway turn the need_wakeup flag
> on
> since it provides better performance for 1 core case and doesn't downgrage
> the
> 2 core case performance.
>
> >
> >- I suppose this will break compilation with kernels that have af_xdp but
> >are < 5.3.
>
> Yes, that is true. It will break the compilation with early kernel, I feel
> it's
> sort of common issue, we enable some features in dpdk that's based on
> kernel
> features, then kernel side features keep evolving, we need to keep the
> pace,
> but it will hurt the compatiblity with the old kernel.
>
> What's dpdk's convention for handling this kind of case? Add some notes in
> doc
> to reminder the prerequisite or use the KERNEL_VERSION macro in code?
>

Rather than a kernel version, you can check that XDP_USE_NEED_WAKEUP is
defined (present in the uapi kernel header).
  
Bruce Richardson June 17, 2019, 10:05 a.m. UTC | #3
On Mon, Jun 17, 2019 at 10:51:52AM +0200, David Marchand wrote:
> On Mon, Jun 17, 2019 at 10:45 AM Ye Xiaolong <xiaolong.ye@intel.com> wrote:
> 
> > On 06/17, David Marchand wrote:
> > >On Mon, Jun 17, 2019 at 9:42 AM Xiaolong Ye <xiaolong.ye@intel.com>
> > wrote:
> > >
> > >> This patch adds a new devarg to support the need_wakeup flag for Tx and
> > >> fill rings, when this flag is set by the driver, it means that the
> > >> userspace application has to explicitly wake up the kernel Rx or kernel
> > Tx
> > >> processing by issuing a syscall. Poll() can wake up both and sendto() or
> > >> its alternatives will wake up Tx processing only.
> > >>
> > >> This feature is to provide efficient support for case that application
> > and
> > >> driver are executing on the same core.
> > >>
> > >> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
> > >> ---
> > >>
> > >> Original busy poll feature has morphed into need_wakeup flag in
> > >> kernel side, the main purpose is the same, that is to support both
> > >> application and driver executing on the same core efficiently.
> > >>
> > >> kernel side patchset can be found at netdev mailing list.
> > >>
> > >>
> > https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t
> > >>
> > >> It is targeted for v5.3
> > >>
> > >
> > >- Is this really optional? Adding too many options is just a nightmare
> > >later...
> >
> > Hmm, I think we can remove this option and alway turn the need_wakeup flag
> > on
> > since it provides better performance for 1 core case and doesn't downgrage
> > the
> > 2 core case performance.
> >
> > >
> > >- I suppose this will break compilation with kernels that have af_xdp but
> > >are < 5.3.
> >
> > Yes, that is true. It will break the compilation with early kernel, I feel
> > it's
> > sort of common issue, we enable some features in dpdk that's based on
> > kernel
> > features, then kernel side features keep evolving, we need to keep the
> > pace,
> > but it will hurt the compatiblity with the old kernel.
> >
> > What's dpdk's convention for handling this kind of case? Add some notes in
> > doc
> > to reminder the prerequisite or use the KERNEL_VERSION macro in code?
> >
> 
> Rather than a kernel version, you can check that XDP_USE_NEED_WAKEUP is
> defined (present in the uapi kernel header).
> 
+1 for this.

Also, since AF_XDP is still fairly new with ongoing development on the
kernel side, I think it is reasonable to limit out PMD to only working with
sufficiently updated kernels. Hopefully in 6 months or so, the feature set
we need should be locked down and we can specify a fixed baseline
requirement.

/Bruce
  
Xiaolong Ye June 17, 2019, 3:27 p.m. UTC | #4
On 06/17, David Marchand wrote:
>On Mon, Jun 17, 2019 at 9:42 AM Xiaolong Ye <xiaolong.ye@intel.com> wrote:
>
>> This patch adds a new devarg to support the need_wakeup flag for Tx and
>> fill rings, when this flag is set by the driver, it means that the
>> userspace application has to explicitly wake up the kernel Rx or kernel Tx
>> processing by issuing a syscall. Poll() can wake up both and sendto() or
>> its alternatives will wake up Tx processing only.
>>
>> This feature is to provide efficient support for case that application and
>> driver are executing on the same core.
>>
>> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> ---
>>
>> Original busy poll feature has morphed into need_wakeup flag in
>> kernel side, the main purpose is the same, that is to support both
>> application and driver executing on the same core efficiently.
>>
>> kernel side patchset can be found at netdev mailing list.
>>
>> https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t
>>
>> It is targeted for v5.3
>>
>
>- Is this really optional? Adding too many options is just a nightmare
>later...

Hmm, I think we can remove this option and alway turn the need_wakeup flag on
since it provides better performance for 1 core case and doesn't downgrage the
2 core case performance.

>
>- I suppose this will break compilation with kernels that have af_xdp but
>are < 5.3.

Yes, that is true. It will break the compilation with early kernel, I feel it's 
sort of common issue, we enable some features in dpdk that's based on kernel
features, then kernel side features keep evolving, we need to keep the pace,
but it will hurt the compatiblity with the old kernel.

What's dpdk's convention for handling this kind of case? Add some notes in doc
to reminder the prerequisite or use the KERNEL_VERSION macro in code?

Thanks,
Xiaolong

>
>
>-- 
>David Marchand
  
Xiaolong Ye June 17, 2019, 3:39 p.m. UTC | #5
On 06/17, David Marchand wrote:
>On Mon, Jun 17, 2019 at 10:45 AM Ye Xiaolong <xiaolong.ye@intel.com> wrote:
>
>> On 06/17, David Marchand wrote:
>> >On Mon, Jun 17, 2019 at 9:42 AM Xiaolong Ye <xiaolong.ye@intel.com>
>> wrote:
>> >
>> >> This patch adds a new devarg to support the need_wakeup flag for Tx and
>> >> fill rings, when this flag is set by the driver, it means that the
>> >> userspace application has to explicitly wake up the kernel Rx or kernel
>> Tx
>> >> processing by issuing a syscall. Poll() can wake up both and sendto() or
>> >> its alternatives will wake up Tx processing only.
>> >>
>> >> This feature is to provide efficient support for case that application
>> and
>> >> driver are executing on the same core.
>> >>
>> >> Signed-off-by: Xiaolong Ye <xiaolong.ye@intel.com>
>> >> ---
>> >>
>> >> Original busy poll feature has morphed into need_wakeup flag in
>> >> kernel side, the main purpose is the same, that is to support both
>> >> application and driver executing on the same core efficiently.
>> >>
>> >> kernel side patchset can be found at netdev mailing list.
>> >>
>> >>
>> https://lore.kernel.org/netdev/CAJ8uoz2szX=+JXXAMyuVmvSsMXZuDqp6a8rjDQpTioxbZwxFmQ@mail.gmail.com/T/#t
>> >>
>> >> It is targeted for v5.3
>> >>
>> >
>> >- Is this really optional? Adding too many options is just a nightmare
>> >later...
>>
>> Hmm, I think we can remove this option and alway turn the need_wakeup flag
>> on
>> since it provides better performance for 1 core case and doesn't downgrage
>> the
>> 2 core case performance.
>>
>> >
>> >- I suppose this will break compilation with kernels that have af_xdp but
>> >are < 5.3.
>>
>> Yes, that is true. It will break the compilation with early kernel, I feel
>> it's
>> sort of common issue, we enable some features in dpdk that's based on
>> kernel
>> features, then kernel side features keep evolving, we need to keep the
>> pace,
>> but it will hurt the compatiblity with the old kernel.
>>
>> What's dpdk's convention for handling this kind of case? Add some notes in
>> doc
>> to reminder the prerequisite or use the KERNEL_VERSION macro in code?
>>
>
>Rather than a kernel version, you can check that XDP_USE_NEED_WAKEUP is
>defined (present in the uapi kernel header).

Sounds better, will try.

Thanks,
Xiaolong
>
>
>-- 
>David Marchand
  

Patch

diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index c638d9227..198b00147 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -5,6 +5,7 @@ 
 #include <errno.h>
 #include <stdlib.h>
 #include <string.h>
+#include <poll.h>
 #include <netinet/in.h>
 #include <net/if.h>
 #include <sys/socket.h>
@@ -90,6 +91,7 @@  struct pkt_rx_queue {
 	struct rx_stats stats;
 
 	struct pkt_tx_queue *pair;
+	struct pollfd fds[1];
 	int xsk_queue_idx;
 };
 
@@ -117,6 +119,7 @@  struct pmd_internals {
 	int combined_queue_cnt;
 
 	int pmd_zc;
+	int need_wakeup;
 	struct rte_ether_addr eth_addr;
 
 	struct pkt_rx_queue *rx_queues;
@@ -127,12 +130,14 @@  struct pmd_internals {
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
 #define ETH_AF_XDP_PMD_ZC_ARG			"pmd_zero_copy"
+#define ETH_AF_XDP_NEED_WAKEUP_ARG		"need_wakeup"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
 	ETH_AF_XDP_PMD_ZC_ARG,
+	ETH_AF_XDP_NEED_WAKEUP_ARG,
 	NULL
 };
 
@@ -206,8 +211,12 @@  eth_af_xdp_rx(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		return 0;
 
 	rcvd = xsk_ring_cons__peek(rx, nb_pkts, &idx_rx);
-	if (rcvd == 0)
+	if (rcvd == 0) {
+		if (xsk_ring_prod__needs_wakeup(fq))
+			(void)poll(rxq->fds, 1, 1000);
+
 		goto out;
+	}
 
 	if (xsk_prod_nb_free(fq, free_thresh) >= free_thresh)
 		(void)reserve_fill_queue(umem, ETH_AF_XDP_RX_BATCH_SIZE);
@@ -279,16 +288,17 @@  kick_tx(struct pkt_tx_queue *txq)
 {
 	struct xsk_umem_info *umem = txq->pair->umem;
 
-	while (send(xsk_socket__fd(txq->pair->xsk), NULL,
-		      0, MSG_DONTWAIT) < 0) {
-		/* some thing unexpected */
-		if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
-			break;
+	if (xsk_ring_prod__needs_wakeup(&txq->tx))
+		while (send(xsk_socket__fd(txq->pair->xsk), NULL,
+			    0, MSG_DONTWAIT) < 0) {
+			/* some thing unexpected */
+			if (errno != EBUSY && errno != EAGAIN && errno != EINTR)
+				break;
 
-		/* pull from completion queue to leave more space */
-		if (errno == EAGAIN)
-			pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
-	}
+			/* pull from completion queue to leave more space */
+			if (errno == EAGAIN)
+				pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
+		}
 	pull_umem_cq(umem, ETH_AF_XDP_TX_BATCH_SIZE);
 }
 
@@ -621,7 +631,7 @@  xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	cfg.tx_size = ring_size;
 	cfg.libbpf_flags = 0;
 	cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
-	cfg.bind_flags = 0;
+	cfg.bind_flags = internals->need_wakeup ? XDP_USE_NEED_WAKEUP : 0;
 	ret = xsk_socket__create(&rxq->xsk, internals->if_name,
 			rxq->xsk_queue_idx, rxq->umem->umem, &rxq->rx,
 			&txq->tx, &cfg);
@@ -683,6 +693,9 @@  eth_rx_queue_setup(struct rte_eth_dev *dev,
 		goto err;
 	}
 
+	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
+	rxq->fds[0].events = POLLIN;
+
 	rxq->umem->pmd_zc = internals->pmd_zc;
 
 	dev->data->rx_queues[rx_queue_id] = rxq;
@@ -856,7 +869,7 @@  xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt, int *pmd_zc)
+		 int *queue_cnt, int *pmd_zc, int *need_wakeup)
 {
 	int ret;
 
@@ -882,6 +895,9 @@  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_NEED_WAKEUP_ARG,
+				 &parse_integer_arg, need_wakeup);
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -919,7 +935,7 @@  get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt, int pmd_zc)
+	       int start_queue_idx, int queue_cnt, int pmd_zc, int need_wakeup)
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -935,6 +951,7 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->start_queue_idx = start_queue_idx;
 	internals->queue_cnt = queue_cnt;
 	internals->pmd_zc = pmd_zc;
+	internals->need_wakeup = !!need_wakeup;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
@@ -993,6 +1010,9 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	if (internals->pmd_zc)
 		AF_XDP_LOG(INFO, "Zero copy between umem and mbuf enabled.\n");
 
+	if (internals->need_wakeup)
+		AF_XDP_LOG(INFO, "need_wakeup feature is explicitly turned on.\n");
+
 	return eth_dev;
 
 err_free_tx:
@@ -1014,6 +1034,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
 	int pmd_zc = 0;
+	int need_wakeup = 0;
 
 	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
 		rte_vdev_device_name(dev));
@@ -1041,7 +1062,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt, &pmd_zc) < 0) {
+			     &xsk_queue_cnt, &pmd_zc, &need_wakeup) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1052,7 +1073,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt, pmd_zc);
+				 xsk_queue_cnt, pmd_zc, need_wakeup);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;