net/mlx5: fix FD configuration for Rx interrupt
Checks
Commit Message
The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
FD list for Rx interrupts.
The driver wrongly configured the FD with a non-blocking flag which
prevent waiting on this FD.
This patch removes O_NONBLOCK flag adding.
Fixes: 3c7d44af252a ("net/mlx5: support user space Rx interrupt event")
Cc: stable@dpdk.org
Signed-off-by: Michael Baum <michaelba@nvidia.com>
Acked-by: Matan Azrad <matan@nvidia.com>
---
drivers/net/mlx5/mlx5_rxq.c | 16 +---------------
1 file changed, 1 insertion(+), 15 deletions(-)
Comments
On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com> wrote:
>
> The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
> FD list for Rx interrupts.
>
> The driver wrongly configured the FD with a non-blocking flag which
> prevent waiting on this FD.
>
> This patch removes O_NONBLOCK flag adding.
- Maybe I deserve a Reported-by: credit on this issue.
I sent a proposal to make use of Rx interrupts in OVS
https://patchwork.ozlabs.org/project/openvswitch/patch/20220304161132.22065-1-david.marchand@redhat.com/.
And that's when I noticed that mlx5 rx fds were waking OVS too often
and reported it to mlx5 maintainers.
- Testing this patch by starting dpdk-l3fwd-power example (and no
traffic sent at all):
# strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
...
[pid 534983] 0.000348 epoll_wait(26, [], 1, 10) = 0
[pid 534983] 0.010082 read(24,
For some reason, there is an event available for fd 18 right away
(which is the issue in the first place).
When reading this fd, read() blocks until an actual packet is received.
Then, I send exactly one packet:
[pid 534983] 0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
[pid 534983] 9.228478 epoll_wait(26, [], 1, 10) = 0
[pid 534983] 0.010082 read(24,
That makes mlx5 rx interrupts unusable for an application that does
more than just polling one rxq.
10/03/2022 16:12, David Marchand:
> On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com> wrote:
> >
> > The mlx5_rx_intr_vec_enable() function allocates queue vector and fill
> > FD list for Rx interrupts.
> >
> > The driver wrongly configured the FD with a non-blocking flag which
> > prevent waiting on this FD.
> >
> > This patch removes O_NONBLOCK flag adding.
>
> - Maybe I deserve a Reported-by: credit on this issue.
> I sent a proposal to make use of Rx interrupts in OVS
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220304161132.22065-1-david.marchand@redhat.com/.
> And that's when I noticed that mlx5 rx fds were waking OVS too often
> and reported it to mlx5 maintainers.
>
>
> - Testing this patch by starting dpdk-l3fwd-power example (and no
> traffic sent at all):
>
> # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> ...
> [pid 534983] 0.000348 epoll_wait(26, [], 1, 10) = 0
> [pid 534983] 0.010082 read(24,
>
> For some reason, there is an event available for fd 18 right away
> (which is the issue in the first place).
> When reading this fd, read() blocks until an actual packet is received.
>
> Then, I send exactly one packet:
> [pid 534983] 0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
> [pid 534983] 9.228478 epoll_wait(26, [], 1, 10) = 0
> [pid 534983] 0.010082 read(24,
>
> That makes mlx5 rx interrupts unusable for an application that does
> more than just polling one rxq.
Excuse me, I don't understand this trace.
What is the first read?
Having a second read after epoll_wait is normal if a packet is received,
isn't it?
On Thu, Mar 10, 2022 at 6:16 PM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 10/03/2022 16:12, David Marchand:
> > On Thu, Mar 10, 2022 at 2:19 PM Michael Baum <michaelba@nvidia.com>
> wrote:
> > >
> > > The mlx5_rx_intr_vec_enable() function allocates queue vector and
> > > fill FD list for Rx interrupts.
> > >
> > > The driver wrongly configured the FD with a non-blocking flag which
> > > prevent waiting on this FD.
> > >
> > > This patch removes O_NONBLOCK flag adding.
> >
> > - Maybe I deserve a Reported-by: credit on this issue.
Hi David,
You are right, you should have get the credit.
> > I sent a proposal to make use of Rx interrupts in OVS
> >
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatc
> hwork.ozlabs.org%2Fproject%2Fopenvswitch%2Fpatch%2F20220304161132.2
> 2065-1-
> david.marchand%40redhat.com%2F&data=04%7C01%7Cmichaelba%40n
> vidia.com%7C3679e1e70986425d465b08da02b14b34%7C43083d15727340c1b7
> db39efd9ccc17a%7C0%7C0%7C637825257758477604%7CUnknown%7CTWFpb
> GZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C3000&sdata=LamFMZTLgZzQoUDBYAGn2aL9z7RWylWCkn
> q2SGFuKOY%3D&reserved=0.
> > And that's when I noticed that mlx5 rx fds were waking OVS too often
> > and reported it to mlx5 maintainers.
> >
> >
> > - Testing this patch by starting dpdk-l3fwd-power example (and no
> > traffic sent at all):
> >
> > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > ...
> > [pid 534983] 0.000348 epoll_wait(26, [], 1, 10) = 0
> > [pid 534983] 0.010082 read(24,
> >
> > For some reason, there is an event available for fd 18 right away
> > (which is the issue in the first place).
I don't understand who is FD 18, I cannot see it in your log
> > When reading this fd, read() blocks until an actual packet is received.
> >
> > Then, I send exactly one packet:
> > [pid 534983] 0.010082 read(24, "@\217:\370\21\0\0\0", 136) = 8
> > [pid 534983] 9.228478 epoll_wait(26, [], 1, 10) = 0
> > [pid 534983] 0.010082 read(24,
> >
> > That makes mlx5 rx interrupts unusable for an application that does
> > more than just polling one rxq.
>
> Excuse me, I don't understand this trace.
> What is the first read?
> Having a second read after epoll_wait is normal if a packet is received, isn't it?
>
On Thu, Mar 10, 2022 at 6:04 PM Michael Baum <michaelba@nvidia.com> wrote:
> > > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power --lcores
> > > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > > ...
> > > [pid 534983] 0.000348 epoll_wait(26, [], 1, 10) = 0
> > > [pid 534983] 0.010082 read(24,
> > >
> > > For some reason, there is an event available for fd 18 right away
> > > (which is the issue in the first place).
>
> I don't understand who is FD 18, I cannot see it in your log
I had refreshed the trace before sending, but did not update the mail,
I meant fd 24.
Now, looking from scratch (rather than my one month old mail),
epoll_wait() returning 0 is normal since it's returning on a 10ms
timeout.
The mlx5 rx fd should be left in non blocking mode.
From: David Marchand
> On Thu, Mar 10, 2022 at 6:04 PM Michael Baum <michaelba@nvidia.com>
> wrote:
> > > > # strace -r -f ./dpdk-dir/v21.11/examples/dpdk-l3fwd-power
> > > > --lcores
> > > > 0@3,1@5 -a 0000:82:00.0 --in-memory -- -p 0x1 -P --config '(0,0,1)'
> > > > ...
> > > > [pid 534983] 0.000348 epoll_wait(26, [], 1, 10) = 0
> > > > [pid 534983] 0.010082 read(24,
> > > >
> > > > For some reason, there is an event available for fd 18 right away
> > > > (which is the issue in the first place).
> >
> > I don't understand who is FD 18, I cannot see it in your log
>
> I had refreshed the trace before sending, but did not update the mail, I meant fd
> 24.
>
>
> Now, looking from scratch (rather than my one month old mail),
> epoll_wait() returning 0 is normal since it's returning on a 10ms timeout.
> The mlx5 rx fd should be left in non blocking mode.
+1
>
> --
> David Marchand
@@ -1075,7 +1075,6 @@ mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev)
/* This rxq obj must not be released in this function. */
struct mlx5_rxq_priv *rxq = mlx5_rxq_get(dev, i);
struct mlx5_rxq_obj *rxq_obj = rxq ? rxq->ctrl->obj : NULL;
- int rc;
/* Skip queues that cannot request interrupts. */
if (!rxq_obj || (!rxq_obj->ibv_channel &&
@@ -1097,23 +1096,10 @@ mlx5_rx_intr_vec_enable(struct rte_eth_dev *dev)
rte_errno = ENOMEM;
return -rte_errno;
}
- rc = mlx5_os_set_nonblock_channel_fd(rxq_obj->fd);
- if (rc < 0) {
- rte_errno = errno;
- DRV_LOG(ERR,
- "port %u failed to make Rx interrupt file"
- " descriptor %d non-blocking for queue index"
- " %d",
- dev->data->port_id, rxq_obj->fd, i);
- mlx5_rx_intr_vec_disable(dev);
- return -rte_errno;
- }
-
if (rte_intr_vec_list_index_set(intr_handle, i,
RTE_INTR_VEC_RXTX_OFFSET + count))
return -rte_errno;
- if (rte_intr_efds_index_set(intr_handle, count,
- rxq_obj->fd))
+ if (rte_intr_efds_index_set(intr_handle, count, rxq_obj->fd))
return -rte_errno;
count++;
}