[v2,2/3] net/af_xdp: support pinning of IRQs

Message ID 20190930164205.19419-3-ciara.loftus@intel.com (mailing list archive)
State Rejected, archived
Delegated to: Ferruh Yigit
Headers
Series AF_XDP tx halt fix, IRQ pinning and unaligned chunks |

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Loftus, Ciara Sept. 30, 2019, 4:42 p.m. UTC
  Network devices using the AF_XDP PMD will trigger interrupts
on reception of packets. The new PMD argument 'queue_irq'
allows the user to specify a core on which to pin interrupts
for a given queue. Multiple queue_irq arguments can be specified.
For example:

  --vdev=net_af_xdp1,iface=eth0,queue_count=2,
           queue_irq=0:2,queue_irq=1:5

..will pin queue 0 interrupts to core 2 and queue 1 interrupts
to core 5.

The queue argument refers to the ethdev queue as opposed to the
netdev queue. These values are the same unless a value greater
than 0 is specified in a start_queue argument.

The drivers supported for this feature are those with support for
AF_XDP zero copy in the kernel, namely ixgbe, i40e and mlx5_core.

Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/nics/af_xdp.rst             |  15 +
 doc/guides/rel_notes/release_19_11.rst |   7 +
 drivers/net/af_xdp/rte_eth_af_xdp.c    | 366 ++++++++++++++++++++++++-
 3 files changed, 383 insertions(+), 5 deletions(-)
  

Comments

Stephen Hemminger Sept. 30, 2019, 5:11 p.m. UTC | #1
On Mon, 30 Sep 2019 16:42:04 +0000
Ciara Loftus <ciara.loftus@intel.com> wrote:

> +/* drivers supported for the queue_irq option */
> +enum supported_drivers {
> +	I40E_DRIVER,
> +	IXGBE_DRIVER,
> +	MLX5_DRIVER,
> +	NUM_DRIVERS
> +};

Anything device specific like this raises a red flag to me.

This regex etc, seems like a huge hack. Is there a better way  using
irqbalance and smp_affinity in kernel drivers?

NACK
  
Loftus, Ciara Oct. 3, 2019, 1:23 p.m. UTC | #2
> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Monday 30 September 2019 18:12
> To: Loftus, Ciara <ciara.loftus@intel.com>
> Cc: dev@dpdk.org; Ye, Xiaolong <xiaolong.ye@intel.com>; Laatz, Kevin
> <kevin.laatz@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> 
> On Mon, 30 Sep 2019 16:42:04 +0000
> Ciara Loftus <ciara.loftus@intel.com> wrote:
> 
> > +/* drivers supported for the queue_irq option */
> > +enum supported_drivers {
> > +	I40E_DRIVER,
> > +	IXGBE_DRIVER,
> > +	MLX5_DRIVER,
> > +	NUM_DRIVERS
> > +};
> 
> Anything device specific like this raises a red flag to me.
> 
> This regex etc, seems like a huge hack. Is there a better way  using
> irqbalance and smp_affinity in kernel drivers?
> 
> NACK

Hi Stephen,
 
Thanks for looking at the patch. I understand your concern however unfortunately I haven't been able to identify a way to achieve the desired outcome by using your suggestions of irqbalance and smp_affinity. Did you have something specific in mind or are aware of any generic way of retrieving interrupt numbers for NICs regardless of vendor or range?
 
I think this feature is really important for the usability of this PMD. Without it, to configure the IRQs the user has to open up /proc/interrupts, trawl through it and identify the correct IRQ number for their given NIC and qid (the format for which is unlikely to be known off-hand), and manually pin them by writing the appropriate values in the appropriate format to the appropriate file - prone to error if not automated IMO.
If the user fails to set the affinity it's probably fine for a single pmd, however with multiple pmds all irqs will by default land on core 0 and lead to terrible performance.

It should be possible to rework the code to remove the regexes and use a direct string compare. Would that make the solution more palatable?
 
Let me know what you think.
 
Thanks,
Ciara
  
Bruce Richardson Oct. 14, 2019, 2:43 p.m. UTC | #3
On Thu, Oct 03, 2019 at 02:23:07PM +0100, Loftus, Ciara wrote:
> 
> 
> > -----Original Message----- From: Stephen Hemminger
> > <stephen@networkplumber.org> Sent: Monday 30 September 2019 18:12 To:
> > Loftus, Ciara <ciara.loftus@intel.com> Cc: dev@dpdk.org; Ye, Xiaolong
> > <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
> > Richardson, Bruce <bruce.richardson@intel.com> Subject: Re: [dpdk-dev]
> > [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> > 
> > On Mon, 30 Sep 2019 16:42:04 +0000 Ciara Loftus
> > <ciara.loftus@intel.com> wrote:
> > 
> > > +/* drivers supported for the queue_irq option */ +enum
> > > supported_drivers { +	I40E_DRIVER, +	IXGBE_DRIVER, +
> > > MLX5_DRIVER, +	NUM_DRIVERS +};
> > 
> > Anything device specific like this raises a red flag to me.
> > 
> > This regex etc, seems like a huge hack. Is there a better way  using
> > irqbalance and smp_affinity in kernel drivers?
> > 
> > NACK
> 
> Hi Stephen,
>  
> Thanks for looking at the patch. I understand your concern however
> unfortunately I haven't been able to identify a way to achieve the
> desired outcome by using your suggestions of irqbalance and smp_affinity.
> Did you have something specific in mind or are aware of any generic way
> of retrieving interrupt numbers for NICs regardless of vendor or range?
>  
> I think this feature is really important for the usability of this PMD.
> Without it, to configure the IRQs the user has to open up
> /proc/interrupts, trawl through it and identify the correct IRQ number
> for their given NIC and qid (the format for which is unlikely to be known
> off-hand), and manually pin them by writing the appropriate values in the
> appropriate format to the appropriate file - prone to error if not
> automated IMO.  If the user fails to set the affinity it's probably fine
> for a single pmd, however with multiple pmds all irqs will by default
> land on core 0 and lead to terrible performance.
> 
> It should be possible to rework the code to remove the regexes and use a
> direct string compare. Would that make the solution more palatable?
>  

Hi Ciara, Stephen,

is there any way forward on this patch?

From my experience with using AF_XDP the pinning of interrupts is both
necessary for performance and sadly rather awkward to implement in
practice. If we can't find a better way to do this, I think merging this
patch is the best thing to do. It may be a bit messy, but the overall user
experience should be far improved over not having it.

Regards,
/Bruce
  
Ray Kinsella Oct. 15, 2019, 11:14 a.m. UTC | #4
On 03/10/2019 14:23, Loftus, Ciara wrote:
> 
> 
>> -----Original Message-----
>> From: Stephen Hemminger <stephen@networkplumber.org>
>> Sent: Monday 30 September 2019 18:12
>> To: Loftus, Ciara <ciara.loftus@intel.com>
>> Cc: dev@dpdk.org; Ye, Xiaolong <xiaolong.ye@intel.com>; Laatz, Kevin
>> <kevin.laatz@intel.com>; Richardson, Bruce <bruce.richardson@intel.com>
>> Subject: Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
>>
>> On Mon, 30 Sep 2019 16:42:04 +0000
>> Ciara Loftus <ciara.loftus@intel.com> wrote:
>>
>>> +/* drivers supported for the queue_irq option */
>>> +enum supported_drivers {
>>> +	I40E_DRIVER,
>>> +	IXGBE_DRIVER,
>>> +	MLX5_DRIVER,
>>> +	NUM_DRIVERS
>>> +};
>>
>> Anything device specific like this raises a red flag to me.
>>
>> This regex etc, seems like a huge hack. Is there a better way  using
>> irqbalance and smp_affinity in kernel drivers?
>>
>> NACK
> 
> Hi Stephen,
>  
> Thanks for looking at the patch. I understand your concern however unfortunately I haven't been able to identify a way to achieve the desired outcome by using your suggestions of irqbalance and smp_affinity. Did you have something specific in mind or are aware of any generic way of retrieving interrupt numbers for NICs regardless of vendor or range?
>  
> I think this feature is really important for the usability of this PMD. Without it, to configure the IRQs the user has to open up /proc/interrupts, trawl through it and identify the correct IRQ number for their given NIC and qid (the format for which is unlikely to be known off-hand), and manually pin them by writing the appropriate values in the appropriate format to the appropriate file - prone to error if not automated IMO.
> If the user fails to set the affinity it's probably fine for a single pmd, however with multiple pmds all irqs will by default land on core 0 and lead to terrible performance.
> 
> It should be possible to rework the code to remove the regexes and use a direct string compare. Would that make the solution more palatable?
>  
> Let me know what you think.
>  
> Thanks,
> Ciara
> 

Assuming there is no easier way to co-relate an ethernet device with an interrupt, to make the strcmp's go away. My preference is for DPDK to take care of it's rqmts - even in a less-that-ideal way, in preference to asking a user to figure it out. 

Ray K
  
Xiaolong Ye Oct. 18, 2019, 11:49 p.m. UTC | #5
On 09/30, Ciara Loftus wrote:
>Network devices using the AF_XDP PMD will trigger interrupts
>on reception of packets. The new PMD argument 'queue_irq'
>allows the user to specify a core on which to pin interrupts
>for a given queue. Multiple queue_irq arguments can be specified.
>For example:
>
>  --vdev=net_af_xdp1,iface=eth0,queue_count=2,
>           queue_irq=0:2,queue_irq=1:5
>
>..will pin queue 0 interrupts to core 2 and queue 1 interrupts
>to core 5.
>
>The queue argument refers to the ethdev queue as opposed to the
>netdev queue. These values are the same unless a value greater
>than 0 is specified in a start_queue argument.
>
>The drivers supported for this feature are those with support for
>AF_XDP zero copy in the kernel, namely ixgbe, i40e and mlx5_core.
>
>Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
>Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>---
> doc/guides/nics/af_xdp.rst             |  15 +
> doc/guides/rel_notes/release_19_11.rst |   7 +
> drivers/net/af_xdp/rte_eth_af_xdp.c    | 366 ++++++++++++++++++++++++-
> 3 files changed, 383 insertions(+), 5 deletions(-)
>

Reviewed-by: Xiaolong Ye <xiaolong.ye@intel.com>
  
Loftus, Ciara Oct. 21, 2019, 10:04 a.m. UTC | #6
> > On Mon, 30 Sep 2019 16:42:04 +0000
> > Ciara Loftus <ciara.loftus@intel.com> wrote:
> >
> > > +/* drivers supported for the queue_irq option */
> > > +enum supported_drivers {
> > > +	I40E_DRIVER,
> > > +	IXGBE_DRIVER,
> > > +	MLX5_DRIVER,
> > > +	NUM_DRIVERS
> > > +};
> >
> > Anything device specific like this raises a red flag to me.
> >
> > This regex etc, seems like a huge hack. Is there a better way  using
> > irqbalance and smp_affinity in kernel drivers?
> >
> > NACK
> 
> Hi Stephen,
> 
> Thanks for looking at the patch. I understand your concern however
> unfortunately I haven't been able to identify a way to achieve the desired
> outcome by using your suggestions of irqbalance and smp_affinity. Did you
> have something specific in mind or are aware of any generic way of retrieving
> interrupt numbers for NICs regardless of vendor or range?
> 
> I think this feature is really important for the usability of this PMD. Without it,
> to configure the IRQs the user has to open up /proc/interrupts, trawl through
> it and identify the correct IRQ number for their given NIC and qid (the format
> for which is unlikely to be known off-hand), and manually pin them by writing
> the appropriate values in the appropriate format to the appropriate file -
> prone to error if not automated IMO.
> If the user fails to set the affinity it's probably fine for a single pmd, however
> with multiple pmds all irqs will by default land on core 0 and lead to terrible
> performance.

Hi,

Following this up with some performance data which shows the impact of no pinning.

The test case is N instances of testpmd macswap where N= the number of interfaces.

ifaces  no pinning  pinning
1       9059100     9171612
2       9261635     18376552
3       9332804     27696702

For the no-pinning case, all IRQs are landing on the default core 0, which results in very poor scaling versus the pinned case where scaling is linear.

Thanks,
Ciara

> 
> It should be possible to rework the code to remove the regexes and use a
> direct string compare. Would that make the solution more palatable?
> 
> Let me know what you think.
> 
> Thanks,
> Ciara
  
Varghese, Vipin Oct. 21, 2019, 12:52 p.m. UTC | #7
Hi Ciara,

snipped
> 
> ifaces  no pinning  pinning
> 1       9059100     9171612
> 2       9261635     18376552
> 3       9332804     27696702
> 
> For the no-pinning case, all IRQs are landing on the default core 0, which
> results in very poor scaling versus the pinned case where scaling is linear.

Thanks for the information, but a question here `Is the reason for landing all IRQ on core '0' is because the Kernel CMD line 'isol or no interupts' is done for all expect core 0?`

If the cores are not isolated and no interrupts are redirected; normally `cat /proc/interrupts` shows IRQ mask to cores. Depending upon FDIR (intel X522 and X710) this could be core 0 or 'n-1'?

> 
> Thanks,
> Ciara
> 
> >
> > It should be possible to rework the code to remove the regexes and use
> > a direct string compare. Would that make the solution more palatable?
> >
> > Let me know what you think.
> >
> > Thanks,
> > Ciara
  
Bruce Richardson Oct. 21, 2019, 1:04 p.m. UTC | #8
On Mon, Oct 21, 2019 at 01:52:27PM +0100, Varghese, Vipin wrote:
> Hi Ciara,
> 
> snipped
> > 
> > ifaces  no pinning  pinning
> > 1       9059100     9171612
> > 2       9261635     18376552
> > 3       9332804     27696702
> > 
> > For the no-pinning case, all IRQs are landing on the default core 0, which
> > results in very poor scaling versus the pinned case where scaling is linear.
> 
> Thanks for the information, but a question here `Is the reason for landing all IRQ on core '0' is because the Kernel CMD line 'isol or no interupts' is done for all expect core 0?`
> 
> If the cores are not isolated and no interrupts are redirected; normally `cat /proc/interrupts` shows IRQ mask to cores. Depending upon FDIR (intel X522 and X710) this could be core 0 or 'n-1'?
> 
Yes, the interrupt pinning default is somewhat dependent on the exact
setup, but the fact remains that in just about any setup the interrupts for
an AF_XDP queue are unlikely to end up on the exactly the one core that the
user wants them on. This is what makes this patch so necessary, both from a
usability and performance point of view.

In the absense of alternatives, I really think this patch should be merged,
since with more than one point the difference between having correctly or
incorrectly pinned interrupts is huge. I'd also point out that in my
testing the interrupts need to be pinned each and every time an app is run -
it's not a set once and forget thing. This ability to have the driver pin
the interrupts for the user would be a big timesaver for developers too,
who may be constantly re-running apps when testing.

Regards,
/Bruce
  
Varghese, Vipin Oct. 21, 2019, 1:11 p.m. UTC | #9
Hi Bruce,

snipped
> > >
> > > For the no-pinning case, all IRQs are landing on the default core 0,
> > > which results in very poor scaling versus the pinned case where scaling is
> linear.
> >
> > Thanks for the information, but a question here `Is the reason for
> > landing all IRQ on core '0' is because the Kernel CMD line 'isol or no
> > interupts' is done for all expect core 0?`
> >
> > If the cores are not isolated and no interrupts are redirected; normally `cat
> /proc/interrupts` shows IRQ mask to cores. Depending upon FDIR (intel X522
> and X710) this could be core 0 or 'n-1'?
> >
> Yes, the interrupt pinning default is somewhat dependent on the exact setup,
> but the fact remains that in just about any setup the interrupts for an AF_XDP
> queue are unlikely to end up on the exactly the one core that the user wants
> them on. This is what makes this patch so necessary, both from a usability and
> performance point of view.
> 
> In the absense of alternatives, I really think this patch should be merged, since
> with more than one point the difference between having correctly or
> incorrectly pinned interrupts is huge. I'd also point out that in my testing the
> interrupts need to be pinned each and every time an app is run - it's not a set
> once and forget thing.
Yes, I agree with you as in my testing with XDP and FDIR we had to do this in each test run.

 This ability to have the driver pin the interrupts for the
> user would be a big timesaver for developers too, who may be constantly re-
> running apps when testing.
Here my understanding, user can not or should not pass DPDK cores for interrupt pinning. So should we ask the driver to fetch `rte_eal_configuration` and ensure the same?

> 
> Regards,
> /Bruce
  
Bruce Richardson Oct. 21, 2019, 1:17 p.m. UTC | #10
On Mon, Oct 21, 2019 at 02:11:39PM +0100, Varghese, Vipin wrote:
> Hi Bruce,
> 
> snipped
> > > >
> > > > For the no-pinning case, all IRQs are landing on the default core 0,
> > > > which results in very poor scaling versus the pinned case where scaling is
> > linear.
> > >
> > > Thanks for the information, but a question here `Is the reason for
> > > landing all IRQ on core '0' is because the Kernel CMD line 'isol or no
> > > interupts' is done for all expect core 0?`
> > >
> > > If the cores are not isolated and no interrupts are redirected; normally `cat
> > /proc/interrupts` shows IRQ mask to cores. Depending upon FDIR (intel X522
> > and X710) this could be core 0 or 'n-1'?
> > >
> > Yes, the interrupt pinning default is somewhat dependent on the exact setup,
> > but the fact remains that in just about any setup the interrupts for an AF_XDP
> > queue are unlikely to end up on the exactly the one core that the user wants
> > them on. This is what makes this patch so necessary, both from a usability and
> > performance point of view.
> > 
> > In the absense of alternatives, I really think this patch should be merged, since
> > with more than one point the difference between having correctly or
> > incorrectly pinned interrupts is huge. I'd also point out that in my testing the
> > interrupts need to be pinned each and every time an app is run - it's not a set
> > once and forget thing.
> Yes, I agree with you as in my testing with XDP and FDIR we had to do this in each test run.
> 
>  This ability to have the driver pin the interrupts for the
> > user would be a big timesaver for developers too, who may be constantly re-
> > running apps when testing.
> Here my understanding, user can not or should not pass DPDK cores for interrupt pinning. So should we ask the driver to fetch `rte_eal_configuration` and ensure the same?
> 

Actually I disagree. I think the user should pass the cores for interrupt
pinning, because unlike other PMDs it is perfectly valid to have the
interrupts pinned to dedicated cores separate from those used by DPDK.

Or taking another example, suppose the app takes 8 cores in the coremask,
but only one of those cores is to be used for I/O, what cores should the
driver pin the interrupts to? It probably should be the same core used for
I/O, but the driver can't know which cores will be for that, or
alternatively the user might want to use AF_XDP split across two cores, in
which case any core on the system might be the intended one for interrupts.

Regards,
/Bruce
  
Varghese, Vipin Oct. 21, 2019, 1:45 p.m. UTC | #11
Hi Bruce,

snipped
> >  This ability to have the driver pin the interrupts for the
> > > user would be a big timesaver for developers too, who may be
> > > constantly re- running apps when testing.
> > Here my understanding, user can not or should not pass DPDK cores for
> interrupt pinning. So should we ask the driver to fetch `rte_eal_configuration`
> and ensure the same?
> >
> 
> Actually I disagree. I think the user should pass the cores for interrupt pinning,
I agree to this.

> because unlike other PMDs it is perfectly valid to have the interrupts pinned to
> dedicated cores separate from those used by DPDK.
My point is the same, but not on DPDK DP or service cores.

> 
> Or taking another example, suppose the app takes 8 cores in the coremask, but
> only one of those cores is to be used for I/O, what cores should the driver pin
> the interrupts to?
It can be cores on machine (guest or host) which is not used by DPDK.

 It probably should be the same core used for I/O, but the
> driver can't know which cores will be for that, or alternatively the user might
> want to use AF_XDP split across two cores, in which case any core on the
> system might be the intended one for interrupts.
I agree to the patch, only difference in dev->probe function, should not there be validation to ensure the IRQ core is not DPDK core or Service core as the Interface is owned by kernel and for non matched eBPF skb buff is used by kernel.
  
Bruce Richardson Oct. 21, 2019, 1:56 p.m. UTC | #12
On Mon, Oct 21, 2019 at 02:45:05PM +0100, Varghese, Vipin wrote:
> Hi Bruce,
> 
> snipped
> > >  This ability to have the driver pin the interrupts for the
> > > > user would be a big timesaver for developers too, who may be
> > > > constantly re- running apps when testing.
> > > Here my understanding, user can not or should not pass DPDK cores for
> > interrupt pinning. So should we ask the driver to fetch `rte_eal_configuration`
> > and ensure the same?
> > >
> > 
> > Actually I disagree. I think the user should pass the cores for interrupt pinning,
> I agree to this.
> 
> > because unlike other PMDs it is perfectly valid to have the interrupts pinned to
> > dedicated cores separate from those used by DPDK.
> My point is the same, but not on DPDK DP or service cores.
> 
> > 
> > Or taking another example, suppose the app takes 8 cores in the coremask, but
> > only one of those cores is to be used for I/O, what cores should the driver pin
> > the interrupts to?
> It can be cores on machine (guest or host) which is not used by DPDK.
> 
>  It probably should be the same core used for I/O, but the
> > driver can't know which cores will be for that, or alternatively the user might
> > want to use AF_XDP split across two cores, in which case any core on the
> > system might be the intended one for interrupts.
> I agree to the patch, only difference in dev->probe function, should not there be validation to ensure the IRQ core is not DPDK core or Service core as the Interface is owned by kernel and for non matched eBPF skb buff is used by kernel.
> 
No. Since the 5.4 kernel, it's a usable configuration to run both the
kernel and userspace portions of AF_XDP on the same core. In order to get
best performance with a fixed number of cores, this setup - with interrupts
pinned to the polling RX core - is now recommended. [For absolute best perf
using any number of cores, a separate interrupt core may still work best,
though]

/Bruce
  
Varghese, Vipin Oct. 21, 2019, 2:06 p.m. UTC | #13
Ok thanks

> -----Original Message-----
> From: Bruce Richardson <bruce.richardson@intel.com>
> Sent: Monday, October 21, 2019 7:26 PM
> To: Varghese, Vipin <vipin.varghese@intel.com>
> Cc: Loftus, Ciara <ciara.loftus@intel.com>; 'Stephen Hemminger'
> <stephen@networkplumber.org>; 'dev@dpdk.org' <dev@dpdk.org>; Ye,
> Xiaolong <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
> Yigit, Ferruh <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> 
> On Mon, Oct 21, 2019 at 02:45:05PM +0100, Varghese, Vipin wrote:
> > Hi Bruce,
> >
> > snipped
> > > >  This ability to have the driver pin the interrupts for the
> > > > > user would be a big timesaver for developers too, who may be
> > > > > constantly re- running apps when testing.
> > > > Here my understanding, user can not or should not pass DPDK cores
> > > > for
> > > interrupt pinning. So should we ask the driver to fetch
> > > `rte_eal_configuration` and ensure the same?
> > > >
> > >
> > > Actually I disagree. I think the user should pass the cores for
> > > interrupt pinning,
> > I agree to this.
> >
> > > because unlike other PMDs it is perfectly valid to have the
> > > interrupts pinned to dedicated cores separate from those used by DPDK.
> > My point is the same, but not on DPDK DP or service cores.
> >
> > >
> > > Or taking another example, suppose the app takes 8 cores in the
> > > coremask, but only one of those cores is to be used for I/O, what
> > > cores should the driver pin the interrupts to?
> > It can be cores on machine (guest or host) which is not used by DPDK.
> >
> >  It probably should be the same core used for I/O, but the
> > > driver can't know which cores will be for that, or alternatively the
> > > user might want to use AF_XDP split across two cores, in which case
> > > any core on the system might be the intended one for interrupts.
> > I agree to the patch, only difference in dev->probe function, should not there
> be validation to ensure the IRQ core is not DPDK core or Service core as the
> Interface is owned by kernel and for non matched eBPF skb buff is used by
> kernel.
> >
> No. Since the 5.4 kernel, it's a usable configuration to run both the kernel and
> userspace portions of AF_XDP on the same core. In order to get best
> performance with a fixed number of cores, this setup - with interrupts pinned
> to the polling RX core - is now recommended. [For absolute best perf using any
> number of cores, a separate interrupt core may still work best, though]
> 
> /Bruce
  
Ferruh Yigit Oct. 21, 2019, 3:24 p.m. UTC | #14
On 10/14/2019 3:43 PM, Bruce Richardson wrote:
> On Thu, Oct 03, 2019 at 02:23:07PM +0100, Loftus, Ciara wrote:
>>
>>
>>> -----Original Message----- From: Stephen Hemminger
>>> <stephen@networkplumber.org> Sent: Monday 30 September 2019 18:12 To:
>>> Loftus, Ciara <ciara.loftus@intel.com> Cc: dev@dpdk.org; Ye, Xiaolong
>>> <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
>>> Richardson, Bruce <bruce.richardson@intel.com> Subject: Re: [dpdk-dev]
>>> [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
>>>
>>> On Mon, 30 Sep 2019 16:42:04 +0000 Ciara Loftus
>>> <ciara.loftus@intel.com> wrote:
>>>
>>>> +/* drivers supported for the queue_irq option */ +enum
>>>> supported_drivers { +	I40E_DRIVER, +	IXGBE_DRIVER, +
>>>> MLX5_DRIVER, +	NUM_DRIVERS +};
>>>
>>> Anything device specific like this raises a red flag to me.
>>>
>>> This regex etc, seems like a huge hack. Is there a better way  using
>>> irqbalance and smp_affinity in kernel drivers?
>>>
>>> NACK
>>
>> Hi Stephen,
>>  
>> Thanks for looking at the patch. I understand your concern however
>> unfortunately I haven't been able to identify a way to achieve the
>> desired outcome by using your suggestions of irqbalance and smp_affinity.
>> Did you have something specific in mind or are aware of any generic way
>> of retrieving interrupt numbers for NICs regardless of vendor or range?
>>  
>> I think this feature is really important for the usability of this PMD.
>> Without it, to configure the IRQs the user has to open up
>> /proc/interrupts, trawl through it and identify the correct IRQ number
>> for their given NIC and qid (the format for which is unlikely to be known
>> off-hand), and manually pin them by writing the appropriate values in the
>> appropriate format to the appropriate file - prone to error if not
>> automated IMO.  If the user fails to set the affinity it's probably fine
>> for a single pmd, however with multiple pmds all irqs will by default
>> land on core 0 and lead to terrible performance.
>>
>> It should be possible to rework the code to remove the regexes and use a
>> direct string compare. Would that make the solution more palatable?
>>  
> 
> Hi Ciara, Stephen,
> 
> is there any way forward on this patch?
> 
> From my experience with using AF_XDP the pinning of interrupts is both
> necessary for performance and sadly rather awkward to implement in
> practice. If we can't find a better way to do this, I think merging this
> patch is the best thing to do. It may be a bit messy, but the overall user
> experience should be far improved over not having it.
> 

Can we have this external to the PMD, like a helper script that run after you
start the DPDK app?
  
Bruce Richardson Oct. 21, 2019, 3:54 p.m. UTC | #15
On Mon, Oct 21, 2019 at 04:24:26PM +0100, Ferruh Yigit wrote:
> On 10/14/2019 3:43 PM, Bruce Richardson wrote:
> > On Thu, Oct 03, 2019 at 02:23:07PM +0100, Loftus, Ciara wrote:
> >>
> >>
> >>> -----Original Message----- From: Stephen Hemminger
> >>> <stephen@networkplumber.org> Sent: Monday 30 September 2019 18:12 To:
> >>> Loftus, Ciara <ciara.loftus@intel.com> Cc: dev@dpdk.org; Ye, Xiaolong
> >>> <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
> >>> Richardson, Bruce <bruce.richardson@intel.com> Subject: Re: [dpdk-dev]
> >>> [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> >>>
> >>> On Mon, 30 Sep 2019 16:42:04 +0000 Ciara Loftus
> >>> <ciara.loftus@intel.com> wrote:
> >>>
> >>>> +/* drivers supported for the queue_irq option */ +enum
> >>>> supported_drivers { +	I40E_DRIVER, +	IXGBE_DRIVER, +
> >>>> MLX5_DRIVER, +	NUM_DRIVERS +};
> >>>
> >>> Anything device specific like this raises a red flag to me.
> >>>
> >>> This regex etc, seems like a huge hack. Is there a better way  using
> >>> irqbalance and smp_affinity in kernel drivers?
> >>>
> >>> NACK
> >>
> >> Hi Stephen,
> >>  
> >> Thanks for looking at the patch. I understand your concern however
> >> unfortunately I haven't been able to identify a way to achieve the
> >> desired outcome by using your suggestions of irqbalance and smp_affinity.
> >> Did you have something specific in mind or are aware of any generic way
> >> of retrieving interrupt numbers for NICs regardless of vendor or range?
> >>  
> >> I think this feature is really important for the usability of this PMD.
> >> Without it, to configure the IRQs the user has to open up
> >> /proc/interrupts, trawl through it and identify the correct IRQ number
> >> for their given NIC and qid (the format for which is unlikely to be known
> >> off-hand), and manually pin them by writing the appropriate values in the
> >> appropriate format to the appropriate file - prone to error if not
> >> automated IMO.  If the user fails to set the affinity it's probably fine
> >> for a single pmd, however with multiple pmds all irqs will by default
> >> land on core 0 and lead to terrible performance.
> >>
> >> It should be possible to rework the code to remove the regexes and use a
> >> direct string compare. Would that make the solution more palatable?
> >>  
> > 
> > Hi Ciara, Stephen,
> > 
> > is there any way forward on this patch?
> > 
> > From my experience with using AF_XDP the pinning of interrupts is both
> > necessary for performance and sadly rather awkward to implement in
> > practice. If we can't find a better way to do this, I think merging this
> > patch is the best thing to do. It may be a bit messy, but the overall user
> > experience should be far improved over not having it.
> > 
> 
> Can we have this external to the PMD, like a helper script that run after you
> start the DPDK app?

It could be, but the main objection here seems to be with the method used
to find the interrupts, which would not change based on a script version,
not to mention the fact that the usability would be far less in that case.
For ease of use, this needs to be part of the driver, and the current
implementation is the best we can do right now, and nobody has suggested a
concrete alternative to it.

/Bruce
  
Ferruh Yigit Oct. 21, 2019, 4:02 p.m. UTC | #16
On 10/21/2019 4:54 PM, Bruce Richardson wrote:
> On Mon, Oct 21, 2019 at 04:24:26PM +0100, Ferruh Yigit wrote:
>> On 10/14/2019 3:43 PM, Bruce Richardson wrote:
>>> On Thu, Oct 03, 2019 at 02:23:07PM +0100, Loftus, Ciara wrote:
>>>>
>>>>
>>>>> -----Original Message----- From: Stephen Hemminger
>>>>> <stephen@networkplumber.org> Sent: Monday 30 September 2019 18:12 To:
>>>>> Loftus, Ciara <ciara.loftus@intel.com> Cc: dev@dpdk.org; Ye, Xiaolong
>>>>> <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
>>>>> Richardson, Bruce <bruce.richardson@intel.com> Subject: Re: [dpdk-dev]
>>>>> [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
>>>>>
>>>>> On Mon, 30 Sep 2019 16:42:04 +0000 Ciara Loftus
>>>>> <ciara.loftus@intel.com> wrote:
>>>>>
>>>>>> +/* drivers supported for the queue_irq option */ +enum
>>>>>> supported_drivers { +	I40E_DRIVER, +	IXGBE_DRIVER, +
>>>>>> MLX5_DRIVER, +	NUM_DRIVERS +};
>>>>>
>>>>> Anything device specific like this raises a red flag to me.
>>>>>
>>>>> This regex etc, seems like a huge hack. Is there a better way  using
>>>>> irqbalance and smp_affinity in kernel drivers?
>>>>>
>>>>> NACK
>>>>
>>>> Hi Stephen,
>>>>  
>>>> Thanks for looking at the patch. I understand your concern however
>>>> unfortunately I haven't been able to identify a way to achieve the
>>>> desired outcome by using your suggestions of irqbalance and smp_affinity.
>>>> Did you have something specific in mind or are aware of any generic way
>>>> of retrieving interrupt numbers for NICs regardless of vendor or range?
>>>>  
>>>> I think this feature is really important for the usability of this PMD.
>>>> Without it, to configure the IRQs the user has to open up
>>>> /proc/interrupts, trawl through it and identify the correct IRQ number
>>>> for their given NIC and qid (the format for which is unlikely to be known
>>>> off-hand), and manually pin them by writing the appropriate values in the
>>>> appropriate format to the appropriate file - prone to error if not
>>>> automated IMO.  If the user fails to set the affinity it's probably fine
>>>> for a single pmd, however with multiple pmds all irqs will by default
>>>> land on core 0 and lead to terrible performance.
>>>>
>>>> It should be possible to rework the code to remove the regexes and use a
>>>> direct string compare. Would that make the solution more palatable?
>>>>  
>>>
>>> Hi Ciara, Stephen,
>>>
>>> is there any way forward on this patch?
>>>
>>> From my experience with using AF_XDP the pinning of interrupts is both
>>> necessary for performance and sadly rather awkward to implement in
>>> practice. If we can't find a better way to do this, I think merging this
>>> patch is the best thing to do. It may be a bit messy, but the overall user
>>> experience should be far improved over not having it.
>>>
>>
>> Can we have this external to the PMD, like a helper script that run after you
>> start the DPDK app?
> 
> It could be, but the main objection here seems to be with the method used
> to find the interrupts, which would not change based on a script version,
> not to mention the fact that the usability would be far less in that case.
> For ease of use, this needs to be part of the driver, and the current
> implementation is the best we can do right now, and nobody has suggested a
> concrete alternative to it.
> 

The method used to find interrupts can be same but it being embedded into a
specific PMD with the list of supported driver hardcoded versus putting this
information into a script differs I think.

I tend to agree that this looks hack for the PMD.

And not sure if the usability will be far less, instead of providing parameters
to the PMD, you will provide to the script. And script becomes something more
generic.
  
Bruce Richardson Oct. 21, 2019, 4:14 p.m. UTC | #17
On Mon, Oct 21, 2019 at 05:02:25PM +0100, Ferruh Yigit wrote:
> On 10/21/2019 4:54 PM, Bruce Richardson wrote:
> > On Mon, Oct 21, 2019 at 04:24:26PM +0100, Ferruh Yigit wrote:
> >> On 10/14/2019 3:43 PM, Bruce Richardson wrote:
> >>> On Thu, Oct 03, 2019 at 02:23:07PM +0100, Loftus, Ciara wrote:
> >>>>
> >>>>
> >>>>> -----Original Message----- From: Stephen Hemminger
> >>>>> <stephen@networkplumber.org> Sent: Monday 30 September 2019 18:12 To:
> >>>>> Loftus, Ciara <ciara.loftus@intel.com> Cc: dev@dpdk.org; Ye, Xiaolong
> >>>>> <xiaolong.ye@intel.com>; Laatz, Kevin <kevin.laatz@intel.com>;
> >>>>> Richardson, Bruce <bruce.richardson@intel.com> Subject: Re: [dpdk-dev]
> >>>>> [PATCH v2 2/3] net/af_xdp: support pinning of IRQs
> >>>>>
> >>>>> On Mon, 30 Sep 2019 16:42:04 +0000 Ciara Loftus
> >>>>> <ciara.loftus@intel.com> wrote:
> >>>>>
> >>>>>> +/* drivers supported for the queue_irq option */ +enum
> >>>>>> supported_drivers { +	I40E_DRIVER, +	IXGBE_DRIVER, +
> >>>>>> MLX5_DRIVER, +	NUM_DRIVERS +};
> >>>>>
> >>>>> Anything device specific like this raises a red flag to me.
> >>>>>
> >>>>> This regex etc, seems like a huge hack. Is there a better way  using
> >>>>> irqbalance and smp_affinity in kernel drivers?
> >>>>>
> >>>>> NACK
> >>>>
> >>>> Hi Stephen,
> >>>>  
> >>>> Thanks for looking at the patch. I understand your concern however
> >>>> unfortunately I haven't been able to identify a way to achieve the
> >>>> desired outcome by using your suggestions of irqbalance and smp_affinity.
> >>>> Did you have something specific in mind or are aware of any generic way
> >>>> of retrieving interrupt numbers for NICs regardless of vendor or range?
> >>>>  
> >>>> I think this feature is really important for the usability of this PMD.
> >>>> Without it, to configure the IRQs the user has to open up
> >>>> /proc/interrupts, trawl through it and identify the correct IRQ number
> >>>> for their given NIC and qid (the format for which is unlikely to be known
> >>>> off-hand), and manually pin them by writing the appropriate values in the
> >>>> appropriate format to the appropriate file - prone to error if not
> >>>> automated IMO.  If the user fails to set the affinity it's probably fine
> >>>> for a single pmd, however with multiple pmds all irqs will by default
> >>>> land on core 0 and lead to terrible performance.
> >>>>
> >>>> It should be possible to rework the code to remove the regexes and use a
> >>>> direct string compare. Would that make the solution more palatable?
> >>>>  
> >>>
> >>> Hi Ciara, Stephen,
> >>>
> >>> is there any way forward on this patch?
> >>>
> >>> From my experience with using AF_XDP the pinning of interrupts is both
> >>> necessary for performance and sadly rather awkward to implement in
> >>> practice. If we can't find a better way to do this, I think merging this
> >>> patch is the best thing to do. It may be a bit messy, but the overall user
> >>> experience should be far improved over not having it.
> >>>
> >>
> >> Can we have this external to the PMD, like a helper script that run after you
> >> start the DPDK app?
> > 
> > It could be, but the main objection here seems to be with the method used
> > to find the interrupts, which would not change based on a script version,
> > not to mention the fact that the usability would be far less in that case.
> > For ease of use, this needs to be part of the driver, and the current
> > implementation is the best we can do right now, and nobody has suggested a
> > concrete alternative to it.
> > 
> 
> The method used to find interrupts can be same but it being embedded into a
> specific PMD with the list of supported driver hardcoded versus putting this
> information into a script differs I think.
> 
> I tend to agree that this looks hack for the PMD.
> 
We are happy to work to make it less of a hack. However, no better methods
are forthcoming.

> And not sure if the usability will be far less, instead of providing parameters
> to the PMD, you will provide to the script. And script becomes something more
> generic.
Unlikely to be generic. It will still be very much tied to AF_XDP PMD, I
think. A script would be workable option except for the fact that the
script has to be run each time *after* the application starts up. This is
very different to e.g. dev_bind, where it only needs to be done once, and
can be done ahead of application startup. Using a script in this case means
that the user needs to either start their app in the background or use a
separate terminal session to run the script once app startup has occurred.
This also makes the integration into a CI automation system that bit more
complex too, I believe.

Beyond this, this patch solves an immediate, visible, user problem
illustrated with measurable performance data above, while the issues with
it are concerning possible future maintenance issues which may never arise.

/Bruce
  

Patch

diff --git a/doc/guides/nics/af_xdp.rst b/doc/guides/nics/af_xdp.rst
index ec46f08f0..a255ba4e7 100644
--- a/doc/guides/nics/af_xdp.rst
+++ b/doc/guides/nics/af_xdp.rst
@@ -36,6 +36,11 @@  The following options can be provided to set up an af_xdp port in DPDK.
 *   ``start_queue`` - starting netdev queue id (optional, default 0);
 *   ``queue_count`` - total netdev queue number (optional, default 1);
 *   ``pmd_zero_copy`` - enable zero copy or not (optional, default 0);
+*   ``queue_irq`` - pin queue irqs to specified core <queue:core> (optional,
+    default no pinning). The queue argument refers to the ethdev queue as
+    opposed to the netdev queue. These values are the same unless a value
+    greater than 0 is specified for start_queue. ixgbe, i40e and mlx5 drivers
+    supported;
 
 Prerequisites
 -------------
@@ -57,3 +62,13 @@  The following example will set up an af_xdp interface in DPDK:
 .. code-block:: console
 
     --vdev net_af_xdp,iface=ens786f1
+
+Pin queue IRQs
+--------------
+The following example will pin queue 0 interrupts to core 2 and queue 1
+interrupts to core 5.
+
+.. code-block:: console
+
+      --vdev=net_af_xdp1,iface=eth0,queue_count=2,
+               queue_irq=0:2,queue_irq=1:5
diff --git a/doc/guides/rel_notes/release_19_11.rst b/doc/guides/rel_notes/release_19_11.rst
index 27cfbd9e3..06bf57c42 100644
--- a/doc/guides/rel_notes/release_19_11.rst
+++ b/doc/guides/rel_notes/release_19_11.rst
@@ -56,6 +56,13 @@  New Features
      Also, make sure to start the actual text at the margin.
      =========================================================
 
+* **Updated the AF_XDP PMD.**
+
+  Updated the AF_XDP PMD. The new features include:
+
+  * Support for pinning netdev queue IRQs to cores specified by the user.
+    Available for ixgbe, i40e and mlx5 drivers.
+
 
 Removed Items
 -------------
diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c b/drivers/net/af_xdp/rte_eth_af_xdp.c
index e496e9aaa..dfb2845cb 100644
--- a/drivers/net/af_xdp/rte_eth_af_xdp.c
+++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
@@ -3,6 +3,7 @@ 
  */
 #include <unistd.h>
 #include <errno.h>
+#include <regex.h>
 #include <stdlib.h>
 #include <string.h>
 #include <poll.h>
@@ -10,6 +11,7 @@ 
 #include <net/if.h>
 #include <sys/socket.h>
 #include <sys/ioctl.h>
+#include <sys/sysinfo.h>
 #include <linux/if_ether.h>
 #include <linux/if_xdp.h>
 #include <linux/if_link.h>
@@ -17,6 +19,8 @@ 
 #include <linux/sockios.h>
 #include "af_xdp_deps.h"
 #include <bpf/xsk.h>
+#include <sys/stat.h>
+#include <libgen.h>
 
 #include <rte_ethdev.h>
 #include <rte_ethdev_driver.h>
@@ -116,6 +120,7 @@  struct pmd_internals {
 	int queue_cnt;
 	int max_queue_cnt;
 	int combined_queue_cnt;
+	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
 
 	int pmd_zc;
 	struct rte_ether_addr eth_addr;
@@ -128,12 +133,14 @@  struct pmd_internals {
 #define ETH_AF_XDP_START_QUEUE_ARG		"start_queue"
 #define ETH_AF_XDP_QUEUE_COUNT_ARG		"queue_count"
 #define ETH_AF_XDP_PMD_ZC_ARG			"pmd_zero_copy"
+#define ETH_AF_XDP_QUEUE_IRQ_ARG		"queue_irq"
 
 static const char * const valid_arguments[] = {
 	ETH_AF_XDP_IFACE_ARG,
 	ETH_AF_XDP_START_QUEUE_ARG,
 	ETH_AF_XDP_QUEUE_COUNT_ARG,
 	ETH_AF_XDP_PMD_ZC_ARG,
+	ETH_AF_XDP_QUEUE_IRQ_ARG,
 	NULL
 };
 
@@ -144,6 +151,26 @@  static const struct rte_eth_link pmd_link = {
 	.link_autoneg = ETH_LINK_AUTONEG
 };
 
+/* drivers supported for the queue_irq option */
+enum supported_drivers {
+	I40E_DRIVER,
+	IXGBE_DRIVER,
+	MLX5_DRIVER,
+	NUM_DRIVERS
+};
+const char driver_array[NUM_DRIVERS][NAME_MAX] = {"i40e", "ixgbe", "mlx5_core"};
+
+/*
+ * function pointer template to be implemented for each driver in 'driver_array'
+ * to generate the appropriate regular expression to search for in
+ * /proc/interrupts in order to identify the IRQ number for the netdev_qid of
+ * the given interface.
+ */
+typedef
+int (*generate_driver_regex_func)(struct pmd_internals *internals,
+				  uint16_t netdev_qid,
+				  char *iface_regex_str);
+
 static inline int
 reserve_fill_queue(struct xsk_umem_info *umem, uint16_t reserve_size)
 {
@@ -660,6 +687,287 @@  xsk_configure(struct pmd_internals *internals, struct pkt_rx_queue *rxq,
 	return ret;
 }
 
+/** get interface's driver name to determine /proc/interrupts entry format */
+static int
+get_driver_name(struct pmd_internals *internals, char *driver)
+{
+	char driver_path[PATH_MAX];
+	struct stat s;
+	char link[PATH_MAX];
+	int len;
+
+	snprintf(driver_path, sizeof(driver_path),
+			"/sys/class/net/%s/device/driver", internals->if_name);
+	if (lstat(driver_path, &s)) {
+		AF_XDP_LOG(ERR, "Error reading %s: %s\n",
+					driver_path, strerror(errno));
+		return -errno;
+	}
+
+	/* driver_path should link to /sys/bus/pci/drivers/<driver_name> */
+	len = readlink(driver_path, link, PATH_MAX - 1);
+	if (len == -1) {
+		AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n",
+					driver_path, strerror(errno));
+		return -errno;
+	}
+
+	link[len] = '\0';
+	strlcpy(driver, basename(link), NAME_MAX);
+	if (!strncmp(driver, ".", strlen(driver))) {
+		AF_XDP_LOG(ERR, "Error getting driver name from %s: %s\n",
+					link, strerror(errno));
+		return -errno;
+	}
+
+	return 0;
+}
+
+static int
+generate_ixgbe_i40e_regex(struct pmd_internals *internals, uint16_t netdev_qid,
+			  char *iface_regex_str)
+{
+	if (snprintf(iface_regex_str, 128,
+			"-%s.*-%d", internals->if_name, netdev_qid) >= 128) {
+		AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n",
+					internals->if_name, netdev_qid);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+generate_mlx5_regex(struct pmd_internals *internals, uint16_t netdev_qid,
+		    char *iface_regex_str)
+{
+	char pci_path[PATH_MAX];
+	char *pci;
+	int ret = -1;
+	struct stat s;
+	char *link;
+	int len;
+
+	snprintf(pci_path, sizeof(pci_path),
+			"/sys/class/net/%s/device", internals->if_name);
+	if (lstat(pci_path, &s)) {
+		AF_XDP_LOG(ERR, "Error reading %s: %s\n",
+					pci_path, strerror(errno));
+		return -errno;
+	}
+
+	/* pci_path should link to a directory whose name is the pci addr */
+	link = malloc(s.st_size + 1);
+	len = readlink(pci_path, link, PATH_MAX - 1);
+	if (len == -1) {
+		AF_XDP_LOG(ERR, "Error reading symbolic link %s: %s\n",
+					pci_path, strerror(errno));
+		ret = -errno;
+		goto out;
+	}
+
+	link[len] = '\0';
+	pci = basename(link);
+	if (!strncmp(pci, ".", strlen(pci))) {
+		AF_XDP_LOG(ERR, "Error getting pci from %s\n", link);
+		goto out;
+	}
+
+	if (snprintf(iface_regex_str, 128, ".*p%i@pci:%s", netdev_qid, pci) >=
+			128) {
+		AF_XDP_LOG(INFO, "Cannot get interrupt for %s q %i\n",
+					internals->if_name, netdev_qid);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	if (link)
+		free(link);
+
+	return ret;
+}
+
+/*
+ * array of handlers for different drivers for generating appropriate regex
+ * format for searching /proc/interrupts
+ */
+generate_driver_regex_func driver_handlers[NUM_DRIVERS] = {
+					generate_ixgbe_i40e_regex,
+					generate_ixgbe_i40e_regex,
+					generate_mlx5_regex};
+
+/*
+ * function for getting the index into driver_handlers array that corresponds
+ * to 'driver'
+ */
+static int
+get_driver_idx(const char *driver)
+{
+	for (int i = 0; i < NUM_DRIVERS; i++) {
+		if (strncmp(driver, driver_array[i], strlen(driver_array[i])))
+			continue;
+		return i;
+	}
+
+	return -1;
+}
+
+/** generate /proc/interrupts search regex based on driver type */
+static int
+generate_search_regex(struct pmd_internals *internals, const char *driver,
+		      uint16_t netdev_qid, regex_t *r)
+{
+	char iface_regex_str[128];
+	int ret = -1;
+	int idx = get_driver_idx(driver);
+
+	if (idx == -1 || driver_handlers[idx] == NULL) {
+		AF_XDP_LOG(ERR, "Error getting driver handler for %s\n",
+					internals->if_name);
+		goto out;
+	}
+
+	if (driver_handlers[idx](internals, netdev_qid, iface_regex_str)) {
+		AF_XDP_LOG(ERR, "Error getting regex string for %s\n",
+					internals->if_name);
+		goto out;
+	}
+
+	if (regcomp(r, iface_regex_str, 0)) {
+		AF_XDP_LOG(ERR, "Error computing regex %s\n", iface_regex_str);
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	return ret;
+}
+
+/** get interrupt number associated with the given interface qid */
+static int
+get_interrupt_number(struct pmd_internals *internals, regex_t *r,
+		     int *interrupt)
+{
+	FILE *f_int_proc;
+	int found = 0;
+	char line[4096];
+	int ret = 0;
+
+	f_int_proc = fopen("/proc/interrupts", "r");
+	if (f_int_proc == NULL) {
+		AF_XDP_LOG(ERR, "Failed to open /proc/interrupts.\n");
+		return -1;
+	}
+
+	while (!feof(f_int_proc) && !found) {
+		/* Make sure to read a full line at a time */
+		if (fgets(line, sizeof(line), f_int_proc) == NULL ||
+				line[strlen(line) - 1] != '\n') {
+			AF_XDP_LOG(ERR, "Error reading from interrupts file\n");
+			ret = -1;
+			break;
+		}
+
+		/* Extract interrupt number from line */
+		if (regexec(r, line, 0, NULL, 0) == 0) {
+			*interrupt = atoi(line);
+			found = true;
+			AF_XDP_LOG(INFO, "Got interrupt %d for %s\n",
+						*interrupt, internals->if_name);
+		}
+	}
+
+	fclose(f_int_proc);
+
+	return ret;
+}
+
+/** affinitise interrupts for the given qid to the given coreid */
+static int
+set_irq_affinity(struct pmd_internals *internals, int coreid,
+		 uint16_t rx_queue_id, uint16_t netdev_qid, int interrupt)
+{
+	char bitmask[128];
+	char smp_affinity_filename[NAME_MAX];
+	FILE *f_int_smp_affinity;
+	int i, ret = 0;
+
+	/* Create affinity bitmask. Every 32 bits are separated by a comma */
+	snprintf(bitmask, sizeof(bitmask), "%x", 1 << (coreid % 32));
+	for (i = 0; i < coreid / 32; i++)
+		strlcat(bitmask, ",00000000", sizeof(bitmask));
+
+	/* Write the new affinity bitmask */
+	snprintf(smp_affinity_filename, sizeof(smp_affinity_filename),
+			"/proc/irq/%d/smp_affinity", interrupt);
+	f_int_smp_affinity = fopen(smp_affinity_filename, "w");
+	if (f_int_smp_affinity == NULL) {
+		AF_XDP_LOG(ERR, "Error opening %s\n", smp_affinity_filename);
+		return -1;
+	}
+	if (fwrite(bitmask, strlen(bitmask), 1, f_int_smp_affinity) != 1) {
+		AF_XDP_LOG(ERR, "Error writing to %s\n", smp_affinity_filename);
+		ret = -1;
+		goto out;
+	}
+
+	AF_XDP_LOG(INFO, "IRQs for %s ethdev queue %i (netdev queue %i)"
+				" affinitised to core %i\n",
+				internals->if_name, rx_queue_id,
+				netdev_qid, coreid);
+out:
+	fclose(f_int_smp_affinity);
+
+	return ret;
+}
+
+static void
+configure_irqs(struct pmd_internals *internals, uint16_t rx_queue_id)
+{
+	int coreid = internals->queue_irqs[rx_queue_id];
+	char driver[NAME_MAX];
+	uint16_t netdev_qid = rx_queue_id + internals->start_queue_idx;
+	regex_t r;
+	int interrupt;
+
+	if (coreid < 0)
+		return;
+
+	if (coreid > (get_nprocs() - 1)) {
+		AF_XDP_LOG(ERR, "Affinitisation failed - invalid coreid %i\n",
+					coreid);
+		return;
+	}
+
+	if (get_driver_name(internals, driver)) {
+		AF_XDP_LOG(ERR, "Error retrieving driver name for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (generate_search_regex(internals, driver, netdev_qid, &r)) {
+		AF_XDP_LOG(ERR, "Error generating search regex for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (get_interrupt_number(internals, &r, &interrupt)) {
+		AF_XDP_LOG(ERR, "Error getting interrupt number for %s\n",
+					internals->if_name);
+		return;
+	}
+
+	if (set_irq_affinity(internals, coreid, rx_queue_id, netdev_qid,
+				interrupt)) {
+		AF_XDP_LOG(ERR, "Error setting interrupt affinity for %s\n",
+					internals->if_name);
+		return;
+	}
+}
+
 static int
 eth_rx_queue_setup(struct rte_eth_dev *dev,
 		   uint16_t rx_queue_id,
@@ -697,6 +1005,8 @@  eth_rx_queue_setup(struct rte_eth_dev *dev,
 		goto err;
 	}
 
+	configure_irqs(internals, rx_queue_id);
+
 	rxq->fds[0].fd = xsk_socket__fd(rxq->xsk);
 	rxq->fds[0].events = POLLIN;
 
@@ -834,6 +1144,39 @@  parse_name_arg(const char *key __rte_unused,
 	return 0;
 }
 
+/** parse queue irq argument */
+static int
+parse_queue_irq_arg(const char *key __rte_unused,
+		   const char *value, void *extra_args)
+{
+	int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT] = extra_args;
+	char *parse_str = strdup(value);
+	char delimiter[] = ":";
+	char *queue_str;
+
+	queue_str = strtok(parse_str, delimiter);
+	if (queue_str != NULL && strncmp(queue_str, value, strlen(value))) {
+		char *end;
+		long queue = strtol(queue_str, &end, 10);
+
+		if (*end == '\0' && queue >= 0 &&
+				queue < RTE_MAX_QUEUES_PER_PORT) {
+			char *core_str = strtok(NULL, delimiter);
+			long core = strtol(core_str, &end, 10);
+
+			if (*end == '\0' && core >= 0 && core < get_nprocs()) {
+				(*queue_irqs)[queue] = core;
+				free(parse_str);
+				return 0;
+			}
+		}
+	}
+
+	AF_XDP_LOG(ERR, "Invalid queue_irq argument.\n");
+	free(parse_str);
+	return -1;
+}
+
 static int
 xdp_get_channels_info(const char *if_name, int *max_queues,
 				int *combined_queues)
@@ -877,7 +1220,8 @@  xdp_get_channels_info(const char *if_name, int *max_queues,
 
 static int
 parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
-			int *queue_cnt, int *pmd_zc)
+			int *queue_cnt, int *pmd_zc,
+			int (*queue_irqs)[RTE_MAX_QUEUES_PER_PORT])
 {
 	int ret;
 
@@ -903,6 +1247,11 @@  parse_parameters(struct rte_kvargs *kvlist, char *if_name, int *start_queue,
 	if (ret < 0)
 		goto free_kvlist;
 
+	ret = rte_kvargs_process(kvlist, ETH_AF_XDP_QUEUE_IRQ_ARG,
+				 &parse_queue_irq_arg, queue_irqs);
+	if (ret < 0)
+		goto free_kvlist;
+
 free_kvlist:
 	rte_kvargs_free(kvlist);
 	return ret;
@@ -940,7 +1289,8 @@  get_iface_info(const char *if_name,
 
 static struct rte_eth_dev *
 init_internals(struct rte_vdev_device *dev, const char *if_name,
-			int start_queue_idx, int queue_cnt, int pmd_zc)
+			int start_queue_idx, int queue_cnt, int pmd_zc,
+			int queue_irqs[RTE_MAX_QUEUES_PER_PORT])
 {
 	const char *name = rte_vdev_device_name(dev);
 	const unsigned int numa_node = dev->device.numa_node;
@@ -957,6 +1307,8 @@  init_internals(struct rte_vdev_device *dev, const char *if_name,
 	internals->queue_cnt = queue_cnt;
 	internals->pmd_zc = pmd_zc;
 	strlcpy(internals->if_name, if_name, IFNAMSIZ);
+	rte_memcpy(internals->queue_irqs, queue_irqs,
+			sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
 
 	if (xdp_get_channels_info(if_name, &internals->max_queue_cnt,
 				  &internals->combined_queue_cnt)) {
@@ -1035,6 +1387,9 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	struct rte_eth_dev *eth_dev = NULL;
 	const char *name;
 	int pmd_zc = 0;
+	int queue_irqs[RTE_MAX_QUEUES_PER_PORT];
+
+	memset(queue_irqs, -1, sizeof(int) * RTE_MAX_QUEUES_PER_PORT);
 
 	AF_XDP_LOG(INFO, "Initializing pmd_af_xdp for %s\n",
 		rte_vdev_device_name(dev));
@@ -1062,7 +1417,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 		dev->device.numa_node = rte_socket_id();
 
 	if (parse_parameters(kvlist, if_name, &xsk_start_queue_idx,
-			     &xsk_queue_cnt, &pmd_zc) < 0) {
+			     &xsk_queue_cnt, &pmd_zc, &queue_irqs) < 0) {
 		AF_XDP_LOG(ERR, "Invalid kvargs value\n");
 		return -EINVAL;
 	}
@@ -1073,7 +1428,7 @@  rte_pmd_af_xdp_probe(struct rte_vdev_device *dev)
 	}
 
 	eth_dev = init_internals(dev, if_name, xsk_start_queue_idx,
-					xsk_queue_cnt, pmd_zc);
+					xsk_queue_cnt, pmd_zc, queue_irqs);
 	if (eth_dev == NULL) {
 		AF_XDP_LOG(ERR, "Failed to init internals\n");
 		return -1;
@@ -1117,7 +1472,8 @@  RTE_PMD_REGISTER_PARAM_STRING(net_af_xdp,
 			      "iface=<string> "
 			      "start_queue=<int> "
 			      "queue_count=<int> "
-			      "pmd_zero_copy=<0|1>");
+			      "pmd_zero_copy=<0|1> "
+			      "queue_irq=<int>:<int>");
 
 RTE_INIT(af_xdp_init_log)
 {