[dpdk-dev] net/mlx5: fix number of segment calculation

Message ID 1510243472-24872-1-git-send-email-orika@mellanox.com (mailing list archive)
State Not Applicable, archived
Delegated to: Ferruh Yigit
Headers

Checks

Context Check Description
ci/checkpatch success coding style OK
ci/Intel-compilation success Compilation OK

Commit Message

Ori Kam Nov. 9, 2017, 4:04 p.m. UTC
  The CRC size should be taken into consideration when computing
the number of mbuf segments for packet on the receive path.
Large packets can be dropped due to extra CRC length.

Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
Cc: stable@dpdk.org
Cc: nelio.laranjeiro@6wind.com

Signed-off-by: Ori Kam <orika@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxq.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)
  

Comments

Yongseok Koh Nov. 9, 2017, 10:30 p.m. UTC | #1
On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> The CRC size should be taken into consideration when computing
> the number of mbuf segments for packet on the receive path.
> Large packets can be dropped due to extra CRC length.
> 
> Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
> Cc: stable@dpdk.org
> Cc: nelio.laranjeiro@6wind.com
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_rxq.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 6b29aae..701925b 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -887,6 +887,8 @@ struct mlx5_rxq_ctrl*
>  	const uint16_t desc_n =
>  		desc + priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP;
>  	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
> +	uint8_t crc_size =
> +			!!(dev->data->dev_conf.rxmode.hw_strip_crc == 0) << 2;

How about making it more explicit with ETHER_CRC_LEN? E.g.
	uint8_t crc_size = ETHER_CRC_LEN * 
			   (dev->data->dev_conf.rxmode.hw_strip_crc == 0);

>  
>  	tmpl = rte_calloc_socket("RXQ", 1,
>  				 sizeof(*tmpl) +
> @@ -900,12 +902,13 @@ struct mlx5_rxq_ctrl*
>  	/* Enable scattered packets support for this queue if necessary. */
>  	assert(mb_len >= RTE_PKTMBUF_HEADROOM);

You might want to make the same change for this assert?

>  	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
> -	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
> +	    (mb_len - RTE_PKTMBUF_HEADROOM - crc_size)) {
>  		tmpl->rxq.sges_n = 0;
>  	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
>  		unsigned int size =
>  			RTE_PKTMBUF_HEADROOM +
> -			dev->data->dev_conf.rxmode.max_rx_pkt_len;
> +			dev->data->dev_conf.rxmode.max_rx_pkt_len +
> +			crc_size;

I think there's another bugs we didn't know. If scatter is required,
RTE_PKTMBUF_HEADROOM is also reserved per every chained mbufs. So, it looks like
mb_len should be "rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM" when it
is declared in the beginning. Make sense?

> 		/*
> 		 * Determine the number of SGEs needed for a full packet
> 		 * and round it to the next power of two.
> 		 */
> 		sges_n = log2above((size / mb_len) + !!(size % mb_len));
> 		tmpl->rxq.sges_n = sges_n;

rxq.sges_n is 2bits, which means the max value is 3. So, if sges_n is larger
than 3, it would just take the last 2bits and it will result in false error
below. As we can't use sizeof() for bit-fields, this should be changed like:
		
 		/* Check the maximum value of the bit-field. */
 		tmpl->rxq.sges_n = -1;
 		tmpl->rxq.sges_n = RTE_MIN(tmpl->rxq.sges_n, sges_n);

> 		/* Make sure rxq.sges_n did not overflow. */
> 		size = mb_len * (1 << tmpl->rxq.sges_n);
> 		size -= RTE_PKTMBUF_HEADROOM;
> 		if (size < dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> 			ERROR("%p: too many SGEs (%u) needed to handle"
> 			      " requested maximum packet size %u",
> 			      (void *)dev,
> 			      1 << sges_n,
> 			      dev->data->dev_conf.rxmode.max_rx_pkt_len);
> 			goto error;
> 		}

This may be unnecessary if we make right changes?


Thanks,
Yongseok
  
Adrien Mazarguil Nov. 10, 2017, 10:06 a.m. UTC | #2
Hi Ori,

On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> The CRC size should be taken into consideration when computing
> the number of mbuf segments for packet on the receive path.
> Large packets can be dropped due to extra CRC length.
> 
> Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
> Cc: stable@dpdk.org
> Cc: nelio.laranjeiro@6wind.com
> 
> Signed-off-by: Ori Kam <orika@mellanox.com>

I don't think there's an issue to fix, there's actually a reason it's done
that way, perhaps I'm wrong but let me elaborate.

When applications request CRC to be written to mbuf (more precisely not to
be stripped), its extra 4 bytes are neither part of mbuf->pkt_len nor
mbuf->data_len. It just happens to be written past mbuf data if there's room
for it, where applications knowingly expect it based on how they configured
the PMD. That's the API.

This implies applications also size mbufs accordingly; if they don't provide
room for the CRC, it can't be written. This extra room is assumed to be part
of max_rx_pkt_len. When CRC stripping is requested, they do not have to
provide such room (IBV_WQ_FLAGS_SCATTER_FCS is not set on mlx5 Rx queues).

One problem with your proposal is assuming all segments are consumed
entirely during Rx and max_rx_pkt_len is reached, another segment with zero
data length gets appended just to hold the CRC. Applications may interpret
this as a bug.

Another problem is this doesn't solve the issue when Rx scatter is disabled
although it's no different from when packet data consumes all segments
entirely and there's no room left for the CRC. If it's that important, the
PMD should fail to create the Rx queue in that case as well.

> ---
>  drivers/net/mlx5/mlx5_rxq.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> index 6b29aae..701925b 100644
> --- a/drivers/net/mlx5/mlx5_rxq.c
> +++ b/drivers/net/mlx5/mlx5_rxq.c
> @@ -887,6 +887,8 @@ struct mlx5_rxq_ctrl*
>  	const uint16_t desc_n =
>  		desc + priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP;
>  	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
> +	uint8_t crc_size =
> +			!!(dev->data->dev_conf.rxmode.hw_strip_crc == 0) << 2;
>  
>  	tmpl = rte_calloc_socket("RXQ", 1,
>  				 sizeof(*tmpl) +
> @@ -900,12 +902,13 @@ struct mlx5_rxq_ctrl*
>  	/* Enable scattered packets support for this queue if necessary. */
>  	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
>  	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
> -	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
> +	    (mb_len - RTE_PKTMBUF_HEADROOM - crc_size)) {
>  		tmpl->rxq.sges_n = 0;
>  	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
>  		unsigned int size =
>  			RTE_PKTMBUF_HEADROOM +
> -			dev->data->dev_conf.rxmode.max_rx_pkt_len;
> +			dev->data->dev_conf.rxmode.max_rx_pkt_len +
> +			crc_size;
>  		unsigned int sges_n;
>  
>  		/*
> -- 
> 1.7.1
>
  
Adrien Mazarguil Nov. 10, 2017, 10:22 a.m. UTC | #3
Hi Yongseok,

On Thu, Nov 09, 2017 at 02:30:30PM -0800, Yongseok Koh wrote:
> On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> > The CRC size should be taken into consideration when computing
> > the number of mbuf segments for packet on the receive path.
> > Large packets can be dropped due to extra CRC length.
> > 
> > Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
> > Cc: stable@dpdk.org
> > Cc: nelio.laranjeiro@6wind.com
> > 
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> > ---
> >  drivers/net/mlx5/mlx5_rxq.c |    7 +++++--
> >  1 files changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > index 6b29aae..701925b 100644
> > --- a/drivers/net/mlx5/mlx5_rxq.c
> > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > @@ -887,6 +887,8 @@ struct mlx5_rxq_ctrl*
> >  	const uint16_t desc_n =
> >  		desc + priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP;
> >  	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
> > +	uint8_t crc_size =
> > +			!!(dev->data->dev_conf.rxmode.hw_strip_crc == 0) << 2;
> 
> How about making it more explicit with ETHER_CRC_LEN? E.g.
> 	uint8_t crc_size = ETHER_CRC_LEN * 
> 			   (dev->data->dev_conf.rxmode.hw_strip_crc == 0);
> 
> >  
> >  	tmpl = rte_calloc_socket("RXQ", 1,
> >  				 sizeof(*tmpl) +
> > @@ -900,12 +902,13 @@ struct mlx5_rxq_ctrl*
> >  	/* Enable scattered packets support for this queue if necessary. */
> >  	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
> 
> You might want to make the same change for this assert?
> 
> >  	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
> > -	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
> > +	    (mb_len - RTE_PKTMBUF_HEADROOM - crc_size)) {
> >  		tmpl->rxq.sges_n = 0;
> >  	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
> >  		unsigned int size =
> >  			RTE_PKTMBUF_HEADROOM +
> > -			dev->data->dev_conf.rxmode.max_rx_pkt_len;
> > +			dev->data->dev_conf.rxmode.max_rx_pkt_len +
> > +			crc_size;
> 
> I think there's another bugs we didn't know. If scatter is required,
> RTE_PKTMBUF_HEADROOM is also reserved per every chained mbufs. So, it looks like
> mb_len should be "rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM" when it
> is declared in the beginning. Make sense?

RTE_PKTMBUF_HEADROOM is actually only reserved on the first segment,
i.e. once per mbuf chain, it should be fine.

> > 		/*
> > 		 * Determine the number of SGEs needed for a full packet
> > 		 * and round it to the next power of two.
> > 		 */
> > 		sges_n = log2above((size / mb_len) + !!(size % mb_len));
> > 		tmpl->rxq.sges_n = sges_n;
> 
> rxq.sges_n is 2bits, which means the max value is 3. So, if sges_n is larger
> than 3, it would just take the last 2bits and it will result in false error
> below. As we can't use sizeof() for bit-fields, this should be changed like:

The name is perhaps confusing, sges_n is documented as a log 2 value, 1 << 3
means 8 segments at most. Assuming default mbuf size, this allows up to
17280 bytes per packet excluding headroom.

You're right exceeding 3 will remove the extra bits and since sizeof() can't
be used, that's precisely the reason for the subsequent check, which makes
sure the stored value is enough for a max_rx_pkt_len-sized packet after
converting it back to a number of bytes.

> 		
>  		/* Check the maximum value of the bit-field. */
>  		tmpl->rxq.sges_n = -1;
>  		tmpl->rxq.sges_n = RTE_MIN(tmpl->rxq.sges_n, sges_n);
> 
> > 		/* Make sure rxq.sges_n did not overflow. */
> > 		size = mb_len * (1 << tmpl->rxq.sges_n);
> > 		size -= RTE_PKTMBUF_HEADROOM;
> > 		if (size < dev->data->dev_conf.rxmode.max_rx_pkt_len) {
> > 			ERROR("%p: too many SGEs (%u) needed to handle"
> > 			      " requested maximum packet size %u",
> > 			      (void *)dev,
> > 			      1 << sges_n,
> > 			      dev->data->dev_conf.rxmode.max_rx_pkt_len);
> > 			goto error;
> > 		}
> 
> This may be unnecessary if we make right changes?

I think it has to be kept as a safety check even if the max number of SGEs
is increased, at least as long as it's stored as a bit-field value.
  
Yongseok Koh Nov. 10, 2017, 9:22 p.m. UTC | #4
On Fri, Nov 10, 2017 at 11:06:25AM +0100, Adrien Mazarguil wrote:
> Hi Ori,
> 
> On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> > The CRC size should be taken into consideration when computing
> > the number of mbuf segments for packet on the receive path.
> > Large packets can be dropped due to extra CRC length.
> > 
> > Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
> > Cc: stable@dpdk.org
> > Cc: nelio.laranjeiro@6wind.com
> > 
> > Signed-off-by: Ori Kam <orika@mellanox.com>
> 
> I don't think there's an issue to fix, there's actually a reason it's done
> that way, perhaps I'm wrong but let me elaborate.
> 
> When applications request CRC to be written to mbuf (more precisely not to
> be stripped), its extra 4 bytes are neither part of mbuf->pkt_len nor
> mbuf->data_len. It just happens to be written past mbuf data if there's room
> for it, where applications knowingly expect it based on how they configured
> the PMD. That's the API.
>
> This implies applications also size mbufs accordingly; if they don't provide
> room for the CRC, it can't be written. This extra room is assumed to be part
> of max_rx_pkt_len. When CRC stripping is requested, they do not have to
> provide such room (IBV_WQ_FLAGS_SCATTER_FCS is not set on mlx5 Rx queues).

I looked around other driver/example codes as it is not documented (or too
obvious to do?), it looks there's consensus that max_rx_pkt_len includes 4B FCS.
Then, I agree that PMD doesn't need to care about this.

> One problem with your proposal is assuming all segments are consumed
> entirely during Rx and max_rx_pkt_len is reached, another segment with zero
> data length gets appended just to hold the CRC. Applications may interpret
> this as a bug.

I don't think this patch causes the issue. It just unnecessarily reserves extra
4B room if CRC strip is disabled. And even apps should not interpret this as a
bug because apps requested to have CRC.
Currently mlx5_rx_busrt() doesn't allow this situation (putting only 4B CRC in
the last segment) because it subtracts ETHER_CRC_LEN from pkt_len if CRC isn't
stripped. And it is done before looking for the next segment. I think this is a
problem to fix on the contrary - app wanted to see CRC but it's not there.
Right?

Thanks,
Yongseok
  
Yongseok Koh Nov. 10, 2017, 9:42 p.m. UTC | #5
On Fri, Nov 10, 2017 at 11:22:06AM +0100, Adrien Mazarguil wrote:
> Hi Yongseok,
> 
> On Thu, Nov 09, 2017 at 02:30:30PM -0800, Yongseok Koh wrote:
> > On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> > > The CRC size should be taken into consideration when computing
> > > the number of mbuf segments for packet on the receive path.
> > > Large packets can be dropped due to extra CRC length.
> > > 
> > > Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx queues")
> > > Cc: stable@dpdk.org
> > > Cc: nelio.laranjeiro@6wind.com
> > > 
> > > Signed-off-by: Ori Kam <orika@mellanox.com>
> > > ---
> > >  drivers/net/mlx5/mlx5_rxq.c |    7 +++++--
> > >  1 files changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
> > > index 6b29aae..701925b 100644
> > > --- a/drivers/net/mlx5/mlx5_rxq.c
> > > +++ b/drivers/net/mlx5/mlx5_rxq.c
> > > @@ -887,6 +887,8 @@ struct mlx5_rxq_ctrl*
> > >  	const uint16_t desc_n =
> > >  		desc + priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP;
> > >  	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
> > > +	uint8_t crc_size =
> > > +			!!(dev->data->dev_conf.rxmode.hw_strip_crc == 0) << 2;
> > 
> > How about making it more explicit with ETHER_CRC_LEN? E.g.
> > 	uint8_t crc_size = ETHER_CRC_LEN * 
> > 			   (dev->data->dev_conf.rxmode.hw_strip_crc == 0);
> > 
> > >  
> > >  	tmpl = rte_calloc_socket("RXQ", 1,
> > >  				 sizeof(*tmpl) +
> > > @@ -900,12 +902,13 @@ struct mlx5_rxq_ctrl*
> > >  	/* Enable scattered packets support for this queue if necessary. */
> > >  	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
> > 
> > You might want to make the same change for this assert?
> > 
> > >  	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
> > > -	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
> > > +	    (mb_len - RTE_PKTMBUF_HEADROOM - crc_size)) {
> > >  		tmpl->rxq.sges_n = 0;
> > >  	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
> > >  		unsigned int size =
> > >  			RTE_PKTMBUF_HEADROOM +
> > > -			dev->data->dev_conf.rxmode.max_rx_pkt_len;
> > > +			dev->data->dev_conf.rxmode.max_rx_pkt_len +
> > > +			crc_size;
> > 
> > I think there's another bugs we didn't know. If scatter is required,
> > RTE_PKTMBUF_HEADROOM is also reserved per every chained mbufs. So, it looks like
> > mb_len should be "rte_pktmbuf_data_room_size(mp) - RTE_PKTMBUF_HEADROOM" when it
> > is declared in the beginning. Make sense?
> 
> RTE_PKTMBUF_HEADROOM is actually only reserved on the first segment,
> i.e. once per mbuf chain, it should be fine.

Right, I got confused with Tx. mlx5's Rx overwrites m->data_offset when it
injects mbufs for extra segments.

> > > 		/*
> > > 		 * Determine the number of SGEs needed for a full packet
> > > 		 * and round it to the next power of two.
> > > 		 */
> > > 		sges_n = log2above((size / mb_len) + !!(size % mb_len));
> > > 		tmpl->rxq.sges_n = sges_n;
> > 
> > rxq.sges_n is 2bits, which means the max value is 3. So, if sges_n is larger
> > than 3, it would just take the last 2bits and it will result in false error
> > below. As we can't use sizeof() for bit-fields, this should be changed like:
> 
> The name is perhaps confusing, sges_n is documented as a log 2 value, 1 << 3
> means 8 segments at most. Assuming default mbuf size, this allows up to
> 17280 bytes per packet excluding headroom.
> 
> You're right exceeding 3 will remove the extra bits and since sizeof() can't
> be used, that's precisely the reason for the subsequent check, which makes
> sure the stored value is enough for a max_rx_pkt_len-sized packet after
> converting it back to a number of bytes.

The name wasn't confusing, I wanted to make it clearer as I thought it could
have some false negatives. But, I misread something. The sanity check can
correctly filter those cases. No bug here!

Thanks,
Yongseok
  
Ori Kam Nov. 12, 2017, 7:08 a.m. UTC | #6
Hi,

Thanks Adrien and Yongseok for your review.

Due to Adrien comment I'm withdrawing the patch.


Thanks again,
Ori Kam

> -----Original Message-----
> From: Yongseok Koh
> Sent: Friday, November 10, 2017 11:23 PM
> To: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Cc: Ori Kam <orika@mellanox.com>; Nélio Laranjeiro
> <nelio.laranjeiro@6wind.com>; dev@dpdk.org; stable@dpdk.org
> Subject: Re: [PATCH] net/mlx5: fix number of segment calculation
> 
> On Fri, Nov 10, 2017 at 11:06:25AM +0100, Adrien Mazarguil wrote:
> > Hi Ori,
> >
> > On Thu, Nov 09, 2017 at 06:04:32PM +0200, Ori Kam wrote:
> > > The CRC size should be taken into consideration when computing the
> > > number of mbuf segments for packet on the receive path.
> > > Large packets can be dropped due to extra CRC length.
> > >
> > > Fixes: a1366b1a2be3 ("net/mlx5: add reference counter on DPDK Rx
> > > queues")
> > > Cc: stable@dpdk.org
> > > Cc: nelio.laranjeiro@6wind.com
> > >
> > > Signed-off-by: Ori Kam <orika@mellanox.com>
> >
> > I don't think there's an issue to fix, there's actually a reason it's
> > done that way, perhaps I'm wrong but let me elaborate.
> >
> > When applications request CRC to be written to mbuf (more precisely
> > not to be stripped), its extra 4 bytes are neither part of
> > mbuf->pkt_len nor
> > mbuf->data_len. It just happens to be written past mbuf data if
> > mbuf->there's room
> > for it, where applications knowingly expect it based on how they
> > configured the PMD. That's the API.
> >
> > This implies applications also size mbufs accordingly; if they don't
> > provide room for the CRC, it can't be written. This extra room is
> > assumed to be part of max_rx_pkt_len. When CRC stripping is requested,
> > they do not have to provide such room (IBV_WQ_FLAGS_SCATTER_FCS is
> not set on mlx5 Rx queues).
> 
> I looked around other driver/example codes as it is not documented (or too
> obvious to do?), it looks there's consensus that max_rx_pkt_len includes 4B
> FCS.
> Then, I agree that PMD doesn't need to care about this.
> 
> > One problem with your proposal is assuming all segments are consumed
> > entirely during Rx and max_rx_pkt_len is reached, another segment with
> > zero data length gets appended just to hold the CRC. Applications may
> > interpret this as a bug.
> 
> I don't think this patch causes the issue. It just unnecessarily reserves extra
> 4B room if CRC strip is disabled. And even apps should not interpret this as a
> bug because apps requested to have CRC.
> Currently mlx5_rx_busrt() doesn't allow this situation (putting only 4B CRC in
> the last segment) because it subtracts ETHER_CRC_LEN from pkt_len if CRC
> isn't stripped. And it is done before looking for the next segment. I think this
> is a problem to fix on the contrary - app wanted to see CRC but it's not there.
> Right?
> 
> Thanks,
> Yongseok
  

Patch

diff --git a/drivers/net/mlx5/mlx5_rxq.c b/drivers/net/mlx5/mlx5_rxq.c
index 6b29aae..701925b 100644
--- a/drivers/net/mlx5/mlx5_rxq.c
+++ b/drivers/net/mlx5/mlx5_rxq.c
@@ -887,6 +887,8 @@  struct mlx5_rxq_ctrl*
 	const uint16_t desc_n =
 		desc + priv->rx_vec_en * MLX5_VPMD_DESCS_PER_LOOP;
 	unsigned int mb_len = rte_pktmbuf_data_room_size(mp);
+	uint8_t crc_size =
+			!!(dev->data->dev_conf.rxmode.hw_strip_crc == 0) << 2;
 
 	tmpl = rte_calloc_socket("RXQ", 1,
 				 sizeof(*tmpl) +
@@ -900,12 +902,13 @@  struct mlx5_rxq_ctrl*
 	/* Enable scattered packets support for this queue if necessary. */
 	assert(mb_len >= RTE_PKTMBUF_HEADROOM);
 	if (dev->data->dev_conf.rxmode.max_rx_pkt_len <=
-	    (mb_len - RTE_PKTMBUF_HEADROOM)) {
+	    (mb_len - RTE_PKTMBUF_HEADROOM - crc_size)) {
 		tmpl->rxq.sges_n = 0;
 	} else if (dev->data->dev_conf.rxmode.enable_scatter) {
 		unsigned int size =
 			RTE_PKTMBUF_HEADROOM +
-			dev->data->dev_conf.rxmode.max_rx_pkt_len;
+			dev->data->dev_conf.rxmode.max_rx_pkt_len +
+			crc_size;
 		unsigned int sges_n;
 
 		/*