[PATCH] net/mlx5: fix Rx queue recovery mechanism

Amiya Mohakud amohakud at paloaltonetworks.com
Fri Sep 16 20:20:29 CEST 2022


Thank you for the clarification.

On Sun, Sep 11, 2022 at 3:18 AM Asaf Penso <asafp at nvidia.com> wrote:

> Hello Amiya,
>
>
>
> This fix is for an issue with the error recovery mechanism.
>
>
>
> I assume the use case you’re referring to is working with rx_vec burst +
> cqe zipping enabled + error recovery.
>
> If that’s the case, we mentioned in our mlx5.rst that the recovery is not
> supported.
>
> Since there is no way to discable the recovery in our PMD, we suggested to
> disable cqe zipping.
>
>
>
> However, these days, we are working on a patch to allow the above
> combination.
>
> It should be sent to the ML in a week or two, once we fully finish testing
> it.
>
>
>
> Regards,
>
> Asaf Penso
>
>
>
> *From:* Amiya Mohakud <amohakud at paloaltonetworks.com>
> *Sent:* Wednesday, September 7, 2022 9:31 AM
> *To:* NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>
> *Cc:* dev <dev at dpdk.org>; security at dpdk.org; Matan Azrad <matan at nvidia.com>;
> stable at dpdk.org; Alexander Kozyrev <akozyrev at nvidia.com>; Slava Ovsiienko
> <viacheslavo at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> *Subject:* Re: [PATCH] net/mlx5: fix Rx queue recovery mechanism
>
>
>
> Hi All,
>
>
>
> I would need some confirmation on this patch.
>
> For some earlier issues encountered on mlx5, we have disable cqe_comp in
> the mlx5 driver. In that case, do we still need this fix or disabling
> cqe_comp will take care of it as well?
>
>
>
> Regards
>
> Amiya
>
>
>
> On Mon, Aug 29, 2022 at 8:45 PM Thomas Monjalon <thomas at monjalon.net>
> wrote:
>
> From: Matan Azrad <matan at nvidia.com>
>
> The local variables are getting inconsistent in data receiving routines
> after queue error recovery.
> Receive queue consumer index is getting wrong, need to reset one to the
> size of the queue (as RQ was fully replenished in recovery procedure).
>
> In MPRQ case, also the local consumed strd variable should be reset.
>
> CVE-2022-28199
> Fixes: 88c0733535d6 ("net/mlx5: extend Rx completion with error handling")
> Cc: stable at dpdk.org
>
> Signed-off-by: Alexander Kozyrev <akozyrev at nvidia.com>
> Signed-off-by: Matan Azrad <matan at nvidia.com>
> ---
>
> Already applied in main branch as part of the public disclosure process.
>
> ---
>  drivers/net/mlx5/mlx5_rx.c | 34 ++++++++++++++++++++++++----------
>  1 file changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index bb3ccc36e5..917c517b83 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -408,6 +408,11 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
>         *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci);
>  }
>
> +/* Must be negative. */
> +#define MLX5_ERROR_CQE_RET (-1)
> +/* Must not be negative. */
> +#define MLX5_RECOVERY_ERROR_RET 0
> +
>  /**
>   * Handle a Rx error.
>   * The function inserts the RQ state to reset when the first error CQE is
> @@ -422,7 +427,7 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
>   *   0 when called from non-vectorized Rx burst.
>   *
>   * @return
> - *   -1 in case of recovery error, otherwise the CQE status.
> + *   MLX5_RECOVERY_ERROR_RET in case of recovery error, otherwise the CQE
> status.
>   */
>  int
>  mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec)
> @@ -451,7 +456,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>                 sm.queue_id = rxq->idx;
>                 sm.state = IBV_WQS_RESET;
>                 if (mlx5_queue_state_modify(RXQ_DEV(rxq_ctrl), &sm))
> -                       return -1;
> +                       return MLX5_RECOVERY_ERROR_RET;
>                 if (rxq_ctrl->dump_file_n <
>                     RXQ_PORT(rxq_ctrl)->config.max_dump_files_num) {
>                         MKSTR(err_str, "Unexpected CQE error syndrome "
> @@ -491,7 +496,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>                         sm.queue_id = rxq->idx;
>                         sm.state = IBV_WQS_RDY;
>                         if (mlx5_queue_state_modify(RXQ_DEV(rxq_ctrl),
> &sm))
> -                               return -1;
> +                               return MLX5_RECOVERY_ERROR_RET;
>                         if (vec) {
>                                 const uint32_t elts_n =
>                                         mlx5_rxq_mprq_enabled(rxq) ?
> @@ -519,7 +524,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>
> rte_pktmbuf_free_seg
>                                                                 (*elt);
>                                                 }
> -                                               return -1;
> +                                               return
> MLX5_RECOVERY_ERROR_RET;
>                                         }
>                                 }
>                                 for (i = 0; i < (int)elts_n; ++i) {
> @@ -538,7 +543,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>                 }
>                 return ret;
>         default:
> -               return -1;
> +               return MLX5_RECOVERY_ERROR_RET;
>         }
>  }
>
> @@ -556,7 +561,9 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>   *   written.
>   *
>   * @return
> - *   0 in case of empty CQE, otherwise the packet size in bytes.
> + *   0 in case of empty CQE, MLX5_ERROR_CQE_RET in case of error CQE,
> + *   otherwise the packet size in regular RxQ, and striding byte
> + *   count format in mprq case.
>   */
>  static inline int
>  mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
> @@ -623,8 +630,8 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile
> struct mlx5_cqe *cqe,
>                                              rxq->err_state)) {
>                                         ret = mlx5_rx_err_handle(rxq, 0);
>                                         if (ret == MLX5_CQE_STATUS_HW_OWN
> ||
> -                                           ret == -1)
> -                                               return 0;
> +                                           ret == MLX5_RECOVERY_ERROR_RET)
> +                                               return MLX5_ERROR_CQE_RET;
>                                 } else {
>                                         return 0;
>                                 }
> @@ -869,8 +876,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
> uint16_t pkts_n)
>                 if (!pkt) {
>                         cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_cnt];
>                         len = mlx5_rx_poll_len(rxq, cqe, cqe_cnt, &mcqe);
> -                       if (!len) {
> +                       if (len <= 0) {
>                                 rte_mbuf_raw_free(rep);
> +                               if (unlikely(len == MLX5_ERROR_CQE_RET))
> +                                       rq_ci = rxq->rq_ci << sges_n;
>                                 break;
>                         }
>                         pkt = seg;
> @@ -1093,8 +1102,13 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
>                 }
>                 cqe = &(*rxq->cqes)[rxq->cq_ci & cq_mask];
>                 ret = mlx5_rx_poll_len(rxq, cqe, cq_mask, &mcqe);
> -               if (!ret)
> +               if (ret == 0)
>                         break;
> +               if (unlikely(ret == MLX5_ERROR_CQE_RET)) {
> +                       rq_ci = rxq->rq_ci;
> +                       consumed_strd = rxq->consumed_strd;
> +                       break;
> +               }
>                 byte_cnt = ret;
>                 len = (byte_cnt & MLX5_MPRQ_LEN_MASK) >>
> MLX5_MPRQ_LEN_SHIFT;
>                 MLX5_ASSERT((int)len >= (rxq->crc_present << 2));
> --
> 2.36.1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/stable/attachments/20220916/2b18bcdd/attachment.htm>


More information about the stable mailing list