[PATCH] net/mlx5: fix Rx queue recovery mechanism
Amiya Mohakud
amohakud at paloaltonetworks.com
Fri Sep 16 20:20:29 CEST 2022
Thank you for the clarification.
On Sun, Sep 11, 2022 at 3:18 AM Asaf Penso <asafp at nvidia.com> wrote:
> Hello Amiya,
>
>
>
> This fix is for an issue with the error recovery mechanism.
>
>
>
> I assume the use case you’re referring to is working with rx_vec burst +
> cqe zipping enabled + error recovery.
>
> If that’s the case, we mentioned in our mlx5.rst that the recovery is not
> supported.
>
> Since there is no way to discable the recovery in our PMD, we suggested to
> disable cqe zipping.
>
>
>
> However, these days, we are working on a patch to allow the above
> combination.
>
> It should be sent to the ML in a week or two, once we fully finish testing
> it.
>
>
>
> Regards,
>
> Asaf Penso
>
>
>
> *From:* Amiya Mohakud <amohakud at paloaltonetworks.com>
> *Sent:* Wednesday, September 7, 2022 9:31 AM
> *To:* NBU-Contact-Thomas Monjalon (EXTERNAL) <thomas at monjalon.net>
> *Cc:* dev <dev at dpdk.org>; security at dpdk.org; Matan Azrad <matan at nvidia.com>;
> stable at dpdk.org; Alexander Kozyrev <akozyrev at nvidia.com>; Slava Ovsiienko
> <viacheslavo at nvidia.com>; Shahaf Shuler <shahafs at nvidia.com>
> *Subject:* Re: [PATCH] net/mlx5: fix Rx queue recovery mechanism
>
>
>
> Hi All,
>
>
>
> I would need some confirmation on this patch.
>
> For some earlier issues encountered on mlx5, we have disable cqe_comp in
> the mlx5 driver. In that case, do we still need this fix or disabling
> cqe_comp will take care of it as well?
>
>
>
> Regards
>
> Amiya
>
>
>
> On Mon, Aug 29, 2022 at 8:45 PM Thomas Monjalon <thomas at monjalon.net>
> wrote:
>
> From: Matan Azrad <matan at nvidia.com>
>
> The local variables are getting inconsistent in data receiving routines
> after queue error recovery.
> Receive queue consumer index is getting wrong, need to reset one to the
> size of the queue (as RQ was fully replenished in recovery procedure).
>
> In MPRQ case, also the local consumed strd variable should be reset.
>
> CVE-2022-28199
> Fixes: 88c0733535d6 ("net/mlx5: extend Rx completion with error handling")
> Cc: stable at dpdk.org
>
> Signed-off-by: Alexander Kozyrev <akozyrev at nvidia.com>
> Signed-off-by: Matan Azrad <matan at nvidia.com>
> ---
>
> Already applied in main branch as part of the public disclosure process.
>
> ---
> drivers/net/mlx5/mlx5_rx.c | 34 ++++++++++++++++++++++++----------
> 1 file changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index bb3ccc36e5..917c517b83 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -408,6 +408,11 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
> *rxq->rq_db = rte_cpu_to_be_32(rxq->rq_ci);
> }
>
> +/* Must be negative. */
> +#define MLX5_ERROR_CQE_RET (-1)
> +/* Must not be negative. */
> +#define MLX5_RECOVERY_ERROR_RET 0
> +
> /**
> * Handle a Rx error.
> * The function inserts the RQ state to reset when the first error CQE is
> @@ -422,7 +427,7 @@ mlx5_rxq_initialize(struct mlx5_rxq_data *rxq)
> * 0 when called from non-vectorized Rx burst.
> *
> * @return
> - * -1 in case of recovery error, otherwise the CQE status.
> + * MLX5_RECOVERY_ERROR_RET in case of recovery error, otherwise the CQE
> status.
> */
> int
> mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t vec)
> @@ -451,7 +456,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
> sm.queue_id = rxq->idx;
> sm.state = IBV_WQS_RESET;
> if (mlx5_queue_state_modify(RXQ_DEV(rxq_ctrl), &sm))
> - return -1;
> + return MLX5_RECOVERY_ERROR_RET;
> if (rxq_ctrl->dump_file_n <
> RXQ_PORT(rxq_ctrl)->config.max_dump_files_num) {
> MKSTR(err_str, "Unexpected CQE error syndrome "
> @@ -491,7 +496,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
> sm.queue_id = rxq->idx;
> sm.state = IBV_WQS_RDY;
> if (mlx5_queue_state_modify(RXQ_DEV(rxq_ctrl),
> &sm))
> - return -1;
> + return MLX5_RECOVERY_ERROR_RET;
> if (vec) {
> const uint32_t elts_n =
> mlx5_rxq_mprq_enabled(rxq) ?
> @@ -519,7 +524,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
>
> rte_pktmbuf_free_seg
> (*elt);
> }
> - return -1;
> + return
> MLX5_RECOVERY_ERROR_RET;
> }
> }
> for (i = 0; i < (int)elts_n; ++i) {
> @@ -538,7 +543,7 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
> }
> return ret;
> default:
> - return -1;
> + return MLX5_RECOVERY_ERROR_RET;
> }
> }
>
> @@ -556,7 +561,9 @@ mlx5_rx_err_handle(struct mlx5_rxq_data *rxq, uint8_t
> vec)
> * written.
> *
> * @return
> - * 0 in case of empty CQE, otherwise the packet size in bytes.
> + * 0 in case of empty CQE, MLX5_ERROR_CQE_RET in case of error CQE,
> + * otherwise the packet size in regular RxQ, and striding byte
> + * count format in mprq case.
> */
> static inline int
> mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cqe,
> @@ -623,8 +630,8 @@ mlx5_rx_poll_len(struct mlx5_rxq_data *rxq, volatile
> struct mlx5_cqe *cqe,
> rxq->err_state)) {
> ret = mlx5_rx_err_handle(rxq, 0);
> if (ret == MLX5_CQE_STATUS_HW_OWN
> ||
> - ret == -1)
> - return 0;
> + ret == MLX5_RECOVERY_ERROR_RET)
> + return MLX5_ERROR_CQE_RET;
> } else {
> return 0;
> }
> @@ -869,8 +876,10 @@ mlx5_rx_burst(void *dpdk_rxq, struct rte_mbuf **pkts,
> uint16_t pkts_n)
> if (!pkt) {
> cqe = &(*rxq->cqes)[rxq->cq_ci & cqe_cnt];
> len = mlx5_rx_poll_len(rxq, cqe, cqe_cnt, &mcqe);
> - if (!len) {
> + if (len <= 0) {
> rte_mbuf_raw_free(rep);
> + if (unlikely(len == MLX5_ERROR_CQE_RET))
> + rq_ci = rxq->rq_ci << sges_n;
> break;
> }
> pkt = seg;
> @@ -1093,8 +1102,13 @@ mlx5_rx_burst_mprq(void *dpdk_rxq, struct rte_mbuf
> **pkts, uint16_t pkts_n)
> }
> cqe = &(*rxq->cqes)[rxq->cq_ci & cq_mask];
> ret = mlx5_rx_poll_len(rxq, cqe, cq_mask, &mcqe);
> - if (!ret)
> + if (ret == 0)
> break;
> + if (unlikely(ret == MLX5_ERROR_CQE_RET)) {
> + rq_ci = rxq->rq_ci;
> + consumed_strd = rxq->consumed_strd;
> + break;
> + }
> byte_cnt = ret;
> len = (byte_cnt & MLX5_MPRQ_LEN_MASK) >>
> MLX5_MPRQ_LEN_SHIFT;
> MLX5_ASSERT((int)len >= (rxq->crc_present << 2));
> --
> 2.36.1
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mails.dpdk.org/archives/stable/attachments/20220916/2b18bcdd/attachment.htm>
More information about the stable
mailing list