[PATCH] event/dlb2: add support for single 512B write of 4 QEs

McDaniel, Timothy timothy.mcdaniel at intel.com
Mon May 16 19:00:37 CEST 2022



> -----Original Message-----
> From: Jerin Jacob <jerinjacobk at gmail.com>
> Sent: Saturday, May 14, 2022 7:08 AM
> To: McDaniel, Timothy <timothy.mcdaniel at intel.com>; Richardson, Bruce
> <bruce.richardson at intel.com>; konstantin.v.ananyev at yandex.ru
> Cc: Jerin Jacob <jerinj at marvell.com>; dpdk-dev <dev at dpdk.org>
> Subject: Re: [PATCH] event/dlb2: add support for single 512B write of 4 QEs
> 
> On Sat, Apr 9, 2022 at 8:48 PM Timothy McDaniel
> <timothy.mcdaniel at intel.com> wrote:
> >
> > On Xeon, as 512b accesses are available, movdir64 instruction is able to
> > perform 512b read and write to DLB producer port. In order for movdir64
> > to be able to pull its data from store buffers (store-buffer-forwarding)
> > (before actual write), data should be in single 512b write format.
> > This commit add change when code is built for Xeon with 512b AVX support
> > to make single 512b write of all 4 QEs instead of 4x64b writes.
> >
> > Signed-off-by: Timothy McDaniel <timothy.mcdaniel at intel.com>
> > ---
> >  drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++---------
> >  1 file changed, 67 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
> > index 36f07d0061..e2a5303310 100644
> > --- a/drivers/event/dlb2/dlb2.c
> > +++ b/drivers/event/dlb2/dlb2.c
> > @@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port
> *qm_port,
> >                                                 ev[3].event_type,
> >                                              DLB2_QE_EV_TYPE_WORD + 4);
> >
> > -               /* Store the metadata to memory (use the double-precision
> > -                * _mm_storeh_pd because there is no integer function for
> > -                * storing the upper 64b):
> > -                * qe[0] metadata = sse_qe[0][63:0]
> > -                * qe[1] metadata = sse_qe[0][127:64]
> > -                * qe[2] metadata = sse_qe[1][63:0]
> > -                * qe[3] metadata = sse_qe[1][127:64]
> > -                */
> > -               _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]);
> > -               _mm_storeh_pd((double *)&qe[1].u.opaque_data,
> > -                             (__m128d)sse_qe[0]);
> > -               _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, sse_qe[1]);
> > -               _mm_storeh_pd((double *)&qe[3].u.opaque_data,
> > -                             (__m128d)sse_qe[1]);
> > -
> > -               qe[0].data = ev[0].u64;
> > -               qe[1].data = ev[1].u64;
> > -               qe[2].data = ev[2].u64;
> > -               qe[3].data = ev[3].u64;
> > + #ifdef __AVX512VL__
> 
> + x86 maintainers
> 
> We need a runtime check based on CPU flags. Right? As the build and
> run machine can be different?

Thanks Jerin. I will convert to a runtime check.


More information about the dev mailing list