[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Wang, Zhihong zhihong.wang at intel.com
Fri Sep 23 04:56:25 CEST 2016



> -----Original Message-----
> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> Sent: Thursday, September 22, 2016 10:42 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>
> Cc: Yuanhan Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> <maxime.coquelin at redhat.com>; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> 
> On 22 September 2016 at 18:04, Wang, Zhihong <zhihong.wang at intel.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> >> Sent: Thursday, September 22, 2016 5:02 PM
> >> To: Wang, Zhihong <zhihong.wang at intel.com>
> >> Cc: Yuanhan Liu <yuanhan.liu at linux.intel.com>; Maxime Coquelin
> >> <maxime.coquelin at redhat.com>; dev at dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >>
> >> On 22 September 2016 at 14:58, Wang, Zhihong <zhihong.wang at intel.com>
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Jianbo Liu [mailto:jianbo.liu at linaro.org]
> >> >> Sent: Thursday, September 22, 2016 1:48 PM
> >> >> To: Yuanhan Liu <yuanhan.liu at linux.intel.com>
> >> >> Cc: Wang, Zhihong <zhihong.wang at intel.com>; Maxime Coquelin
> >> >> <maxime.coquelin at redhat.com>; dev at dpdk.org
> >> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >> >>
> >> >> On 22 September 2016 at 10:29, Yuanhan Liu
> >> <yuanhan.liu at linux.intel.com>
> >> >> wrote:
> >> >> > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote:
> >> >> >> >> > My setup consists of one host running a guest.
> >> >> >> >> > The guest generates as much 64bytes packets as possible using
> >> >> >> >>
> >> >> >> >> Have you tested with other different packet size?
> >> >> >> >> My testing shows that performance is dropping when packet size is
> >> >> more
> >> >> >> >> than 256.
> >> >> >> >
> >> >> >> >
> >> >> >> > Hi Jianbo,
> >> >> >> >
> >> >> >> > Thanks for reporting this.
> >> >> >> >
> >> >> >> >  1. Are you running the vector frontend with mrg_rxbuf=off?
> >> >> >> >
> >> >> Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD.
> >> >>
> >> >> >> >  2. Could you please specify what CPU you're running? Is it Haswell
> >> >> >> >     or Ivy Bridge?
> >> >> >> >
> >> >> It's an ARM server.
> >> >>
> >> >> >> >  3. How many percentage of drop are you seeing?
> >> >> The testing result:
> >> >> size (bytes)     improvement (%)
> >> >> 64                   3.92
> >> >> 128                 11.51
> >> >> 256                  24.16
> >> >> 512                  -13.79
> >> >> 1024                -22.51
> >> >> 1500                -12.22
> >> >> A correction is that performance is dropping if byte size is larger than 512.
> >> >
> >> >
> >> > Jianbo,
> >> >
> >> > Could you please verify does this patch really cause enqueue perf to drop?
> >> >
> >> > You can test the enqueue path only by set guest to do rxonly, and compare
> >> > the mpps by show port stats all in the guest.
> >> >
> >> >
> >> Tested with testpmd, host: txonly, guest: rxonly
> >> size (bytes)     improvement (%)
> >> 64                    4.12
> >> 128                   6
> >> 256                   2.65
> >> 512                   -1.12
> >> 1024                 -7.02
> >
> >
> >
> > I think your number is little bit hard to understand for me, this patch's
> > optimization contains 2 parts:
> >
> >  1. ring operation: works for both mrg_rxbuf on and off
> >
> >  2. remote write ordering: works for mrg_rxbuf=on only
> >
> > So, for mrg_rxbuf=off, if this patch is good for 64B packets, then it
> > shouldn't do anything bad for larger packets.
> >
> > This is the gain on x86 platform: host iofwd between nic and vhost,
> > guest rxonly.
> >
> > nic2vm  enhancement
> > 64      21.83%
> > 128     16.97%
> > 256     6.34%
> > 512     0.01%
> > 1024    0.00%
> >
> I bootup a VM with 2 virtual port, and stress the traffic between them.
> First, I stressed with pktgen-dpdk in VM, and did iofwd in host.
> Then, as you told, I did rxonly in VM, and txonly in host.
> 
> > I suspect there's some complication in ARM's micro-arch.
> >
> > Could you try v6 and apply all patches except the the last one:
> > [PATCH v6 6/6] vhost: optimize cache access
> >
> > And see if there's still perf drop?
> >
> The last patch can improve the performance. The drop is actually
> caused by the second patch.


This is expected because the 2nd patch is just a baseline and all optimization
patches are organized in the rest of this patch set.

I think you can do bottleneck analysis on ARM to see what's slowing down the
perf, there might be some micro-arch complications there, mostly likely in
memcpy.

Do you use glibc's memcpy? I suggest to hand-crafted it on your own.

Could you publish the mrg_rxbuf=on data also? Since it's more widely used
in terms of spec integrity.


Thanks
Zhihong


> 
> Jianbo


More information about the dev mailing list