[dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue

Thomas Monjalon thomas.monjalon at 6wind.com
Wed Oct 12 17:31:12 CEST 2016


Sorry guys, you lost me in the discussion.

Is there some regression only on ARM?
Does it need some work specifically on memcpy for ARM,
or vhost for ARM?
Who can work on ARM optimization?

More comments below.

2016-10-12 12:22, Wang, Zhihong:
> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> > > It's an ARM server.
> > >
> > > >> >  3. How many percentage of drop are you seeing?
> > > The testing result:
> > > size (bytes)     improvement (%)
> > > 64                   3.92
> > > 128                 11.51
> > > 256                  24.16
> > > 512                  -13.79
> > > 1024                -22.51
> > > 1500                -12.22
> > > A correction is that performance is dropping if byte size is larger than 512.
> > 
> > I have thought of this twice. Unfortunately, I think I may need NACK this
> > series.
> > 
> > Merging two code path into one is really good: as you stated, it improves
> > the maintainability. But only if we see no performance regression on both
> > path after the refactor. Unfortunately, that's not the case here: it hurts
> > the performance for one code path (non-mrg Rx).

+1

> > That makes me think we may should not do the code path merge at all. I think
> > that also aligns with what you have said before (internally): we could do the
> > merge if it gives comparable performance before and after that.
> > 
> > Besides that, I don't quite like the way you did in patch 2 (rewrite enqueue):
> > you made a lot of changes in one patch. That means if something wrong
> > happened,
> > it is hard to narrow down which change introduces that regression. Badly,
> > that's exactly what we met here. Weeks have been passed, I see no progress.

+1, it is important to have simple patches making changes step by step.

> > That's the reason we like the idea of "one patch only does one thing, an
> > atomic thing".
> 
> 
> Yuanhan, folks,
> 
> Thanks for the analysis. I disagree here though.
> 
> I analyze, develop, benchmark on x86 platforms, where this patch
> works great.
> 
> I've been trying to analyze on ARM too but it takes time and I've
> had a schedule. Also since the ARM perf issue comes when it's
> v6 already, I might not be able to make it in time. However
> that's what I have to do for this patch to be merged in this
> or the next release.
> 
> In the meantime, may I suggest we consider the possibility to
> have dedicated codes for **perf critical paths** for different
> kinds of architecture?

Yes that's what we do in several parts of DPDK.

> It can be hard for a person to have both the knowledge and the
> development environment for multiple archs at the same time.

Yes we do not expect you work on ARM.
So if nobody work on the ARM issue, you could make 2 code paths
in order to allow your optimization for x86 only.
But that's not the preferred way.
And you must split your rework to better identify which part is
a regression on ARM.

> Moreover, different optimization techniques might be required for
> different archs, so it's hard and unnecessary to make a function
> works for all archs, sometimes it's just not the right thing to do.

Yes sometimes. Please help us to be convinced for this case.

> > So I will apply the first patch (it's a bug fixing patch) and ask you to
> > refactor the rest, without the code path merge.
> > 
> > I think we could still have a good maintainability code base if we introduce
> > more common helper functions that can be used on both Rx path, or even on
> > Tx path (such as update_used_ring, or shadow_used_ring).

Yes it is a good step.
And the code path merge could be reconsidered later.

> > It's a bit late for too many changes for v16.11. I think you could just
> > grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path,
> > if that also helps the performance? Let us handle the left in next release,
> > such as shadow used ring.

Thank you


More information about the dev mailing list