[dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support to the TX path

Wang, Zhihong zhihong.wang at intel.com
Fri Nov 4 11:43:16 CET 2016



> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> Sent: Friday, November 4, 2016 4:00 PM
> To: Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
> <yuanhan.liu at linux.intel.com>
> Cc: stephen at networkplumber.org; Pierre Pfister (ppfister)
> <ppfister at cisco.com>; Xie, Huawei <huawei.xie at intel.com>; dev at dpdk.org;
> vkaplans at redhat.com; mst at redhat.com
> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors support
> to the TX path
> 
> 
> 
> On 11/04/2016 08:57 AM, Maxime Coquelin wrote:
> > Hi Zhihong,
> >
> > On 11/04/2016 08:20 AM, Wang, Zhihong wrote:
> >>
> >>
> >>> -----Original Message-----
> >>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> >>> Sent: Thursday, November 3, 2016 4:11 PM
> >>> To: Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
> >>> <yuanhan.liu at linux.intel.com>
> >>> Cc: stephen at networkplumber.org; Pierre Pfister (ppfister)
> >>> <ppfister at cisco.com>; Xie, Huawei <huawei.xie at intel.com>;
> dev at dpdk.org;
> >>> vkaplans at redhat.com; mst at redhat.com
> >>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors
> >>> support
> >>> to the TX path
> >>>
> >>>
> >>>
> >>> On 11/02/2016 11:51 AM, Maxime Coquelin wrote:
> >>>>
> >>>>
> >>>> On 10/31/2016 11:01 AM, Wang, Zhihong wrote:
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Maxime Coquelin [mailto:maxime.coquelin at redhat.com]
> >>>>>> Sent: Friday, October 28, 2016 3:42 PM
> >>>>>> To: Wang, Zhihong <zhihong.wang at intel.com>; Yuanhan Liu
> >>>>>> <yuanhan.liu at linux.intel.com>
> >>>>>> Cc: stephen at networkplumber.org; Pierre Pfister (ppfister)
> >>>>>> <ppfister at cisco.com>; Xie, Huawei <huawei.xie at intel.com>;
> >>> dev at dpdk.org;
> >>>>>> vkaplans at redhat.com; mst at redhat.com
> >>>>>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect descriptors
> >>>>>> support
> >>>>>> to the TX path
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 10/28/2016 02:49 AM, Wang, Zhihong wrote:
> >>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Yuanhan Liu [mailto:yuanhan.liu at linux.intel.com]
> >>>>>>>>> Sent: Thursday, October 27, 2016 6:46 PM
> >>>>>>>>> To: Maxime Coquelin <maxime.coquelin at redhat.com>
> >>>>>>>>> Cc: Wang, Zhihong <zhihong.wang at intel.com>;
> >>>>>>>>> stephen at networkplumber.org; Pierre Pfister (ppfister)
> >>>>>>>>> <ppfister at cisco.com>; Xie, Huawei <huawei.xie at intel.com>;
> >>>>>> dev at dpdk.org;
> >>>>>>>>> vkaplans at redhat.com; mst at redhat.com
> >>>>>>>>> Subject: Re: [dpdk-dev] [PATCH v4] vhost: Add indirect
> descriptors
> >>>>>> support
> >>>>>>>>> to the TX path
> >>>>>>>>>
> >>>>>>>>> On Thu, Oct 27, 2016 at 12:35:11PM +0200, Maxime Coquelin
> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 10/27/2016 12:33 PM, Yuanhan Liu wrote:
> >>>>>>>>>>>>> On Thu, Oct 27, 2016 at 11:10:34AM +0200, Maxime
> Coquelin
> >>>>>> wrote:
> >>>>>>>>>>>>>>> Hi Zhihong,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 10/27/2016 11:00 AM, Wang, Zhihong wrote:
> >>>>>>>>>>>>>>>>> Hi Maxime,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Seems indirect desc feature is causing serious
> >>> performance
> >>>>>>>>>>>>>>>>> degradation on Haswell platform, about 20% drop for
> both
> >>>>>>>>>>>>>>>>> mrg=on and mrg=off (--txqflags=0xf00, non-vector
> >>> version),
> >>>>>>>>>>>>>>>>> both iofwd and macfwd.
> >>>>>>>>>>>>>>> I tested PVP (with macswap on guest) and Txonly/Rxonly
> on
> >>> an
> >>>>>> Ivy
> >>>>>>>>> Bridge
> >>>>>>>>>>>>>>> platform, and didn't faced such a drop.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I was actually wondering that may be the cause. I tested it
> >>>>>>>>>>>>> with
> >>>>>>>>>>>>> my IvyBridge server as well, I saw no drop.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Maybe you should find a similar platform (Haswell) and
> have a
> >>>>>>>>>>>>> try?
> >>>>>>>>>>> Yes, that's why I asked Zhihong whether he could test Txonly
> in
> >>>>>>>>>>> guest
> >>>>>> to
> >>>>>>>>>>> see if issue is reproducible like this.
> >>>>>>>>>
> >>>>>>>>> I have no Haswell box, otherwise I could do a quick test for you.
> >>>>>>>>> IIRC,
> >>>>>>>>> he tried to disable the indirect_desc feature, then the
> >>>>>>>>> performance
> >>>>>>>>> recovered. So, it's likely the indirect_desc is the culprit here.
> >>>>>>>>>
> >>>>>>>>>>> I will be easier for me to find an Haswell machine if it has not
> >>>>>>>>>>> to be
> >>>>>>>>>>> connected back to back to and HW/SW packet generator.
> >>>>>>> In fact simple loopback test will also do, without pktgen.
> >>>>>>>
> >>>>>>> Start testpmd in both host and guest, and do "start" in one
> >>>>>>> and "start tx_first 32" in another.
> >>>>>>>
> >>>>>>> Perf drop is about 24% in my test.
> >>>>>>>
> >>>>>>
> >>>>>> Thanks, I never tried this test.
> >>>>>> I managed to find an Haswell platform (Intel(R) Xeon(R) CPU
> >>>>>> E5-2699 v3
> >>>>>> @ 2.30GHz), and can reproduce the problem with the loop test you
> >>>>>> mention. I see a performance drop about 10% (8.94Mpps/8.08Mpps).
> >>>>>> Out of curiosity, what are the numbers you get with your setup?
> >>>>>
> >>>>> Hi Maxime,
> >>>>>
> >>>>> Let's align our test case to RC2, mrg=on, loopback, on Haswell.
> >>>>> My results below:
> >>>>>  1. indirect=1: 5.26 Mpps
> >>>>>  2. indirect=0: 6.54 Mpps
> >>>>>
> >>>>> It's about 24% drop.
> >>>> OK, so on my side, same setup on Haswell:
> >>>> 1. indirect=1: 7.44 Mpps
> >>>> 2. indirect=0: 8.18 Mpps
> >>>>
> >>>> Still 10% drop in my case with mrg=on.
> >>>>
> >>>> The strange thing with both of our figures is that this is below from
> >>>> what I obtain with my SandyBridge machine. The SB cpu freq is 4%
> >>>> higher,
> >>>> but that doesn't explain the gap between the measurements.
> >>>>
> >>>> I'm continuing the investigations on my side.
> >>>> Maybe we should fix a deadline, and decide do disable indirect in
> >>>> Virtio PMD if root cause not identified/fixed at some point?
> >>>>
> >>>> Yuanhan, what do you think?
> >>>
> >>> I have done some measurements using perf, and know understand
> better
> >>> what happens.
> >>>
> >>> With indirect descriptors, I can see a cache miss when fetching the
> >>> descriptors in the indirect table. Actually, this is expected, so
> >>> we prefetch the first desc as soon as possible, but still not soon
> >>> enough to make it transparent.
> >>> In direct descriptors case, the desc in the virtqueue seems to be
> >>> remain in the cache from its previous use, so we have a hit.
> >>>
> >>> That said, in realistic use-case, I think we should not have a hit,
> >>> even with direct descriptors.
> >>> Indeed, the test case use testpmd on guest side with the forwarding set
> >>> in IO mode. It means the packet content is never accessed by the guest.
> >>>
> >>> In my experiments, I am used to set the "macswap" forwarding mode,
> which
> >>> swaps src and dest MAC addresses in the packet. I find it more
> >>> realistic, because I don't see the point in sending packets to the guest
> >>> if it is not accessed (not even its header).
> >>>
> >>> I tried again the test case, this time with setting the forwarding mode
> >>> to macswap in the guest. This time, I get same performance with both
> >>> direct and indirect (indirect even a little better with a small
> >>> optimization, consisting in prefetching the 2 first descs
> >>> systematically as we know there are contiguous).
> >>
> >>
> >> Hi Maxime,
> >>
> >> I did a little more macswap test and found out more stuff here:
> > Thanks for doing more tests.
> >
> >>
> >>  1. I did loopback test on another HSW machine with the same H/W,
> >>     and indirect_desc on and off seems have close perf
> >>
> >>  2. So I checked the gcc version:
> >>
> >>      *  Previous: gcc version 6.2.1 20160916 (Fedora 24)
> >>
> >>      *  New: gcc version 5.4.0 20160609 (Ubuntu 16.04.1 LTS)
> >
> > On my side, I tested with RHEL7.3:
> >  - gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
> >
> > It certainly contains some backports from newer GCC versions.
> >
> >>
> >>     On previous one indirect_desc has 20% drop
> >>
> >>  3. Then I compiled binary on Ubuntu and scp to Fedora, and as
> >>     expected I got the same perf as on Ubuntu, and the perf gap
> >>     disappeared, so gcc is definitely one factor here
> >>
> >>  4. Then I use the Ubuntu binary on Fedora for PVP test, then the
> >>     perf gap comes back again and the same with the Fedora binary
> >>     results, indirect_desc causes about 20% drop
> >
> > Let me know if I understand correctly:

Yes, and it's hard to breakdown further at this time.

Also we may need to check whether it's caused by certain NIC
model. Unfortunately I don't have the right setup right now.

> > Loopback test with macswap:
> >  - gcc version 6.2.1 : 20% perf drop
> >  - gcc version 5.4.0 : No drop
> >
> > PVP test with macswap:
> >  - gcc version 6.2.1 : 20% perf drop
> >  - gcc version 5.4.0 : 20% perf drop
> 
> I forgot to ask, did you recompile only host, or both host and guest
> testmpd's in your test?

Both.

> 
> >
> >>
> >> So in all, could you try PVP traffic on HSW to see how it works?
> > Sadly, the HSW machine I borrowed does not have other device connected
> > back to back on its 10G port. I can only test PVP with SNB machines
> > currently.
> >
> >>
> >>
> >>>
> >>> Do you agree we should assume that the packet (header or/and buf) will
> >>> always be accessed by the guest application?
> >>> If so, do you agree we should keep indirect descs enabled, and maybe
> >>> update the test cases?
> >>
> >>
> >> I agree with you that mac/macswap test is more realistic and makes
> >> more sense for real applications.
> >
> > Thanks,
> > Maxime


More information about the dev mailing list