[dpdk-dev] Could not achieve wire speed for 40GE with any DPDK version on XL710 NIC's

Anuj Kalia anujkaliaiitd at gmail.com
Wed Jul 1 16:22:35 CEST 2015


Vladimir,

Few possible fixes to your PCIe analysis (let me know if I'm wrong):
- ECRC is probably disabled (check using sudo lspci -vvv | grep
CGenEn-), so TLP header is 26 bytes
- Descriptor writeback can be batched using high value of WTHRESH,
which is what DPDK uses by default
- Read request contains full TLP header (26 bytes)

Assuming WTHRESH = 4, bytes transferred from NIC to host per packet =
26 + 64 (packet itself) +
(26 + 32) / 4 (batched descriptor writeback) +
(26 / 4) (read request for new descriptors) =
111 bytes / packet

This corresponds to 70.9 Mpps over PCIe 3.0 x8. Assuming 5% DLLP
overhead, rate = 67.4 Mpps

--Anuj



On Wed, Jul 1, 2015 at 9:40 AM, Vladimir Medvedkin <medvedkinv at gmail.com> wrote:
> In case with syn flood you should take into account return syn-ack traffic,
> which generates PCIe DLLP's from NIC to host, thus pcie bandwith exceeds
> faster. And don't forget about DLLP's generated by rx traffic, which
> saturates host-to-NIC bus.
>
> 2015-07-01 16:05 GMT+03:00 Pavel Odintsov <pavel.odintsov at gmail.com>:
>
>> Yes, Bruce, we understand this. But we are working with huge SYN
>> attacks processing and they are 64byte only :(
>>
>> On Wed, Jul 1, 2015 at 3:59 PM, Bruce Richardson
>> <bruce.richardson at intel.com> wrote:
>> > On Wed, Jul 01, 2015 at 03:44:57PM +0300, Pavel Odintsov wrote:
>> >> Thanks for answer, Vladimir! So we need look for x16 NIC if we want
>> >> achieve 40GE line rate...
>> >>
>> > Note that this would only apply for your minimal i.e. 64-byte, packet
>> sizes.
>> > Once you go up to larger e.g. 128B packets, your PCI bandwidth
>> requirements
>> > are lower and you can easier achieve line rate.
>> >
>> > /Bruce
>> >
>> >> On Wed, Jul 1, 2015 at 3:06 PM, Vladimir Medvedkin <
>> medvedkinv at gmail.com> wrote:
>> >> > Hi Pavel,
>> >> >
>> >> > Looks like you ran into pcie bottleneck. So let's calculate xl710 rx
>> only
>> >> > case.
>> >> > Assume we have 32byte descriptors (if we want more offload).
>> >> > DMA makes one pcie transaction with packet payload, one descriptor
>> writeback
>> >> > and one memory request for free descriptors for every 4 packets. For
>> >> > Transaction Layer Packet (TLP) there is 30 bytes overhead (4 PHY + 6
>> DLL +
>> >> > 16 header + 4 ECRC). So for 1 rx packet dma sends 30 + 64(packet
>> itself) +
>> >> > 30 + 32 (writeback descriptor) + (16 / 4) (read request for new
>> >> > descriptors). Note that we do not take into account PCIe ACK/NACK/FC
>> Update
>> >> > DLLP. So we have 160 bytes per packet. One lane PCIe 3.0 transmits 1
>> byte in
>> >> > 1 ns, so x8 transmits 8 bytes  in 1 ns. 1 packet transmits in 20 ns.
>> Thus
>> >> > in theory pcie 3.0 x8 may transfer not more than 50mpps.
>> >> > Correct me if I'm wrong.
>> >> >
>> >> > Regards,
>> >> > Vladimir
>> >> >
>> >> >
>>
>>
>>
>> --
>> Sincerely yours, Pavel Odintsov
>>


More information about the dev mailing list