DPDK Summit North America presentations are online!
Skip to main content
Category

Blog

DPDK blog posts

Why is ABI Stability Important?

By Blog

Ray Kinsella & Thomas Monjalon

Most open-source software projects follow distinct life cycle patterns as they evolve from the genesis of a idea, all the way through to a mature stable well-defined project. You can see this evolution reflected in all aspects of the project. Upstreaming rules for instance, are usually permissive in the early days and then become gradually more conservative and risk adverse over time. Similarly, you expect lots of bugs in the early days, then things becomes more tested and stable over time, and so on.

This life-cycle is also evident in a project’s management of it’s application binary interface, usually shortened to ABI. Before continuing, two things should be pointed out;

  1. If you are bit fuzzy on what I mean by ABI, also know as “binary compatibility”, you should read through the wikipedia entry on it.
  2. We have made quite a bit of use of the dataset’s and tools from the ABI Laboratory project. It’s worth following the links to that project below to review the data-sets, we annotated those links with a †.

In the early days of a project, ABI changes are rarely given any consideration as you are usually way too busy trying to change the world! Then time goes on and the project becomes popular. The more users join the project, and become dependent on it, the more time you spend making sure that you don’t break their software.

At the extreme end of this cycle are very mature projects like the Linux Kernel. Linus Torvalds explains Linux’s commitment to maintaining a stable ABI in his own words.

“We care about user-space interfaces to an insane degree. We go to extreme lengths to maintain even badly designed or unintentional interfaces. Breaking user programs simply isn’t acceptable.” – Linus Torvalds, 2005

There are a few distinct patterns of ABI management, between fledgling and very mature project’s like Linux.

Patterns of ABI management

In some software libraries, an evolutionary pattern is very clear, that is they follow the common pattern of an unstable ABI in their early day’s and then after some period of settling they declare a 1.0 release and their ABI is more or less set in stone from that point on-wards. The GStreamer (†) project is good example of this form evolution.

Some software projects, particularly programming languages and operating systems, by virtue of being governed either by a strict set of standards and/or the requirement to offer very strong guarantees about backward compatibility, change very rarely. That is, they have a well-defined ABI from the start and it very rarely changes thereafter, with is no period of stabilization as such. LibC++ (†) and GlibC (†) are good examples of these sorts of projects.

Other software projects will support a stable ABI version for some period of time, usually months or more often years, with planned periodic ABI breakages to introduce new features or to facilitate re-factoring. These breakage’s are often timed to coordinate with the lifecycle of consuming software such as operating system distributions (Debian etc) or higher level applications. LibAV (†) and ffmpeg (†) are clear examples of this kind of project.

Finally, some software project’s by virtue of a design philosophy or simply because they are that bit earlier in their lifecycle, choose to offer fewer guarantees of ABI compatibility. DPDK and Boost projects are both good examples of this kind of project.

Why are ABI Breakages considered bad?

Modern software ecosystems are built on a hard commitment that binary interfaces will be carefully managed. When this commitment does not hold things fall apart rapidly, with applications failing to start or randomly crashing.

Imagine a world in which there was no guarantee that applications installed from an ‘app’ store or repository would just work, imagine how frustrating that might be for users? Today this all just works and we take for granted that behind the scenes, engineers are working hard to ensure that updates don’t break ABIs and therefore do not break applications. However many will remember a time when such guarantees either didn’t exist or were hard to enforce.

And the consequences? Naturally defensive behaviours will follow, developers will start to statically link with their dependencies and become slow about picking up the latest version of those dependencies as being too risky. In the worst case, some developers might start looking for another ecosystem that doesn’t break their code and their application quite so much.

And this worst case, happens more often than you might think …

Miguel De Icaza is one the fathers’ of the GNOME Project, one of the best desktop environments for Linux. For a few years in the late 90s and early 00s, it looked like Linux desktop distributions based on GNOME had a real shot with competing with Microsoft Windows to become a popular desktop operating system. However despite all the excitement, huge community effort, and commercial support from major Linux vendors, it never really happened. Miguel explains why in his blog post What Killed the Linux Desktop (worth a read).

“Backwards compatibility is not a sexy problem. It is not even remotely an interesting problem to solve. Nobody wants to do that work, everyone wants to innovate, and be responsible for the next big feature in Linux.

So Linux was left with idealists that wanted to design the best possible system without having to worry about boring details like support and backwards compatibility.

Meanwhile, you can still run the 2001 Photoshop that came when XP was launched on Windows 8. And you can still run your old OSX apps on Mountain Lion…” – Miguel De Icaza, 2012

It’s a sobering message, ABI stability done right helped contribute to Linux’s vast success as an operating system and done wrong, it hurt it’s popularity as desktop operating system. The risk is for project’s with an unstable ABI is clear, eventually your consumers will start looking for something else that doesn’t break their code quite so much.

DPDK’s has had an ABI policy committing the community to preserving the DPDK ABI since 2015.

Note that the above process for ABI deprecation should not be undertaken lightly. ABI stability is extremely important for downstream consumers of the DPDK, especially when distributed in shared object form. Every effort should be made to preserve the ABI whenever possible. The ABI should only be changed for significant reasons, such as performance enhancements. ABI breakage due to changes such as reorganizing public structure fields for aesthetic or readability purposes should be avoided. – DPDK ABI Policy, 19.08

The DPDK ABI policy encourages contributors to be mindful of consumers when making ABI changes. What is changing in DPDK, is that this policy is now evolving to offer consumers more guarantees of future compatibility.

How we are changing DPDK?

Recently the 6th revision of a new ABI policy was posted to the community, intended to start the process of moving DPDK out of the last category of projects described above and providing it’s consumers with more certainty around future ABI compatibility. This policy has been approved in principle by the DPDK Technical Board and will become the new policy following the DPDK 19.11 LTS release.

The intention is to continue to provide DPDK’s consumers the best possible features and performance for building dataplane applications, now with the addition of clearer upgrade paths and a stronger commitment to backward compatibility.

The change will mean that DPDK will now follow a pattern similar to that described for the LibAV and FFMpeg projects above. A pattern that is characterized by periods of ABI stability with periodic ABI breakages to facilitate change. In this way, a DPDK “major” ABI version will be declared aligned with the DPDK LTS release, and then supported in all the quarterly release over the year following the LTS release.

What does this mean for Contributors?

At a high-level, it means that the community will become more deliberate about how the DPDK ABI is managed. Any new features will be required to maintain existing interfaces between LTS releases, and in general ABI changes will receive more scrutiny than has been the case in the past.

To be absolutely clear, the DPDK ABI can change while ABI compatibility is being maintained.

This means that the DPDK community will guarantee, that applications built and dynamically linked against the most recent LTS release will continue to work, without requiring a rebuild, through the quarterly releases for the year following the LTS release. The DPDK ABI can and will continue to evolve during this period, adding great new features and improvements, so long as ABI compatibility with the LTS release is preserved.

Changes that are so dramatic as to require an ABI compatibility breakage will now need to wait until the next ABI breakage window at the next LTS release.

How do we prepare for this change?

The initial period of ABI stability will run for one year following the v19.11 release. This was designed to minimize disruption to the community, as most contributors are targeting the LTS release with their changes. Currently ABI breakage windows are aligned with LTS releases, meaning that even in the worst case event of an unavoidable ABI breaking change, the impact of the new policy will be minimal.

This has been designed to start to familiarize the community with the requirements of ABI compatibility, while still permitting ABI breakages for the next LTS release. The ABI policy will then be reviewed after this initial year, with the intention of lengthening the stability period and period between ABI breakages to two years.

If you are interested in the next level of detail of how the new policy will work, can review the patch.

Second-Annual DPDK Community Awards Recognize Hard Work & Collaboration

By Blog

 

The DPDK developer community convenes each year at the DPDK “Userspace” event to share knowledge, discuss best practices, and further align the community. During the event we take some time to reflect upon successes of the past year and recognize some of the amazing contributions from across the project – DPDK Community Awards.

 

 

Winners were recognized September 19 at the DPDK Userspace event in Bordeaux, France Details about each award category and its winners appear below. 

Please join us in congratulating all of our nominees and winners!

DPDK Top Ambassador:  Tim O’Driscoll
Tim has been involved with DPDK for many years on the management and marketing side, and will be recognized by most from his previous  “world tours” attending DPDK events in China, India, Europe and the States. He’s very approachable and always has a friendly word, which when coupled with his knowledge of DPDK and packet processing, makes him an excellent DPDK ambassador.

Innovation: Arm team
Congratulations to the Arm team for their innovative work on RCU, MCS Lock, and Ticket Lock. They spend a lot of time working to improve DPDK’s performance and find new alternatives. In just two years, they have improved performance and introduced new alternatives.

Contribution (Code): David Marchand
A long-time contributor to DPDK, David  is meticulous and accurate, taking the time to ensure the results of his work are perfect. His recent contributions DPDK (log fixes, Environment Abstraction Layer (EAL)  fixes, test fixes) show his love for a job well done. He’s also very friendly and funny guy, and it’s a real pleasure to work with him.

Contribution (Maintainer): Andrew Rybchenko
Andrew does very valuable work in DPDK; he’s of a great help for reviews in the mempool and mbuf areas. He’s rigorous and always brings a constructive and positive perspective.  He has the maintainer mindset, and ensures the APIs fit every case now and in the future.

Contribution (Operations):  Ferruh Yigit
Ferruh spends a lot of time ensuring things are progressing. He’s the engine  powering a fast=paced release and improvement cycle. In addition to his numerous contributions (code, reviews, doc, etc.) , Ferruh does an essential job in pinging people to ensure that everything is getting done on time.

Contribution (Testing): Aaron Conole
Aaron has been a regular DPDK contributor since 2015. In the  few past years, he has invested a lot of effort in setting up Travis CI for the DPDK project, working closely with the UNH Interop Lab team. He is especially good with testing powers in Open Source workflows. On top of that, Aaron is always available to provide constructive feedback, be it about technical topics, or about beer brewing!

Memory in DPDK Part 2: Deep Dive into IOVA

By Blog

By Antanoly Burakov

This post is Part 2 of a 4-part blog series that was originally published on the Intel Developer Zone blog.

Introduction

In the previous article, we covered the main concepts and principles behind Data Plane Development Kit (DPDK) memory management and how they contribute to DPDK’s unparalleled performance. However, DPDK is a complex beast that needs to be configured correctly to make the most out of it. In particular, picking the right kernel driver and IOVA mode may be crucial, depending on the application, as well as the environment in which said application is intended to run. This article discusses various options available and makes recommendations on what should be used.

Environment Abstraction Layer (EAL) Parameters

At the heart of DPDK lies the Environment Abstraction Layer (EAL). The EAL is a DPDK library that, as its name suggests, abstracts away the environment (hardware, OS, and so on) and presents a unified interface to software. EAL handles a great many things and is easily the single most complex part of DPDK. Some of the things EAL is responsible for include:

  • Managing CPU cores and non-uniform memory access (NUMA) nodes
  • Making hardware devices available to DPDK poll-mode drivers (PMDs) by mapping their registers into memory
  • Managing hardware and software interrupts
  • Abstracting away platform differences such as endianness, cache line size, and so on
  • Managing memory and multiprocess synchronization
  • Providing platform- and OS-independent ways of working with atomics, memory barriers, and other synchronization primitives
  • Loading and enumerating hardware buses, devices, and PMDs

The above list is by no means exhaustive, but it gives an idea of how vital the EAL is to DPDK. It is therefore no surprise that a lot of configuration in DPDK has to do with configuring the EAL. Currently, this is (directly or indirectly) done through specifying command-line parameters to the DPDK initialization routine. Usually, a DPDK application command-line would look like the following:

Some applications using DPDK may hide this step from the user (such as OvS-DPDK), so there may be no need to specify EAL command-line parameters explicitly, but it is nevertheless always happening in the background.

IO Virtual Addresses (IOVA) Modes

DPDK is a user space application framework, so software using DPDK works with regular virtual addresses, like any other software. However, DPDK also provides user space PMDs and a set of APIs to perform IO operations entirely from user space. As was discussed in the previous article in this series, the hardware does not understand user space virtual addresses; instead, it uses IO addresses—either physical addresses (PA), or IO virtual addresses (IOVA).

The DPDK API does not distinguish between physical and IO virtual addresses, and always refers to either as IOVA, even if no IO memory management unit (IOMMU) is involved to provide the VA part. However, DPDK does distinguish between cases where physical addresses are used as IOVA, and cases where IOVA matches user space virtual addresses. These cases are referred to as IOVA modes in the DPDK API, and there are two of them: IOVA as PA, and IOVA as VA.

IOVA as Physical Addresses (PA) Mode

When IOVA as PA mode is used, the IOVA addresses assigned to all DPDK memory areas are actual physical addresses, and virtual memory layout matches the physical memory layout. The good thing about this approach is that it is simple: it works with all hardware (that is, does not require IOMMU), and it works well with kernel space (it is trivial to convert a real physical address to a kernel space address). This is in fact how DPDK has worked for a long time, and it is in many ways considered the default.

There are certain disadvantages associated with using IOVA as PA mode, however. One of them is that it requires privileges—DPDK cannot get a memory region’s real physical address without having access to the system’s page map. Thus, it is not possible to run in IOVA as PA mode without root privileges on the system.

Another notable limitation of IOVA as PA mode is that virtual memory layout follows physical memory layout. This means that if physical memory space is fragmented (that is, there are lots of small segments instead of a few large ones), the virtual memory space follows that fragmentation. In extreme cases, the fragmentation can be so severe that the number of standalone, physically contiguous segments exhausts DPDK’s internal data structures used to store information about those segments, and DPDK initialization simply fails.

The DPDK community has come up with workarounds to address these issues. For example, one way to reduce the impact of fragmentation is to use bigger page sizes—the problem is not fixed, but a standalone 1-gigabyte (GB) segment is way more useful than a standalone 2-megabyte (MB) segment. Rebooting the system and reserving huge pages at boot time instead of at run time is another widely used workaround. None of the above workarounds fix the underlying problem though, and the DPDK community is so used to dealing with it that every DPDK user (knowingly or unknowingly) ends up following the same thought process of “I need X MB of memory, but I’ll reserve X+Y MB just in case!”

IOVA as Virtual Addresses (VA) Mode

IOVA as VA mode, in contrast, is a mode in which the underlying physical memory layout is not followed. Instead, the physical memory is reshuffled in such a way as to match the virtual memory layout. DPDK EAL does so by relying on kernel infrastructure, which in turn uses IOMMU to remap physical memory.

The advantage of this approach is obvious: in the case of IOVA as VA mode, all memory is both VA- and IOVA-contiguous. This means that any memory allocation that requires lots of IOVA-contiguous memory is more likely to succeed because the memory looks IOVA-contiguous to the hardware, even though the underlying physical memory may not be. Because of the remapping, the problem of fragmented IOVA space becomes irrelevant; however heavily fragmented the physical memory can be, it is always remapped to appear as an IOVA-contiguous chunk of memory.

Another advantage of using IOVA as VA mode is that it does not require any privileges, because it does not need access to the system page map. This allows running DPDK as a non-root user, and makes it easier to use DPDK in environments where privileged access is undesirable, such as cloud-native environments.

There is of course one disadvantage to using IOVA as VA mode. For various reasons, using the IOMMU may not be an option. Such circumstances may include:

  • Hardware that does not support using IOMMU
  • Platform may not have an IOMMU in the first place (for example, a VM without IOMMU emulation)
  • Software devices (for example, DPDK’s Kernel Network Interface (KNI) PMD) will not support IOVA as VA mode
  • Some IOMMUs (generally emulated ones) may have a limited address width, which, while not preventing the use of IOVA as VA mode, limits its usefulness
  • Using DPDK on an OS other than Linux*

However, such cases are relatively rare, and in the great majority of scenarios, IOVA as VA mode will work just fine.

Which IOVA Mode to Use

In many cases, DPDK chooses IOVA as PA mode as the default, as it is the most safe mode to use from the hardware perspective. Any given hardware (or software) PMD is all but guaranteed to support at least IOVA as PA mode. Nevertheless, all DPDK users are highly encouraged to use IOVA as VA mode whenever possible, as there are undeniable advantages to using this mode.

The user, however, does not have to pick one over the other. The most suitable IOVA mode is detected automatically, and the default value most definitely works for the majority of cases, so no user interaction is required to make this choice. If the default is not suitable, the user can attempt to override the IOVA mode with an EAL flag (applicable to DPDK 17.11 and later) by using the –iova-mode EAL command-line parameter:

In most cases, VA and PA modes do not exclude each other and either one can be used, but there are some circumstances where IOVA as PA mode will be the only available option. If using IOVA as VA mode is not available, DPDK automatically switches over to IOVA as PA mode, even if it was requested to use IOVA as VA mode through an EAL parameter.

DPDK also provides an API to query which particular IOVA mode is in use at run time, but generally it is not used in user applications, as such information is usually only required by entities like DPDK PMDs and bus drivers.

IOVA Mode and DPDK PCI Drivers

DPDK does not do all hardware device register and interrupt mapping by itself; it needs a little help from the kernel. To accomplish that, all hardware devices that are to be used by DPDK need to be bound to a generic Peripheral Component Interconnect (PCI) kernel driver. The generic part means that this driver is not locked to a specific set of PCI IDs like regular drivers, but can instead be used with any PCI device.

To bind a device to a generic driver, DPDK users are encouraged to refer to DPDK documentation, which describes this process for all supported OSes. However, a few words need to be said about various user space IO drivers supported by DPDK, and which IOVA modes they support. It may seem like there would be a 1:1 correspondence between a kernel driver and supported IOVA modes, but that is not actually the case. The following section discusses available drivers on Linux.

User Space IO (UIO) Drivers

The oldest kernel driver in the DPDK codebase is the igb_uio driver. It has been there pretty much since the beginning of DPDK, and it is thus the most widely used and the most familiar driver to DPDK developers.

This driver relies on kernel user space IO (UIO) infrastructure to work, and provides support for all interrupt types (legacy, message signaled interrupts (MSI), and MSI-X), as well as creating virtual functions. It also exposes hardware devices’ registers and interrupt handles through the /dev/uio file system, which DPDK EAL then uses to map them into user space and make them available for DPDK PMDs.

The igb_uio driver is very simple and does not do very much. It is therefore no surprise that it does not support using IOMMU. Or, to be more precise, it does support IOMMU, but only in pass-through mode, which sets up a 1:1 mapping between IOVA and the physical address. Using full IOMMU mode is not supported by igb_uio. As a consequence, the igb_uio driver only supports IOVA as PA mode and cannot work in IOVA as VA mode at all.

A driver similar to igb_uio is available in the kernel: uio_pci_generic. It works pretty much the same way as igb_uio, except that it is more limited in what it can do. For example, igb_uio supports all interrupt types (legacy, MSI, and MSI-X), while uio_pci_generic only supports legacy interrupts. Perhaps more importantly, igb_uio can also create virtual functions, while uio_pci_generic cannot; so, if creating virtual functions while using a DPDK physical function driver is a requirement, igb_uio is the only option.

Thus, in most cases, igb_uio would be either equivalent or preferable to uio_pci_generic. All of the limitations with regard to using IOMMU apply equally to both igb_uio and uio_pci_generic drivers—they cannot use full IOMMU functionality, and thus only support IOVA as PA mode.

VFIO Kernel Driver

An alternative to the above drivers is a vfio-pci driver. It is part of Virtual Function I/O (VFIO) kernel infrastructure and was introduced in Linux version 3.6. The VFIO infrastructure makes both device registers and device interrupts available to user space applications, and can use the IOMMU to set up IOVA mappings to perform IO from user space. The latter part is crucial—this driver was developed specifically for use with IOMMU and, on older kernels, will not even work without IOMMU enabled.

Contrary to what might seem intuitive, using the VFIO driver allows using both IOVA as PA and IOVA as VA modes. This is because, while it is recommended to use IOVA as VA mode to avail all of the benefits of that mode, nothing stops DPDK’s EAL from setting up IOMMU maps in such a way as to follow the physical memory layout 1:1; the IOVA mappings are arbitrary, after all. In that case, even though the IOMMU is used, DPDK will work in IOVA as PA mode, thereby allowing things like DPDK KNI to work. It does, however, still require root privileges to use IOVA as PA mode.

On more recent kernels (4.5+, backported to some older versions), there is an enable_unsafe_noiommu_mode option available that allows using VFIO without IOMMU. This mode is for all intents and purposes identical to UIO-based drivers, and shares all of the same advantages and limitations they have.

Which Kernel Driver to Use

Generally speaking, it is not a choice that has to be made. More often than not, the situation dictates the appropriate driver to use. The following flowchart is helpful in deciding which driver can be used in a particular circumstance:

As is clear from Figure 5, it is highly recommended to use the VFIO driver in just about all cases, especially in production environments. Using IOMMU provides device isolation at a hardware level, which makes applications using DPDK more secure, and using IOVA as VA mode allows better use of memory through remapping, as well as not requiring root privileges to run DPDK applications. However, certain use cases will require either igb_uio or uio_pci_generic drivers.

Software Poll Mode Drivers (PMD)

In addition to the above, DPDK also comes with a range of software PMDs that do not require a generic kernel PCI driver, and instead rely on standard kernel infrastructure to provide hardware support. This enables DPDK to work with almost any hardware, even if it is not natively supported by DPDK.

Currently, DPDK has PMDs for the PCAP library, which is a widely used and supported packet capture library for network hardware. DPDK also supports Linux networking with an AF_PACKET PMD, and there is also ongoing work to support AF_XDP natively in DPDK. Using these PMDs comes with a (sometimes considerable) performance cost, but the flipside is that the setup is easy, and these PMDs usually do not care about IOVA mode at all.

Summary

This article provided an in-depth view of how DPDK deals with physical memory, as well as outlined physical addressing features available in DPDK when using various Linux* kernel drivers.

This is the second article in the series of articles about memory management in DPDK. The first article outlined key principles that lie at the foundation of DPDK’s memory management subsystem. The following articles in this series provide a historical perspective on memory management features available in DPDK long term support (LTS) releases 17.11 and earlier, as well as describe the changes and new features available in 18.11 and later DPDK versions.

Memory in DPDK, Part 1: General Concepts

By Blog

By Antanoly Burakov

This post is Part 1 of a 4-part blog series that was originally published on the Intel Developer Zone blog.

Introduction

Memory management is a core aspect of the Data Plane Development Kit (DPDK). It provides a solid foundation upon which both other parts of DPDK and user applications are built to perform their best. In this series of articles, we take a close look at the various memory management features provided by DPDK.

However, before we get into detail on the various memory-related features provided by DPDK, it is important to provide some perspective on why memory management in DPDK works the way it does, and what the principles are that lie behind it. This article covers these principles and explains how they help in achieving DPDK’s high performance.

Note While DPDK supports FreeBSD*, and there is also a work-in-progress Windows* port, the majority of memory-related features are currently only available on Linux*.

Huge Pages

In modern CPU architectures, memory is not managed as individual bytes, but rather using pages—virtually and physically contiguous blocks of memory. These blocks of memory are usually (but not necessarily) stored in RAM. On Intel® 64 and IA-32 architectures, standard system page size is 4 kilobytes.

When code is run, page addresses for accessing memory locations need to be translated from virtual addresses used by software applications to physical addresses used by the hardware. This translation is done by way of page tables, which map virtual to physical addresses, on a page level of granularity. To improve performance, the most recently used page addresses for accessed memory locations are kept in a cache called the translation lookaside buffer (TLB). Each page occupies an entry in the TLB. If your code accesses (or has recently accessed) 16 kilobytes of memory—that is, four pages—then there is a good chance that these pages will be in the TLB cache.

If one of those pages is not in the TLB cache, any attempt to access addresses contained within that page will cause a TLB miss; that is, the operating system (OS) will have to fetch the page address from its global page table into the TLB. TLB misses are therefore relatively expensive (and can get really expensive in some cases), so it is preferable to have as few TLB misses as possible by having all currently active pages in the TLB.

However, the TLB is not infinite in size; it is actually quite small, and the amount of memory covered by the TLB for standard page sizes at any given moment is pretty insignificant (a few megabytes) compared to the amount of data DPDK usually deals with (sometimes up to tens of gigabytes). This means that, were DPDK to use regular memory, applications using DPDK would experience a significant performance degradation due to the high rate of TLB misses.

To address this problem, DPDK relies on huge pages. It is easy to guess from their name that huge pages are like regular pages, only bigger. How much bigger? On Intel 64 and IA-32 architectures, the two currently available HugePage sizes are 2 megabyte (MB) and 1 gigabyte (GB). That means a single page can cover an entire 2 MB or 1 GB physically and virtually contiguous memory area.

DPDK supports both of these page sizes. With those page sizes, it is much easier to cover large memory areas without (as many) TLB misses. Fewer TLB misses, in turn, leads to better performance when working with large memory areas, as is customary for DPDK use cases.

Pinning Memory to NUMA Nodes

When regular memory is allocated, it can, in theory, be physically located anywhere in RAM. This is not an issue on a single-CPU system, but many DPDK consumers run their applications on multi-CPU systems with non-uniform memory access (NUMA) support. With NUMA, all memory is not equal: some memory accesses will take longer than others due to their physical location in relation to the CPU doing said memory accesses. When using regular memory allocation there often is no control over where this memory gets allocated, so if DPDK uses regular memory on such a system, it is possible to end up in a situation where a thread executing on one CPU unintentionally accesses memory belonging to a non-local NUMA node.

Admittedly, such cross-NUMA node accesses would be rare on any modern OS as they are all NUMA-aware, and there are ways to enforce NUMA locality for memory without DPDK. However, what DPDK brings to the table is not just NUMA-awareness; rather, it is the fact that the entirety of DPDK’s API is structured around explicit NUMA awareness for every operation. There is often no way to allocate a given DPDK data structure without explicitly requesting NUMA node access where said structure will have to be located in memory.

Such explicit NUMA awareness throughout the DPDK API helps to ensure that NUMA awareness is always a consideration in every operation performed by a user application; in other words, the DPDK API makes it harder to write poorly performing code.

Hardware, Physical Addresses, and DMA

DPDK was conceived as a set of user space packet I/O libraries, and to this day it largely stays true to its original mission statement. However, hardware does not work with user space virtual addresses—it is unaware of any user space processes, and thus lacks the context required to understand where user space virtual addresses point to. Instead, it works using real physical addresses; that is, the addresses that the CPU, RAM, and all other parts of the system use to communicate to each other.

Modern hardware almost always uses direct memory access (DMA) transactions for efficiency reasons. Normally, in order to perform a DMA transaction, the kernel would need to be involved to create a DMA-enabled memory area, translate the in-process virtual address to a real physical address that can be understood by the hardware, and to initiate the DMA transaction. This is how I/O works in most modern operating systems; however, this is a time-consuming process that requires context switching and translation and lookup operations that are not conducive to high-performance I/O.

DPDK’s memory management addresses this problem in a simple way. Whenever a memory area is made available for DPDK to use, DPDK figures out its physical address by asking the kernel at that time. Since DPDK uses pinned memory, generally in the form of huge pages, the physical address of the underlying memory area is not expected to change, so the hardware can rely on those physical addresses to be valid at all times, even if the memory itself is not used for some time. DPDK then uses these physical addresses when preparing I/O transactions to be done by the hardware, and configures the hardware in such a way that the hardware is allowed to initiate DMA transactions itself. This allows DPDK to avoid needless overhead and to perform I/O entirely from user space.

IOMMU and IOVA

By default, any hardware has access to the entire system, so it can perform DMA transactions anywhere. This has a number of security implications. For example, a rogue and/or untrusted process (including one running inside a virtual machine (VM)) could potentially use a hardware device to read from and write to kernel space, and just about any other memory location. To address this problem, modern systems come equipped with an input-output memory management unit (IOMMU). This is a hardware device that provides DMA address translation and device isolation facilities, so that a particular device is only allowed to perform DMA transactions to and from certain memory areas (designated by the IOMMU), and cannot access the rest of the system memory address space.

Due to the involvement of IOMMU, the physical address the hardware uses may not be the real physical address, but instead a (completely arbitrary) input-output virtual address (IOVA) assigned to the hardware by the IOMMU. Generally, the DPDK community uses the terms physical address and IOVA interchangeably, but, depending on context, the difference between the two might matter. For example, DPDK 17.11 and the newer long-term support (LTS) versions of DPDK may not use actual physical addresses at all in certain circumstances, and may instead use user space virtual addresses (or even completely arbitrary addresses) for DMA purposes. The IOMMU takes care of address translation, so the hardware never notices the difference between the two.

Depending on how DPDK was initialized, IOVA addresses may or may not represent actual physical addresses, but one thing is always true: DPDK is aware of the underlying memory layout, and can therefore take advantage of that. For example, it can map pages in such a way as to create IOVA-contiguous virtual areas, or even make use of IOMMU to rearrange the memory maps to make memory appear IOVA-contiguous, even though the underlying physical memory may not be.

As a result, this awareness of underlying physical memory areas is one more tool in DPDK’s tool belt. Most data structures do not care about IOVA addresses, but when they do, DPDK provides the facilities for software and hardware to take advantage of physical memory layout, and optimize for different use cases.

Note the IOMMU will not set up any mappings by itself. Rather, the platform, the hardware, and the OS must be configured to use IOMMU. Such configuration instructions are out of scope for this series of articles, but there are instructions available in the DPDK documentation and elsewhere. Once the system and the hardware are set up to use IOMMU, DPDK is able to use IOMMU to set up DMA mappings for any memory areas allocated by DPDK. Making use of IOMMU is the recommended way to run DPDK, as doing so is more secure, and it provides usability advantages.

Memory Allocation and Management

DPDK does not use regular memory allocation functions such as malloc(). Instead, DPDK manages its own memory. More specifically, DPDK allocates huge pages and creates a heap out of this memory, to give out to user applications and to use for internal data structures.

Using a custom memory allocator has a number of advantages. The most obvious one is the performance benefit for the end applications: DPDK creates memory areas to be used by the application, and the application can take advantage of huge page support, NUMA node affinity, access to DMA addresses, IOVA contiguousness, and so on, without any additional effort.

DPDK memory allocations are always aligned on CPU cache line boundaries—the start address of each allocation will be a multiple of the cache line size for the system. Such an approach prevents many common performance problems such as unaligned accesses and false sharing of data, where a single cache line inadvertently contains (possibly unrelated) data being accessed by multiple cores at once. Alignment by any other power-of-two value (>= cache line size, of course) is also supported for use cases that require such alignment (for example, allocating hardware ring structures).

Any memory allocation in DPDK is also thread-safe. This means that any allocation taking place on any core will be atomic, and will not interfere with any other allocations. This may seem like a triviality (after all, regular glibc memory allocation routines are generally thread-safe as well), but its significance becomes clearer once it is considered in the context of multiprocessing.

DPDK supports a specific flavor of cooperative multiprocessing, where a primary process manages all DPDK resources, and multiple secondary processes can attach to the primary process and have shared access to resources managed by the primary process.

DPDK’s shared memory implementation works by not only mapping the same resources in different processes (similar to mechanisms like shmget()), but by also duplicating the primary process’s address space inside another process. Therefore, since everything is located at the same addresses within both processes, any pointers to DPDK memory objects will work across processes, without any address translation necessary. This is very important for performance when passing data across processes.

Table 1. Comparison between OS and DPDK allocators.

The shared nature of DPDK’s memory is also why thread safety of the DPDK heap is hugely important; not only can any thread allocate and deallocate data concurrently with any other thread, but any process can allocate and deallocate memory concurrently with multiple other processes, without any race conditions. Because the entire DPDK memory heap is shared across processes, it is also perfectly safe to allocate memory in one process and reference or free it in another.

Memory Pools

DPDK also has a memory pool manager that is widely used throughout DPDK to manage large pools of objects of fixed size. Its uses are many—packet I/O, crypto operations, event scheduling, and many other use cases that need to quickly allocate or deallocate fixed-sized buffers. DPDK memory pools are highly optimized for performance, and support optional thread safety (users do not pay for thread safety if they don’t need it) and bulk operations, all of which result in allocation or free operation cycle counts per buffer reaching low double-digit values.

That said, even though the subject of DPDK memory pools pops up in just about every discussion on memory management in DPDK, the memory pool manager is technically a library built on top of the regular DPDK memory allocator. It is not part of standard DPDK memory allocation facilities, and its internal workings are completely separate from (and very different than) the DPDK memory management routines. For this reason, it is out of scope for this article series. However, more information about the DPDK memory pool manager library can be found in the DPDK documentation.

Conclusion

This article covered many of the core principles that lie at the foundation of DPDK’s memory management subsystem, and demonstrated that high performance of DPDK is not an accident, but rather a deliberate consequence of its architecture.

The following articles in this series present a deep dive into IOVA addressing and its use in DPDK, provide a historical perspective on memory management features available in DPDK long term support (LTS) releases 17.11 and earlier, and describe the changes and new features available in 18.11 and later DPDK versions.

DPDK 19.08, Biggest Release of 2019, is Now Available

By Blog

The latest major release of DPDK is now available, DPDK 19.08:  https://fast.dpdk.org/rel/dpdk-19.08.tar.xz. Arguably the biggest release of the year, DPDK 19.08 was a phenomenal community effort. 

The statistics – probably the biggest release of the year:

  •  1327 commits from 171 authors
  • 1631 files changed, 138797 insertions(+), 97285 deletions(-)

A list of new features, grouped by category, is included below: 

General:

  •    IOVA mode defaults to VA if IOMMU is available
  •    MCS lock
  •    better pseudo-random number generator
  •    Intel QuickData Technology (ioat) PMD
  •    non-transparent bridge (ntb) PMD

Networking:

  •   actions for TCP and GRE in flow API
  •   Broadcom Thor support in bnxt PMD
  •   Huawei (hinic) PMD
  •   Marvell OCTEON TX2 PMD
  •   shared memory (memif) PMD
  •   zero copy and multi-queues in AF_XDP PMD

Baseband:

  •  Intel FPGA LTE FEC PMD

More details available in the release notes: http://doc.dpdk.org/guides/rel_notes/release_19_08.html

There are 70 new contributors (including authors, reviewers and testers). Welcome to Abraham Tovar, Adam Dybkowski, Adham Masarwah, Aideen McLoughlin, Amit Gupta, Amrutha Sampath, Anirudh Venkataramanan, Artur Trybula, Ashijeet Acharya, Ashish Shah, Brett Creeley, Christopher Reder, Dave Ertman, Dilshod Urazov, Eli Britstein, Flavia Musatescu, Georgiy Levashov, Gosia Bakota, Grishma Kotecha, Grzegorz Nitka, Hariprasad Govindharajan, Henry Tieman, Jacek Naczyk, Jacob Keller, Jaroslaw Ilgiewicz, Jeb Cramer, Jesse Brandeburg, Jingzhao Ni, Johan Källström, John OLoughlin, Július Milan, Kalesh AP, Kanaka Durga Kotamarthy, Karol Kolacinski, Kevin Lampis, Kevin Scott, Lance Richardson, Lavanya Govindarajan, Lev Faerman, Lukasz Bartosik, Lukasz Gosiewski, Maciej Bielski, Marcin Zapolski, Mariusz Drost, Marta Plantykow, Mesut Ali Ergin, Michel Machado, Mohsin Mazhar Shaikh, Naresh Kumar PBS, Nicolas Chautru, Radu Bulie, Santoshkumar Karanappa Rastapur, Satha Rao, Sean Morrissey, Shivanshu Shukla, Sriharsha Basavapatna, Srinivas Narayan, Suanming Mou, Suyang Ju, Tao Zhu,Tarun Singh, Thinh Tran, Ting Xu, Tummala Sivaprasad, Vamsi Attunuru, Wenjie Li, William Tu, Xiao Zhang, Yuri Chipchev and Ziyang Xuan.

Below is the number of patches per company (with authors count):

    435     Intel (60)

   239     Marvell (19)

    164     Mellanox (15)

    109     Red Hat (5)

     84     Broadcom (10)

     77     Microsoft (2)

     51     Solarflare (7)

     28     ARM (5)

     25     NXP (6)

     24     6WIND (4)

     20     Huawei (4)

     12     OKTET Labs (3)

     11     Cisco (4)

      8     IBM (4)

      5     Semihalf (3)

      4     Netcope (2)

      4     Ericsson (1)

Based on Reviewed-by and Acked-by tags, the top reviewers are:

    134     Ferruh Yigit <ferruh.yigit@intel.com>

    101     Qi Zhang <qi.z.zhang@intel.com>

     67     Jerin Jacob <jerinj@marvell.com>

     63     Viacheslav Ovsiienko <viacheslavo@mellanox.com>

     44     David Marchand <david.marchand@redhat.com>

     43     Maxime Coquelin <maxime.coquelin@redhat.com>

     38     Bruce Richardson <bruce.richardson@intel.com>

     37     Matan Azrad <matan@mellanox.com>

     34     Stephen Hemminger <stephen@networkplumber.org>

     31     Konstantin Ananyev <konstantin.ananyev@intel.com>

     31     Anatoly Burakov <anatoly.burakov@intel.com>

     31     Ajit Khaparde <ajit.khaparde@broadcom.com>

     29     Akhil Goyal <akhil.goyal@nxp.com>

     27     Luca Boccassi <bluca@debian.org>

     25     Shahaf Shuler <shahafs@mellanox.com>

     23     Xiaolong Ye <xiaolong.ye@intel.com>

     23     Fiona Trahe <fiona.trahe@intel.com>

     22     Andrew Rybchenko <arybchenko@solarflare.com>

     20     Somnath Kotur <somnath.kotur@broadcom.com>

     17     Gavin Hu <gavin.hu@arm.com>

     16     Shally Verma <shallyv@marvell.com>

     16     Olivier Matz <olivier.matz@6wind.com>

     16     Hemant Agrawal <hemant.agrawal@nxp.com>

     15     Yongseok Koh <yskoh@mellanox.com>

 

What’s Next

The new features for the 19.11 release may be submitted during the next four weeks,

in order to be reviewed and integrated during September. DPDK 19.11 should be released at the beginning of November:  http://core.dpdk.org/roadmap#dates

In memory of Rami Rosen, we would like to encourage everybody to clean up and carefully review the DPDK documentation.

We’d love to see you at the DPDK Userspace event  in Bordeaux, France September 19-20. If you haven’t already, please register. More details here: https://events.linuxfoundation.org/events/dpdk-userspace-2019-bordeaux

Thanks to everyone in the broader DPDK community for you participation and contributions.

DPDK releases v19.05, introduces Windows Support!

By Blog

This post originally appeared on the Microsoft Tech Community (Networking) blog, here: https://techcommunity.microsoft.com/t5/Networking-Blog/DPDK-releases-v19-05-introduces-Windows-Support/ba-p/633927

By Harini Ramakrishnan 

Data Plane Development Kit recently issued the release of DPDK v19.05.

We are thrilled to share that this release marks the introduction of Windows Support in the community-maintained upstream repository! This exciting development paves the way for more core libraries and networking hardware to be supported on Windows lighting up new use cases.

DPDK is a set of fast packet-processing libraries and drivers for user-mode applications looking to optimize network performance.

DPDK logo.PNG

The Linux foundation hosted DPDK project is a vibrant, thriving community of developers from over 25 organizations spanning networking hardware vendors, independent software vendors, OS distros and consuming open source projects.

For over a year now, we’ve had the ability to run DPDK on Windows through libraries available in the DPDK Windows draft repository. However, this meant that the Windows port needed a separate development, build and testing pipeline, consequentially trailing behind the DPDK community project by multiple releases.

With the initiation of the merge, DPDK libraries for Windows will benefit from the participation, contribution and leadership of the DPDK community. For instance, as part of this integration, DPDK libraries for Windows moved away from dependency on proprietary tool chain to using Clang-LLVM C compiler and Meson Build system.

What Next?

Wait, does this mean we can retire the DPDK Windows draft repository? Not quite, yet!

The draft repository will continue to be the development vehicle for all contributions, until we attain parity in features at the main repository. The integration of Windows Platform support has been initiated with the release of DPDK v19.05 and is expected to continue through 2019.

Watch the Roadmap page for announcements on Core libraries, Poll Mode drivers and features that will be added in the subsequent releases. As sharedbefore, we are partnering with multiple networking vendors to expand the hardware ecosystem for DPDK on Windows.

Eventually, when the integration is complete, DPDK on Windows can remain stable, up to date enjoying the quality baseline as other platforms.

Ways to Contribute

Interested in participating? Help us make DPDK on Windows more stable!

Test the DPDK libraries on Windows and share your feedback! Head over to the getting started guide.

But wait this is just a Hello world! Looking for more? Try the Windows port at the DPDK-draft-Windows repository with the v18.08 branch and readme.

Do you have questions, feedback to share or want to report bugs? Do you have new use cases to support or want to make feature requests?

Write to us by registering for the DPDK development mail list dev@dpdk.org. Contribute patches under these guidelines, reference “dpdk-draft-windows” in contribution.

While we do our best to follow the forums used by the DPDK community, for quick direct access to the Microsoft Windows DPDK team, drop us an email at dpdkwin@microsoft.com.

Join us in the DPDK Windows Community call, under the guidance of the DPDK Technical Board to help shape the future of DPDK on Windows!

Thanks to the contributions from our partners at Intel and the DPDK Technical board for the guidance and the leadership.

Looking forward to hearing from you, Thanks for reading!

DPDK Community Lab Publishes Relative Performance Testing Results

By Blog

By Jeremy Plsek, Lincoln LaVoie and Patrick MacArthur

The DPDK Community Lab is an open, independent testing resource for the DPDK project. Its purpose is to perform automated testing on incoming patch submissions, to ensure the performance and quality of DPDK is maintained. Participation in the lab is open to all DPDK project participants.

For some time now, the DPDK Community Lab has been gathering performance deltas using the single-core packet I/O layer 2 throughput test from DTS for each patch series submitted to DPDK compared to the master branch. We are pleased to announce that the  Lab has recently been allowed to make these results public. These results are also now published to Patchwork as they are automatically generated. These results currently contain Mellanox and Intel devices, and the lab is able to support hardware from any DPDK participants wishing to support these testing efforts.

To view these results, you can go to DPDK Community Lab Dashboard via the following link: https://lab.dpdk.org. The dashboard lists an overview of all active patch series and their results. Detailed results can be viewed by clicking on the patch series. If a patch fails to merge into master, a build log will show to help identify any issues. If a patch cleanly merges into master, performance delta results will show for each participating member.

The Lab is hosted by the University of New Hampshire InterOperability Laboratory, as a neutral, third party location. This provides a secure environment for hosting equipment and generating unbiased results for all participating vendors. Lab participants, i.e. companies hosting equipment in the testing, can securely access their equipment through a VPN, allowing for maintenance and performance tuning, as the DPDK project progresses.

The Lab works by polling the Patchwork API. When new patches are submitted, the CI server merges them with the master branch and generates a tarball. Each participating system unpacks and installs the DPDK tarball and then runs the performance testing against this DPDK build. When all systems have finished testing, the CI gathers the results into our internal database to be shown on the Dashboard, and sends final reports to Patchwork to show up on the submitted patch. This allows patch submitters to utilize Patchwork to view their individual results, while also allowing anyone to quickly see an overview of results on the Dashboard. The system provides maintainers with positive confirmation of the stability and performance of the overall project.

In the future, we plan to open the Lab to more testing scenarios, such as performance testing of other features, beyond single-core packet I/O layer 2 throughput, and possibly running Unit Tests for DPDK. Additional features will be added to the Dashboard, such as showing graphs of the performance changes of master over time.

If your company would like to be involved, email the Continuous Integration group at ci@dpdk.org and dpdklab@iol.unh.edu.

First DPDK Community Awards Shine Spotlight on Teamwork, Collaboration

By Blog

As the DPDK community continues to make strides, we’d like to take some time to reflect upon successes of the past year and announce the winners of the inaugural DPDK Community Awards, acknowledging individual and team contributions to the success of the project. We have an amazing community that has been working hard to ensure DPDK’s success, so please join us in taking a moment to thank and congratulate each of our winners, and the entire developer community at large.  

Winners were recognized September 5th at the DPDK Userspace event in Dublin, Ireland. Details about each award category and its winners appear below. Please join us in congratulating all of our nominees and winners!

DPDK Project Service Award: Thomas Monjalon
The community would like to recognize Thomas for his tireless work across many groups through the entire DPDK community. Thomas has been DPDK’’s primary maintainer since the open source project was established in 2013 and works in the background to keep the projects’ CI/CD infrastructure moving smoothly. Additionally, Thomas played a crucial role in designing the updated DPDK website.

DPDK Top Ambassador: Jim St. Leger
Jim’s passion for the project is unparalleled. He continues to champion and evangelize DPDK across a variety of mediums, regularly speaks on behalf of the project, and recently briefed a handful of industry media and analysts about the project.  

Innovation: Berkeley Packet Filter library (BPF)
Konstantin Ananyev showed great initiative in creating an eBPF library for DPDK. This represents another step towards combining the best of DPDK and the kernel.  

Innovation: Compression API
Thanks to Shally Verma, Fiona Trahe, Lee Daly, Pablo de Lara Guarch and Ahmed Mansour, Compression API was a great collaborative, cross-vendor initiative to create a new acceleration API which helps to expand DPDK’s reach into new use cases such as storage.

Innovation: Virtio 1.1
Congratulations to Tiwei Bie, Maxime Coquelin, Jens Freimann, Yuanhan Liu, and Jason Wang. This was another great collaborative effort to adopt the new Virtio 1.1 standard in DPDK, leading to a significant boost in performance in virtualized environments.

Contribution (Code):  Anatoly Burakov
Not only does Anatoly regularly and consistently contribute high-quality code, but his significant work in developing a memory hotplug resulted in a significant improvement to the project’s memory management subsystem.

Contribution (Documentation):  John McNamara
John’s work with DPDK documentation has not gone unnoticed by the community. He has taken on the job of main documentation maintainer, and does a lot of crucial organization and clean-up of the docs for each release.

Contribution (Maintainer): Thomas Monjalon
Thomas has been DPDK’s primary maintainer since the open source project was established in 2013 and works in the background to keep the projects’ CI/CD infrastructure moving smoothly.

Contribution (Reviews):  Ferruh Yigit
Ferruh is known throughout the DPDK community for his deep review work, which is consistently efficient and beyond helpful.

Contribution (Testing): Intel Validation Team
Thank you to the Intel Validation team for testing each major DPDK release, and for open sourcing and maintaining the DPDK Test Suite (DTS).