In today’s rapidly evolving digital landscape, the demand for high-speed, reliable, and scalable network solutions is greater than ever. Enterprises are constantly seeking ways to optimize their network performance to handle increasingly complex workloads. The integration of the Data Plane Development Kit (DPDK) with Microsoft Azure’s Network Adapter (MANA) is a groundbreaking development in this domain.
Building on our recent user story, “Unleashing Network Performance with Microsoft Azure MANA and DPDK,” this blog post delves deeper into how this integration is revolutionizing network performance for virtual machines on Azure. DPDK’s high-performance packet processing capabilities, combined with MANA’s advanced hardware offloading and acceleration features, enable users to achieve unprecedented levels of throughput and reliability.
In this technical Q&A, Brian Denton Senior Program Manager, at Microsoft Azure Core further illuminates the technical intricacies of DPDK and MANA, including the specific optimizations implemented to ensure seamless compatibility and high performance. He also elaborates on the tools and processes provided by Microsoft to help developers leverage this powerful integration, simplifying the deployment of network functions virtualization (NFV) and other network-centric applications.
1. How does Microsoft’s MANA integrate with DPDK to enhance the packet processing capabilities of virtual machines on Azure, and what specific optimizations are implemented to ensure compatibility and high performance?
[Brian]: MANA is a critical part of our hardware offloading and acceleration effort. The end goal is to maximize workloads in hardware and minimize the host resources needed to service virtual machines. Network Virtual Appliance (NVA) partner products and large customers leverage DPDK to achieve the highest possible network performance in Azure. We are working closely with these partners and customers to ensure their products and services take advantage of DPDK on our new hardware platforms.
2. In what ways does the integration of DPDK with Microsoft’s Azure services improve the scalability and efficiency of network-intensive applications, and what are the measurable impacts on latency and throughput?
[Brian]: Network Virtual Appliances are choke points in customers networks and are often chained together to protect, deliver, and scale applications. Every application in the network path adds processing and latency between the endpoints communicating. Therefore, NVA products are heavily focused on speeds and feeds and designed to be as close to wire-speed as possible. DPDK is the primary tool used by firewalls, WAF, routers, Application Delivery Controllers (ADC), and other networking applications to reduce the impact of their products on network latency. In a virtualized environment, this becomes even more critical.
3. What tools and processes has Microsoft provided for developers to leverage DPDK within the Azure ecosystem, and how does this integration simplify the deployment of network functions virtualization (NFV) and other network-centric applications?
[Brian]: We provide documentation on running testpmd in Azure: https://aka.ms/manadpdk. Most NVA products are on older LTS Linux kernels and require backporting kernel drivers, so having a working starting point is crucial for integrating DPDK application with new Azure hardware.
4. How does DPDK integrate with the MANA hardware and software, especially considering the need for stable forward-compatible device drivers in Windows and Linux?
[Brian]: The push for hardware acceleration in a virtualized environment comes with the drawback that I/O devices are exposed to the virtual machine guests through SR-IOV. Introducing the next generation of network card often requires the adoption of new network drivers in the guest. For DPDK, this depends on the Linux kernel which may not have drivers available for new hardware, especially in older long-term support versions of Linux distros. Our goal with the MANA driver is to have a common, long-lived driver interface that will be compatible with future networking hardware in Azure. This means that DPDK applications will be forward-compatible and long-lived in Azure.
5. What steps were taken to ensure DPDK’s compatibility with both Mellanox and MANA NICs in Azure environments?
[Brian]: We introduced SR-IOV through Accelerated Networking early 2018 with the Mellanox ConnectX-3 card. Since then, we’ve added ConnectX-4 Lx, ConnectX-5, and now the Microsoft Azure Network Adapter (MANA). All these network cards still exist in the Azure fleet, and we will continue to support DPDK products leveraging Azure hardware. The introduction of new hardware does not impact the functionality of prior generations of hardware, so it’s a matter of ensuring new hardware and drivers are supported and tested prior to release.
6. How does DPDK contribute to the optimization of TCP/IP performance and VM network throughput in Azure?
[Brian]: See answer to #2. DPDK is necessary to maximize network performance for applications in Azure, especially for latency sensitive applications and heavy network processing.
7. How does DPDK interact with different operating systems supported by Azure MANA, particularly with the requirement of updating kernels in Linux distros for RDMA/InfiniBand support?
[Brian]: DPDK applications require a combination of supported kernel and user space drivers including both Ethernet and RDMA/InfiniBand. Therefore, the underlying Linux kernel must include MANA drivers to support DPDK. The latest versions of Red Hat and Ubuntu support both the Ethernet and InfiniBand Linux kernel drivers required for DPDK.
8. Can you provide some examples or case studies of real-world deployments where DPDK has been used effectively with Azure MANA?
[Brian]: DPDK applications in Azure are primarily firewall, network security, routing, and ADC products provided by our third-party Network Virtual Appliance (NVA) partners through the Marketplace. With our most recent Azure Boost preview running on MANA, we’ve seen additional interest by some of our large customers in integrating DPDK into their own proprietary services.
9. How do users typically manage the balance between using the hypervisor’s virtual switch and DPDK for network connectivity in scenarios where the operating system doesn’t support MANA?
[Brian]: In the case where the guest does not have the appropriate network drivers for the VF, the netvsc driver will automatically forward traffic to the software vmbus. The DPDK application developer needs to ensure that they support the netvsc PMD to make this work.
10.What future enhancements or features are being considered for DPDK in the context of Azure MANA, especially with ongoing updates and improvements in Azure’s cloud networking technology?
[Brian]: The supported feature list is published in the DPDK documentation: 1. Overview of Networking Drivers — Data Plane Development Kit 24.03.0-rc4 documentation (dpdk.org). We will release with the current set of features and get feedback from partners and customers on demand for any new features.
11. How does Microsoft plan to address the evolving needs of network performance and scalability in Azure with the continued development of DPDK and MANA?
[Brian]: We are focused on hardware acceleration to drive the future performance and scalability in Azure. DPDK is critical for the most demanding networking customers and we will continue to ensure that it’s supported on the next generations of hardware in Azure.
12. How does Microsoft support the community and provide documentation regarding the use of DPDK with Azure MANA, especially for new users or those transitioning from other systems?
[Brian]: Feature documentation is generated out of the codebase and results in the following:
- Overview of Networking Drivers — Data Plane Development Kit 24.03.0-rc4 documentation (dpdk.org)
- 36. MANA poll mode driver library — Data Plane Development Kit 24.03.0-rc4 documentation (dpdk.org)
- 42. Netvsc poll mode driver — Data Plane Development Kit 24.03.0-rc4 documentation (dpdk.org)
Documentation for MANA DPDK, including running testpmd, can be found here: https://aka.ms/manadpdk
13. Are there specific resources or training modules that focus on the effective use of DPDK in Azure MANA environments?
[Brian]: We do not have specific training resources for customers to use DPDK in Azure, but that’s a good idea. Typically, DPDK is used by key partners and large customers that work directly with our development teams.
14. Will MANA provide functionality for starting and stopping queues?
[Brian]: TBD. What’s the use case and have you seen a need for this? Customers will be able to change the number of queues, but I will have to find out whether they can be stopped/started individually.
15. Is live configuration of Receive Side Scaling (RSS) possible with MANA?
[Brian]: Yes. RSS is supported by MANA.
16. Does MANA support jumbo frames?
[Brian]: Jumbo frames and MTU size tuning are available as of DPDK 24.03 and rdma-core v49.1
17. Will Large Receive Offload (LRO) and TCP Segmentation Offload (TSO) be enabled with MANA?
[Brian]: LRO in hardware (also referred to as Receive Segment Coalescing) is not supported (software should work fine)
18. Are there specific flow offloads that MANA will implement? If so, which ones?
[Brian]: MANA does not initially support DPDK flows. We will evaluate the need as customers request it.
19. How is low migration downtime achieved with DPDK?
[Brian]: This is a matter of reducing the amount of downtime during servicing events and supporting hotplugging. Applications will need to implement the netvsc PMD to service traffic while the VF is revoked and fall back to the synthetic vmbus.
20. How will you ensure feature parity with mlx4/mlx5, which support a broader range of features?
[Brian]: Mellanox creates network cards for a broad customer base that includes all the major public cloud platforms as well as retail. Microsoft does not sell the MANA NIC to retail customers and does not have to support features that are not relevant to Azure. One of the primary benefits of MANA is we can keep functionality specific to the needs of Azure and iterate quickly.
21. Is it possible to select which NIC is used in the VM (MANA or mlx), and for how long will mlx support be available?
[Brian]: No, you will never see both MANA and Mellanox NICs on the same VM instance. Additionally, when a VM is allocated (started) it will select a node from a pool of hardware configurations available for that VM size. Depending on the VM size, you could get allocated on ConnectX-3, ConnectX-4 Lx, ConnectX-5, or eventually MANA. VMs will need to support mlx4, mlx5, and mana drivers till hardware is retired from the fleet to ensure they are compatible with Accelerated Networking.
22. Will there be support for Windows and FreeBSD with DPDK for MANA?
[Brian]: There are currently no plans to support DPDK on Windows or FreeBSD. However, there is interest within Microsoft to run DPDK on Windows.
23. What applications are running on the SoC?
[Brian]: The SoC is used for hardware offloading of host agents that were formerly ran in software on the host and hypervisor. This ultimately frees up memory and CPU resources from the host that can be utilized for VMs and reduces impact of neighbor noise, jitter, and blackout times for servicing events.
24. What applications are running on the FPGA?
[Brian]: This is initially restricted to I/O hardware acceleration such as RDMA, the MANA NIC, as well as host-side security features.
Read the full user story ‘Unleashing Network Performance with Microsoft Azure MANA and DPDK’