DPDK Summit North America presentations are online!
Skip to main content
Category

Blog

DPDK blog posts

Explore the latest Innovations and Insights from this year’s DPDK North America Summit

By Blog

This year’s DPDK North America Summit highlighted the projects ongoing technical excellence and innovation in high-performance networking. The event gathered experts and enthusiasts from around the globe, including project pioneers and new community contributors. They engaged in discussions and demonstrations focused on recent code developments, the technical board’s plans, and notably, exciting use cases and applications.

Over the past 14 years, DPDK has methodically developed an open stack that meets a broad variety of user requirements. The project’s open development approach and adaptability to community needs have been invaluable. They showcase the project’s commitment to open source principles. However, these practices have also led to a more expansive and less streamlined code base. Nevertheless, the technical board skillfully manages the necessary compromises for both core developers and end users, highlighting some exciting developments at the summit.

The review of new technology integrations and organic implementation alongside the community’s evolution has been impressive. The project’s development, impact, and extensive application across a global infrastructure have been significant. This includes not just data centers, enterprise cloud services, and network equipment, but also transportation networks, telecom systems, financial trading platforms, industrial control systems, and even particle processors, and astronomical data processing!

The project has reached a pivotal stage of maturity, with use cases and applications expanding dynamically. This evolution presents an opportune moment to explore and showcase real-world applications, extending far beyond its conventional roles in routers and firewalls.

One highlight of the summit was the presentation by Robin Jarry and David Marchand from Red Hat, who introduced “Grout,” a graph router based on DPDK. This tool is designed to simulate network functions and physical routers to replicate the behavior of typically closed-source VNFs and CNFs using an open source tool. They provided a detailed explanation of the rte_graph library’s role in data path processing and showcased Grout’s capabilities.

Another notable session was led by Dr. John Romein from the Netherlands Institute for Radio Astronomy (ASTRON). He discussed how advanced GPU technologies, specifically the NVIDIA Grace Hopper Superchip, are being utilized to process vast amounts of data from radio telescopes. This session not only emphasized the integration of DPDK with GPU technology but also demonstrated its real-world applications in astronomical data processing, pushing the boundaries of modern hardware capabilities.

Each session, from discussions on the challenges of implementing DPDK on non-cache coherent platforms by Hemant Agrawal and Gagandeep Singh from NXP, to insights into machine learning inference within network processing by Srikanth Yalavarthi from Marvell, highlighted the versatility and robustness of DPDK. These discussions underscored its ability to meet the increasingly complex demands of network performance solutions.

In summary the latest DPDK summit provided a platform for learning and sharing, reinforcing the community’s commitment to driving innovation in network performance through open development and governance, as highlighted by the ongoing initiatives of the project.

Watch all the presentations on the DPDK youtube here

Have a use case you’d like to share and feature on the website? Email marketing@dpdk.org

Highlights from DPDK Summit APAC

By Blog

Welcome & Opening Remarks – Thomas Monjalon, Maintainer, NVIDIA

Thomas Monjalon, a maintainer at NVIDIA, opened the summit in Bangkok by emphasizing the importance of the community in the project. He highlighted the role of contributors like Ben Thomas, who handles marketing, and encouraged attendees to share their stories and get involved

Thomas provided logistical details about and discussed the project’s history, noting its growth since its inception in 2010 and the support from the Linux Foundation. Looking ahead, Monjalon outlined priorities such as better public cloud integration, enhanced security protocols, and contributions to AI. 

He stressed the long-term benefits of contributing to open source, including thorough documentation and community support, and noted that being part of the community can help individuals find new job opportunities.

Introducing UACCE Bus of DPDK – Feng Chengwen, Huawei Technologies Co., Ltd

Feng Chengwen from Huawei Technologies presented on the UACCE (Unified Accelerator Framework) integrated into DPDK. UACCE was designed to simplify usage and enhance performance and security for user space I/O and DMA operations without system involvement. It was upstreamed in version 5.7 of the Linux kernel and the latest DPDK release 24.03, allowing accelerators to access memory regions directly and eliminating address translation. 

Key objectives include high performance, simplified usage, and security, with support for multiprocess memory acceleration and on-demand resource usage. UACCE addresses performance issues such as page faults and NUMA balancing using techniques like CPU pre-access and memory binding. 

It is used in both host systems and virtual machines, though some features for virtual machines are still in development. Feng encouraged other developers to adopt UACCE, highlighting its broader application potential, and discussed future enhancements to integrate more devices into DPDK using the UACCE framework.

ZXDH DPU Adapter and It’s Application – Lijie Shan & Wang Junlong, ZTE

The presentation introduces the ZXDH DPU driver, highlighting its features, applications, and product portfolio. The DPU system framework includes modules for high-speed network interfaces, PCI connectivity, and advanced packet processing, supporting RDMA, NVMe protocols, and multiple accelerators for security and storage. 

It enhances network and storage performance by offloading tasks from the host CPU, supporting virtualization, AI, and edge computing, with capabilities like TLS encryption. An example of offloading security group functions to the DPU demonstrates reduced CPU load and increased processing efficiency. 

The product portfolio supports up to 5 million IOPS and 100 million packets per second, with ongoing development to improve TCP protocol handling and storage acceleration.

Libtpa Introduction – Yuanhan Liu, Bytedance

Yuanhan Liu from ByteDance introduces Libtpa, a user-space TCP/IP stack developed from scratch. The presentation discusses its background, design, testing, and performance. 

Traditional kernel-based TCP/IP stacks have inefficiencies and overhead, and existing user-space TCP stacks face problems like breaking the kernel stack, limited zero-copy support, and inadequate testing and debugging tools. 

Libtpa addresses these issues by allowing coexistence with the kernel stack, supporting multiple user-space instances, optimizing performance with zero-copy, and providing extensive testing and debugging capabilities. 

Its architecture supports high throughput and low latency, achieving significant performance improvements. Libtpa includes over 200 unit tests and advanced debugging tools to ensure stability and ease of troubleshooting in production environments.

Telecom Packet Processing and Correlation Engine Using DPDK – Ilan Raman, Aviz Networks

Ilan Raman and his colleagues from Aviz Networks developed 5G packet processing applications using DPDK to manage complex 4G and 5G traffic on commodity hardware efficiently. Aviz Networks, founded in 2019 and operating in the USA, India, and Japan, focuses on providing open-source solutions for telecom operators. 

They address challenges in monitoring evolving mobile technologies, scaling solutions horizontally, and reducing TCO through software-driven approaches. A primary use case is 5G correlation, which enhances network performance monitoring by correlating control and user traffic. Deployment involves DPDK-based applications on commodity hardware, processing high-bandwidth traffic, and extracting valuable metadata for insights and capacity planning. 

The architecture uses a run-to-completion model, distributing functions across dedicated cores to handle various traffic types, with scalability achieved through RSS functionality in NICs. Practical learnings include configuring RSS for different packet types, ensuring symmetric load balancing, using per-core hash tables, isolating DPDK cores from the Linux kernel, and performing deep packet parsing in software. 

Aviz Networks leveraged DPDK’s packet manipulation libraries for handling custom headers and achieved better performance through memory optimizations and CPU isolation.

Unified Representor with Large Scale Ports – Suanming Mou, NVIDIA Semiconductor

The presentation by Suanming Mou from NVIDIA focused on optimizing the unified representer in large-scale ports within DBK switch models. Initially, the high memory and CPU usage due to the need to poll all represent ports when packets missed hardware flow rules was a significant challenge. 

The optimization approach involved setting “represent matching” to zero, directing all packets to a single uplink represent port, and copying the source port ID to packet metadata for identification in the hypervisor. This change reduced the need for extensive memory allocation and CPU polling as traffic was handled through a single proxy port. 

The implementation of new flow rules for this setup resulted in substantial memory savings, decreasing from over 800 MB to around 332 MB, and improved packets per second (PPS) performance, increasing from 20 Mega PPS to 27.5 Mega PPS due to optimized polling and reduced cache misses. Overall, the optimization streamlined the polling process and significantly enhanced resource efficiency and performance in managing large-scale port traffic.

Troubleshooting Low Latency Application on CNF Deployment – Vipin Varghese & Sivaprasad Tummala, AMD

The presentation addresses the challenges encountered when transitioning applications from bare metal to container environments, emphasizing issues like reduced throughput, increased packet processing time, fluctuating latencies, and unpredictable performance with multiple container instances. 

Root causes of these problems include limited access to hardware resources, library and compiler version mismatches, lack of specific patches, and performance variations based on hardware architecture and deployment models. Through several case studies, Vipin and Sivaprasad underscore the importance of profiling applications on bare metal before deployment, using tools like flame graphs and perf, and understanding hardware details such as cache domains and PCI bus partitioning for optimization. 

They call for enhanced telemetry and observability in DPDK for containerized environments, noting that current tools and documentation are inadequate for complex troubleshooting. Recommendations include extending DPDK’s telemetry infrastructure, utilizing eBPF hooks for improved runtime data collection, and ensuring consistent performance through better documentation, custom plugins for CPU isolation, and awareness of hardware-specific optimizations.

Suggestions to Enhance DPDK to Enable Migration of User Space Networking Applications to DPDK – Vivek Gupta, Benison Technologies Pvt Ltd

The presentation by Vivek Gupta delves into enhancing DPDK to facilitate the migration of various user space networking applications, pinpointing a critical issue: advancements in CPU, IO, and memory technologies are not benefiting these applications. Despite significant improvements in infrastructure, user space networking applications often fail to utilize these advancements effectively. This gap highlights the need for a framework that can bring the benefits of these technological improvements to user space frameworks, ensuring better performance and efficiency.

Customers face numerous challenges when attempting to migrate their existing user space applications to DPDK or VPP environments without rewriting them. These applications, which traditionally rely on Linux kernel methods, encounter significant hurdles during migration. The proposed solution is to create a unified framework that integrates various technologies, such as EF VI and VPP, to enhance the performance of these applications. This framework would support different levels of packet processing, from L2 to L4, and provide essential mechanisms for encryption, decryption, deep packet inspection, and proxy functions.

To meet customer needs, the framework should enable applications to capture and inject packets at various levels, from the interface to higher layers. It should support state management and route updates from control applications, ensuring that applications always operate with the most current data. Additionally, the framework must offer accelerators for cryptographic and AI/ML-based processing to handle the complex requirements of modern applications. By addressing issues related to threading, caching, and reducing contention, the framework aims to significantly improve the performance of user space applications.

Practical examples underscore the potential benefits of this framework. For instance, enhancing web servers, proxy servers, and video streaming applications using the proposed framework could lead to substantial performance gains. By tackling issues such as blocking operations and optimizing thread management, applications can achieve higher throughput and better resource utilization. The framework should also cater to the needs of high-speed applications and support flexible application architectures, enabling user space applications to become more efficient and faster.

In conclusion, the proposed enhancements to DPDK aim to bridge the gap between advancements in infrastructure and the performance of user space networking applications. By providing a comprehensive framework that supports various processing levels, state management, and cryptographic acceleration, the solution promises to improve application performance, reduce contention, and enhance resource utilization. This approach will help customers migrate their applications more effectively and realize the full benefits of technological advancements in CPU, IO, and memory technologies.

Welcome Back – Prasun Kapoor, Associate Vice President, Marvell

The Asia Pacific (APAC) region, particularly India and China, has established a strong and dynamic community around the Data Plane Development Kit (DPDK). Recognizing this, the decision was made to hold the DPDK APAC Summit in Thailand, a geopolitically neutral location that facilitates easy participation from various APAC countries without visa complications.

The DPDK project is witnessing robust growth in multiple areas, including technical contributions, marketing outreach, and the number of active contributors. This growth is further bolstered by increasing interest from new prospective corporate members, indicating a healthy and expanding ecosystem.

Significant updates have been made to the University of New Hampshire (UNH) lab, which has recently incorporated the Marvell CN10K Data Processing Unit (DPU) into its testing suite. The lab now reports Data Test Suite (DTS) results for a variety of tests, and has established a community dashboard for code coverage, releasing monthly reports. Additionally, the lab has been proactive in submitting patches and bug fixes and is running compilation tests for Open vSwitch (OVS) with each DPDK patch, with future plans to include performance testing.

Marketing efforts for DPDK have seen a considerable boost, with increased engagement on platforms like LinkedIn and a notable rise in YouTube views, which is seen as a leading indicator of the project’s growing interest. The steady increase in DPDK downloads further underscores the project’s rising popularity.

Enhancements to the DPDK documentation have also been a focal point, with updates to the Poll Mode Driver (PMD) guidelines, security protocol documentation, and multiple sections of the programmer’s guide and contributor guidelines. Financially, the DPDK project is in a strong position with a healthy budget and substantial reserves. This financial stability ensures that key activities such as summits, community labs, and marketing efforts are well-supported for the foreseeable future.

Coupling Eventdev Usage with Traffic Metering & Policing (QoS) – Sachin Saxena & Apeksha Gupta, NXP

Sachin Saxena and Apeksha Gupta from NXP presented on integrating Eventdev with Traffic Metering and Policing to enhance Quality of Service (QoS). They discussed the various requirements from customers and the comprehensive solution they developed to meet these demands. Their goal was to share their extensive work and experiences with the community, offering insights into how similar challenges can be addressed effectively.

They highlighted different customer requirements, such as the need for traffic classification and scheduling in hardware, reducing CPU cycle usage, and implementing custom schedulers. By leveraging the DPDK framework, they managed to consolidate these varied needs into a generic solution. This approach not only met the specific requirements but also provided a reference for others in the community who might face similar challenges.

The technical approach of their solution involved utilizing DPDK’s metering, policing, and Eventdev frameworks. They explained how these components interact to meet the specified use cases, enhancing overall efficiency and performance. By breaking down complex use cases into manageable components and mapping these to corresponding RT library elements, they ensured a robust end-to-end functionality.

In their implementation details, they described the method of segmenting use cases into multiple components and aligning these with the appropriate RT library components. This strategy ensured that each part of the system worked seamlessly together, achieving the desired outcomes effectively and efficiently.

To illustrate their points, they shared practical use cases, including the management of scheduling priorities, grouping multiple ports, and applying markers and policers at the priority group level. These examples demonstrated how to optimize CPU cycles and prevent data loss, showcasing the practical applications of their solution in real-world scenarios.

GRO Library Enhancements – Kumara Parameshwaran Rathinavel, Microsoft

Kumara Parameshwaran Rathinavel from Microsoft has been working on enhancing the Generic Receive Offload (GRO) library. GRO is a widely used software technique that optimizes packet processing by merging multiple TCP segments into a single large segment. Kumara has been contributing to this project since his time at VMware and continues to do so at Microsoft. His work aims to improve the efficiency and performance of GRO, particularly in the context of network traffic handling.

The current implementation of GRO, which involves iteratively checking a table for flow matches, has been identified as suboptimal for handling packets received in multiple bursts. This method can lead to inefficiencies, especially as the timeout intervals increase. Kumara highlighted that the existing approach struggles with scalability and performance under these conditions, necessitating a more efficient solution.

To address these limitations, Kumara proposed a hash-based method for flow matching. This new approach significantly enhances the efficiency of the GRO process. In tests, the hash-based method demonstrated substantial performance improvements, reducing the CPU utilization of the GRO reassemble function. This method not only optimizes the flow matching process but also ensures better handling of packet bursts, leading to overall improved performance.

Recognizing the varying latency requirements of different applications, Kumara suggested implementing tuple-specific timeouts within the GRO framework. This approach allows for more flexible and optimized GRO settings tailored to the specific needs of various applications. For instance, applications with low latency requirements, such as banking transactions, can have shorter timeouts, while those with less stringent latency needs can benefit from longer timeouts. This customization ensures that all applications can operate efficiently without compromising on performance.

To validate these enhancements, Kumara used a setup involving a virtual machine as a test proxy, demonstrating notable performance gains. The improvements in GRO are particularly beneficial for network applications like load balancers, where reducing CPU utilization and improving packet processing efficiency are critical. Kumara’s work on GRO library enhancements showcases significant advancements in optimizing network traffic handling, contributing to more efficient and scalable network performance.

Refactor Power Library for Vendor Agnostic Uncore APIs – Sivaprasad Tummala & Vipin Varghese, AMD

The presentation focuses on the critical need for improved power management and efficiency for Telco operators, emphasizing the importance of vendor-agnostic solutions for scalability across different platforms. This is particularly relevant as power has become a significant concern, with the need to optimize performance per watt and manage power effectively.

AMD’s power library within the DPDK (Data Plane Development Kit) aims to address these concerns by balancing power consumption and performance. The library optimizes core and uncore frequency management and introduces adaptive algorithms for real-time monitoring and idle state management. This ensures that while cores are busy polling, they consume power efficiently without compromising on performance.

Currently, the power library is tightly coupled with Linux and requires specific modifications to accommodate new drivers, leading to inefficiencies and increased code size. Each new driver introduction necessitates changes to the core library, increasing the complexity and effort required for maintenance and updates. This approach is not scalable as the number of drivers and their capabilities grow.

To address these challenges, the refactoring efforts aim to modularize the power library, enabling plug-and-play capabilities for new drivers and reducing dependencies. This modular approach will simplify the addition of new drivers, improve performance, and enhance scalability by minimizing the library’s footprint and code complexity.

The proposed enhancements include vendor-agnostic uncore APIs to manage interconnect bus frequencies and dynamic link width management. These APIs promote a standardized interface for power management across different hardware vendors, making it easier for applications to develop power management solutions without being tied to specific vendors. This approach not only reduces complexity but also ensures compatibility and scalability across various platforms.

Q&A with the Governing Board & Technical Board – Wang Yong, Thomas Monjalon, Jerin Jacob

The Technical Board (TBoard) and Governing Board (GBoard) of the project play distinct but complementary roles in steering the community. The TBoard consists of 11 members who meet bi-weekly to discuss and resolve technical issues. These meetings, conducted via Zoom, involve all community members and focus on consensus-driven decision-making. When consensus cannot be reached, the TBoard votes on issues, requiring prior email submissions for agenda inclusion. This structured approach ensures thorough consideration and discussion before decisions are made.

The GBoard, on the other hand, sets the project’s broad direction, encompassing administrative tasks, marketing strategies, and budgeting. This board meets monthly and includes a permanent chairperson along with representatives from the Linux Foundation. The GBoard comprises 12 members: 10 from golden member companies, one from a silver member, and one from the TBoard. Every six weeks, the GBoard convenes, and every three months, they hold joint meetings with the TBoard to align on financial plans, marketing efforts, and major project decisions.

Membership in the GBoard is tiered, with gold members contributing $50,000 annually and silver members contributing $10,000 annually. These funds are crucial for project initiatives, such as summits and acquiring new servers for the lab. Gold members play a significant role in decision-making due to their financial contributions, ensuring their interests and investments are aligned with project goals.

Community involvement is a cornerstone of both boards’ operations. TBoard meetings are open to all, fostering transparency and inclusivity in technical discussions. Issues are raised via email, ensuring that all voices can be heard. The GBoard, while more focused on strategic direction, includes representatives from various companies to bring diverse perspectives to the table. This collaborative approach allows for comprehensive planning and execution of project initiatives.

Currently, the boards are prioritizing several key areas: enhancing security protocols and documentation, improving continuous integration (CI) performance testing, and integrating more functional testing in the Data Plane Development Kit (DTS). Future plans include creating a performance dashboard and requiring contributors to add tests for new features. These efforts aim to maintain high standards of performance and security, ensuring the project’s robustness and reliability for all users.

Rte_flow Match with Comparison Result – Suanming Mou, NVIDIA Semiconductor

The presentation introduces a new feature for rte_flow, which focuses on comparison operations to enhance the flexibility of flow rules. This feature allows for comparisons between fields or between a field and an immediate value, providing more dynamic and versatile rule configurations. The presenter assumes familiarity with rte_flow from previous sessions and emphasizes the advantages of this new capability.

Traditional rte_flow rules are limited to matching immediate values, which can be restrictive in certain scenarios. For instance, in TCP connection tracking, the termination of connections often goes unnoticed by software if the reset packet is handled by hardware directly. Similarly, for packet payload evaluation, users may want to skip cryptographic operations on packets without payloads. These examples highlight the need for more advanced comparison methods in flow rules.

The new feature supports a range of comparison operations, including greater than, less than, and equal comparisons. It has been initially implemented in ConnectX-7 and BlueField-3 NICs, specifically within the template API. However, there are limitations, such as the inability to mix comparison items with other items and restricted field support. The feature is designed to be flexible but currently has hardware constraints that limit its full potential.

Users can configure these comparison rules using a new `item compare` structure in the API. This involves specifying the fields to compare, the immediate values, and the desired operations, such as equal, not equal, greater than, and so forth. The configuration also supports specific bit-width comparisons, providing detailed control over how comparisons are executed. This structure aims to offer a robust framework for implementing dynamic and complex flow rules.

Several examples demonstrate the use of the new comparison item in flow rules, illustrating its practical application. Despite its benefits, the feature currently supports only single comparison rules within flow tables and a limited range of fields. The requirement for both spec and mask in the configuration is due to the template API structure, which mandates these elements even if they might not be necessary for all comparisons. Suanming Mou concludes by encouraging other developers to integrate support for this feature in their PMDs, recognizing its potential to significantly enhance rte_flow’s capabilities.

DPDK PMD Live Upgrade – Rongwei Liu, Nvidia

The DPDK PMD live upgrade process is designed to meet the critical need for upgrading or downgrading PMD versions seamlessly without disrupting ongoing services. This process ensures the transfer of user configurations while minimizing downtime to nearly zero, making it essential for applications requiring continuous operation.

Two primary approaches are detailed for conducting these upgrades. The first approach involves a graceful exit of the old PMD followed by the restart of the new PMD, during which there is a brief period of service unavailability. The second approach utilizes a standby mode, where the new PMD is prepared with the necessary configurations but remains inactive until the old PMD exits. This method ensures that there is no service disruption as the traffic seamlessly switches to the new PMD once the old one exits.

To facilitate this process, two modes are introduced: active and standby. In the active mode, the PMD manages traffic and hardware configuration directly. In standby mode, configurations are set up but do not affect traffic immediately. Instead, they become active only when the old PMD gracefully exits, ensuring that the traffic handling transitions smoothly without any interruption.

A crucial aspect of the upgrade process is the use of group zero as a dispatcher for traffic processing rules. This mechanism ensures that all configurations are synchronized and become effective immediately, eliminating any downtime or disruption in traffic flow. By inserting and managing these rules efficiently, the system can transition from the old PMD to the new one seamlessly.

Finally, the process is designed to be highly scalable, allowing for adaptable resource usage to accommodate various deployment scales. It also emphasizes the importance of a user-friendly API, ensuring that users can access and utilize the upgrade features quickly and easily, thus enhancing the overall efficiency and effectiveness of the live upgrade process.

Monitoring 400G Traffic in DPDK Using FPGA-Based SmartNIC with RTE Flow – David Vodák, Cesnet

David Vodák from Cesnet presented their journey to enable 400G traffic monitoring using DPDK and FPGA-based SmartNICs, a project initiated due to the lack of suitable FPGA cards in the market. Cesnet, a national research and educational network, designed the FPGA SmartNIC which utilizes the Intel HX I7 chip with 400G Ethernet support and PCIe gen 4/5 compatibility. This card is engineered for high-speed processing, making it ideal for their needs.

The cornerstone of their solution is the NDK platform, an open-source framework that supports up to 400G throughput. NDK facilitates parallel processing, filtering, and metadata export, which are crucial for handling high-speed network traffic. It is designed to be highly adaptable, allowing users to create new components or use existing ones to build custom firmware for various applications.

NDK’s versatility extends beyond monitoring; it is also used for high-frequency trading and CL testing. One of the open source tools developed by Cesnet, the IPFIXPROBE, supports DPDK and is employed to create detailed traffic flows from input packets. This probe exemplifies the practical applications of NDK in real-world scenarios, demonstrating its robustness and flexibility.

To ensure the reliability of their solutions, Cesnet employs rigorous testing and verification methods. Functional testing is conducted using tools like testpmd or custom DPDK applications in loopback setups. For benchmarking, they utilize external traffic generators such as Spent and Flow Test, with the latter capable of simulating realistic network traffic to provide more accurate testing results.

Looking ahead, Cesnet plans to expand the capabilities of the NDK platform to support various cards and use cases beyond traffic monitoring. Their commitment to open source development is evident, as they provide extensive resources on GitHub for the community to collaborate and innovate further. This open source approach not only fosters community involvement but also drives continuous improvement and adaptation of their technology.

Lessons Learnt from Reusing QDMA NIC to Base Band PMD – Vipin Varghese & Sivaprasad Tummala, AMD

AMD undertook a project to repurpose its QDMA NIC for Forward Error Correction (FEC) offloading in virtual RAN environments using an FPGA-based prototype. The goal was to support LDPC encode/decode functionalities without developing the BBDEV PMD from scratch. Instead, the team adapted existing QDMA NIC code, incorporating necessary modifications to create a functional BB PMD. This approach allowed for rapid prototyping and integration within a short time frame.

Throughout the project, several challenges arose, including mismatched use cases, software latencies, and inadequate thread handling. To address these, the team implemented solutions such as using selective builds and applying compiler pragmas. These strategies helped optimize the RX/TX burst functionalities and reduce instruction cache misses, which in turn minimized overall latency.

Significant efforts were made to reduce latencies, which were initially high due to multiple factors including software and PMD-related overheads. By minimizing instruction cache misses and optimizing critical functions to fit within smaller memory pages, the team achieved notable latency reductions. Further improvements were realized by implementing lockless mechanisms using RT rings, ensuring efficient NQ/DQ operations.

To meet the specific requirements of the customer for low-latency and high-throughput, the team had to go beyond simple test scenarios and adapt the software implementations to better match real-world use cases. This involved modifying the BBDEV PMD to handle multiple threads and ensuring proper mapping and distribution of LDPC encode and decode requests, which significantly improved performance and reliability.

The project highlighted several important lessons. It underscored the value of reusing existing PMDs when feasible, as well as the need to reduce code bloating and align PMD examples with actual customer use cases. The team recommended updates for the DPDK community to focus more on low latency and stress testing, and to improve lockless implementations. These insights and improvements contributed to a more robust and efficient solution, ultimately enhancing the overall performance of the system.

Closing Remarks – Nathan Southern, Sr. Projects Coordinator, The Linux Foundation

The DPDK conference in Bangkok marked a significant milestone as the first APAC event since COVID-19. With a total of 63 participants, including 30 in person and 33 online, the event exceeded expectations. This conference, considered an experiment in a geopolitically neutral location, was deemed successful and has paved the way for potential future APAC events in various locations.

DPDK, a project now 14 years old, has shown remarkable growth and resilience, countering previous perceptions of being in its sunset phase. Since Nathan joined the Linux Foundation in April 2022, the project has maintained nearly perfect member retention and continued technological advancement. This longevity and sustained momentum underscore the project’s vitality and relevance in the tech community.

Strategic efforts in marketing and documentation have significantly enhanced the project’s visibility and usability. Under the direction of Ben Thomas, marketing initiatives have been robust, and tech writer Nini Purad has overhauled the project’s documentation. These efforts aim to foster community growth and engagement, ensuring that DPDK remains a valuable resource for its users.

The DPDK project is evolving in critical areas such as security, cloud hyperscaling, and AI. This evolution is driven by community input and the guidance of the tech board, including leaders like Thomas Monjalon. Continuous community involvement is essential for future advancements, highlighting the importance of active participation from all stakeholders.

Nathan emphasized the importance of community engagement in driving DPDK’s development forward. He encouraged attendees to participate through Slack, tech board calls, and contributions to the OSS code. Additionally, the project is actively creating dynamic content, including end-user stories and developer spotlights, to promote mutual growth and expand the membership base. This focus on community and content creation is key to sustaining and growing the DPDK project.

Watch all the summit videos here

Microsoft Azure Mana DPDK Q&A

By Blog

In today’s rapidly evolving digital landscape, the demand for high-speed, reliable, and scalable network solutions is greater than ever. Enterprises are constantly seeking ways to optimize their network performance to handle increasingly complex workloads. The integration of the Data Plane Development Kit (DPDK) with Microsoft Azure’s Network Adapter (MANA) is a groundbreaking development in this domain.

Building on our recent user story, “Unleashing Network Performance with Microsoft Azure MANA and DPDK,” this blog post delves deeper into how this integration is revolutionizing network performance for virtual machines on Azure. DPDK’s high-performance packet processing capabilities, combined with MANA’s advanced hardware offloading and acceleration features, enable users to achieve unprecedented levels of throughput and reliability.

In this technical Q&A, Brian Denton Senior Program Manager, at Microsoft Azure Core further illuminates the technical intricacies of DPDK and MANA, including the specific optimizations implemented to ensure seamless compatibility and high performance. He also elaborates on the tools and processes provided by Microsoft to help developers leverage this powerful integration, simplifying the deployment of network functions virtualization (NFV) and other network-centric applications.

1. How does Microsoft’s MANA integrate with DPDK to enhance the packet processing capabilities of virtual machines on Azure, and what specific optimizations are implemented to ensure compatibility and high performance?

[Brian]: MANA is a critical part of our hardware offloading and acceleration effort. The end goal is to maximize workloads in hardware and minimize the host resources needed to service virtual machines. Network Virtual Appliance (NVA) partner products and large customers leverage DPDK to achieve the highest possible network performance in Azure. We are working closely with these partners and customers to ensure their products and services take advantage of DPDK on our new hardware platforms.

2. In what ways does the integration of DPDK with Microsoft’s Azure services improve the scalability and efficiency of network-intensive applications, and what are the measurable impacts on latency and throughput?

[Brian]: Network Virtual Appliances are choke points in customers networks and are often chained together to protect, deliver, and scale applications. Every application in the network path adds processing and latency between the endpoints communicating. Therefore, NVA products are heavily focused on speeds and feeds and designed to be as close to wire-speed as possible. DPDK is the primary tool used by firewalls, WAF, routers, Application Delivery Controllers (ADC), and other networking applications to reduce the impact of their products on network latency. In a virtualized environment, this becomes even more critical.

3. What tools and processes has Microsoft provided for developers to leverage DPDK within the Azure ecosystem, and how does this integration simplify the deployment of network functions virtualization (NFV) and other network-centric applications? 

[Brian]: We provide documentation on running testpmd in Azure: https://aka.ms/manadpdk. Most NVA products are on older LTS Linux kernels and require backporting kernel drivers, so having a working starting point is crucial for integrating DPDK application with new Azure hardware.

4. How does DPDK integrate with the MANA hardware and software, especially considering the need for stable forward-compatible device drivers in Windows and Linux?

[Brian]: The push for hardware acceleration in a virtualized environment comes with the drawback that I/O devices are exposed to the virtual machine guests through SR-IOV. Introducing the next generation of network card often requires the adoption of new network drivers in the guest. For DPDK, this depends on the Linux kernel which may not have drivers available for new hardware, especially in older long-term support versions of Linux distros. Our goal with the MANA driver is to have a common, long-lived driver interface that will be compatible with future networking hardware in Azure. This means that DPDK applications will be forward-compatible and long-lived in Azure.

5. What steps were taken to ensure DPDK’s compatibility with both Mellanox and MANA NICs in Azure environments?

[Brian]: We introduced SR-IOV through Accelerated Networking early 2018 with the Mellanox ConnectX-3 card. Since then, we’ve added ConnectX-4 Lx, ConnectX-5, and now the Microsoft Azure Network Adapter (MANA). All these network cards still exist in the Azure fleet, and we will continue to support DPDK products leveraging Azure hardware. The introduction of new hardware does not impact the functionality of prior generations of hardware, so it’s a matter of ensuring new hardware and drivers are supported and tested prior to release.

6. How does DPDK contribute to the optimization of TCP/IP performance and VM network throughput in Azure?

[Brian]: See answer to #2. DPDK is necessary to maximize network performance for applications in Azure, especially for latency sensitive applications and heavy network processing.

7. How does DPDK interact with different operating systems supported by Azure MANA, particularly with the requirement of updating kernels in Linux distros for RDMA/InfiniBand support?

[Brian]: DPDK applications require a combination of supported kernel and user space drivers including both Ethernet and RDMA/InfiniBand. Therefore, the underlying Linux kernel must include MANA drivers to support DPDK. The latest versions of Red Hat and Ubuntu support both the Ethernet and InfiniBand Linux kernel drivers required for DPDK.

8. Can you provide some examples or case studies of real-world deployments where DPDK has been used effectively with Azure MANA?

[Brian]: DPDK applications in Azure are primarily firewall, network security, routing, and ADC products provided by our third-party Network Virtual Appliance (NVA) partners through the Marketplace.  With our most recent Azure Boost preview running on MANA, we’ve seen additional interest by some of our large customers in integrating DPDK into their own proprietary services.

9. How do users typically manage the balance between using the hypervisor’s virtual switch and DPDK for network connectivity in scenarios where the operating system doesn’t support MANA?

[Brian]: In the case where the guest does not have the appropriate network drivers for the VF, the netvsc driver will automatically forward traffic to the software vmbus. The DPDK application developer needs to ensure that they support the netvsc PMD to make this work.

10.What future enhancements or features are being considered for DPDK in the context of Azure MANA, especially with ongoing updates and improvements in Azure’s cloud networking technology?

[Brian]: The supported feature list is published in the DPDK documentation: 1. Overview of Networking Drivers — Data Plane Development Kit 24.03.0-rc4 documentation (dpdk.org). We will release with the current set of features and get feedback from partners and customers on demand for any new features.

11. How does Microsoft plan to address the evolving needs of network performance and scalability in Azure with the continued development of DPDK and MANA?

[Brian]: We are focused on hardware acceleration to drive the future performance and scalability in Azure. DPDK is critical for the most demanding networking customers and we will continue to ensure that it’s supported on the next generations of hardware in Azure.

12. How does Microsoft support the community and provide documentation regarding the use of DPDK with Azure MANA, especially for new users or those transitioning from other systems?

[Brian]: Feature documentation is generated out of the codebase and results in the following:

Documentation for MANA DPDK, including running testpmd, can be found here: https://aka.ms/manadpdk

13. Are there specific resources or training modules that focus on the effective use of DPDK in Azure MANA environments?

[Brian]: We do not have specific training resources for customers to use DPDK in Azure, but that’s a good idea. Typically, DPDK is used by key partners and large customers that work directly with our development teams.

14. Will MANA provide functionality for starting and stopping queues?

[Brian]: TBD. What’s the use case and have you seen a need for this? Customers will be able to change the number of queues, but I will have to find out whether they can be stopped/started individually.

15. Is live configuration of Receive Side Scaling (RSS) possible with MANA?

[Brian]: Yes. RSS is supported by MANA.

16. Does MANA support jumbo frames?

[Brian]: Jumbo frames and MTU size tuning are available as of DPDK 24.03 and rdma-core v49.1

17. Will Large Receive Offload (LRO) and TCP Segmentation Offload (TSO) be enabled with MANA?

[Brian]: LRO in hardware (also referred to as Receive Segment Coalescing) is not supported (software should work fine)

18. Are there specific flow offloads that MANA will implement? If so, which ones?

[Brian]: MANA does not initially support DPDK flows. We will evaluate the need as customers request it.

19. How is low migration downtime achieved with DPDK?

[Brian]: This is a matter of reducing the amount of downtime during servicing events and supporting hotplugging. Applications will need to implement the netvsc PMD to service traffic while the VF is revoked and fall back to the synthetic vmbus.

20. How will you ensure feature parity with mlx4/mlx5, which support a broader range of features?

[Brian]: Mellanox creates network cards for a broad customer base that includes all the major public cloud platforms as well as retail.  Microsoft does not sell the MANA NIC to retail customers and does not have to support features that are not relevant to Azure. One of the primary benefits of MANA is we can keep functionality specific to the needs of Azure and iterate quickly.

21. Is it possible to select which NIC is used in the VM (MANA or mlx), and for how long will mlx support be available?

[Brian]: No, you will never see both MANA and Mellanox NICs on the same VM instance. Additionally, when a VM is allocated (started) it will select a node from a pool of hardware configurations available for that VM size. Depending on the VM size, you could get allocated on ConnectX-3, ConnectX-4 Lx, ConnectX-5, or eventually MANA. VMs will need to support mlx4, mlx5, and mana drivers till hardware is retired from the fleet to ensure they are compatible with Accelerated Networking.

22. Will there be support for Windows and FreeBSD with DPDK for MANA?

[Brian]: There are currently no plans to support DPDK on Windows or FreeBSD. However, there is interest within Microsoft to run DPDK on Windows.

23. What applications are running on the SoC?

[Brian]: The SoC is used for hardware offloading of host agents that were formerly ran in software on the host and hypervisor. This ultimately frees up memory and CPU resources from the host that can be utilized for VMs and reduces impact of neighbor noise, jitter, and blackout times for servicing events.  

24. What applications are running on the FPGA?

[Brian]: This is initially restricted to I/O hardware acceleration such as RDMA, the MANA NIC, as well as host-side security features.

Read the full user story ‘Unleashing Network Performance with Microsoft Azure MANA and DPDK’

Cache Awareness in DPDK Mempool

By Blog

Author: Kamalakshitha Aligeri – Senior Software Engineer at Arm

The objective of DPDK is to accelerate packet processing by transferring the packets from the NIC  to the application directly, bypassing the kernel. The performance of DPDK relies on various factors such as memory access latency, I/O throughput, CPU performance, etc.

Efficient packet processing relies on ensuring that packets are readily accessible in the hardware  cache. Additionally, since the memory access latency of the cache is small, the packet processing  performance increases if more packets can fit into the hardware cache. Therefore, it is important  to know how the packet buffers are allocated in hardware cache and how it can be utilized to get  the maximum performance. 

With the default buffer size in DPDK, hardware cache is utilized to its full capacity, but it is not  clear if this is being done intentionally. Therefore, this blog helps in understanding how the  buffer size can have an impact on the performance and things to remember when changing the  default buffer size in DPDK in future. 

In this blog, I will describe, 

1. Problem with contiguous buffers 

2. Allocation of buffers with cache awareness 

3. Cache awareness in DPDK mempool 

4. l3fwd performance results with and without cache awareness 

Problem with contiguous buffers 

The mempool in DPDK is created from a large chunk of contiguous memory. The packets from  the network are stored in packet buffers of fixed size (objects in mempool). The problem with  contiguous buffers is when the CPU accesses only a portion of the buffer, such as in cases like  DPDK’s L3 forwarding application where only metadata and packet headers are accessed. Rest of  the buffer is not brought into the cache. This results in inefficient cache utilization. To gain a better  understanding of this problem, its essential to understand how the buffers are allocated in hardware  cache. 

How are buffers mapped in Hardware Cache? 

Consider a 1KB, 4-way set-associative cache with 64 bytes cache line size. The total number of  cache lines would be 1KB/64B = 16. For a 4-way cache, each set will have 4 cache lines. Therefore, there will be a total of 16/4 = 4 sets. 

As shown in Figure1, each memory address is divided into three parts: tag, set and offset. 

• The offset bits specify the position of a byte within a cache line (Since each cache line is  64 bytes, 6 bits are needed to select a byte in a single cache line). 

• The set bits determine which set the cache line belongs to (2 bits are needed to identify the  set among 4 ways).

• The tag bits uniquely identify the memory block. Once the set is identified with set bits,  the tag bits of the 4 ways in that set is compared against the tag bits of the memory address,  to check if the address is already present in the cache. 

Figure 1 Memory Address 

In Figure 2, each square represents a cache line of 64 bytes. Each row represents a set. Since it’s a  4-way cache, each set contains 4 cache lines in it – C0 to C3. 

Figure 2 Hardware Cache 

Let’s consider a memory area that can used to create a pool of buffers. Each buffer is 128 bytes,  hence occupies 2 cache lines. Assuming the first buffer address starts at 0x0, the addresses of the  buffers are as shown below. 

Figure 3 Contiguous buffers in memory

In the above figure the offset bits are highlighted in orange, set bits in green and tag bits in blue. Consider buffer 1’s address, where set bits “00” means the buffer maps to set0. Assuming initially  all the sets are empty, buffer 1 occupies the first cache line of 2 contiguous sets. 

Since buffer 1 address is 0x0 and the cache line size is 64 bytes, the first 64 bytes of the buffer  occupy the cache line in set0. For the next 64 bytes, the address becomes 0x40 (0b01000000) indicating set1 because the set bits are “01”. As a result, the last 64 bytes of the buffer occupy the  cache line in set1. Thus, the buffer is mapped into cache lines (S0, C0) and (S1, C0). 

Figure 4 Hardware cache with buffer 1 

Similarly, buffer 2 will occupy the first cache line of next two sets (S2, C0) and (S3, C0).

Figure 5 Hardware cache with 2 buffers 

The set bits in buffer 3 address “00” show that the buffer 3 maps to set 0 again. Since the first  cache line of set0 and set1 is occupied, buffer 3 occupies second cache line of set 0 and 1 (S0, C1)  and (S1, C1). 

Figure 6 Hardware cache with 3 buffers 

Similarly buffer 4 occupies the second cache-line of sets 2 and 3 and so on. Each buffer is  represented with a different color and a total of 8 buffers can occupy the hardware cache without  any evictions.  

Figure 7 Allocation of buffers in hardware cache 

Although the buffer size is 128 bytes, CPU might not access all the bytes. For example, for 64 bytes packets, only the first 64 bytes of the buffer are consumed by the CPU (i.e. one cache line  worth of data).  

Since the buffers are two cache lines long, and are contiguous, and only the first 64 bytes of each  buffer is accessed, only sets 0 and sets 2 are populated with data. Sets 1 and 3 go unused (unused  sets are shown with pattern in Figure 8).

Figure 8 Unused sets in hardware cache 

When buffer 9 needs to be cached, it maps to set 0 since set bits are “00”. Considering a LRU  replacement policy, the least recently used cache line of 4 ways (buffer 1, 3, 5 or 7) in set0 will be  evicted to accommodate buffer 9 even though set 1 and set 3 are empty. 

This is highly inefficient, as we are not utilizing the cache capacity to the full.  

Solution – Allocation of buffers with Cache awareness 

In the above example, if the ununsed cache sets can be utilized to allocate the subsequent buffers (buffers 9 – 16), we would utilize the cache in a more efficient manner. 

To accomplish this, the memory addresses of the buffers can be manipulated during the creation  of mempool. This can be achieved by inserting one cache line padding after every 8 buffers,  effectively aligning the buffer addresses in a way that utilizes the cache more efficiently. Let’s take the above example of contiguous buffer addresses and then compare it with same buffers  but with cache line padding. 

Figure 9 Without cache lines padding Figure 10 With cache lines padding

From figure 9 and 10, we can see that the buffer 9 address has changed from 0x400 to 0x440. With 0x440 address, the buffer 9 maps to set1. So, there is no need to evict any cache line from set0 and  we are utilizing the unused cache set 1. 

Similarly, buffer 10 maps to set3 instead of set2 and so on. This way buffer 9 to buffer 16, can  occupy the sets1 and 3 that are unused by buffers1 to 8. 

Figure 11 Hardware cache with cache awareness 

This approach effectively distributes the allocation of buffers to better utilize the hardware cache. Since for 64-byte packets, only the first cache line of each buffer contains useful data, we are  effectively utilizing the hardware cache capacity by accommodating useful packet data from 16  buffers instead of 8. This doubles the cache utilization, enhancing the overall performance of the  system. 

Padding of cache lines is necessary primarily when the cache size is exactly divisible by the buffer  size (which means buffer size is a power of 2). In cases where the buffer size does not divide  evenly into the cache size, part of the buffer is left unmapped. This residual portion effectively  introduces an offset like the one achieved through padding. 

Cache Awareness in DPDK Mempool 

In DPDK mempool, each buffer typically has a size of 2368 bytes and consists of several distinct  fields – header, object and trailer. Let’s look at each one of them.

Figure 13 Mempool buffer fields 

Header: This portion of the buffer contains metadata and control information needed by DPDK to  manage buffer efficiently. It includes information such as buffer length, buffer state or type and  helps to iterate on mempool objects. The size of the object header is 64 bytes. Object: This section contains actual payload or data. Within the object section, there are additional  fileds such as mbuf, headroom and packet data. The mbuf of 128 bytes contains metadata such as  message type, offset to start of the packet data and pointer to additional mbuf structures. Then  there is a headroom of 128 bytes. The packet data is 2048 bytes that contains packet headers and  payload. 

Trailer: The object trailer is 0 bytes, but a cookie of 8 bytes is added in debug mode. This cookie acts as a marker to prevent corruptions. 

With a buffer size of 2368 bytes (not a power of 2), the buffers are inherently aligned with cache  awareness without the need for cache line padding. In other words, the buffer size is such that it  optimizes cache utilization without the need for additional padding. 

The buffer size of 2368 bytes does not include the padding added to distribute buffers across  memory channels. 

To prove how the performance can vary with a buffer size that is power of 2, I ran an experiment  with 2048 buffer size and compared it against the default buffer size of mempool in DPDK. In the experiment 8192 buffers are allocated in the mempool and a histogram of cache sets for all  the buffers was plotted. The histogram illustrates the number buffers allocated in each cache set. 

Figure 14 Histogram of buffers – 2048 bytes 

With a buffer size of 2048 bytes, the same sets in the hardware cache are hit repeatedly, whereas  other sets are not utilized (we can see that from the gaps in the histogram) 

Figure 15 Histogram of buffers – 2368 bytes

With a buffer size of 2368 bytes, each set is being accessed only around 400 times. There are no  gaps in the above histogram, indicating that the cache is being utilized efficiently. 

DPDK l3fwd Performance 

The improved cache utilization observed in the histogram, attributed to cache awareness, is further  corroborated by the throughput numbers of the l3fwd application. The application is run on a  system with 64KB 4-way set associative cache. 

Below chart shows the throughput in MPPS for single core l3fwd test with 2048 and 2368 buffer  sizes 

Figure 16 l3fwd throughput comparison

There is a 17% performance increase with the 2368 buffer size. 

Conclusion 

Contiguous buffer allocation in memory with cache awareness enhances performance by  minimizing cache evictions and maximizing hardware cache utilization. In scenarios where the  buffer size is exactly divisible by the cache size (e.g., 2048 bytes), padding cache lines creates a offset in the memory addresses and better distribution of buffers in the cache. This led to a 17%  increase in performance for DPDK l3fwd application. 

However, with buffer sizes not precisely divisible by the cache size, as is the default in DPDK,  padding of cache lines already occurs because of the offset in the buffer addresses, resulting in an improved performance. 

For more information visit the programmers guide

DPDK Long Term Stable (LTS) Release 22.11.05

By Blog

The latest DPDK Long Term Stable (LTS) Release 22.11.05 includes several updates and enhancements across various components of the DPDK framework. Significant changes in this release involve numerous file modifications, which are indicated by a large number of insertions and deletions across the codebase. The release notes document extensive additions, suggesting improvements and new features in the areas of network interface controllers (NICs), cryptographic devices, event devices, and baseband processing.

Notable adjustments were made to the build system, documentation, and driver support for various hardware. Improvements in error handling, memory management, and device operation stability are also reflected in the release notes. The release also addresses various bug fixes and performance enhancements to ensure better stability and efficiency.

Contributors

The update involved 246 files with a total of 3,235 insertions and 2,053 deletions. A big shout out to all the contributors to this release including:

Ajit Khaparde, Akhil Goyal, Akshay Dorwat, Alan Elder, Alex Vesker, Ali Alnubani, Andrew Boyer, Anoob Joseph, Arkadiusz Kusztal, Bing Zhao, Bruce Richardson, Chaoyong He, Chengwen Feng, Ciara Power, Dariusz Sosnowski, David Marchand, Dengdui Huang, Edwin Brossette, Eli Britstein, Emi Aoki, Erez Shitrit, Ferruh Yigit, Fidel Castro, Flore Norceide, Ganapati Kundapura, Gregory Etelson, Hamdan Igbaria, Hanumanth Pothula, Hao Chen, Harman Kalra, Hernan Vargas, Holly Nichols, Huisong Li, Jie Hai, Jonathan Erb, Joyce Kong, Kaiwen Deng, Kalesh AP, Kevin Traynor, Kiran Kumar K, Kishore Padmanabha, Kommula Shiva Shankar, Konstantin Ananyev, Kumara Parameshwaran, Long Li, Luca Boccassi, Maayan Kashani, Masoumeh Farhadi Nia, Maxime Coquelin, Michael Baum, Mingjin Ye, Morten Brørup, Mário Kuka, Neel Patel, Nithin Dabilpuram, Pavan Nikhilesh, Pengfei Sun, Qi Zhang, Qian Hao, Radu Nicolau, Rahul Bhansali, Rakesh Kudurumalla, Robin Jarry, Rongwei Liu, Satheesh Paul, Shai Brandes, Shaowei Sun, Shihong Wang, Shun Hao, Simei Su, Sivaprasad Tummala, Sivaramakrishnan Venkat, Stephen Hemminger, Suanming Mou, Sunil Kumar Kori, Sunyang Wu, Tom Jones, Viacheslav Ovsiienko, Wathsala Vithanage, Weiguo Li, Yajun Wu, and Yunjian Wang.

These contributors addressed various aspects from software fixes and performance enhancements to security improvements across multiple components of the system.

Download it here: DPDK 22.11.5

The git tree for this version can be accessed here: DPDK Stable 22.11

DPDK’s Role in Hyperscaling

By Blog

In the rapidly evolving digital landscape, hyperscaling in the cloud has emerged as a critical strategy for businesses aiming to scale their operations efficiently. The webinar, “Hyperscaling in the Cloud,” hosted by Honnappa Nagarahalli (Arm), from the DPDK Tech Board, brings together industry experts to discuss how the Data Plane Development Kit (DPDK) is revolutionizing hyperscale cloud environments.

The Webinar Panelists

The webinar featured three distinguished panelists:

1. Brian Denton: A Senior Program Manager at Microsoft Azure, Brian brings a wealth of experience in Azure’s host networking. He shared insights into Azure’s implementation of DPDK, emphasizing its use in enhancing Ethernet, and overall network performance.

2. Rushil Gupta: As a Senior Software Engineer at Google, Rushil highlighted the critical role of DPDK in financial technology (Fintech) applications on Google Cloud Platform (GCP). His discussion focused on achieving consistency, performance, and reliability in high-frequency trading platforms.

3. Jim Thompson: Co-founder of Netgate, Jim delved into the use of DPDK in networking applications outside the traditional cloud domain. His contribution illuminated the versatility of DPDK across different cloud environments and its impact on virtual private networks (VPNs).

Insights from the Webinar

DPDK in Azure’s Cloud Networking

Brian Denton’s presentation offered a glimpse into how Microsoft Azure leverages DPDK to offload packet processing from the CPU to dedicated hardware. This approach significantly reduces latency and improves throughput, enabling Azure to offer enhanced performance for virtual machines (VMs) and networking services.

Brian shared valuable insights into how DPDK has been instrumental in Azure’s network infrastructure, particularly highlighting its impact on Azure’s host networking and the broader ecosystem of partners and customers. He explained that Azure has integrated DPDK to address the need for high-speed packet processing, which is crucial for a wide range of applications, from basic web services to complex, latency-sensitive tasks like real-time analytics and high-frequency trading.

One of the key points Brian made was about the technical architecture that enables Azure to leverage DPDK’s capabilities. He detailed how DPDK is used in conjunction with Azure’s hardware, such as SmartNICs, to offload and accelerate network functions traditionally handled by software. This hardware-software synergy, as Denton explained, not only reduces CPU overhead but also significantly decreases latency, providing Azure customers with improved network performance and efficiency.

Furthermore, Brian highlighted real-world applications of DPDK in Azure, illustrating how partners and customers utilize DPDK for scenarios that require minimal latency and maximum throughput. He also discussed the continuous evolution of Azure’s networking stack, underscored by the introduction of new hardware and the ongoing optimization of DPDK to meet the growing demands of cloud computing.

Some examples: 

Clearent by Xplor used Azure SQL Database Hyperscale to revamp its merchant transaction reporting system. Previously operating on in-house systems, Clearent, which handles over 500 million transactions annually, shifted to a cloud-based setup. This move significantly boosted their ability to process and report data.

Protocall Services, a provider of telephonic crisis and behavioral health digital tools, embarked on a cloud migration journey to enhance the reliability, security, and scalability of its IT infrastructure. 

DPDK’s Impact on Fintech Applications

Rushil Gupta’s discussion on the use of DPDK in financial technology (Fintech) applications, particularly in the realm of high-frequency trading (HFT) platforms on Google Cloud Platform (GCP), sheds light on how bleeding-edge network processing technologies are evolving financial markets. 

In the fast-paced world of HFT, where milliseconds can equate to millions of dollars, the need for ultra-low latency and high throughput is paramount. Traditional cloud networking approaches may falter under such demanding requirements due to the involvement of kernel-based networking stacks that introduce additional latency. Here, DPDK’s bypass of the kernel networking stack, allowing direct access to network hardware, presents a compelling solution. This direct path significantly reduces latency and increases packet processing speed, enabling HFT platforms to operate at the speed required to capitalize on fleeting market opportunities.

Rushil illustrates how Google leverages DPDK to empower fintech customers on GCP, providing them with the infrastructure necessary to achieve the high throughput and low-latency communication essential for HFT platforms. One notable application is in the construction of complex event processing (CEP) systems, which are at the heart of many trading platforms. These systems analyze and act upon market data in real-time, necessitating the rapid processing capabilities that DPDK facilitates.

Rushil discusses the role of DPDK in enhancing data replication and recovery processes within fintech applications. In an industry where data integrity and availability are critical, DPDK’s efficiency in handling large volumes of data packets ensures that financial institutions can maintain robust data replication frameworks. This capability not only supports the high availability demands of trading platforms but also aids in achieving regulatory compliance related to data persistence and recovery.

Rushil explained how DPDK’s application in fintech on GCP demonstrates the technology’s pivotal role in enabling HFT and other financial services to meet their stringent performance and reliability criteria. With DPDK, Google provides a competitive edge to fintech applications, facilitating new levels of speed and efficiency in financial markets. 

Some examples: 

1. CME Group. As one of the world’s leading derivatives marketplaces, CME Group leverages GCP and DPDK for enhanced market data analytics and to facilitate high-speed trading. Their partnership aims to accelerate CME Group’s move to the cloud, transforming the global markets ecosystem with cloud-based innovation and scaling capacity dynamically to meet market demands.

2. Talos. Specializing in digital asset trading technology, Talos utilizes GCP’s infrastructure to support its trading platform. With DPDK, Talos benefits from reduced latency and increased throughput, essential for executing trades and managing orders across multiple exchanges and liquidity pools efficiently.

3. Clowd9. This cloud-based trading technology provider uses GCP to offer a scalable and secure platform for trading firms and financial institutions. DPDK supports Clowd9’s need for high performance and low latency in executing trades, managing risk, and processing real-time market data.

4. Freetrade. Freetrade, an investment platform, leverages GCP to power its app, offering users commission-free trading. GCP’s global infrastructure and DPDK’s network optimization capabilities ensure that Freetrade can manage high volumes of transactions and data analysis. 

5. TD Securities Automated Trading (TDSAT): TDSAT uses GCP for trading fixed-income bonds, benefiting from DPDK’s high-performance packet processing capabilities. This enables TDSAT to execute trades at high speed and with precision, critical for maintaining competitiveness in the fixed income market.

These customers and use cases underscore the importance of DPDK in enhancing network performance on GCP, making it an ideal platform for capital market applications that demand high throughput, low latency, and scalability. By leveraging GCP and DPDK, capital market firms innovate and adapt quickly to market changes, manage risks more effectively, and unlock new opportunities for growth.

Broadening DPDK’s Application Scope in VPNs and Software Routers

Jim Thompson’s insights during the DPDK webinar shed light on how DPDK is leveraged in cloud networking through the lens of Netgate’s product, TNSR (pronounced ‘Tensor’). This serves as a case study of DPDK’s implementation outside its traditional use cases. TNSR, a virtual router developed by Netgate, underscores the adaptability and robustness of DPDK in addressing specific cloud networking challenges.

In cloud environments, networking demands can quickly escalate due to the sheer volume of data transfer and the need for secure connections. Traditional VPN solutions often fall short due to bandwidth limitations and the number of tunnels they can support. Jim highlighted how these constraints could hinder the scalability and performance of cloud-based services. This scenario is particularly relevant for large organizations that require extensive interconnectivity across various cloud environments.

The introduction of TNSR as a DPDK-powered solution exemplifies how DPDK’s high-performance packet processing capabilities can be extended beyond typical use cases to solve complex cloud networking problems. By utilizing DPDK’s efficient polling mode drivers (PMDs) for network and cryptography offload, TNSR significantly enhances throughput and reduces latency in VPN connections. 

Jim explained how TNSR facilitates seamless connectivity between on-premise networks and cloud regions, highlighting the importance of VPN connections for secure data transfer. He underscored the limitations of existing cloud VPN solutions, such as bandwidth caps and tunnel number restrictions, which can significantly hamper large organizations’ networking needs. By leveraging DPDK, TNSR bypasses these limitations, providing a more flexible and scalable solution for cloud-based networking.

Take a look at Netgate’s customer stories here

Conclusion

The webinar underscored DPDK’s pivotal role in enabling hyperscaling in the cloud. By providing a high-performance packet processing framework, DPDK not only enhances network efficiency but also opens new avenues for application development across various industries. As cloud architectures continue to evolve, the collaboration between cloud providers, technology firms, and the open source community will be vital in harnessing the full potential of DPDK.

Join in the Hyperscaling discussion and the community on slack here.

DPDK LTS 22.11.4

By Blog

The latest DPDK release, version 22.11.4, brings several important updates, bug fixes, and improvements across various components of the DPDK framework. In this blog post, we’ll provide a brief summary of the key changes and highlights in this release.

Release Highlights:

1. Updated Git Tree: You can access the latest DPDK source code on the official Git repository at DPDK Stable Git Tree

2. Bug Fixes and Backports: This release includes numerous bug fixes and patches, thanks to the efforts of the community, which contribute to the stability and reliability of the DPDK framework.

3. Security Improvements: The release addresses security-related issues and provides enhancements to improve the overall security of the DPDK.

4. Documentation Updates: The DPDK documentation has been updated, including improvements to guides for various NICs and platforms. The release also includes updates to the Security Guide and VDPA (Virtio Data Path Acceleration) documentation.

5. Performance Enhancements: DPDK is known for its high-performance networking capabilities, and this release continues to optimize and improve performance across different components.

6. Driver Updates: Multiple network drivers have been updated and improved in this release, including fixes for checksum offloading, packet handling, and performance tuning.

7. Eventdev Improvements: The eventdev subsystem has seen enhancements, including fixes related to device pointer management and driver names in the info struct.

8. Crypto Libraries: Various crypto libraries have been updated and fixed, including the addition of missing documentation for security context and memory leak fixes in the OpenSSL PMD.

9. Mempool Fixes: Several fixes have been applied to the mempool component, improving memory allocation and thread safety.

10. Other Component Updates: This release also includes fixes and improvements to other DPDK components, such as the test suite, Ethernet device drivers, and examples.

Overall, DPDK 22.11.4 is a stable release that brings a wide range of improvements, ensuring the continued reliability and performance of the DPDK framework. Users are encouraged to upgrade to this latest release to benefit from the bug fixes and enhancements provided by the DPDK community.

For detailed information about specific changes, you can refer to the official release notes here.

As always, it’s important to thoroughly test any new DPDK release in your specific networking environment to ensure compatibility and performance before deploying it in a production environment.

A Big Shoutout to Our Dedicated Contributors

We want to take this opportunity to express our gratitude to the hardworking individuals who contributed to this release. Their dedication and expertise have made DPDK even more robust and efficient.

Xueming Li, Aakash Sasidharan, Abdullah Sevincer, Akhil Goyal, Alex Vesker, Alexander Kozyrev, Amit Prakash Shukla, Anatoly Burakov, Anoob Joseph, Artemy Kovalyov, Bing Zhao, Brian Dooley, Bruce Richardson, Chaoyong He, Christian Ehrhardt, Ciara Power, David Christensen, David Marchand, Dengdui Huang, Ed Czeck, Eli Britstein, Feifei Wang, Fengjiang Liu, Ferruh Yigit, Gagandeep Singh, Ganapati Kundapura, Gregory Etelson, Harman Kalra, Harry van Haaren, Hemant Agrawal, Hernan Vargas, Huisong Li.

DPDK Long Term Stable Release Guide

By Blog

Navigating the Data Plane Development Kit (DPDK) release landscape requires a thorough understanding of what a Long-Term Stable (LTS) release entails and why it may be the preferred choice for certain network environments. This guide dives into the nuances of DPDK LTS, its suitability for production, and the level of active support it receives.

Understanding DPDK Long Term Stable (LTS) Releases

DPDK LTS releases stand out in the networking world for their reliability over an extended period. Here’s what sets them apart:

Longevity of Support: DPDK LTS offers a commitment of three years’ worth of fixes, ensuring that a chosen release remains robust against issues found long after its initial deployment.

Consistent Improvements: A DPDK LTS release isn’t static. It evolves with a series of API/ABI compatible drop-in replacements that incorporate the latest fixes discovered in the subsequent years. For instance, a series based on a 2022 DPDK release will be refined with fixes identified during 2023-2025.

Sustainability: This approach guarantees that production environments can rely on a consistent, stable platform without the need to constantly adapt to new feature changes.

DPDK LTS releases are tailored for specific scenarios within the networking domain

Ideal for Production: LTS releases are the go-to for production environments where stability is paramount and the latest features are less of a priority.

Focus on Stability Over Novelty: Organizations that value long-term reliability over cutting-edge features will find DPDK LTS releases more suitable.

Active Maintenance and Support

The vibrancy of the DPDK LTS ecosystem is reflected in the following statistics:

Multiple Active Releases: As of now, three LTS releases are being actively maintained: 21.11, 22.11, and 23.11.

Frequent Updates: 2023 saw 9 releases across these maintained LTS versions.

Volume of Fixes: Approximately 1800 fixes have been backported to the active DPDK LTS releases in the last year alone.

Maintenance Span for DPDK LTS

The commitment to maintain a DPDK LTS release is clear and long-term:

General Fixes: All identified fixes will be backported to the LTS releases for a full three years from their release date.

Security Patches: Security-related updates may even extend beyond the three-year window, ensuring that LTS releases maintain a strong defense against vulnerabilities.

Choosing a DPDK LTS Release

When deciding on a DPDK LTS release, consider the following:

Maintenance Timeline: Evaluate whether longer support windows aligns with your deployment cycle and update capacity.

Active Maintenance Record: The volume and frequency of backported fixes provide an indication of the LTS version’s vitality and the community’s dedication to its upkeep.

Security Commitment: With a promise of three-plus years of security fixes, assess whether this meets your organization’s security and compliance requirements.

Preparing for DPDK LTS Deployment

Transitioning to or between DPDK LTS releases requires an organization to:

Stay Informed: Keep abreast of the DPDK LTS release and maintenance schedules to time updates strategically.

Test Thoroughly: Allocate resources for detailed testing to ensure the LTS version integrates seamlessly with your environment.

Anticipate Adjustments: Be prepared for any necessary changes that might arise from the introduction of backported fixes.

Conclusion

Selecting a DPDK LTS release is a strategic decision influenced by the need for stability, long-term support, and a maintenance schedule that ensures network applications remain secure and performant. 

With the extended support and backporting of fixes, DPDK LTS releases offer a dependable foundation for organizations seeking a stable networking stack. This maintenance model continues to be a cornerstone of network reliability, allowing organizations to leverage stable and secure networking functions without the churn of constant feature updates.

For more information visit: https://doc.dpdk.org/guides/contributing/stable.html

Which DPDK LTS Release Should I Pick?: The LTS (Upstream) Maintainer’s Guide

By Blog

Authors: Ben Thomas (Linux Foundation Marketing Lead) & Kevin Traynor (LTS Maintainer & Software Engineer @ Redhat)

Navigating the complex landscape of Data Plane Development Kit (DPDK) releases, particularly Long-Term Support (LTS) versions, is often accompanied with pressing questions: “Which release is most suitable for my needs?” “What are the advantages of choosing an LTS release?” “How does an LTS release impact the stability and life span of my network applications?” 

This detailed blog aims to shed light on these queries and steer you towards the appropriate DPDK LTS release for your specific requirements.

Understanding DPDK LTS Releases

DPDK is a collection of libraries and drivers that enable rapid packet processing, which is essential in network functions virtualization (NFV), cloud computing, and other high-speed networking environments. 

LTS releases are special in that they are designated to receive ongoing maintenance updates, including bug fixes and security patches, for a longer duration than standard releases.

DPDK LTS vs. Standard Releases

DPDK standard releases are known for their rapid development and inclusion of bleeding-edge features. In contrast, LTS releases follow a more systematic and stable approach, focusing on a consistent cadence of fixes and releases. 

This distinction is important for organizations deciding between adopting the latest DPDK features or prioritizing long-term stability, who anticipate a solid three-year cycle of maintenance, with security patches often exceeding this period.

Volume of Fixes in Each Release

A notable characteristic of DPDK LTS releases is the substantial number of fixes each version receives. This high volume of fixes is a testament to the active maintenance and commitment to ensuring the reliability and stability of each LTS version. It indicates the ongoing effort to address a wide range of issues, from minor bugs to critical vulnerabilities. 

Simultaneous Maintenance of Multiple Versions

DPDK’s maintenance strategy includes managing three LTS versions simultaneously. This approach ensures that organizations using different versions of DPDK LTS receive the necessary support and updates. It exemplifies the dedication of the DPDK community to cater to a diverse range of users and their varying adoption timelines.

Differences in Fixes Across Versions

The number of fixes in LTS releases varies based on the age and lifecycle of the release. For instance, an older release like 20.11 tends to have fewer fixes as it matures and stabilizes over time. In contrast, a newer release like 22.11, which was released late in 2022, has not yet completed a full year of fixes. This discrepancy in the number of fixes reflects the evolving nature of each release and the continuous effort to enhance stability and performance.

Impact of Release Timelines on Maintenance

The timing of a release plays a critical role in its maintenance cycle. A newer release like 22.11, having been in the market for a shorter duration, might not have accumulated as many fixes as an older release. This variation underscores the importance of understanding the release timelines and their implications on the maintenance and support cycles of DPDK LTS releases.

Enhanced Focus on NIC and Driver Fixes in LTS Releases

One of the key aspects of DPDK LTS releases is their focus on Network Interface Controller (NIC) and driver fixes. Hardware vendors are deeply invested in driver functionality, making them some of the most active contributors to the DPDK LTS ecosystem. Their involvement is crucial, as they provide the expertise and timely updates necessary to keep the drivers—and thus the network—running smoothly. 

Trends in Bug Fixes and Maintenance

An analysis of recent LTS versions, such as version 22.11.3, reveals a trend of decreasing bug fixes. This trend corresponds to the number of bugs found in the main branch, contrasting with previous versions where fixes averaged around 300. The rate of fixes is not fixed; as a release ages, the number of fixes tends to decrease, indicating increased stability over time.

Longevity of LTS Maintenance

The question of why LTS releases are not maintained for extended periods, like 20 years, is addressed by the trend of diminishing fixes over time. As LTS versions become more stable, the need for frequent fixes decreases, justifying the typical LTS support duration.

Future Code Integrations and Impact on LTS

Looking ahead, future code integrations in DPDK may not significantly impact LTS releases. While library fixes might increase, the core stability of LTS releases is expected to be maintained.

The Process Behind DPDK LTS Releases

The philosophy guiding LTS maintenance encapsulates a straightforward principle: “Don’t make it worse.” Key aspects of this philosophy include:

LTS Selection: Annually, one DPDK release is chosen to become an LTS version. Typically this is the November DPDK release. This release ceases to receive new features but is maintained to address critical issues.

Maintenance Workflow: Fixes from subsequent releases are ported back to the LTS release. For instance, if a significant bug is fixed in a March release, that fix is usually also applied to the LTS version.

Vendor Contributions: Updates for drivers specific to various hardware vendors are included in LTS releases, ensuring that common components remain stable across different platforms.

Validation: LTS releases are validated through a mix of CI and vendors dedicating validation resources.

Distribution Adoption: Prominent Linux distributions such as Red Hat, Ubuntu and Debian favor LTS releases due to their longer support window and relative feature stability.

The Role of LTS in Upstream Maintenance

The DPDK upstream maintainers are committed to ensuring that LTS releases receive the necessary fixes without introducing new issues. This careful balance underscores their dedication to the core LTS premise.

Statistical analysis of LTS releases provides insight into several key areas:

Fixes: The quantity and significance of the bug fixes an LTS release receives.

Bug Age: The duration that bugs existed before being fixed, indicating the codebase’s stability.

Code Areas: The sections of code that were most frequently fixed, highlighting areas of potential vulnerability or critical importance.

Why Opt for DPDK LTS?

Opting for a DPDK LTS release over a standard one is akin to choosing a well-established airline for travel—while it may not boast the newest features, its track record for safety is impeccable.

Industry Adoption

Companies like Red Hat and Ubuntu use LTS releases because they trust the extended support period will provide a stable foundation for their network infrastructure. This trust stems from the methodical maintenance and broad community endorsement of LTS releases, which adds another layer of testing and quality assurance.

Community and Self-Support

While LTS releases are backed by community support, there is nothing to prevent organizations from taking on their support initiatives. For example, if a company needs to support a particular network card with custom features, they can take an LTS release and integrate their changes while still benefiting from the core stability that LTS provides.

Update Overhead

LTS offers flexibility of when to update to the next LTS. If a new feature of interest is available in the next LTS then it can be worth the effort to update and also extend the longevity of using a maintained release.

If not, then it is fine to skip and wait for a later LTS. Staying with an LTS series means less effort to get fixes. It also means API/ABI compatibility, so there are no application code changes needed and less frequent product integration testing.

What if there’s a new interesting feature that is not yet in an LTS release yet? With one LTS released per year, there is always another one coming soon.

Example Integration into Other Projects

Open source projects like Open vSwitch (OVS) integrate DPDK LTS to enhance their performance. Each year the OVS project takes the newest DPDK LTS and integrates into their next release. This means that OVS users benefit from fixes and stability in the underlying DPDK drivers that it uses.

Which LTS Release Should You Pick?

The choice of the right DPDK LTS release hinges on various factors:

Support Window: Assess the duration of support your deployment requires.

Feature Set: Determine whether the LTS release contains the necessary features for your network applications, keeping in mind that it takes time for new features to become stable.

Vendor Compatibility: Check if the LTS release supports your hardware and if vendor-specific drivers are maintained.

Preparing for Transition: From LTS to LTS

Transitioning from one LTS release to the next requires careful planning. As exemplified by maintainers like Kevin Traynor in the 21.11 release, the process is detailed but manageable. Organizations should:

  • Stay informed about the DPDK stable release schedule and plan their updates accordingly.
  • Dedicate time to comprehensive testing when updating versions.
  • Anticipate and prepare to address possible integration issues.

The Future of DPDK LTS Releases

Looking forward, DPDK LTS releases will continue to be a cornerstone for networks that value stability. The DPDK community, in conjunction with hardware vendors, is committed to ensuring that LTS releases are equipped to handle the demands of modern networking environments. 

As we navigate this terrain, the feedback loop between users and maintainers will remain vital. Each fix, each update, and each LTS release is a product of collective effort and shared knowledge.

Call to Action: Join the Effort

As the DPDK LTS ecosystem thrives, we extend an open invitation to more companies and contributors to provide their input and expertise. Whether you’re a hardware vendor with an eye on driver optimizations or an enterprise leveraging DPDK for high-performance networking, your experiences and contributions are invaluable. The strength of an LTS release is not just in its code—it’s in the community that molds and shapes it.

Learn more about DPDK LTS releases here

Inside DPDK 23.11: An Overview of the Latest Release

By Blog

Introduction:
The latest DPDK version 23.11, is now available. This version includes latest updates and new features in network processing.

Download Link: https://fast.dpdk.org/rel/dpdk-23.11.tar.xz

Release Statistics:

Commits: This release comprises 1161 commits from 161 authors.

File Changes: There were modifications to 1647 files, which include 97,078 insertions and 44,688 deletions.

Support Duration:
The 23.11 version will be supported for three years, indicating its reliability for system integration and deployment.

ABI Versioning: New Major ABI Version: This release introduces major ABI version 24.

Future Releases: The forthcoming 24.03 and 24.07 versions will maintain compatibility with ABI version 23.11.


Main Features of the Release:

  • C11 Compiler Requirement: The build now mandates a C11 compatible compiler.
  • MSVC Build Support: Initial support is added for Microsoft Visual C++ (MSVC) builds.
  • New Atomic Operations API: An enhanced API for atomic operations is included.
  • AMD CPU Power Management: Advanced power management features for AMD CPUs.
  • MBuf Recycling: Enhancements in memory buffer recycling.
  • RSS Algorithm Management: Upgraded management of Receive Side Scaling (RSS) algorithms.
  • Max Rx Buffer Size Adjustment: Modifications to the maximum receive buffer size.
  • New Flow Action Types: Introduction of novel flow action types.
  • Flow Group Miss Action: A new flow group miss action feature is added.
  • Packet Type Matching: Improved packet type matching in flow items.
  • TLS Record Offload: New offloading capability for TLS record processing.
  • Security Rx Inject: Advanced features for security reception injection.
  • Eventdev

Enhancements:

  • Updates in eventdev link profiles, adapter for dmadev, and the event dispatcher library.
  • NFP vDPA Driver: A new driver is introduced.
  • Graph Application: A novel application feature for graph processing.
  • Deprecated Libraries: Certain libraries and drivers have been removed.

Detailed Release Notes: https://doc.dpdk.org/guides/rel_notes/release_23_11.html

Contributor Acknowledgments:
The release includes contributions from 40 new individuals across various roles like authors, reviewers, and testers.

A big shoutout to Alan Brady, Ales Musil, Andrey Ignatov, Artemy Kovalyov, Chang Miao, Fengjiang Liu, Igor de Paula, Jayaprakash Shanmugam, John Romein, Jonathan Erb, Jonathan Tsai, Josh Hay, Julian Grajkowski, Karen Kelly, Kuan Xu, Madhu Chittim, Mahesh Adulla, Matthew Dirba, Paul Szczepanek, Peter Nilsson, Sam Andrew, Sampath Peechu, Saurabh Singhal, Shailendra Bhatnagar, Shihong Wang, Shubham Rohila, Shujing Dong, Sibaranjan Pattnayak, Sinan Kaya, Sivaprasad Tummala, Sivaramakrishnan Venkat, Timothy Miskell, Tomer Shmilovich, Trevor Tao, Trevor Tao, Vamsi Krishna Attunuru, Wajeeh Atrash, Wei Hu, Xiaoming Jiang, Zhenning Xiao.