io-uring Linux Next-Generation Asynchronous I/O Interface

Unveiling io_uring: Linux’s Next-Generation Asynchronous I/O Interface

Introduction

In the ever-evolving landscape of operating systems, efficient Input/Output (I/O) operations are paramount for achieving high performance and responsiveness. For decades, Linux has offered various mechanisms for handling I/O, each with its own strengths and limitations. However, with the advent of high-speed storage devices and increasingly demanding applications, the need for a more sophisticated and efficient I/O interface became apparent. This led to the development of io_uring, a revolutionary asynchronous I/O interface introduced in Linux kernel version 5.1.

This blog post delves into the intricacies of io_uring, exploring its core concepts, architectural advantages, and practical applications. We will analyze the foundational paper that introduced io_uring [1], examine its evolution and recent advancements, discuss its security implications, and provide a hands-on example to demonstrate its usage in a real-world scenario.

The Genesis of io_uring: Addressing I/O Bottlenecks

Before io_uring emerged, Linux provided several I/O interfaces, each presenting its own set of challenges that hindered optimal performance, especially in modern, high-throughput environments. Understanding these limitations is crucial to appreciating the innovations brought forth by io_uring.

Limitations of Traditional I/O Mechanisms

Traditional synchronous I/O system calls, such as read(2) and write(2), operate by blocking the calling process until the I/O operation is complete. While simple to use, this blocking behavior can lead to significant performance degradation in applications that require concurrent I/O operations or need to maintain responsiveness while waiting for data. Even variations like pread(2), pwrite(2), and their vector-based counterparts (preadv(2), pwritev(2), preadv2(2), pwritev2(2)) remained fundamentally synchronous, offering little relief for high-concurrency scenarios.

The POSIX Asynchronous I/O (AIO) interface, aio_read(3) and aio_write(3), was an attempt to address the need for non-blocking I/O. However, its implementations were often criticized for being inefficient and failing to deliver the expected performance gains.

The native Linux aio interface, while more robust than POSIX AIO, suffered from several critical deficiencies:

  • O_DIRECT Dependency: A major drawback was its primary reliance on O_DIRECT for asynchronous operations. O_DIRECT bypasses the kernel’s page cache, which can be beneficial for certain workloads but comes with strict alignment and size constraints. For most common buffered I/O operations, native aio would revert to synchronous behavior, negating its asynchronous promise.
  • Unpredictable Blocking: Even when O_DIRECT was used, aio submissions could still block. This might occur if metadata was required, or if the internal request queues were saturated. Such unpredictable blocking made it difficult for applications to rely on aio for truly non-blocking I/O, often forcing developers to offload I/O to separate threads.
  • Inefficient API Design: The aio API incurred significant overhead due to excessive memory copying. Each I/O submission and completion involved copying a total of 104 bytes, a substantial cost for an interface intended for high performance. The design of its completion event ring buffer was also problematic, being difficult, if not impossible, for applications to use correctly. Furthermore, every I/O operation necessitated at least two system calls (one for submission and one for waiting for completion), which, especially in the post-Spectre/Meltdown era, introduced noticeable performance penalties.

These limitations meant that despite the existence of asynchronous I/O interfaces, many applications resorted to creating their own private I/O offload thread pools to achieve decent asynchronous I/O, a pattern that io_uring aims to eliminate by providing a more efficient kernel-level solution.

The Design Philosophy Behind io_uring

Recognizing the shortcomings of existing I/O interfaces, the developers of io_uring, led by Jens Axboe, embarked on a mission to create a new API from the ground up. This fresh start allowed for a design unburdened by the constraints of previous implementations, focusing on several key goals:

  • Ease of Use, Hard to Misuse: A primary objective was to create an intuitive and straightforward API that developers could easily adopt and use correctly, minimizing the potential for common errors.
  • Extendability: While initially conceived for block-oriented storage I/O, io_uring was designed with future expansion in mind. This foresight ensures its applicability to a broader range of I/O types, including networking and emerging storage technologies.
  • Feature Richness: Unlike its predecessors, io_uring aimed to provide a comprehensive set of features, eliminating the need for applications to constantly re-implement common I/O functionalities or rely on external thread pools.
  • Efficiency: This was a non-negotiable design principle. io_uring was engineered to minimize per-request overhead, with a strong emphasis on avoiding memory copies and indirections for both submission and completion events. This is crucial for high-performance devices with sub-10 microsecond latencies and very high IOPS.
  • Scalability: The design sought to expose the kernel’s inherent I/O scalability directly to applications, enabling them to achieve peak performance even under heavy loads.

The Power of Ring Buffers

The fundamental innovation that underpins io_uring’s efficiency and scalability is its use of shared ring buffers between the kernel and user space. This elegant design allows for extremely fast and efficient communication, circumventing the need for costly system calls and memory copies for each I/O operation. Instead of making a system call for every request, applications can batch multiple I/O requests into a submission queue (SQ) ring buffer. The kernel then processes these requests asynchronously and places their completion results into a completion queue (CQ) ring buffer, which the application can poll for results.

This ring-based architecture significantly reduces the overhead associated with I/O operations, making io_uring a game-changer for applications demanding high I/O performance.

io_uring in Evolution: Recent Advancements and Emerging Concerns

Since its introduction, io_uring has undergone continuous development, expanding its capabilities beyond traditional file I/O to encompass networking and other domains. However, with great power comes great responsibility, and io_uring has also brought to light new security considerations.

Advancements in Networking I/O

One of the most significant areas of io_uring’s evolution is its application to networking. While traditional network applications often rely on readiness-based models like epoll to be notified of data availability, io_uring offers a more efficient, completion-based approach. Key advancements in this area include [2]:

  • Batching: io_uring excels at batching multiple I/O operations into a single system call. This is particularly beneficial for network applications, where many small I/O operations can be combined, reducing system call overhead. The io_uring_submit_and_wait() function, for instance, allows applications to submit new requests and wait for completions in one atomic operation, streamlining the I/O event loop.
  • Multi-shot Requests: This feature revolutionizes how network events are handled. Instead of requiring an application to re-submit a request after each event (e.g., after accepting a new connection or receiving data), multi-shot requests are submitted once and continue to generate completion events as new occurrences arise. This is supported for operations like accept() (via io_uring_prep_multishot_accept()), recv() (via io_uring_prep_recv_multishot()), and poll() (via io_uring_prep_poll_multishot()), significantly reducing the

housekeeping overhead for applications.

  • Provided Buffers: To address the challenge of buffer management in a completion-based model, io_uring introduces “provided buffers.” Instead of requiring applications to pre-allocate and manage buffers for every potential incoming data stream, applications can provide a pool of buffers to the kernel. The kernel then intelligently uses these pre-registered buffers to place incoming data, leading to more efficient memory utilization and reduced memory copying.

Security Implications

While io_uring offers unparalleled performance, its power and low-level access to kernel functionalities have also raised security concerns. Recent research has highlighted potential vulnerabilities and the emergence of new attack vectors:

  • Rootkit Bypass Capabilities: Security researchers have demonstrated that io_uring can be exploited to create stealthy rootkits that bypass traditional Linux security tools. This is primarily because io_uring can perform I/O operations without always going through the conventional system call interface, creating a “blind spot” for security monitoring solutions that rely on system call interception [3]. The “Curing” rootkit, for instance, showcased how malicious activities could be performed undetected [4].
  • Increased Attack Surface: The complexity and power of io_uring contribute to an expanded attack surface. Its history includes several security vulnerabilities, such as use-after-free bugs (e.g., CVE-2024-0582) [5] and privilege escalation flaws (e.g., CVE-2025-21333) [6]. These vulnerabilities underscore the need for continuous vigilance and prompt patching of systems.

The dual nature of io_uring as both a performance enhancer and a potential security risk necessitates a balanced approach. While leveraging its benefits, developers and system administrators must also implement robust security practices, including regular updates, careful monitoring, and potentially specialized security solutions designed to detect io_uring-based exploits.

Practical Example: Reading a File with io_uring

To illustrate the basic usage of io_uring, let’s walk through a simple C program that reads the content of a file asynchronously. This example uses the liburing library, which simplifies interaction with the io_uring kernel interface.

Prerequisites

Before compiling and running the example, ensure you have liburing-dev installed on your Linux system. You can install it using your distribution’s package manager:

sudo apt-get update
sudo apt-get install -y liburing-dev build-essential

The Code

Here’s the C code (io_uring_read_file.c):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <liburing.h>
#include <fcntl.h>
#include <unistd.h>

#define QUEUE_DEPTH 1
#define BUFFER_SIZE 1024

int main(int argc, char *argv[]) {
    struct io_uring ring;
    struct io_uring_sqe *sqe;
    struct io_uring_cqe *cqe;
    int fd;
    char buffer[BUFFER_SIZE];
    int ret;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
        return 1;
    }

    fd = open(argv[1], O_RDONLY);
    if (fd < 0) {
        perror("open");
        return 1;
    }

    // Initialize io_uring with a queue depth of 1
    ret = io_uring_queue_init(QUEUE_DEPTH, &ring, 0);
    if (ret < 0) {
        fprintf(stderr, "io_uring_queue_init: %s\n", strerror(-ret));
        close(fd);
        return 1;
    }

    // Get a Submission Queue Entry (SQE)
    sqe = io_uring_get_sqe(&ring);
    if (!sqe) {
        fprintf(stderr, "io_uring_get_sqe failed\n");
        io_uring_queue_exit(&ring);
        close(fd);
        return 1;
    }

    // Prepare a read operation: read from fd into buffer, size BUFFER_SIZE, at offset 0
    io_uring_prep_read(sqe, fd, buffer, BUFFER_SIZE, 0);
    
    // Submit the prepared SQE to the kernel
    io_uring_submit(&ring);

    // Wait for a Completion Queue Event (CQE)
    ret = io_uring_wait_cqe(&ring, &cqe);
    if (ret < 0) {
        fprintf(stderr, "io_uring_wait_cqe: %s\n", strerror(-ret));
        io_uring_queue_exit(&ring);
        close(fd);
        return 1;
    }

    // Check the result of the operation
    if (cqe->res < 0) {
        fprintf(stderr, "io_uring read failed: %s\n", strerror(-cqe->res));
    } else {
        // Null-terminate the buffer and print the content
        buffer[cqe->res] = '\0';
        printf("Read %d bytes: %s\n", cqe->res, buffer);
    }

    // Mark the CQE as seen by the application
    io_uring_cqe_seen(&ring, cqe);
    
    // Clean up io_uring resources
    io_uring_queue_exit(&ring);
    close(fd);

    return 0;
}

Explanation

  1. Initialization: io_uring_queue_init() sets up the io_uring instance, creating the submission and completion queues. QUEUE_DEPTH defines the maximum number of outstanding requests.
  2. Getting an SQE: io_uring_get_sqe() retrieves an available Submission Queue Entry (SQE) from the submission queue. This SQE is where we describe the I/O operation we want to perform.
  3. Preparing the Operation: io_uring_prep_read() populates the SQE with the details of a read operation: the file descriptor (fd), the buffer to read into (buffer), the size of the buffer (BUFFER_SIZE), and the offset (0 for the beginning of the file).
  4. Submitting the Request: io_uring_submit() tells the kernel that there are new requests in the submission queue that need processing. This is where the asynchronous operation begins.
  5. Waiting for Completion: io_uring_wait_cqe() blocks until a Completion Queue Event (CQE) is available in the completion queue. The CQE contains the result of the completed I/O operation.
  6. Processing the Result: The cqe->res field indicates the result of the operation. A positive value signifies the number of bytes read, while a negative value indicates an error. The read data is then printed to the console.
  7. Cleanup: io_uring_cqe_seen() marks the CQE as processed, and io_uring_queue_exit() cleans up the io_uring instance.

Compilation and Execution

To compile the program, use gcc and link against liburing:

gcc -Wall -O2 -D_GNU_SOURCE -o io_uring_read_file io_uring_read_file.c -luring

Before running, create a simple text file named test_file.txt:

echo "Hello, io_uring! This is a test file to demonstrate reading with io_uring." > test_file.txt

Now, execute the program with the test file:

./io_uring_read_file test_file.txt

You should see output similar to this:

Read 76 bytes: Hello, io_uring! This is a test file to demonstrate reading with io_uring.

This simple example demonstrates the fundamental steps involved in performing an asynchronous file read using io_uring. While this is a basic illustration, it highlights the power and efficiency of io_uring in offloading I/O operations to the kernel, allowing applications to remain responsive.

Conclusion

io_uring represents a significant leap forward in Linux I/O. By providing a highly efficient, asynchronous, and flexible interface, it addresses the long-standing limitations of traditional I/O mechanisms. Its innovative use of shared ring buffers minimizes overhead, enabling applications to achieve unprecedented levels of performance and scalability, particularly in demanding scenarios like high-throughput networking and storage.

However, the power and low-level access offered by io_uring also introduce new security challenges. As demonstrated by recent research, the interface can be a target for sophisticated attacks, necessitating a proactive approach to security, including continuous monitoring and timely patching. Developers and system administrators must remain vigilant, balancing the performance benefits with the inherent security risks.

As io_uring continues to evolve, it promises to unlock even greater potential for high-performance computing on Linux. Understanding its principles and mastering its usage will be crucial for developers building the next generation of efficient and responsive applications.

📺 Presentation: io_uring: So Fast. It’s Scary. - Paul Moore, Microsoft

References

[1] Axboe, J. (n.d.). Efficient IO with io_uring. Retrieved from https://kernel.dk/io_uring.pdf
[2] Axboe, J. (2023, February 14). io_uring and networking in 2023. Retrieved from https://kernel.dk/io_uring%20and%20networking%20in%202023.pdf
[3] ARMO. (2025, April 24). io_uring Rootkit Bypasses Linux Security Tools. Retrieved from https://www.armosec.io/blog/io_uring-rootkit-bypasses-linux-security/
[4] The Hacker News. (2025, April 24). Linux io_uring PoC Rootkit Bypasses System Call-Based Threat Detection. Retrieved from https://thehackernews.com/2025/04/linux-iouring-poc-rootkit-bypasses.html
[5] Exodus Intelligence. (2024, March 27). Mind the Patch Gap: Exploiting an io_uring Vulnerability in Ubuntu. Retrieved from https://blog.exodusintel.com/2024/03/27/mind-the-patch-gap-exploiting-an-io_uring-vulnerability-in-ubuntu/
[6] DataStackHub. (2025, April 27). CVE-2025-21333: Linux Io_uring Escalation Vulnerability. Retrieved from https://www.datastackhub.com/cve/cve-2025-21333/

打赏一个呗

取消

感谢您的支持,我会继续努力的!

扫码支持
扫码支持
扫码打赏,你说多少就多少

打开支付宝扫一扫,即可进行扫码打赏哦