io_uring is an asynchronous I/O interface introduced in Linux 5.1 (May 2019), designed by Jens Axboe to replace the older aio and POSIX AIO interfaces. This post covers what problems it was built to solve, the ring-buffer design at its core, recent additions in networking, the security concerns that have since surfaced, and a small liburing example.
Why io_uring exists
Each existing Linux I/O interface had limits that made it unsuitable for high-throughput async workloads.
Limitations of the older interfaces
Traditional synchronous I/O system calls, such as read(2) and write(2), operate by blocking the calling process until the I/O operation is complete. While simple to use, this blocking behavior can lead to significant performance degradation in applications that require concurrent I/O operations or need to maintain responsiveness while waiting for data. Even variations like pread(2), pwrite(2), and their vector-based counterparts (preadv(2), pwritev(2), preadv2(2), pwritev2(2)) remained fundamentally synchronous, offering little relief for high-concurrency scenarios.
The POSIX Asynchronous I/O (AIO) interface, aio_read(3) and aio_write(3), was an attempt to address the need for non-blocking I/O. However, its implementations were often criticized for being inefficient and failing to deliver the expected performance gains.
The native Linux aio interface, while more robust than POSIX AIO, suffered from several critical deficiencies:
O_DIRECTDependency: A major drawback was its primary reliance onO_DIRECTfor asynchronous operations.O_DIRECTbypasses the kernel’s page cache, which can be beneficial for certain workloads but comes with strict alignment and size constraints. For most common buffered I/O operations, nativeaiowould revert to synchronous behavior, negating its asynchronous promise.- Unpredictable Blocking: Even when
O_DIRECTwas used,aiosubmissions could still block. This might occur if metadata was required, or if the internal request queues were saturated. Such unpredictable blocking made it difficult for applications to rely onaiofor truly non-blocking I/O, often forcing developers to offload I/O to separate threads. - Inefficient API Design: The
aioAPI incurred significant overhead due to excessive memory copying. Each I/O submission and completion involved copying a total of 104 bytes, a substantial cost for an interface intended for high performance. The design of its completion event ring buffer was also problematic, being difficult, if not impossible, for applications to use correctly. Furthermore, every I/O operation necessitated at least two system calls (one for submission and one for waiting for completion), which, especially in the post-Spectre/Meltdown era, introduced noticeable performance penalties.
Because of these limitations, many high-performance applications resorted to building their own thread pools to fake async I/O on top of synchronous syscalls. io_uring was designed to make that workaround unnecessary.
Design goals
The io_uring paper lists five design goals:
- Easy to use, hard to misuse. Developers should not need to re-learn an exotic API.
- Extensible. Initially block-storage focused, but designed to grow into networking and other I/O types.
- Feature-rich. Common I/O patterns should be expressible without external thread pools.
- Efficient. Per-request overhead must be minimal — no memory copies or extra indirection on submission and completion paths. Required to keep up with sub-10µs storage devices.
- Scalable. Expose the kernel’s I/O scalability directly to userspace.
Ring buffers
The core mechanism is a pair of shared ring buffers mapped between kernel and userspace: a submission queue (SQ) and a completion queue (CQ). Userspace fills SQ entries, the kernel processes them asynchronously, and posts results into the CQ. This avoids a syscall per request — applications can batch many requests and submit them with a single io_uring_enter call (or no syscall at all in polled modes).
Recent developments
io_uring has expanded beyond block I/O into networking, and has surfaced new security concerns along the way.
Networking
Traditional network applications use readiness-based interfaces like epoll. io_uring offers a completion-based model, with several features tailored to networking:
- Batching:
io_uringexcels at batching multiple I/O operations into a single system call. This is particularly beneficial for network applications, where many small I/O operations can be combined, reducing system call overhead. Theio_uring_submit_and_wait()function, for instance, allows applications to submit new requests and wait for completions in one atomic operation, streamlining the I/O event loop. - Multi-shot Requests: Instead of requiring an application to re-submit a request after each event (e.g., after accepting a new connection or receiving data), multi-shot requests are submitted once and continue to generate completion events as new occurrences arise. Supported for operations like
accept()(viaio_uring_prep_multishot_accept()),recv()(viaio_uring_prep_recv_multishot()), andpoll()(viaio_uring_prep_poll_multishot()), significantly reducing the housekeeping overhead for applications. - Provided Buffers: To address the challenge of buffer management in a completion-based model,
io_uringintroduces “provided buffers.” Instead of requiring applications to pre-allocate and manage buffers for every potential incoming data stream, applications can provide a pool of buffers to the kernel. The kernel then intelligently uses these pre-registered buffers to place incoming data, leading to more efficient memory utilization and reduced memory copying.
Security implications
The same properties that make io_uring fast — direct access to a queue of kernel work, fewer syscalls — also create new attack surfaces:
- Rootkit blind spots. Because
io_uringcan perform I/O without going through the conventional syscall interface, security tools that hook onseccomp/syscall interception miss it. The “Curing” PoC rootkit demonstrated this, performing malicious actions invisibly to syscall-based detection. - Vulnerability history. Several CVEs have surfaced, including UAF bugs (e.g. CVE-2024-0582) and privilege-escalation flaws (e.g. CVE-2025-21333). Some hardened distros and container runtimes (Google, Docker) disable
io_uringby default for this reason.
Example: reading a file with io_uring
A small C program using liburing.
Prerequisites
Install liburing-dev:
sudo apt-get update
sudo apt-get install -y liburing-dev build-essential
The Code
Here’s the C code (io_uring_read_file.c):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <liburing.h>
#include <fcntl.h>
#include <unistd.h>
#define QUEUE_DEPTH 1
#define BUFFER_SIZE 1024
int main(int argc, char *argv[]) {
struct io_uring ring;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
int fd;
char buffer[BUFFER_SIZE];
int ret;
if (argc < 2) {
fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
return 1;
}
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
perror("open");
return 1;
}
// Initialize io_uring with a queue depth of 1
ret = io_uring_queue_init(QUEUE_DEPTH, &ring, 0);
if (ret < 0) {
fprintf(stderr, "io_uring_queue_init: %s\n", strerror(-ret));
close(fd);
return 1;
}
// Get a Submission Queue Entry (SQE)
sqe = io_uring_get_sqe(&ring);
if (!sqe) {
fprintf(stderr, "io_uring_get_sqe failed\n");
io_uring_queue_exit(&ring);
close(fd);
return 1;
}
// Prepare a read operation: read from fd into buffer, size BUFFER_SIZE, at offset 0
io_uring_prep_read(sqe, fd, buffer, BUFFER_SIZE, 0);
// Submit the prepared SQE to the kernel
io_uring_submit(&ring);
// Wait for a Completion Queue Event (CQE)
ret = io_uring_wait_cqe(&ring, &cqe);
if (ret < 0) {
fprintf(stderr, "io_uring_wait_cqe: %s\n", strerror(-ret));
io_uring_queue_exit(&ring);
close(fd);
return 1;
}
// Check the result of the operation
if (cqe->res < 0) {
fprintf(stderr, "io_uring read failed: %s\n", strerror(-cqe->res));
} else {
// Null-terminate the buffer and print the content
buffer[cqe->res] = '\0';
printf("Read %d bytes: %s\n", cqe->res, buffer);
}
// Mark the CQE as seen by the application
io_uring_cqe_seen(&ring, cqe);
// Clean up io_uring resources
io_uring_queue_exit(&ring);
close(fd);
return 0;
}
Explanation
- Initialization:
io_uring_queue_init()sets up theio_uringinstance, creating the submission and completion queues.QUEUE_DEPTHdefines the maximum number of outstanding requests. - Getting an SQE:
io_uring_get_sqe()retrieves an available Submission Queue Entry (SQE) from the submission queue. This SQE is where we describe the I/O operation we want to perform. - Preparing the Operation:
io_uring_prep_read()populates the SQE with the details of a read operation: the file descriptor (fd), the buffer to read into (buffer), the size of the buffer (BUFFER_SIZE), and the offset (0for the beginning of the file). - Submitting the Request:
io_uring_submit()tells the kernel that there are new requests in the submission queue that need processing. This is where the asynchronous operation begins. - Waiting for Completion:
io_uring_wait_cqe()blocks until a Completion Queue Event (CQE) is available in the completion queue. The CQE contains the result of the completed I/O operation. - Processing the Result: The
cqe->resfield indicates the result of the operation. A positive value signifies the number of bytes read, while a negative value indicates an error. The read data is then printed to the console. - Cleanup:
io_uring_cqe_seen()marks the CQE as processed, andio_uring_queue_exit()cleans up theio_uringinstance.
Compilation and Execution
To compile the program, use gcc and link against liburing:
gcc -Wall -O2 -D_GNU_SOURCE -o io_uring_read_file io_uring_read_file.c -luring
Before running, create a simple text file named test_file.txt:
echo "Hello, io_uring! This is a test file to demonstrate reading with io_uring." > test_file.txt
Now, execute the program with the test file:
./io_uring_read_file test_file.txt
Output:
Read 76 bytes: Hello, io_uring! This is a test file to demonstrate reading with io_uring.
Talk: io_uring: So Fast. It’s Scary. — Paul Moore, Microsoft
References
- Jens Axboe — Efficient IO with io_uring (the original paper)
- Jens Axboe — io_uring and networking in 2023
- ARMO — io_uring Rootkit Bypasses Linux Security Tools
- The Hacker News — Linux io_uring PoC Rootkit Bypasses System Call-Based Threat Detection
- Exodus Intelligence — Mind the Patch Gap: Exploiting an io_uring Vulnerability in Ubuntu
- DataStackHub — CVE-2025-21333: Linux io_uring Escalation Vulnerability


