I/O Models
How your program waits for I/O is one of the most consequential architectural decisions you can make. The difference between blocking and async I/O is the difference between serving 100 vs. 100,000 connections.
Blocking I/O
The default. A read() call blocks the calling thread until data is available.
Simple to reason about, but each blocked thread consumes a kernel stack (~8KB).
Thousands of concurrent connections = thousands of blocked threads = memory pressure and scheduling overhead.
// This thread does nothing while waiting
ssize_t n = read(fd, buf, sizeof(buf)); // blocks until data arrives
Non-blocking I/O
Set O_NONBLOCK on an fd. read() returns immediately with EAGAIN if no data is available.
The program must poll or check readiness another way. Avoids blocking, but busy-waiting wastes CPU.
select and poll
The traditional multiplexing solution: pass a set of fds; the kernel tells you which are ready.
select— limited to 1024 fds, O(n) scan, fd_set must be rebuilt each callpoll— removes the 1024 limit, still O(n) — not scalable to 10k+ fds
epoll — The Linux Scalable Solution
epoll uses an event-driven model. Register interest once; get notified only when an fd is ready.
O(1) per event regardless of how many fds are watched. The basis for Node.js, nginx, Redis.
int epfd = epoll_create1(0);
epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev); // register once
// Event loop
while (1) {
int n = epoll_wait(epfd, events, MAX_EVENTS, -1); // blocks efficiently
for (int i = 0; i < n; i++) {
handle(events[i].data.fd);
}
}
Edge-triggered vs Level-triggered: Edge-triggered (EPOLLET) fires once when state changes;
level-triggered fires as long as the fd is ready. Edge-triggered is more efficient but trickier to use correctly.
io_uring — The Modern Async Interface
Introduced in Linux 5.1. Uses two shared ring buffers (submission queue + completion queue) to submit and collect I/O operations without any syscall overhead in the hot path. Supports truly async I/O for files, sockets, and more.
// Submit without a syscall in the fast path
struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
io_uring_prep_read(sqe, fd, buf, len, 0);
io_uring_submit(&ring); // syscall only to kick the kernel
io_uring can operate in polling mode (zero syscalls) on high-performance storage.
It’s rapidly becoming the preferred async I/O interface for latency-sensitive systems.
Open Questions
- How does
io_uringcompare toepollfor network I/O at scale? - What is
IORING_FEAT_FAST_POLLand how does it reduce context switches? - How does Windows IOCP compare architecturally to
io_uring?