I built a Wayland compositor in Zig in one day. Here's what I learned.
How teru went from a terminal emulator to a full Wayland compositor with XWayland, tiling, and a built-in launcher — in 3,700 lines of Zig and under 1MB.
I’ve been daily-driving XMonad on X11 for years. It’s a beautiful piece of software — a window manager that’s actually a Haskell library you extend with real code. But X11 is dying, and I already had a terminal emulator (teru) with a built-in tiling engine, 8 layout algorithms, and 10 workspaces.
The question was obvious: what if the terminal was the window manager?
The pitch
teru already does most of what a tiling WM does:
- 8 tiling layouts (master-stack, grid, monocle, spiral, dishes, three-col, columns, accordion)
- 10 workspaces with per-workspace layout cycling
- SIMD CPU rendering at under 50μs per frame
- Config-driven keybinds, session persistence, AI agent integration
What it doesn’t do is manage other programs’ windows. Firefox, Chromium, Electron apps — they’re all Wayland clients that need a compositor to display them. That’s what wlroots provides.
One day, 3,700 lines
The compositor — teruwm — is 3,700 lines of Zig across 14 files. It links libteru (the shared library) for terminal rendering and wlroots for Wayland compositing. The entire binary is under 1MB stripped.
Here’s what it took:
Phase 1: wlroots bootstrap (~400 lines)
Server init, backend autocreate, output handling, seat management. Zig calls C with zero overhead — no bindings generator, no FFI wrapper. You write extern "wlroots-0.18" fn wlr_backend_autocreate(...) and it’s a direct function call.
Phase 2: terminal panes (~300 lines)
Each terminal pane wraps a libteru Pane (PTY + Grid + VtParser) with a SoftwareRenderer and a wlr_scene_buffer. The renderer writes ARGB pixels directly into the wlr_buffer’s memory — zero copies.
Phase 3: everything else (~3,000 lines) XDG shell (external Wayland clients), XWayland (X11 apps), keyboard input routing, cursor handling, dual configurable status bars, a built-in application launcher, numbered scratchpads, window rules, clipboard, fullscreen, media keys, screenshot, notifications, float toggle with mouse drag-to-move/resize, smooth border-drag tiling resize, VT switching.
That last paragraph sounds absurd. But most of these features are 20-50 lines each because libteru already does the heavy lifting. The launcher is 200 lines. Scratchpads are 60 lines. Media keys are 10 lines.
Why Zig is perfect for this
I’ve written Zig daily for over a year (teru is 37K lines). Here’s what makes it ideal for a compositor:
C interop is native. wlroots is a C library. In Rust, you’d spend weeks writing safe wrappers or fighting unsafe blocks. In Zig, you declare extern "wlroots-0.18" fn wlr_scene_create() callconv(.c) ?*wlr_scene and call it. Same calling convention, same memory layout, zero overhead.
@Vector is first-class SIMD. @Vector(4, u32) processes 4 pixels per cycle. The compiler auto-fuses to 256-bit on AVX2. No intrinsics headers, no platform-specific code. One line works on SSE2, AVX2, and ARM NEON.
Comptime eliminates dead code. When you build teru without -Dcompositor=true, all 3,700 lines of compositor code vanish. The terminal binary is 1.6MB with zero wlroots dependency. The compositor binary is 1MB with wlroots linked. Both from the same codebase.
Explicit allocators. The allocator is a parameter, not a global. You can prove the render loop doesn’t allocate by reading the function signature. Try that in C++ with hidden copies and implicit new.
The hybrid rendering trick
Here’s the architectural decision that makes teruwm faster than Sway or Hyprland for terminal-heavy workflows:
Terminal panes use CPU rendering (SIMD, zero-copy into wlr_buffer). External windows use GPU compositing (wlroots scene graph). Each does what it’s best at.
GPU terminals upload textures 60 times per second. For a monospace grid where 2 cells change per frame, that’s absurd. CPU rendering avoids the upload entirely — the pixels are already in the right memory.
For compositing (blending 5-10 surfaces), GPU is better. wlroots handles this with damage tracking — only re-composite changed regions.
When a terminal pane is fullscreen (monocle layout), the compositor can bypass GPU entirely: the DRM subsystem displays the CPU buffer directly. Zero GPU involvement. This is the fastest possible path from keystroke to photon.
What I’m using right now
I tested teruwm on my ultrawide (2560×1080) from a TTY. It starts in ~50ms, renders 365×77 terminal cells, and handles workspace switching, pane spawning, and layout cycling without a single crash (after fixing some bounds errors on the first attempt).
The XMonad config I’m replacing has 1,242 lines of Haskell. The teruwm config is a flat key=value file:
# ~/.config/teruwm/config
gap = 8
[bar.top]
left = {workspaces}
center = {title}
right = {clock}
[bar.bottom]
left = {mem}
right = {exec:2:sensors | grep Tctl | awk '{print $2}'}
[rules]
Chromium = 2
Steam = 7
No recompile. Hot-reloadable. And the built-in launcher (Super+D) scans $PATH once on startup and filters 3,600 executables in real-time as you type — no rofi, no dmenu, no external process.
What’s next
teruwm is usable but not daily-drivable yet. I still need:
- Reliable keyboard layout switching (Dvorak + Ukrainian via XKB)
- Layer shell for third-party panels (if anyone insists on waybar)
- io_uring for the standalone terminal’s event loop
- Row-level dirty tracking (currently re-renders entire grid on any cell change)
The terminal (teru) and compositor (teruwm) are two packages from one repo. Install just the terminal if you don’t need the WM. Install teruwm if you want the whole stack. Both are MIT-licensed.