Voxels That Scale and Break

A voxel siege room with drywall breached down to the bare metal frame and a metal-cored pillar at the centre — The siege-room demo: drywall breached back to the wood studs, with a metal-cored pillar at the centre.

I’ve been building a voxel-based shooter where every wall is destructible — the kind of game where you breach a room by blowing a hole through the drywall instead of opening the door¹. Voxels are the natural fit: each cubic cell is solid or not, can be removed individually, and a crater is just the cells inside the radius going empty.

The obvious version of that doesn’t work. A building at a gameplay-meaningful resolution (5 cm cubes) is hundreds of millions of cells. Re-meshing on every shot stalls the frame.

A handful of techniques fix that, and most aren’t novel on their own — sparse storage, greedy meshing, off-thread re-meshing. What’s interesting is how they compose. The one piece I haven’t seen elsewhere is how explosions decide what’s in the way: each blast bakes a depth cubemap once and turns tens of thousands of occlusion rays into an O(1) lookup. That’s the last section.

Don’t store the air

Most of a voxel world is empty. A 50 m × 50 m × 30 m building at 5 cm resolution is around 600 million cells stored densely, but only a thin shell of those is ever occupied — walls are thin, floors are thin, the inside of a room is air.

The storage answer is a brick-map: divide the world into 8³ bricks (512 voxels each), and allocate a brick’s block of memory only when at least one of its cells turns solid. An empty brick costs nothing but a slot in the index. When destruction empties a brick’s last voxel, the block is freed back to a pool; when a previously-empty brick gains a voxel, a fresh one is allocated.

The resident footprint scales with the solid material — the count of non-empty bricks — not with the bounding box of the world. A town’s worth of voxels fits in memory because most of the town is air: in our benchmark, an eight-by-eight block of buildings drops from 251 MB dense to 34 MB sparse, 7.4× less.

Streaming is the other half. Bricks far from the player are dropped from memory and the near ones re-stamped from their source, so the working set is bounded by view distance, not by world size. (Paging the dropped bricks to disk, rather than rebuilding them, is the next step for genuinely massive maps.) Sparse storage plus a bounded resident set is what lets the world grow past a single building.

Greedy meshing

One quad per exposed face is the straightforward mesh, and it’s wasteful: a flat 10 m × 3 m wall at 5 cm voxels is 12,000 quads, nearly all of them coplanar and identical.

Greedy meshing merges any run of coplanar faces that share the same shading into a single rectangle. A uniform wall collapses to one quad; a varied one to a handful. On realistic mixed content that’s about a 25× drop in triangle count — far more on a pristine flat wall, far less on a fragmented or curved surface.

The merge key is what keeps it correct. Each face vertex carries a color from the voxel’s index, ambient occlusion baked per-vertex from the surrounding solids, and a per-voxel damage value that drives crack shading. Two adjacent voxels with the same color but different damage can’t share a quad — the damage would interpolate across the merged face and bleed into the neighbour. So the key is (color, AO, damage): the mesher sweeps each axis-aligned slab, grows the largest rectangle whose cells all share a key, emits one quad, marks them done, and moves on.

The result is pixel-identical to the per-face mesh, and a test confirms it, checking that the greedy mesh covers the exact same exposed unit faces, winds them the same way, and never merges two cells across a shading discontinuity. (A live toggle in the demo swaps the two meshers so you can A/B them by eye, too.)

The cost is more CPU per chunk on the mesher side. But meshing runs off the main thread on a per-frame budget, so the extra work never touches the frame. The shadow pass alone pays for it — the depth-only shadow caster runs through 25× fewer triangles.

Re-mesh off the main thread

Re-meshing has to happen off the main thread. A single shot dirties one chunk; a grenade dirties several; a wave of destruction dirties dozens at once — and rebuilding that many chunks synchronously hitches the frame past 50 ms, which is unplayable.

The meshing runs on Unity’s DOTS stack — though only the part of it that earns its keep here. Burst-compiled jobs over NativeArrays do the work, reading the brick storage directly on worker threads. There’s no full ECS: these hot loops just walk flat native arrays, and nothing is gained by modelling individual voxels or chunks as entities. Each frame does three things in a strict order: finish the previous batch of mesh jobs and upload their buffers to the GPU, let gameplay mutate the voxels (impact and detachment), then schedule the next batch. The order matters: complete before mutate. The mesh jobs read the bricks as their input, so if gameplay writes a brick a job is still reading, the result is garbage. Completing first, mutating second, and scheduling third keeps the readers and writers from overlapping; because the jobs only ever read, many chunks mesh in parallel.

The payoff: a destruction storm that used to spike a single frame to 58 ms now drains over several frames, worst case 2.5 ms — a 23× lower peak. The total meshing work didn’t change; it moved off the main thread and across cores.

One detail bites if you skip it: the dirty marker has to pad one voxel on every face. The mesher reads a chunk’s neighbours to decide whether to emit a face along the shared boundary, so a hit at the edge of a chunk has to dirty the neighbour too. Skip the pad and you get sliver holes at chunk seams; they don’t show in testing but appear under load.

Hit detection without colliders

A voxel grid is aligned, finite, and indexable, so you don’t need a physics collider to trace a ray against it. The Amanatides–Woo DDA walks a ray cell by cell, visiting each cell it crosses exactly once with no per-ray allocation. A shot is just worldRay → DDA → first solid cell.

That removes a whole category of cost: no PhysX, no broadphase or narrowphase, and — the part that matters for a destructible world — no collider to rebuild every time a voxel disappears. A single core clears several million of the short, range-limited rays a shooter actually fires per second, and it scales across cores from there.

The grid isn’t the whole physics story, though. Static geometry answers to DDA; dynamic things — props, characters — sit in PhysX as usual, separate from the grid. What DDA doesn’t give you is a swept capsule against the world, so character-vs-world collision is a separate solver: generated local colliders, or a custom capsule-vs-grid sweep. The grid makes that solver cheap, but it still has to be written.

Cracks in the shader

Minecraft destruction is binary: a block is whole, or it’s gone. A tactical shooter wants the in-between — a wall that visibly cracks before it breaks, a hole that ragged-edges as it takes hits.

The cheap way is to keep the damage in data, not geometry. Every voxel has a damage byte beside its color index. A hit doesn’t carve the mesh; it stamps a static greyscale crack image into the damage bytes of the surface voxels around the impact. The shader then darkens each voxel in proportion to its damage, so the crack you see is a spatial pattern of darkened cells in the shape of that stamp — not a crack texture sampled per pixel, just damaged voxels rendered dark. Only when a voxel’s damage crosses the lethal threshold does the gameplay layer remove it outright, and only then does the geometry change.²

I tried generating the crack shape procedurally first — straight cracks, rule-based branches, erratic noise. None of it looked right, so I went back to a single hand-authored image. A pool of artist-drawn patterns, picked per shot, is the eventual plan.

Cracks have a plane. They’re stamped along the face the bullet entered, in that face’s plane, never extruded through the wall’s interior. Each surface voxel carries a precomputed plane — baked offline by a flatness check over its neighbourhood — and the bullet reads it at hit time. A grazing shot that crosses many voxels of one face cracks it once, not once per voxel; a clean through-shot cracks the front face and the back face independently, each in its own plane.

What stays standing

Cracks are sub-lethal. When a hit is lethal and voxels actually leave, the question becomes what holds the rest up.

Some voxels are anchors — structural metal, the frame of a wall. A destructible voxel survives only while it connects, through some chain of solid neighbours, to an anchor. After an impact removes voxels, the cells that just lost their support have to detach and vanish. The straightforward test is a flood-fill out from each emptied region: if the flood reaches an anchor, the region holds; if it doesn’t, it falls.

In practice that’s too slow. Real walls anchor at the perimeter, so a hole punched in the centre floods all the way out to the frame — thousands of cells per shot, and under auto-fire it stacks.

The fix is an anchor early-out. The instant a flood touches an anchor it stops, marks every cell it walked as part of an anchored region, and returns; a later flood that runs into one of those marks stops too. The cost per hit drops to the distance to the nearest anchor, instead of the size of the whole structure.

There’s a correctness trap in the marking. It needs three states, not two — unvisited, anchored, and visited-this-flood. With only two, a later flood can wall itself off behind a sibling flood’s visited cells and wrongly declare itself detached. Keeping visited-this-flood separate from anchored — the first is cleared between hits, the second persists — keeps a flood from being walled off from its own anchor. On a single wall a centre hit lands around 1–2 ms, down from a number that wouldn’t fit in a frame at all.

Explosions cast shadows

This is the piece I haven’t seen elsewhere. An area blast — a grenade, a breaching charge — damages a sphere of cells around its centre. But it shouldn’t punch through a solid obstacle in the way: a charge going off beside a steel pillar shouldn’t scar the wall behind it.

The naive test is to trace a ray from the blast centre to every cell in the radius — tens of thousands of rays per blast. It’s doable, but it pays per cell for a question whose answer mostly varies by direction, not by cell.

So treat the explosion as a point light and bake a depth shadow map from it. Fire a fixed pattern of rays out of the blast centre — a cubemap, six faces of M × M texels — and store, per texel, the distance to the first occluder in that direction. At M = 32 that’s 6 × 32 × 32 ≈ 6,000 rays, fired as one batched off-thread job in under a millisecond. The damage gate is then exactly a shadow test: for each cell in the radius, take its direction from the centre, read that texel’s stored depth, and damage the cell only if it sits nearer than the occluder. Closer than the depth, it’s lit; past it, it’s in shadow.

Casting it as a shadow map buys several things at once. A thin obstacle simply falls between rays — a two-voxel pole subtends less than one texel at blast range, so it casts no shadow at all, and the “only real cover blocks a blast” rule drops out of the texel resolution with no special case for it. A real occluder’s silhouette gets quantized to texel boundaries, so its shadow edge is ragged by about one texel; the natural jitter of the blast’s own shape hides that. And because the rays are cast against ordinary physics colliders, the occluder can be anything — a deployable shield, a body, a piece of non-grid mesh — caught by a layer mask with no code change. The bake produces nothing but a grid of depths; every policy decision (exempt the centre, bias the near field, the comparison itself) lives in the cell loop that reads it, which means that loop can be unit-tested against a hand-filled depth grid with no colliders in the scene at all. One bake per blast, then an O(1) lookup per cell.

What it adds up to

Put together: a chunk spans tens of thousands of cells, almost all of them air, and only the chunks near the player stay resident. A realistic scene holds 90 Hz on a laptop RTX 3070 — with enough headroom that heavy fragmentation is where you reach for LOD, and a real headset is the final word on the numbers. A shot is a DDA hit, a damage byte, and a deferred re-mesh; an explosion adds a cubemap bake. A wall takes hundreds of hits, and the expensive part of any one of them is the structural flood, not the cosmetics.

I’ve left out about as much as I covered: character-vs-world collision (the separate solver, still being written), the bake passes that tag flat surfaces for crack orientation, the VR stereo path, and the LOD that town-scale scenes will need. Each is its own post.

None of these layers does more than one job: sparse storage for the empty space, off-thread meshing for the geometry, DDA for hits, the anchor flood for structure. That separation is what turns voxels from a stylistic choice into a material you can take apart in real time.