First Hit

Ray tracing reduces, in description, to three steps: shoot a ray, hit a thing, shade it. The first step is solved geometry. The third opens onto the rest of the renderer. The middle step — the one that sounds trivial — is what this chapter is about, because “hit” is not a single idea. It is four different geometric arguments wearing the same name, and getting any of them subtly wrong produces an image where everything looks right except the one thing that isn’t.

The first version of this renderer is a CPU ray caster: a Scene::TraceImage that walks every pixel, generates a primary ray, asks the scene for the closest hit, and writes whatever that hit reports as color. No light, no bounces, no global illumination — just the question of what each ray touches, answered as cleanly as possible. Everything that comes later in the series — Monte Carlo path tracing, microfacet BRDFs, transmission, image-based lighting, full San Miguel — is bolted onto this skeleton. If the closest-hit query is wrong here, every later chapter is wrong too.

The whole renderer is a machine for finding one number

Every ray in the scene is the same parametric line:

P(t) = origin + t * direction;

The whole renderer, at this stage, exists to find the right t. Which surface a ray hits, how far away that surface is, what the surface looks like at the hit point, whether a second ray launched from there reaches a light — all of these reduce to a t value, and the question of which one to keep.

Frame the work that way and a lot of incidental decisions fall out for free. The intersection record carries t, the position, the normal, and a pointer back to the shape. Closest-hit selection is a comparison on t. Self-intersection avoidance becomes “offset the new ray’s origin by a small t along the normal.” The whole bookkeeping of the chapter is t flowing through the system at the right precision and being compared against the right things.

One question, four geometries

The scene has spheres, axis-aligned boxes, cylinders, and triangle meshes. Each implements the same Shape::intersect(Ray r) interface, but the math behind each one is unrelated, and the failure modes differ enough that they’re worth treating individually.

Sphere

Substituting P(t) = O + t·D into the implicit sphere equation |P − C|² = r² gives a scalar quadratic in t:

vec3 qBar = r.origin - center;
float a = dot(r.direction, r.direction);     // = 1 if D is normalized
float b = 2.0f * dot(qBar, r.direction);
float c = dot(qBar, qBar) - radius * radius;

float discriminant = b*b - 4.0f*a*c;
if (discriminant < 0.0f) return std::nullopt;

float sqrtDisc = sqrt(discriminant);
float t_minus = (-b - sqrtDisc) / (2.0f * a);
float t_plus  = (-b + sqrtDisc) / (2.0f * a);

The two roots bracket an interval — t_minus is the entry point, t_plus the exit. Picking the closest visible hit isn’t quite “smallest root”: if the ray origin is inside the sphere, t_minus is negative and the visible hit is t_plus (the exit through the back wall). The intersection routine returns whichever is the smallest positive root, with the surface normal computed as normalize(P(t) − C).

Axis-aligned box

A box is the intersection of three slabs — pairs of parallel planes perpendicular to the X, Y, and Z axes. For each axis, the ray crosses the two slab planes at parameters t0 and t1, defining an interval [t0, t1] along the ray. The ray hits the box if and only if the three per-axis intervals share a common overlap.

Interval interval;  // initialized to [0, ∞]
for (int i = 0; i < 3; ++i) {
    Interval slab = intersectSlab(r, i);
    if (slab.tmin > slab.tmax) return std::nullopt;  // miss
    interval.intersect(slab);                        // narrow accumulated overlap
    if (interval.tmin > interval.tmax) return std::nullopt;
}

The clean part is that the accumulated interval.tmin is automatically the entry point on the box surface, and its associated normal — tracked alongside the t values — points along the axis whose slab clipped the entry. The annoying part is the parallel-ray case: when D[axis] = 0, the slab equation degenerates and the test becomes “is the ray’s origin between the two planes.” That case has to be coded separately or it produces NaNs on grazing rays.

Cylinder

The cylinder is the most involved primitive. It’s the intersection of two surfaces — an infinite cylindrical body and a slab clipping it to a finite length — so the intersection routine combines a 2D quadratic with a 1D slab and returns the overlap of their t-intervals.

The simplification is to first transform the ray into the cylinder’s local frame, where its axis aligns with Z. Then the body test is the sphere’s quadratic restricted to XY:

float a = dirT.x*dirT.x + dirT.y*dirT.y;
float b = 2.0f * (qBarT.x*dirT.x + qBarT.y*dirT.y);
float c = qBarT.x*qBarT.x + qBarT.y*qBarT.y - radius*radius;

…and the cap test is a Z-axis slab between z = 0 and z = ‖axis‖. Each test produces an interval; their intersection is the visible portion of the ray inside the cylinder. After picking the smallest positive t, the surface normal is rotated back into world space — for a body hit the normal is normalize(vec3(P.xy, 0)) in local frame, for a cap hit it’s ±Z.

What earns its keep here is keeping the rotation as a quaternion built from FromTwoVectors(zAxis, axisNormalized), then converting once to a mat3 and using its transpose as the inverse. This avoids ever having to construct the cylinder coordinate frame by hand, and the inverse-transpose move makes the world-to-local and local-to-world transforms cheap and symmetric.

Triangle

Möller–Trumbore reformulates the triangle hit test as a 3×3 linear system in (t, u, v), where u, v are barycentrics. Cramer’s rule and a precomputed cross product collapse the whole thing to a few dot products with early-outs after each one:

vec3 e1 = v1 - v0, e2 = v2 - v0;
vec3 h  = cross(r.direction, e2);
float a = dot(e1, h);
if (abs(a) < EPSILON) return std::nullopt;       // ray ‖ triangle plane

float f = 1.0f / a;
vec3 s  = r.origin - v0;
float u = f * dot(s, h);
if (u < 0.0f || u > 1.0f) return std::nullopt;   // outside edge u

vec3 q  = cross(s, e1);
float v = f * dot(r.direction, q);
if (v < 0.0f || u + v > 1.0f) return std::nullopt;  // outside edge v / hypotenuse

float t = f * dot(e2, q);

The algorithm is fast precisely because each barycentric is computed and tested before the next is even started. A ray that misses on u never pays for the second cross product. With a few hundred thousand triangles in a mesh, that early-out is the only reason brute-force triangle tests are remotely tractable — and it’s also why the BVH, which culls whole subtrees of triangles before any of them are tested individually, becomes essential a moment later.

What didn’t generalize

What stands out is how little the four implementations actually share. Each one has its own degenerate cases (D parallel to slab, discriminant < 0, a near zero), its own conventions for “front face,” and its own numerical traps near grazing angles. The abstract Shape::intersect interface promises uniformity, but the bodies behind it have almost nothing in common. The dispatch is the only place they look alike.

The image is the debugger

In the early stages of the renderer, the output image is the only meaningful diagnostic surface. Rather than stepping through code, intermediate values are rendered directly as color.

t as grayscale depth. Surface normals as RGB through glm::abs(closest.normal). Material diffuse colors as flat fill. The bounding box of each primitive in place of the actual geometry. Each visualization is a different lens on the same intersection record, and a wrong calculation almost always shows up as a visual glitch the eye catches before any printed log would.

Rendering material color first is the simplest sanity check — every primitive should appear in its assigned Kd, with no ghost geometry and no missing pieces. A missing object means an intersection function is returning false where it shouldn’t.

The same scene rendered as t / maxDistance gives a smooth grayscale gradient — closer surfaces dark, farther ones bright. Discontinuities mean the wrong ray won the closest-hit comparison. The transitions across smooth surfaces have to be continuous; if they aren’t, your normalization is wrong or your ray direction isn’t actually unit-length.

Rendering normals as color is the strictest test of all. Sphere normals radiate outward, producing a smooth gradient. Box normals are perpendicular to each face, producing three flat-colored regions. Triangles either show flat per-face normals or smoothly interpolated ones depending on whether you computed vertex normals. A normal that points the wrong way through some boundary is immediately visible as a misaligned color step — a kind of debugging that no breakpoint replicates.

Six faces, six colors

Before any spatial structure can earn its keep, the bounding boxes themselves have to be honest. An AABB that doesn’t fully contain its primitive will cause rays to skip the primitive entirely; one that’s wildly too large will defeat the acceleration. To verify before trusting the hierarchy, each shape’s actual intersect() is temporarily swapped for its bounding-box intersection:

// In acceleration.cpp, deliberately stand in for the geometric intersect
auto result = shape->intersectBoundingBox(ray);

Each boundingBox() is small, but easy to get wrong on the more complex primitives. The cylinder, in particular, has to enclose both end caps and the full radial extent — box.extend(base ± vec3(radius)) and box.extend(top ± vec3(radius)). The triangle just extends a fresh AABB through its three vertices.

Rendering normals at this level is the strongest visual confirmation: every face of an AABB is parallel to a coordinate plane, so the normals collapse to exactly six colors. If the boxes are correctly sized and positioned, the scene resolves into clean blocks of those six colors, each block enclosing one primitive. Anything else — an off-axis tint, a stray gradient, a primitive bleeding through its box — points directly at the constructor that needs revisiting.

Most of the scene, ignored

With four spheres, the renderer is already fast. Add the Stanford Bunny and a Dwarf model — together about a hundred thousand triangles — and a single frame takes 89 seconds. The reason is brute force: for every primary ray, for every pixel, every shape in the scene is asked whether it’s hit. With a 1 megapixel image and 100 000 triangles, that’s 10¹¹ intersection tests per frame. The math doesn’t scale.

A Bounding Volume Hierarchy collapses that into a logarithm. Shapes are recursively partitioned into a binary tree of AABBs; each interior node’s box bounds all its descendants. A ray descends the tree, but at every node it first tests against the node’s box. If the ray misses the box, the entire subtree is skipped — not one of the primitives below is ever queried. The asymptotic cost per ray drops from O(n) to roughly O(log n), with a small constant overhead from box-test failures along the way.

The renderer pulls in madmann91’s bvh_v2 library rather than rolling its own — its traversal is well-tuned, supports both robust and fast slab tests, and accepts arbitrary leaf intersection through a callback. Plugging it in requires bridging two coordinate worlds: the library uses its own Bvh::BBox<float, 3> for boxes and bvh::v2::Ray for rays, while the rest of the codebase is built on glm::vec3. A pair of thin adapters does the translation. SimpleBox extends the library’s BBox with GLM constructors, and BvhShape wraps a Shape* so that the BVH can hold any primitive uniformly:

struct BvhShape {
    Shape* shape;
    SimpleBox getBoundingBox() const { return shape->boundingBox(); }
    // Called by the traversal when a leaf's AABB is hit
    std::optional<Intersection> intersect(const bvh::v2::Ray<float, 3>& ray) const;
};

The traversal calls back into BvhShape::intersect only after the leaf’s AABB has already been hit, which is what makes the whole thing pay off — the per-shape geometric test runs only when there’s a real chance of hitting it.

Building the tree itself is a separate decision. The library exposes two builders for the same hierarchy, with very different cost models:

SweepSahBuilder uses the Surface Area Heuristic: at each split, it evaluates the cost of placing the partition along every candidate position on every axis, choosing the one that minimizes expected ray traversal cost. It produces high-quality trees but builds single-threaded.
MiniTreeBuilder trades some tree quality for parallelism: shapes are bucketed into mini-trees built on independent threads, then stitched into a top-level structure.

Both produce trees that drop the same dense scene from 89 seconds to roughly 130 ms. With the multi-threaded builder, 110.

Method	Trial 1	Trial 2	Trial 3	Notes
Non-BVH	89423.45 ms	89623.23 ms	89784.13 ms	Brute-force, every ray
SweepSahBuilder	130.44ms	132.91ms	136.96ms	Single-threaded, SAH
MiniTreeBuilder	111.41ms	109.23ms	110.55ms	Threaded build

A roughly 700× improvement. The math has not changed — every primitive that gets tested is tested with exactly the same intersection routine as before. What changed is how many of them get tested. The BVH lets the renderer ignore most of the scene most of the time, and the gap between mathematically correct and computationally tractable becomes very real.

The trade-off is real at the other end too. For trivial scenes — a handful of primitives — the build cost dominates, the tree is shallower than the constant-overhead penalty, and a flat-list brute force is faster. The BVH only earns its keep once the geometry is dense enough that the savings on traversal outweigh the cost of building the tree. The crossover, in this scene, is somewhere around a few hundred primitives. For path tracing, where the same tree is queried tens of millions of times per frame across primary rays, shadow rays, and bounce continuations, that crossover comes very fast.

What the image doesn’t yet say

The output of this chapter looks convincing in one narrow sense: shapes appear in the right place, depth is correct, normals point outward, the BVH delivers its speedup. But there is no light in the picture. Surfaces are filled with their material’s flat diffuse color, uniformly bright from nowhere. The renderer knows what each ray hit — it doesn’t yet know what that hit means.

That is the next chapter. With intersections solved, the question becomes how light moves between them. And the moment you ask that, a new fact arrives: a hit is not a color. It’s the start of a path.