The deferred pipeline at the end of chapter one paints every visible pixel with the right BRDF, but it does so as if every light reached every surface unobstructed. The brass lion head sitting on the wooden table casts no shadow onto the wood. The boulder doesn’t block the floor light from reaching the cabinet behind it. Light passes through every solid object as if it weren’t there.
The path tracer answered occlusion the honest way: cast a ray from the surface toward the light and check whether anything is in between. The rasterizer cannot afford that — there is no scene structure to query, no BVH to traverse, no per-pixel ray budget. What it can do is render the scene a second time, from the light’s point of view, capture the depth of the closest surface in each direction, and then later — during the lighting pass — ask whether the depth recorded in that map matches the depth of the fragment currently being shaded. If they match, the fragment is the closest thing to the light along that direction; it is lit. If the recorded depth is closer, something is between the fragment and the light; it is shadowed.
That second render is the shadow map. The whole chapter is what the rasterizer has to do to make the comparison tolerable to look at.
Light as a camera
A shadow pass is a depth-only render of the scene with the light’s transform substituted for the camera’s. The two common light types map to two projection matrices.
A directional light — sun, moon, anything effectively at infinity — has parallel rays and an orthographic projection. The shadow volume is a box that should encompass the geometry casting and receiving shadows. Outside that box, no shadowing is computed.
A spot light — a flashlight, a stage spotlight, anything with a finite source and a cone of influence — uses a perspective projection with a configurable field of view and aspect. Shadows from the light’s edge stretch outward, the way real point sources behave when their light grazes a surface near the cone boundary.
Both projections write the same kind of depth into the same kind of texture, but the depth value’s relationship to world distance is different. For ortho, depth is linear in world space; for perspective, depth is 1/z-distorted. The shadow shader has to know which projection it’s looking at to decode it correctly.
Independent of projection, frustum fitting is the parameter that controls shadow quality at fixed map resolution. Tighter fitting concentrates the texture’s texels onto the geometry that needs them; looser fitting spreads them across mostly-empty world space. The shadow map’s resolution is fixed, so every doubling of the orthographic frustum size halves the effective texel density on the geometry — and the shadows pixelate.
Binary shadows and their failure modes
The simplest shadow comparison is a strict less-than test. For each shaded fragment, transform its world position into light-space coordinates, sample the shadow map at the resulting UV, and ask whether the fragment’s light-space depth exceeds the recorded depth.
if (lightDepth < pixelDepth) {
inShadow = true; // something is between this fragment and the light
}
This binary test is mathematically clean and visually disastrous. Two artifacts dominate.
Shadow acne is a striped self-shadowing pattern that appears wherever a surface is almost parallel to the light. Numerical precision in the shadow map is finite, so the recorded depth quantizes into bands. The fragment’s continuous depth crosses those bands sample by sample, alternately falling above and below — every other pixel reports itself as shadowed by itself.
Peter-panning is the opposite mistake. The traditional fix for acne is to push the comparison depth slightly inward (a bias) so it never accidentally compares a surface to itself. Push it too far and the fix becomes the new bug — objects appear to float above the surfaces they sit on, their shadow detached from their feet.
A working shadow renderer needs two biases working together. A normal bias offsets the world position along the surface normal before the comparison, lifting it slightly off the surface in the same direction the surface is facing:
biasedPos = worldPos + normal * normalBias;
A slope-dependent depth bias scales the depth offset by the angle between the surface and the light, so steeply-grazing surfaces — which produce the worst acne — get a larger bias than head-on surfaces:
bias = depthBias * (1.0 - dot(normal, lightDir));
biasedPixelDepth = pixelDepth - bias;
With both biases tuned, the binary shadow looks correct from a distance — silhouettes line up, occluding objects darken what they should. Up close, the shadow’s edges are jagged. The shadow map’s texels are projecting cleanly onto the world without any filtering between them.
The penumbra is an average
Percentage-Closer Filtering (PCF) softens shadow edges by sampling a neighborhood of the shadow map and averaging the binary results. A fragment that’s clearly inside or outside the shadow returns 1.0 or 0.0 from every tap. A fragment near the boundary returns a mix — and the resulting fractional shadow value produces a soft transition.
A regular grid of taps produces visible banding at the boundary. Distributing samples in a stratified Poisson disk pattern converts the banding into noise, which the eye reads as a more natural penumbra:
vec2 texelSize = 1.0 / textureSize(shadowMap, 0);
float shadowAccumulator = 0.0;
for (int i = 0; i < 16; i++) {
vec2 offset = poissonDisk[i] * texelSize * 2.0;
float sampledDepth = texture(shadowMap, shadowIndex + offset).r;
if (sampledDepth < biasedPixelDepth) shadowAccumulator += 1.0;
}
float visibility = 1.0 - (shadowAccumulator / 16.0);
Sixteen samples is a typical PCF kernel. Fewer produces visibly grainy edges; more is mostly diminishing returns at this scale.
PCF resolves aliasing but introduces its own characteristic look: a slightly hazy shadow that loses some of the crisp definition where surfaces actually contact each other. The averaging is uniform, so the filter cannot tell which side of the boundary it should bias toward — sharp contact and soft penumbra come out at the same softness.
Depth as a distribution
Variance Shadow Maps (VSM) treat the recorded depth as a random variable whose distribution can be summarized by its first two moments: the mean depth and the mean of squared depth. Pre-filtering these moments — blurring them in advance — is mathematically equivalent to evaluating the filter on a continuous distribution rather than a single sample. Once pre-filtered, the visibility query is constant-time per fragment.
The shadow pass writes both moments. The lighting pass reads them and applies Chebyshev’s inequality to estimate visibility:
float m1 = texture(blurredShadowMap, shadowIndex).r; // mean depth
float m2 = texture(blurredShadowMap, shadowIndex).g; // mean depth²
if (biasedPixelDepth <= m1) {
visibility = 1.0; // definitely lit
} else {
float sigma2 = m2 - m1 * m1; // variance
visibility = sigma2 / (sigma2 + pow(pixelDepth - m1, 2));
}
The result is a smoothly-varying visibility estimate. VSM produces shadows softer than PCF with more definition, because the pre-filter does the smoothing once on the moment buffer — not per-fragment with sample noise.
The cost is a specific failure mode: light bleeding. Where two layers of geometry cast shadows over each other (a chair leg’s shadow falling onto a wall’s shadow, or a marble bust’s shadow on the table beneath it), Chebyshev’s inequality is loose — it conservatively overestimates visibility. The result is a faint ghost of the upper geometry visible inside the lower geometry’s shadow:
For a single isolated occluder, VSM is excellent. For complex layered scenes, the artifact is impossible to miss.
Four moments and a Cholesky
Moment Shadow Maps (MSM) push the same idea further: instead of two moments, store four — the depth raised to powers 1 through 4. With more moments, the visibility bound is tighter, and the light-bleeding artifact that plagued VSM largely vanishes.
The mathematics is no longer Chebyshev. The visibility query solves a 3×3 linear system via Cholesky decomposition to extract a tighter upper bound on the visibility distribution. The shader implementation is a few dozen lines of solver, but the result is a four-channel RGBA32F shadow buffer that can be pre-filtered the same way VSM is.
else if (shadowMapMethod == 3) {
vec4 moments = texture(blurredShadowMap, shadowIndex);
moments *= momentShadow.scale;
visibility = computeMomentShadow(moments, biasedPixelDepth, N, L);
visibility = (1.0 - visibility);
// bleed reduction + visibility curve are tuned per-scene
float p = momentShadow.bleedReduction;
visibility = clamp((visibility - p) / (1.0 - p), 0.0, 1.0);
visibility = pow(visibility, momentShadow.visScale);
}
MSM trades VSM’s light-bleed for a different artifact: thin contact shadows can disappear. Geometry that occupies very little of the shadow map — the legs of a chair, the feet of an elephant figurine — produces faint moments that the Cholesky solver’s tight bound interprets as nearly-fully-visible. Three tuning knobs trade off:
- Moment scale — pre-multiplies the moments before the solver, thickening contact shadows uniformly.
- Bleed reduction — clamps the visibility’s lower tail, sharpening shadow boundaries at the cost of any remaining bleed leaking back in if pushed too far.
- Visibility scale — a power curve applied to the output, useful for darkening or lightening the final shadow without re-tuning the moments.
The honest summary is that MSM is the highest-quality of the four methods and the most parameter-sensitive. The other methods have failure modes; MSM has dials.
Blurring the moment buffer in compute
VSM and MSM only pay off if the moment buffer is pre-filtered. A two-pass separable Gaussian blur is the standard tool, and it’s the kind of work compute shaders were designed for.
The blur dispatches in two passes — a horizontal pass with workgroup size (128, 1, 1), then a vertical pass with (1, 128, 1). Both passes use shared memory to amortize the cost of texture reads: each thread group preloads a tile of the input image into a shared vec4 array, synchronizes via barrier(), then walks the shared array to compute the blur sum. The texture read happens once per pixel, even though each pixel contributes to up to 2 × radius + 1 outputs.
shared vec4 v[128 + 101]; // workgroup tile + apron for blur radius
void main() {
uint lpos = gl_LocalInvocationID.x;
ivec2 gpos = ivec2(gl_GlobalInvocationID);
// Cooperative load of the tile + halo into shared memory
v[lpos] = imageLoad(srcImg, gpos);
if (lpos < blurWidth) v[lpos + 128] = imageLoad(srcImg, gpos + ivec2(128, 0));
barrier();
// Walk the shared array applying Gaussian weights
vec4 sum = vec4(0);
for (int i = -blurWidth; i <= blurWidth; ++i) {
sum += weights[abs(i)] * v[lpos + i + blurWidth];
}
imageStore(dstImg, gpos, sum);
}
The Gaussian weights themselves are uploaded once per blur radius via an SSBO — kernel shape is a uniform parameter, so re-tuning the blur radius doesn’t recompile the shader, just rebinds the buffer. With shared memory and proper boundary apron handling, the blur sustains a 64-radius kernel at full resolution comfortably within real-time budget.
The blur radius is a per-frame trade-off: small kernels keep the small geometry’s contact shadows crisp at the cost of grainier macro shadows; large kernels smooth large shadows at the cost of dissolving small ones. There is no correct answer — the right value depends on the scene’s geometry distribution.
What’s still hand-placed
The renderer at the end of this chapter draws PBR materials with directional and spot lights and knows where each light’s shadows fall, with soft penumbras driven by whichever filter is selected. The artifacts of binary shadow mapping — acne, peter-panning, jagged silhouettes — are gone, replaced by smooth shadow gradients that respond correctly to light movement.
What’s still missing is every other source of light in the world. The sky overhead is not lighting the scene. The walls of a room don’t bounce light onto each other. Direct illumination from a spotlight is now correctly shadowed; everything else is either the same flat ambient term or simply absent.
The next chapter replaces analytic light sources with the environment itself — an HDR panorama treated as a continuous light source, pre-filtered into representations that the lighting shader can sample at one texture lookup per pixel. The whole sky becomes a light, and the rendering equation gets evaluated against that integral, in real time, every frame.