Tutorial 5 β€” Tiled Rendering

Prerequisites

Make sure you have completed Tutorial 4 β€” Shadows.

Setting up the project

In MTMetalTutorialsApp.swift set MT5ContentView() and run:

@main
struct MetalTutorialsApp: App {
    var body: some Scene {
        WindowGroup {
            MT5ContentView()
        }
    }
}

Tiled rendering result

What changes from Tutorial 4

This tutorial refines the scene setup by implementing changes that improve performance without altering the bunny, plane, or shadow elements. The only changes are:

  1. GBuffer textures use .memoryless storage mode β†’ they live in the GPU tile memory, never written to DRAM
  2. The GBuffer pass and the Lighting pass are merged into a single MTLRenderPassDescriptor derived from view.currentRenderPassDescriptor
  3. The lighting fragment shader receives the GBuffer as a direct GBuffer struct parameter instead of reading from textures

The result: the GBuffer data never leaves the tile β€” zero bandwidth cost.


Files

File Name Description
MT5ContentView.swift Sets up Metal view and starts rendering.
MT5DeferredMetalView.swift Manages Metal layer, frame timing, and render loop.
MT5TiledDeferredRenderer.swift Handles GBuffer creation, texture setup, and command buffer encoding.
MT5DeferredRendering.metal Contains vertex and fragment shaders for geometry pass and lighting pass.
MT5RenderTargets.h Defines constants for render targets (e.g., albedo, normal, position).
MT5Uniforms.h Declares uniform structs used in the Metal shaders.

Code

GBuffer fragment β€” unchanged

The geometry-pass shaders (vertex_main, gbuffer_fragment) are identical to Tutorial 4. Metal infers the tile memory write from the [[color(N)]] attributes on the GBuffer struct.

Lighting fragment β€” new GBuffer struct parameter

In Tutorial 4 the lighting shader read from three texture2d bindings. Tutorial 5 replaces that with a direct GBuffer parameter:

struct GBuffer {
    float4 albedo    [[color(MT5RenderTargetAlbedo)]];
    float4 normal    [[color(MT5RenderTargetNormal)]];
    float4 position  [[color(MT5RenderTargetPosition)]];
};

fragment float4 deferred_lighting_fragment(
    QuadInOut                   in       [[ stage_in ]],
    GBuffer                     gBuffer               ,   // ← reads from tile, not DRAM
    constant MT5FragmentUniforms &uniforms [[buffer(1)]])
{
    float4 albedo_vis_at_pix = gBuffer.albedo;
    float4 normal_at_pix     = gBuffer.normal;
    float4 position_at_pix   = gBuffer.position;
    // … same GGX lighting calculation as Tutorial 3/4
}

The GBuffer parameter with [[color(N)]] attributes reads directly from the tile memory render targets. No texture.read() call, no DRAM access.

Note: This tile-memory read only works inside a single merged render pass. If the passes are split (as in Tutorial 3), tile memory is flushed between passes and DRAM textures must be used instead.


What .storeAction = .dontCare actually means

Store action What happens When to use
.store GPU writes tile β†’ DRAM When you need the texture in a later pass or on the CPU
.dontCare GPU discards tile contents When the attachment is intermediate (GBuffer, intermediate depth)
.multisampleResolve Resolve MSAA For MSAA render targets

Using .dontCare on the GBuffer is not just β€œsafe” β€” it’s required for .memoryless textures. If you set .store on a memoryless texture, Metal will error at validation time.


Performance comparison

On Apple Silicon, the bandwidth saving from .memoryless GBuffers is significant:

Scenario T3 GBuffer bandwidth T5 GBuffer bandwidth
1080p, rgba8 albedo ~8 MB/frame written + ~8 MB/frame read 0
1080p, rgba16f normals + positions ~16 MB/frame written + ~16 MB/frame read 0
60 fps total ~2.88 GB/s for GBuffer alone 0

The saved bandwidth also reduces power consumption β€” important for mobile (iPhone/iPad) targets.


When NOT to use .memoryless

Memoryless textures are a great default for intermediate attachments, but they can’t be used when:

  • You need to read the texture in a later separate pass (e.g., shadow map reads in the geometry pass)
  • You need to read the texture on the CPU (e.g., for image capture, screenshots)
  • You’re running on a non-tile GPU (Mac with AMD/NVIDIA) β€” .memoryless is ignored silently on those, which is fine; the texture simply allocates normally

Metal provides device.supportsFamily(.apple1) to detect tile GPU support if you want to conditionally enable memoryless storage.


Key concepts recap

Concept What it is
TBDR GPU renders one tile at a time using fast on-chip memory
.memoryless Texture that exists only in tile memory β€” no DRAM backing
.dontCare store action Discard tile contents after pass β€” required for memoryless
Merged render pass GBuffer + Lighting in one MTLRenderCommandEncoder
GBuffer [[stage_in]] Lighting shader reads GBuffer from tile memory directly
Tile memory Ultra-fast GPU-local memory, ~10–100Γ— faster than DRAM access

GPU Render Pipeline

The GPU render pipeline processes each frame through a series of stages, starting with vertex fetch. Each stage is responsible for processing data in preparation for the next step:

flowchart TD
    A["Vertex Fetch"] --> B["Vertex Shader"]
    B --> C["Primitive Assembly"]
    C --> D["Rasterization"]
    D --> E["Fragment Shader"]
    E --> F["Blending"]
    F --> G["Final Framebuffer Write"]

CPU β†’ GPU Data Flow

flowchart LR
    A["MTLCommandBuffer"] --> B["MTLRenderPassDescriptor"]
    B --> C["GBufferTextures"]
    B --> D["DepthTexture"]
    B --> E["LightingPSO"]
    B --> F["GeometryPSO"]

Concept Summary

flowchart TD
    A["MTLDevice"] --> B["MTLLibrary"]
    B --> C["Vertex Shader"]
    B --> D["Fragment Shader"]
    B --> E["Render Pipeline State Object (PSO)"]
    B --> F["Depth Stencil State"]
    B --> G["GBuffer Textures"]
    B --> H["Depth Texture"]

Next: Tutorial 6 β€” GPU Rendering β€” move draw call generation onto the GPU with MTLIndirectCommandBuffer.


Congratulations πŸŽ‰

You’ve successfully implemented Tile-Based Deferred Rendering (TBDR) in Metal, optimizing your rendering pipeline for Apple Silicon GPUs. Now that you have a solid foundation, try experimenting with different tile sizes and observe how it impacts performance and memory usage.

Happy coding!