Tutorial 6 β€” GPU Rendering

Prerequisites

Make sure you have completed Tutorial 5 β€” Tiled Rendering.

Setting up the project

In MTMetalTutorialsApp.swift set MT6ContentView() and run:

@main
struct MetalTutorialsApp: App {
    var body: some Scene {
        WindowGroup {
            MT6ContentView()
        }
    }
}

GPU rendering result

New concepts in this tutorial

Scene separation: MT6Scene

Tutorial 6 introduces the MT6Scene class β€” a design pattern that separates scene data from renderer logic. This approach makes it easier to share the same scene across multiple renderers (e.g., forward vs. deferred) without reloading assets.

classDiagram
    class MT6Scene {
        +staticGPUHeap: MTLHeap
        +mtkMeshes: [MTKMesh]
        +init(device, commandQueue)
    }
    class MT6DeferredRenderer {
        -meshesBuffer: MTLBuffer
        -shadowsArgBuffer: MTLBuffer
        -indirectCommandBuffer: MTLIndirectCommandBuffer
        +draw(in: MTKView)
    }
    class IndirectMesh["MT6::Indirect::Mesh"] {
        +vertexBuffer: constant float*
        +texCoordsBuffer: constant float*
        +indexBuffer: constant uint*
        +materialArgBuffer: constant MaterialArgument*
    }
    class MaterialArgument {
        +baseColorTexture: texture2d
        +specularTexture: texture2d
        +normalTexture: texture2d
    }
    MT6DeferredRenderer --> MT6Scene : reads heap + meshes
    MT6DeferredRenderer --> IndirectMesh : encodes meshesBuffer
    IndirectMesh --> MaterialArgument : materialArgBuffer
    MT6Scene --> MTLHeap : staticGPUHeap holds textures
class MT6Scene {
    let staticGPUHeap: MTLHeap   // all scene textures packed here
    let mtkMeshes: [MTKMesh]     // CPU-side mesh data

    init(device: MTLDevice, commandQueue: MTLCommandQueue) {
        // 1. Load USD meshes into mtkMeshes
        // 2. Pack textures into staticGPUHeap
    }
}

The renderer (not MT6Scene) owns and encodes the argument buffers β€” this keeps scene data separate from the GPU-driven draw machinery.

USD asset loading

Tutorial 6 uses USD (Universal Scene Description) format assets:

let assetURL = Bundle.main.url(forResource: "toy_biplane_idle", withExtension: "usdz")!
let allocator = MTKMeshBufferAllocator(device: device)
let asset = MDLAsset(url: assetURL,
                     vertexDescriptor: vertexDescriptor,
                     bufferAllocator: allocator)
let (mdlMeshes, mtkMeshes) = try! MTKMesh.newMeshes(asset: asset, device: device)

USD supports hierarchical scene graphs with materials, so textures are embedded:

// Iterate material properties to extract textures
if let prop = mdlMesh.submeshes?.first as? MDLSubmesh,
   let material = prop.material {

    let baseColor = material.property(with: .baseColor)
    if let url = baseColor?.urlValue {
        // load texture from URL
    }

    let normalMap = material.property(with: .tangentSpaceNormal)
    let specular  = material.property(with: .specular)
}

The vertex descriptor for Tutorial 6 adds tangent + bitangent to support normal mapping:

// Position  β€” float3  @ attribute 0
// Normal    β€” float3  @ attribute 1
// Tangent   β€” float3  @ attribute 2
// Bitangent β€” float3  @ attribute 3
// TexCoords β€” float2  @ attribute 4

MTLHeap β€” packing all textures into one allocation

A MTLHeap is a block of GPU memory you allocate once and then sub-allocate textures from. Benefits:

  • One allocation call instead of N separate makeTexture calls
  • Better memory locality β†’ better cache behavior
  • Required for argument buffers that reference textures (resources must be in a heap for residency tracking)
// 1. Measure the size each texture would need
var heapSize = 0
for url in textureURLs {
    let desc = textureLoader.textureDescriptor(url: url)
    let sizeAlign = device.heapTextureSizeAndAlign(descriptor: desc)
    heapSize += sizeAlign.size + sizeAlign.align
}

// 2. Create the heap
let heapDesc = MTLHeapDescriptor()
heapDesc.size = heapSize
heapDesc.storageMode = .private
let heap = device.makeHeap(descriptor: heapDesc)!

// 3. Allocate each texture inside the heap
// 4. Use a blit encoder to copy from staging buffers into heap textures
let blit = commandBuffer.makeBlitCommandEncoder()!
for (staging, heapTex) in zip(stagingTextures, heapTextures) {
    blit.copy(from: staging, to: heapTex)
}
blit.endEncoding()

Argument Buffers

An argument buffer is a MTLBuffer that contains resource handles (textures, buffers, samplers) that shaders can access by index β€” like a C struct of pointers.

In this tutorial, MTLArgumentEncoder fills three argument buffers:

Mesh argument buffer β€” MT6::Indirect::Mesh

The Mesh struct (inside the MT6::Indirect namespace) holds pointers to a single submesh’s vertex data plus a pointer to its material argument buffer. It maps directly to MT6VertexBufferIndeces:

namespace MT6 {
namespace Indirect {
    struct Mesh {
        constant float           *vertexBuffer    [[id(MT6VertexBuffer)]];
        constant float           *texCoordsBuffer [[id(MT6TextureCoordinatesBuffer)]];
        constant uint            *indexBuffer     [[id(MT6IndecesBuffer)]];
        constant MaterialArgument *materialArgBuffer [[id(MT6MaterialArgBuffer)]];
    };
} // namespace Indirect
} // namespace MT6

The renderer encodes one Mesh entry per submesh into a MTLBuffer at index MT6MeshesBuffer (14).

Material argument buffer β€” MaterialArgument

Contains only three heap textures β€” no float constants:

struct MaterialArgument {
    texture2d<float> baseColorTexture [[id(MT6BaseColorTexture)]];
    texture2d<float> specularTexture  [[id(MT6SpecularTexture)]];
    texture2d<float> normalTexture    [[id(MT6NormalTexture)]];
};

Shadow argument buffer β€” ShadowsArgBuffer

A thin wrapper that lets the ICB kernel bind the shadow depth texture by index:

struct ShadowsArgBuffer {
    depth2d<float> shadowTexture [[id(0)]];
};

Encoding on the Swift side

// Encode the meshes buffer (array of Indirect::Mesh)
let meshEncoder = meshFunctionArguments.makeArgumentEncoder(bufferIndex: Int(MT6MeshesBuffer.rawValue))
meshEncoder.setArgumentBuffer(meshesBuffer, offset: 0)
for (i, mtkMesh) in scene.mtkMeshes.enumerated() {
    meshEncoder.setBuffer(mtkMesh.vertexBuffers[Int(MT6VertexBuffer.rawValue)].buffer,
                          offset: 0,
                          index: Int(MT6VertexBuffer.rawValue))
    meshEncoder.setBuffer(mtkMesh.vertexBuffers[Int(MT6TextureCoordinatesBuffer.rawValue)].buffer,
                          offset: 0,
                          index: Int(MT6TextureCoordinatesBuffer.rawValue))
    // … index buffer, material argument buffer …
}

MTLIndirectCommandBuffer

An ICB is a GPU-writable array of command objects. Each entry in the ICB is one draw call.

CPU setup

let icbDesc = MTLIndirectCommandBufferDescriptor()
icbDesc.commandTypes         = [.drawIndexed]
icbDesc.inheritBuffers       = false
icbDesc.maxVertexBufferBindCount   = 4
icbDesc.maxFragmentBufferBindCount = 2

let indirectCommandBuffer = device.makeIndirectCommandBuffer(
    descriptor: icbDesc,
    maxCommandCount: meshes.count,
    options: [])!

GPU compute kernel fills the ICB

The drawKernel receives all per-mesh and per-frame data it needs, wraps the ICB in CommandArgBuffer, and encodes one render_command per thread:

namespace MT6 {
namespace Indirect {
    struct CommandArgBuffer {
        command_buffer indirectCommandBuffer [[id(0)]];
    };

    kernel void drawKernel(
        uint                                              threadId              [[thread_position_in_grid]],
        device CommandArgBuffer                          *pCommandBuffer        [[buffer(MT6IndirectCommandBuffer)]],
        constant Mesh                                    *pMeshes               [[buffer(MT6MeshesBuffer)]],
        constant MT6VertexUniforms                       *pVertexUniformsArray  [[buffer(MT6VertexUniformsBuffer)]],
        constant MT6FragmentUniforms                     &fragmentUniforms      [[buffer(MT6FragmentUniformsBuffer)]],
        constant ShadowsArgBuffer                        *pShadowArgBuffer      [[buffer(MT6ShadowsArgumentsBuffer)]],
        constant MTLDrawIndexedPrimitivesIndirectArguments *drawArgumentsBuffer [[buffer(MT6DrawArgumentsBuffer)]])
    {
        render_command cmd(pCommandBuffer->indirectCommandBuffer, threadId);

        constant Mesh &mesh = pMeshes[threadId];

        cmd.set_vertex_buffer(mesh.vertexBuffer,     MT6VertexBuffer);
        cmd.set_vertex_buffer(mesh.texCoordsBuffer,  MT6TextureCoordinatesBuffer);
        cmd.set_vertex_buffer(mesh.materialArgBuffer, MT6MaterialArgBuffer);
        cmd.set_vertex_buffer(&pVertexUniformsArray[threadId], MT6VertexUniformsBuffer);

        cmd.set_fragment_buffer(&fragmentUniforms,            MT6FragmentUniformsBuffer);
        cmd.set_fragment_buffer(pShadowArgBuffer,             MT6ShadowsArgumentsBuffer);
        cmd.set_fragment_buffer(mesh.materialArgBuffer,       MT6MaterialArgBuffer);

        const MTLDrawIndexedPrimitivesIndirectArguments &drawArgs = drawArgumentsBuffer[threadId];
        cmd.draw_indexed_primitives(
            primitive_type::triangle,
            drawArgs.indexCount,
            mesh.indexBuffer,
            drawArgs.instanceCount,
            drawArgs.baseVertex,
            drawArgs.baseInstance);
    }
} // namespace Indirect
} // namespace MT6

CPU dispatches compute, then executes ICB

// Step 1: compute pass β€” GPU fills the ICB
let computeEncoder = commandBuffer.makeComputeCommandEncoder()!
computeEncoder.setComputePipelineState(_drawKernelPSO)
computeEncoder.setBuffer(_commandArgBuffer,    offset: 0, index: Int(MT6IndirectCommandBuffer.rawValue))
computeEncoder.setBuffer(_meshesBuffer,        offset: 0, index: Int(MT6MeshesBuffer.rawValue))
computeEncoder.setBuffer(_vertexUniformsBuffer,offset: 0, index: Int(MT6VertexUniformsBuffer.rawValue))
computeEncoder.setBuffer(_fragmentUniformsBuffer, offset: 0, index: Int(MT6FragmentUniformsBuffer.rawValue))
computeEncoder.setBuffer(_shadowsArgBuffer,    offset: 0, index: Int(MT6ShadowsArgumentsBuffer.rawValue))
computeEncoder.setBuffer(_drawArgumentsBuffer, offset: 0, index: Int(MT6DrawArgumentsBuffer.rawValue))
computeEncoder.useHeap(scene.staticGPUHeap)
computeEncoder.dispatchThreads(
    MTLSize(width: submeshCount, height: 1, depth: 1),
    threadsPerThreadgroup: MTLSize(width: 1, height: 1, depth: 1))
computeEncoder.endEncoding()

// Step 2: render pass executes the ICB
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
renderEncoder.useHeap(scene.staticGPUHeap)
renderEncoder.executeCommandsInBuffer(_indirectCommandBuffer, range: 0..<submeshCount)
renderEncoder.endEncoding()

The useHeap call is critical: it tells Metal that all textures in the heap may be accessed by this pass, enabling proper residency tracking.

The full CPU→GPU flow for one frame:

sequenceDiagram
    participant CPU
    participant ComputePass as Compute Pass (GPU)
    participant ICB as MTLIndirectCommandBuffer
    participant RenderPass as Render Pass (GPU)

    CPU->>ComputePass: encode drawKernel dispatch
    CPU->>ComputePass: setBuffer(meshesBuffer, MT6MeshesBuffer)
    CPU->>ComputePass: setBuffer(shadowsArgBuffer, MT6ShadowsArgumentsBuffer)
    CPU->>ComputePass: useHeap(staticGPUHeap)
    CPU->>ComputePass: commit
    ComputePass->>ICB: render_command cmd(commandArgBuffer.icb, threadId)
    ComputePass->>ICB: cmd.set_vertex_buffer(mesh.vertexBuffer, ...)
    ComputePass->>ICB: cmd.draw_indexed_primitives(...)
    CPU->>RenderPass: encode executeCommandsInBuffer(ICB)
    CPU->>RenderPass: useHeap(staticGPUHeap)
    CPU->>RenderPass: commit
    RenderPass->>RenderPass: execute all N draw calls from ICB

Buffer index layout

The buffer index constants are defined in MT6Input.h and split into two enums β€” one for per-mesh vertex data (low indices, packed tightly) and one for pass-level data (starting at 10):

// Per-vertex attribute buffers β€” used directly in vertex shaders and ICB commands
typedef enum MT6VertexBufferIndeces {
    MT6VertexBuffer             = 0,   // position + normal + tangent + bitangent (interleaved)
    MT6TextureCoordinatesBuffer = 1,   // UV coords (separate buffer)
    MT6IndecesBuffer            = 2,   // index buffer
    MT6MaterialArgBuffer        = 3,   // per-submesh material argument buffer
} MT6VertexBufferIndeces;

// Pass-level buffers β€” uniforms, GPU-driven drawing structures
typedef enum MT6BufferIndices {
    MT6BufferIndexMeshPositions = 10,  // full-screen quad vertex buffer
    MT6VertexUniformsBuffer     = 11,  // array of per-mesh MT6VertexUniforms
    MT6FragmentUniformsBuffer   = 12,  // MT6FragmentUniforms (light position)
    MT6IndirectCommandBuffer    = 13,  // CommandArgBuffer wrapping the ICB
    MT6MeshesBuffer             = 14,  // array of Mesh argument structs
    MT6DrawArgumentsBuffer      = 15,  // MTLDrawIndexedPrimitivesIndirectArguments[]
    MT6ShadowsArgumentsBuffer   = 16,  // ShadowsArgBuffer (depth2d shadow texture)
} MT6BufferIndices;

Render pass architecture

Tutorial 6 keeps the same 3-pass structure as Tutorial 4 (Shadow β†’ GBuffer β†’ Lighting), but:

  • Shadow pass and GBuffer pass are now driven by the ICB (GPU encodes the draw calls)
  • The executeCommandsInBuffer method is used to execute these commands

Performance Tips for Apple Silicon

Apple Silicon GPUs excel at parallel processing, so leveraging GPU-driven rendering can significantly improve performance. By offloading draw call generation to the GPU, you allow the CPU to focus on other tasks, such as physics or AI calculations.

Key concepts recap

  • MTLIndirectCommandBuffer: A buffer that holds GPU-writable command objects.
  • Compute Kernel: Used to generate draw commands in parallel.
  • Argument Buffers: Structs of resources passed to shaders via buffers.
  • Heap: Contiguous block of memory for efficient texture management.

What GPU-driven rendering enables

  1. Efficient use of CPU and GPU resources by offloading work from the CPU to the GPU.
  2. Improved performance on Apple Silicon GPUs due to parallel processing capabilities.
  3. Simplified scene management through separation of concerns between the renderer and scene data.

πŸŽ‰ Congrats β€” you’ve completed all Metal Tutorials!

All source code is on GitHub.