Sample Object | 🌵 Federico Forti (fe0437)

In this tutorial we graduate from a hardcoded triangle to a real 3D mesh with physically-based lighting. The goal is a shaded, rotating object that occludes correctly:

Load a mesh from disk with Model I/O into GPU-backed MTLBuffers
Add a depth stencil state so closer surfaces hide farther ones
Upload camera and light data each frame via uniform buffers
Shade each fragment with a GGX physically-based BRDF

Prerequisite: Make sure you have completed Tutorial 1 — Hello Triangle.

New concepts in this tutorial:

Loading a mesh with Model I/O (MDLAsset → MTKMesh)
Depth stencil state (MTLDepthStencilState) to handle occlusion correctly
Uniform buffers (MTLBuffer) for MVP matrices and light position
GGX BRDF physically-based lighting in the fragment shader
Moving the renderer into its own class (MT2ObjRenderer)

🔗 Metal API in this tutorial

Object	Docs	Scope	Role
`MDLAsset`	↗	Scene loading	Loads 3D files (OBJ, USD, ABC, …) into a scene graph
`MDLVertexDescriptor`	↗	Pipeline setup	Describes per-vertex attribute layout for Model I/O
`MTKMeshBufferAllocator`	↗	Scene loading	Makes Model I/O allocate directly into `MTLBuffer` objects
`MTKMesh`	↗	Scene lifetime	GPU-ready mesh: vertex + index `MTLBuffer`s
`MTLDepthStencilDescriptor`	↗	Pipeline setup	Mutable config for depth/stencil test
`MTLDepthStencilState`	↗	Renderer lifetime	Immutable compiled depth/stencil test
`MTLVertexDescriptor`	↗	Pipeline setup	Tells the pipeline how to read vertex buffers

⚙️ Setting up the project

In MTMetalTutorialsApp.swift set MT2ContentView() and run:

@main
struct MetalTutorialsApp: App {
    var body: some Scene {
        WindowGroup {
            // substitute here to choose the tutorial
            MT2ContentView()
        }
    }
}

📁 Files

File	Purpose
`MT2ContentView.swift`	SwiftUI entry point
`MT2SampleObjectMetalView.swift`	`UIViewRepresentable`, creates the renderer
`MT2ObjRenderer.swift`	Standalone `MTKViewDelegate` class
`MT2SampleObjectShaders.metal`	Vertex shader + GGX fragment shader
`MT2Uniforms.h`	Shared uniform structs

The key structural difference from Tutorial 1 is that the renderer is now a standalone class (MT2ObjRenderer) rather than an inline coordinator:

classDiagram
    class MT2ContentView {
        +body: some View
    }
    class MT2SampleObjectMetalView {
        +makeUIView() MTKView
        +makeCoordinator() MT2ObjRenderer
    }
    class MT2ObjRenderer {
        -_device: MTLDevice
        -_commandQueue: MTLCommandQueue
        -_pipelineState: MTLRenderPipelineState
        -_depthStencilState: MTLDepthStencilState
        -_meshes: [MTKMesh]
        +draw(in: MTKView)
        +mtkView(_:drawableSizeWillChange:)
    }
    class MTKViewDelegate {
        <<protocol>>
        +draw(in:)
        +mtkView(_:drawableSizeWillChange:)
    }

    MT2ContentView --> MT2SampleObjectMetalView : contains
    MT2SampleObjectMetalView --> MT2ObjRenderer : creates
    MT2ObjRenderer ..|> MTKViewDelegate : implements

📦 Loading the mesh with Model I/O

Before we can draw anything real, we need geometry on the GPU. We use Apple’s Model I/O framework to load the .obj file and land its vertex and index data directly in MTLBuffer-backed memory — no extra copy at draw time.

MDLAsset — cross-format 3D file loader. Model I/O (ModelIO.framework) is Apple’s scene-graph framework, separate from Metal but deeply integrated with MetalKit. MDLAsset loads OBJ, USD/USDZ, Alembic, and STL files into a device-independent in-memory scene graph of MDLObject, MDLMesh, and MDLSubmesh nodes. It handles unit conversion, coordinate-system normalization, and material extraction automatically. The key insight: by passing an MTKMeshBufferAllocator, you tell Model I/O to allocate vertex and index data directly into MTLBuffers — so no extra copy is needed at draw time.

MTKMesh — GPU-ready mesh wrapper. MTKMesh.newMeshes(asset:device:) converts MDLMesh objects into MTKMesh objects whose vertexBuffers and submeshes contain actual MTLBuffer references. A MTKSubmesh carries the indexBuffer, indexCount, indexType, and primitiveType you pass directly to drawIndexedPrimitives. No manual buffer management needed.

Model I/O (ModelIO.framework) handles .obj asset loading and lets us describe a vertex layout that Metal can consume directly via a vertex descriptor.

// MT2ObjRenderer._loadObj
let modelURL = Bundle.main.url(forResource: objName, withExtension: "obj")!

let vertexDescriptor = MDLVertexDescriptor()
// attribute 0: position (float3)
vertexDescriptor.attributes[0] = MDLVertexAttribute(
    name: MDLVertexAttributePosition, format: .float3, offset: 0, bufferIndex: 0)
// attribute 1: normal (float3, right after position)
vertexDescriptor.attributes[1] = MDLVertexAttribute(
    name: MDLVertexAttributeNormal, format: .float3,
    offset: MemoryLayout<Float>.size * 3, bufferIndex: 0)
// attribute 2: UV (float2)
vertexDescriptor.attributes[2] = MDLVertexAttribute(
    name: MDLVertexAttributeTextureCoordinate, format: .float2,
    offset: MemoryLayout<Float>.size * 6, bufferIndex: 0)
vertexDescriptor.layouts[0] = MDLVertexBufferLayout(stride: MemoryLayout<Float>.size * 8)

let bufferAllocator = MTKMeshBufferAllocator(device: device)
let meshAsset = MDLAsset(url: modelURL,
                         vertexDescriptor: vertexDescriptor,
                         bufferAllocator: bufferAllocator)

We then extract MTKMesh objects from the asset and convert the vertex descriptor into a Metal-native one for the pipeline:

(_, _meshes) = try MTKMesh.newMeshes(asset: meshAsset, device: _device)
// convert Model I/O descriptor → Metal vertex descriptor (used in the pipeline)
_vertexDescriptor = MTKMetalVertexDescriptorFromModelIO(meshAsset.vertexDescriptor!)

The full loading pipeline:

flowchart LR
    A[".obj file\n(URL)"] --> B["MDLVertexDescriptor\n(position, normal, UV)"]
    B --> C["MDLAsset\n(meshAsset)"]
    C --> D["MTKMesh.newMeshes\n→ [MTKMesh]"]
    D --> E["MTKMetalVertexDescriptorFromModelIO\n→ MTLVertexDescriptor"]
    E --> F["MTLRenderPipelineDescriptor\n→ MTLRenderPipelineState"]

🎯 Depth stencil state

The problem: draw order without depth testing

Without a depth buffer the GPU uses a simple rule: last draw wins. Whatever triangle is submitted last overwrites the pixel, regardless of how far it is from the camera. For a rotating 3D mesh this is catastrophic — back faces paint over front faces just because they happened to be submitted last.

The fix is a depth buffer (also called a Z-buffer): a texture the same resolution as the framebuffer where each pixel stores the depth value (z) of the closest fragment rendered so far. Before writing a new fragment’s color the GPU checks: is this fragment closer than what’s already there? If yes, draw it and update the depth buffer. If no, discard it. Step through the widget to see this in action:

MTLDepthStencilState — immutable occlusion test. Like MTLRenderPipelineState, this is a compiled, immutable object. Describe your requirements in an MTLDepthStencilDescriptor, compile once with device.makeDepthStencilState(descriptor:), and bind it with setDepthStencilState(_:) before each draw call.

depthCompareFunction controls the per-fragment test. .less keeps the fragment with the smaller z (closer to camera) — the correct choice for standard forward rendering. Other options exist for special cases: .greater for reversed-depth buffers (better precision on large scenes), .always to disable depth testing entirely (e.g. for a skybox drawn last), .never to discard all fragments.

isDepthWriteEnabled controls whether a passing fragment updates the depth buffer. Use true for opaque geometry. Use false for transparent objects: they still read the depth buffer (so they’re correctly hidden behind opaque occluders), but don’t write to it — allowing multiple translucent layers to stack on top of each other correctly.

Without a depth test the GPU has no way to know which fragments are in front of others. We add a depth buffer and tell the pipeline to keep only the fragment closest to the camera:

let depthStencilDescriptor = MTLDepthStencilDescriptor()
depthStencilDescriptor.depthCompareFunction = .less   // keep nearer fragment
depthStencilDescriptor.isDepthWriteEnabled  = true    // update depth buffer after passing
_depthStencilState = device.makeDepthStencilState(descriptor: depthStencilDescriptor)!

The MTKView also needs matching pixel formats:

metalView.depthStencilPixelFormat = .depth32Float
// in the pipeline descriptor:
pipelineDescriptor.depthAttachmentPixelFormat = .depth32Float

Why .depth32Float? Each pixel stores one 32-bit float — giving ~16.7 million distinct depth levels between the near and far clip planes. A narrower format (.depth16Unorm, 16-bit) uses half the memory but can cause z-fighting: a flickering artifact when two coplanar or nearly-coplanar surfaces alternate winning the depth test frame to frame because their z values round to the same 16-bit value.

Uniforms — MVP matrices and light

With geometry loaded and depth testing in place, we need to tell the GPU where the camera is, how to transform each vertex, and where the light is. All of that goes into uniform buffers — small structs uploaded to the GPU once per frame.

What is a surface normal?

A normal is a unit vector (length = 1) pointing perpendicular to a surface at a given point. On a flat floor the normal points straight up: (0, 1, 0). On a sphere each point has a different normal pointing away from the center. Normals are what makes lighting work: the angle between the normal and the light direction determines how bright a surface is (dot(N, L) — the classic Lambert term). Without normals, every part of an object would be equally lit and look flat.

What are MVP matrices?

A 4×4 matrix is a compact way to represent any combination of translation (moving), rotation, scaling, and projection — applied to a point with a single multiplication. Chaining three matrices gives you the full transformation from a local model to a screen pixel:

Vertex position (model space)
    × Model matrix        → world space   (places the object in the scene)
    × View matrix         → camera space  (re-centers the world on the camera)
    × Projection matrix   → clip space    (applies perspective + prepares for NDC)

Model matrix — positions, rotates, and scales the object in world space. In this tutorial it’s a rotation around Y so the bunny spins.

View matrix (also called “look-at” matrix) — simulates a camera. It transforms the entire world so the camera sits at the origin looking down -Z. After this, anything in front of the camera has a negative Z coordinate.

Projection matrix — applies perspective foreshortening (closer objects appear larger). It outputs a clip-space position whose w component grows with depth. When the GPU divides (x, y, z) by w, farther objects shrink toward the center — giving you the 3D depth effect.

Why combine them? The three matrices are multiplied once per frame on the CPU into a single modelViewProjectionMatrix. The vertex shader then does just one matrix multiply per vertex: MVP * position. This is much cheaper than doing three separate multiplies on the GPU for every vertex.

Why the inverse-transpose for normals? A normal isn’t a position — it’s a direction. If you scale an object non-uniformly (e.g., stretch it 2× in X), applying the regular model matrix would squash the normals so they no longer point perpendicular to the stretched surface. The inverse-transpose of the upper-left 3×3 of the modelView matrix is the mathematically correct transform for directions under non-uniform scale. For uniform scaling (same factor on all axes) the regular matrix works fine, but the inverse-transpose is always correct.

The shared header defines two structs (used on both the Swift and Metal side):

// MT2Uniforms.h
struct MT2VertexUniforms {
    matrix_float4x4 modelViewMatrix;
    matrix_float3x3 modelViewInverseTransposeMatrix;
    matrix_float4x4 modelViewProjectionMatrix;
};

struct MT2FragmentUniforms {
    simd_float4 viewLightPosition;
};

Why the inverse-transpose for normals? When an object is scaled non-uniformly, transforming normals with the regular modelView matrix skews them. The inverse-transpose keeps normals perpendicular to the surface they belong to.

The matrices are recomputed every frame in _buildUniforms:

// model → world: rotate around bounding box center
let modelMatrix = float4x4(rotationAbout: SIMD3<Float>(0, 1, 0), by: Float(_currentAngle))
                * float4x4(translationBy: -center)

// world → camera (look-at)
let viewMatrix = float4x4(origin: origin, target: target, up: SIMD3<Float>(0, 1, 0))

// perspective projection
let projectionMatrix = float4x4(perspectiveProjectionFov: _camera.fov,
                                aspectRatio: aspectRatio,
                                nearZ: _camera.nearZ, farZ: _camera.farZ)

let modelView           = viewMatrix * modelMatrix
let modelViewProjection = projectionMatrix * modelView

They are uploaded to the GPU at index: 1 (index 0 is the vertex buffer):

commandEncoder.setVertexBytes(&uniforms.0,
    length: MemoryLayout<MT2VertexUniforms>.size, index: 1)
commandEncoder.setFragmentBytes(&uniforms.1,
    length: MemoryLayout<MT2FragmentUniforms>.size, index: 1)

🖌️ Drawing the mesh

What is an index buffer?

In Tutorial 1 we used drawPrimitives with 3 vertices for 1 triangle. A real mesh has thousands of triangles, and many triangles share vertices. Drawing a simple quad (two triangles) without indices would duplicate vertex data:

Without index buffer — 6 vertices, 2 duplicated:
  V0(TL)  V1(TR)  V2(BL)   V2(BL)  V1(TR)  V3(BR)
  ────────────────────────────────────────────────
  Triangle 1        Triangle 2

With index buffer — 4 unique vertices + 6 indices:
  Vertices: V0(TL)  V1(TR)  V2(BL)  V3(BR)
  Indices:  [0, 1, 2,   2, 1, 3]
             ───────    ─────────
             Triangle1  Triangle2

For the Stanford Bunny (69,000+ triangles) the savings are enormous: instead of 207,000 vertex entries you store ~35,000 unique vertices and ~207,000 integer indices. Indices are small (2–4 bytes each); full vertices can be 32+ bytes. drawIndexedPrimitives reads the index buffer first, then looks up each indexed vertex.

Instead of a hardcoded vertex array we iterate over the loaded meshes and their submeshes:

for mesh in _meshes {
    let vertexBuffer = mesh.vertexBuffers.first!
    commandEncoder.setVertexBuffer(vertexBuffer.buffer,
                                   offset: vertexBuffer.offset,
                                   index: 0)
    for submesh in mesh.submeshes {
        let indexBuffer = submesh.indexBuffer
        commandEncoder.drawIndexedPrimitives(
            type:              submesh.primitiveType,
            indexCount:        submesh.indexCount,
            indexType:         submesh.indexType,
            indexBuffer:       indexBuffer.buffer,
            indexBufferOffset: indexBuffer.offset)
    }
}

Indexed drawing reuses shared vertices instead of duplicating them for every triangle — essential for real meshes with thousands of polygons.

🎸 Metal shaders — `MT2SampleObjectShaders.metal`

With the Swift side complete, let’s look at the shaders. The vertex shader transforms each vertex into clip space and passes the normal and position through to the fragment shader, which runs a full GGX physically-based BRDF.

Vertex shader

The vertex and fragment data structs use Metal’s [[stage_in]] attribute, which means Metal loads and unpacks vertex buffer attributes automatically according to the vertex descriptor. The [[attribute(N)]] tag links each struct field to the Nth attribute defined in the MTLVertexDescriptor (the same attribute indices set on MDLVertexDescriptor during mesh loading):

namespace MT2 {
    struct VertexIn {
        float3 position  [[attribute(0)]];   // maps to MDLVertexAttributePosition  at offset 0
        float3 normal    [[attribute(1)]];   // maps to MDLVertexAttributeNormal    at offset 12
        float2 texCoords [[attribute(2)]];   // maps to MDLVertexAttributeTexture   at offset 24
    };

    struct VertexOut {
        float4 clipSpacePosition [[position]];
        float3 viewNormal;
        float4 viewPosition;
        float2 texCoords;
    };

    vertex VertexOut vertex_main(VertexIn vertexIn [[stage_in]],
                                     constant MT2VertexUniforms &uniforms [[buffer(1)]])
    {
        VertexOut vertexOut;
        vertexOut.clipSpacePosition = uniforms.modelViewProjectionMatrix * float4(vertexIn.position, 1);
        vertexOut.viewNormal = uniforms.modelViewInverseTransposeMatrix * vertexIn.normal;
        vertexOut.viewPosition = uniforms.modelViewMatrix * float4(vertexIn.position, 1);
        return vertexOut;
    }

Fragment shader — GGX BRDF

The fragment shader computes physically-based reflectance using real-world microfacet lighting formulas.

GGX / Trowbridge-Reitz normal distribution — models how surface microfacets are oriented given a roughness value:

// GGX / Trowbridge-Reitz
// [Walter et al. 2007, "Microfacet models for refraction through rough surfaces"]
float D_GGX( float a2, float NoH )
{
    if(NoH<=0)
    {
        return 0;
    }
    float d = ( NoH * a2 - NoH ) * NoH + 1;    // 2 mad
    return a2 / ( M_PI_F*d*d );                // 4 mul, 1 rcp
}

Joint Smith visibility term — accounts for mutual masking and shadowing between microfacets:

// Appoximation of joint Smith term for GGX
// [Heitz 2014, "Understanding the Masking-Shadowing Function in Microfacet-Based BRDFs"]
float Vis_SmithJointApprox( float a2, float NoV, float NoL )
{
    NoV = abs(NoV);
    NoL = abs(NoL);
    float a = sqrt(a2);
    float x = 2 * NoV * NoL;
    float y = NoV + NoL;
    return 0.5 * rcp( mix(x,y,a) );
}

Schlick Fresnel — models the increase in reflectance at glancing angles:

// [Schlick 1994, "An Inexpensive BRDF Model for Physically-Based Rendering"]
float3 F_Schlick( float3 SpecularColor, float VoH )
{
    float Fc = pow(( 1 - VoH ),5);                 // 1 sub, 3 mul
    return Fc + (1 - Fc) * SpecularColor;
}

Combining into the full BRDF — the fragment_main computes all dot products, combines the three terms, and adds a diffuse component:

fragment float4 fragment_main(VertexOut fragmentIn [[stage_in]],
                                  constant MT2FragmentUniforms &uniforms [[buffer(1)]]) {
  
    const float3 V = normalize(-float3(fragmentIn.viewPosition));
    const float3 N = normalize(fragmentIn.viewNormal);
    const float3 L = normalize(float3(uniforms.viewLightPosition));
  
    const float3 specColor = float3(1);
    const float3 Lcolor = float3(10);
    const float  roughness = 0.2;
    const float3 rho(0.01);
    const float  sqrRoughness = roughness*roughness;
  
    float3 H = normalize(V+L);
    float NdotL = saturate(dot(N,L));
    float NdotV = saturate(dot(N,V));
    float NdotH = saturate(dot(N,H));
    float VdotH = saturate(dot(V,H));
  
    float a2 = sqrRoughness*sqrRoughness;
    float Vis = Vis_SmithJointApprox(a2, NdotV, NdotL);
    float D =  D_GGX(a2, NdotH);
    float3 F = F_Schlick(specColor, VdotH);
  
    const float3 f_reflection = (D * Vis) * F;
    const float3 f_diffuse = rho / M_PI_F;
    const float3 L_o = M_PI_F * NdotL * Lcolor * (f_reflection + f_diffuse);
    return float4(L_o,1);
}

📊 GGX specular lobe — polar diagram

The D_GGX term controls the shape of the specular highlight. Low roughness concentrates reflected light into a tight spike (sharp, mirror-like highlight). High roughness spreads it across a wide lobe (diffuse-like). The polar diagram below shows this directly: the radial distance at each angle from the surface normal equals D_GGX(cos θ) · cos θ. Ghost curves show reference roughness values for comparison.

  roughness α = 0.20
  

📚 Key concepts recap

Concept	Apple Docs	What it does
Model I/O (`MDLAsset`)	↗	Loads `.obj`/USD files, describes vertex layout Metal can consume directly
`MTKMesh`	↗	GPU-ready mesh with vertex + index `MTLBuffer`s; maps directly to draw calls
`MTLDepthStencilState`	↗	Immutable depth/stencil test — keeps nearest fragment, discards occluded ones
`MTLBuffer` (uniforms)	↗	Ships MVP matrices and light position to the GPU each frame via `setVertexBytes`
`MTLVertexDescriptor`	↗	Tells the pipeline how to fetch per-vertex attributes from `MTLBuffer`s
GGX BRDF	—	D (distribution) × Vis (masking) × F (Fresnel) = physically-based specular reflectance
Indexed drawing	—	Reuse shared vertices across triangles via an index buffer; use `drawIndexedPrimitives`