Skip to content

Index

mps

Apple GPU Backend.

This module provides a complete, system-level compiler backend targeting Apple's proprietary AIR (Apple Intermediate Representation) bitcode format.

It functions as the final code generation stage in an MLIR-based pipeline, serving as a "native" GPU backend for Apple Silicon (M-series) processors, analogous to NVPTX for NVIDIA or AMDGPU for AMD.

1. Architecture: The MPS Dialect

Unlike standard approaches, this backend introduces a dedicated MPS Dialect that acts as a structured, in-memory representation of Apple's proprietary IR.

[ High-Level MLIR ]   (Linalg, GPU, Vector)
        |
        v
[   MPS Dialect   ]   (Reverse-Engineered AIR Model)
        |
        v
[  AIR Bitcode   ]    (Serialized LLVM IR + Metadata)
        .
        . (External Linkage via xcrun)
        v
[  Metallib      ]    (Standard Metal Library Resource)

2. Rationale: Why Bypass MSL?

The standard Metal toolchain enforces a "black box" compilation step (MSL -> AIR) that obscures the hardware reality from the compiler frontend. By generating AIR directly, this backend reclaims control:

  • (1) Instruction Selection: We can emit specific opcodes (e.g., simd_shuffle, threadgroup_barrier) that might not have direct MSL equivalents or whose generation from MSL is unpredictable.
  • (2) Predictability: Bypassing the aggressive high-level MSL optimizer guarantees that the code structure defined in MLIR is preserved in the final binary, crucial for performance tuning.
  • (3) Compilation Latency: Skipping the C++ parsing and frontend phases of the Metal compiler significantly reduces JIT compilation time for dynamic workloads.

3. The Target: Apple Intermediate Representation (AIR)

AIR is an undocumented format based on LLVM IR. The backend emits this bitcode directly. The user is responsible for packaging it into a .metallib using Apple's command-line tools: xcrun metallib or xcrun metal.

It differs from standard LLVM in three critical ways:

  • (1) Intrinsics: It relies heavily on llvm.air.* intrinsics for GPU-specific operations.
  • (2) Metadata: It uses a complex schema of named metadata nodes to describe kernel signatures, threadgroup dimensions and argument bindings.
  • (3) Conventions: It enforces strict ABI rules regarding address spaces and types.

This module implements the logic to synthesize correct AIR modules, including all necessary metadata and bitcode encoding, based on clean-room reverse engineering.