Kernel Programming Basics
This chapter covers the fundamentals of writing Tileon kernels.
Kernel Functions
Tileon kernels are Python functions decorated with @tileon.jit:
The @tileon.jit decorator tells Tileon to compile this function into optimized kernel code.
Program ID
Each kernel instance is identified by a program ID:
axis=0: First dimensionaxis=1: Second dimensionaxis=2: Third dimension
Index Calculation
Calculate which data elements to process:
# Simple 1D case
pid = tl.program_id(axis=0)
block_start = pid * BLOCK_SIZE
offsets = block_start + tl.arange(0, BLOCK_SIZE)
# 2D case
pid_m = pid // num_pid_n
pid_n = pid % num_pid_n
offs_m = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
offs_n = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
Memory Operations
Load and store data:
# Load with masking
x = tl.load(x_ptr + offsets, mask=mask, other=0.0)
# Store with masking
tl.store(output_ptr + offsets, output, mask=mask)
Masking
Masks ensure safe access to boundary elements:
Constants
Use tl.constexpr for compile-time constants:
This allows Tileon to optimize the kernel at compile time.
Grid Launch
Launch kernels with a grid:
The grid function computes how many program instances to launch.
Data Types
Common data types:
Vector Operations
Element-wise operations:
# Arithmetic
z = x + y
z = x - y
z = x * y
z = x / y
# Comparison
z = tl.where(cond, x, y)
z = tl.maximum(x, y)
z = tl.minimum(x, y)
Reduction
Reduce across a dimension: