Performance Optimization

This page describes the low-level optimizations that make the framework efficient on CPU-bound workloads.

Core optimizations

Fused kernels

The C backend exposes fused primitives such as:

Fused kernels reduce memory traffic and FFI call count.

BLAS and LAPACKE

Hot path minimization

When to use

When not to use