Performance

This framework is built around C-level compute kernels and zero-copy data movement.

Why it is fast

  • C computation: All heavy math is implemented in src/Lib/*.c and exposed through FFI.
  • OpenBLAS / LAPACKE: Linear algebra operations such as matrix multiplication and decompositions use optimized native libraries.
  • AVX2 and fused kernels: The backend includes fused kernels for common deep learning patterns such as linear, addRelu, and mulAdd.
  • Zero-copy views: Tensor slicing and batching create views instead of copying data when possible.
  • SafeTensors mmap: Model weights can be loaded directly from disk without copying into the PHP heap.

CPU optimizations

  • OpenMP: The C runtime is compiled with -fopenmp and exposes tensor_configure_threading().
  • BLAS threading: configureThreading() lets you control BLAS and OpenMP independently.
  • Vectorized kernels: Fused operations use AVX2 instructions for fewer memory passes.

Zero-copy design

  • Tensor::slice() and Dataset::slice() use C views that share the same underlying buffer.
  • Views retain a parent reference to prevent the original memory from being freed.
  • SafeTensorsIO::load() returns mmap-backed tensors that are not copied into the process memory.
  • Dataset::batches() yields zero-copy tensor slices.

Data ingestion performance

  • Dataset::fromCSV() uses tensor_dataset_from_csv() for numeric CSVs, bypassing PHP array allocation.
  • For mixed-type CSVs, Dataset::load() uses the ETL-mode C DataFrame to parse and transform data.
  • Tensor::fromArray() packs nested PHP arrays into a binary string and copies them in a single FFI boundary crossing.

Practical tips

  • Call Tensor::configureThreading() once at startup to avoid oversubscription.
  • Prefer Dataset::randomize() over manual PHP shuffling.
  • Avoid toArray() or toFlatArray() in hot loops.
  • Use Tensor::copyFrom() and matmulInto() to reuse pre-allocated buffers.

Example

use Pml\Tensor;

Tensor::configureThreading(8, 2);
$x = Tensor::randomNormal([1024, 1024]);
$y = Tensor::randomNormal([1024, 1024]);
$z = $x->matmul($y);