PML / Vision
Vision

Vision Module

A complete computer vision pipeline: image I/O, transforms, augmentation, classification (MobileNetV3), and object detection (YOLO11n, NanoDet, PicoDet, SSDLite). Backed by libvision.so — 106 C functions across 8 source files.

Backend & Enums

Pml\Lib\VisionEngine is the FFI singleton for libvision.so. All Vision classes call it via VisionEngine::get(). Build: see Getting Started → Building libvision.so.

Enum classConstantsDescription
InterpNEAREST, BILINEAR, BICUBIC, AREAInterpolation modes for resize/rotate
ColorSpaceRGB, BGR, GRAY, HSV, LAB, YUVColor space identifiers
PixelFormatUINT8, FLOAT32Pixel data type
LayoutHWC, CHWMemory layout for tensor export
BorderCONSTANT, REFLECT, REPLICATEPadding border mode

Image

Pml\Vision\Image wraps a VisionImage* C struct. All operations return $this for fluent chaining and are zero-copy where possible.

Loading

use Pml\Vision\Image;

$img = Image::fromFile('photo.jpg');       // JPEG/PNG/BMP/TIFF
$img = Image::fromTensor($tensor);          // wrap a [H, W, C] or [C, H, W] Tensor
$img = Image::blank(width: 224, height: 224, channels: 3);

Metadata

MethodReturns
width(): intPixel width
height(): intPixel height
channels(): intNumber of channels (1, 3, or 4)
format(): intPixelFormat constant
layout(): intLayout constant (HWC or CHW)
colorSpace(): intColorSpace constant

Transforms (all return self)

MethodDescription
resize(int $w, int $h, int $interp)Resize to exact dimensions
resizeLongEdge(int $edge, int $interp)Resize so longest edge = $edge, preserving aspect ratio
crop(int $x, int $y, int $w, int $h)Crop a region
centerCrop(int $w, int $h)Center crop
pad(int $top, int $bot, int $left, int $right, array $color, int $mode)Add border padding
flipHorizontal()Mirror left-right
flipVertical()Mirror top-bottom
rotate90(int $k = 1)Rotate 90° × k
rotate(float $angle, int $interp, array $border)Arbitrary rotation
toGrayscale()Convert to 1-channel gray
rgbToBgr()Swap R and B channels
toFloat32(float $scale = 1/255.0)Cast pixels to float and scale
toUint8(float $scale = 255.0)Cast back to uint8
toHWC()Ensure HWC memory layout
toCHW()Transpose to CHW layout (PyTorch-style)
normalize(array $mean, array $stdDev)Per-channel (x − μ) / σ. $mean and $stdDev are float arrays [C].
adjustBrightness(float $delta)Add constant to all pixels
adjustContrast(float $factor)Scale contrast around 128

Export

MethodDescription
toTensor(): TensorExport to a Tensor (shares C memory — zero-copy)
toArray(): arrayCopy to PHP array (slow — avoid in hot paths)
save(string $path): voidSave to JPEG/PNG (by extension)
encode(string $ext = '.png'): stringEncode to in-memory byte string

Augmentation

Pml\Vision\Augmentation chains random transforms. It is callable — pass an Image to __invoke() to apply the chain.

use Pml\Vision\Augmentation;

$aug = (new Augmentation(seed: 42))
    ->randomFlipH(prob: 0.5)
    ->randomResizeCrop(224, 224, minScale: 0.8, maxScale: 1.0)
    ->randomBrightness(maxDelta: 0.2)
    ->randomContrast(0.8, 1.2)
    ->randomHue(maxDelta: 0.1);

foreach ($images as $img) {
    $augmented = $aug($img);
}
MethodParametersDescription
randomFlipH(float $prob)prob=0.5Horizontal flip with probability
randomFlipV(float $prob)prob=0.5Vertical flip
randomCrop(int $w, int $h)Random crop to size
randomResizeCrop(int $w, int $h, float $minScale, float $maxScale)Random scale then crop (torchvision-style)
randomBrightness(float $maxDelta)0.2Random ±brightness
randomContrast(float $lo, float $hi)0.8, 1.2Random contrast multiplier
randomHue(float $maxDelta)0.1Random hue shift in HSV
randomRotation(float $maxAngle, int $interp, array $border)15°Random ±rotation
cutout(int $nHoles, int $holeSize, float $fill)1, 16, 0.0CutOut regularization
mixup(Image $a, Image $b, float $alpha)alpha=0.2MixUp augmentation — returns [Image, float $lambda]
cutmix(Image $a, Image $b, float $alpha)alpha=1.0CutMix — returns [Image, float $lambda]

Classification Models

MobileNetV3

use Pml\Vision\MobileNetV3;

$net = new MobileNetV3(variant: 'large', numClasses: 1000);
$net->loadWeights('models/mobilenetv3_large.safetensors');

$img = Image::fromFile('cat.jpg')
    ->resize(224, 224)
    ->toFloat32()
    ->normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ->toCHW();

$topK = $net->classify($img);   // returns [['class' => 281, 'score' => 0.92], ...]
MethodDescription
forward(Tensor $x): TensorFull forward pass → [batch, numClasses] logits
extract(Tensor $x): TensorExtract feature vector before classification head
preprocess(Image $img): TensorResize + normalize in one call
classify(Image $img): arrayTop-5 class predictions with scores
loadWeights(string $path): voidLoad SafeTensors checkpoint
parameters(): arrayAll parameter tensors (for fine-tuning)
featDim(): intFeature vector dimension before head

Object Detection

All detectors follow the same interface: detect(Image $img): array returning [['bbox' => [x, y, w, h], 'score' => float, 'class' => int], ...].

ClassNotes
Yolo11n YOLOv11 nano variant. Fastest. ~640×640 input. COCO 80 classes.
NanoDet NanoDet-Plus. Very lightweight, mobile-friendly. 320×320 input.
PicoDet PicoDet (Baidu PaddleDetection). Excellent accuracy-per-FLOP for mobile.
SSDLite SSD with MobileNetV3 backbone. Multi-scale detection with 6 detection heads.
use Pml\Vision\Yolo11n;

$yolo = new Yolo11n();
$yolo->loadWeights('models/yolo11n.safetensors');

$img  = Image::fromFile('street.jpg');
$dets = $yolo->detect($img);

foreach ($dets as $d) {
    printf("Class %d @ %.1f%% [%d,%d %dx%d]\n",
        $d['class'], $d['score'] * 100,
        ...$d['bbox']
    );
}

Segmentation

ClassNotes
FastSAM FastSAM (Segment Anything Model, fast variant). Instance segmentation. Returns binary masks per detected object.
use Pml\Vision\FastSAM;

$sam   = new FastSAM('models/fastsam.safetensors');
$masks = $sam->segment($img);  // returns [Tensor $masks [N, H, W], array $scores]

Example: Detect + Classify Pipeline

use Pml\Vision\{Image, Yolo11n, MobileNetV3, Augmentation};

$detector = new Yolo11n();
$detector->loadWeights('models/yolo11n.safetensors');

$classifier = new MobileNetV3('small', 1000);
$classifier->loadWeights('models/mobilenetv3_small.safetensors');

$img = Image::fromFile('scene.jpg');
$dets = $detector->detect($img);

foreach ($dets as $det) {
    [$x, $y, $w, $h] = $det['bbox'];
    $crop  = $img->crop($x, $y, $w, $h);
    $label = $classifier->classify($crop)[0];
    printf("Detected class %d (fine-grained: %d @ %.2f)\n",
        $det['class'], $label['class'], $label['score']);
}