Vision

Vision Module

A complete computer vision pipeline: image I/O, transforms, augmentation, classification (MobileNetV3), and object detection (YOLO11n, NanoDet, PicoDet, SSDLite). Backed by libvision.so — 106 C functions across 8 source files.

Backend & Enums

Pml\Lib\VisionEngine is the FFI singleton for libvision.so. All Vision classes call it via VisionEngine::get(). Build: see Getting Started → Building libvision.so.

Enum class	Constants	Description
`Interp`	`NEAREST, BILINEAR, BICUBIC, AREA`	Interpolation modes for resize/rotate
`ColorSpace`	`RGB, BGR, GRAY, HSV, LAB, YUV`	Color space identifiers
`PixelFormat`	`UINT8, FLOAT32`	Pixel data type
`Layout`	`HWC, CHW`	Memory layout for tensor export
`Border`	`CONSTANT, REFLECT, REPLICATE`	Padding border mode

Image

Pml\Vision\Image wraps a VisionImage* C struct. All operations return $this for fluent chaining and are zero-copy where possible.

Loading

use Pml\Vision\Image;

$img = Image::fromFile('photo.jpg');       // JPEG/PNG/BMP/TIFF
$img = Image::fromTensor($tensor);          // wrap a [H, W, C] or [C, H, W] Tensor
$img = Image::blank(width: 224, height: 224, channels: 3);

Metadata

Method	Returns
`width(): int`	Pixel width
`height(): int`	Pixel height
`channels(): int`	Number of channels (1, 3, or 4)
`format(): int`	PixelFormat constant
`layout(): int`	Layout constant (HWC or CHW)
`colorSpace(): int`	ColorSpace constant

Transforms (all return self)

Method	Description
`resize(int $w, int $h, int $interp)`	Resize to exact dimensions
`resizeLongEdge(int $edge, int $interp)`	Resize so longest edge = $edge, preserving aspect ratio
`crop(int $x, int $y, int $w, int $h)`	Crop a region
`centerCrop(int $w, int $h)`	Center crop
`pad(int $top, int $bot, int $left, int $right, array $color, int $mode)`	Add border padding
`flipHorizontal()`	Mirror left-right
`flipVertical()`	Mirror top-bottom
`rotate90(int $k = 1)`	Rotate 90° × k
`rotate(float $angle, int $interp, array $border)`	Arbitrary rotation
`toGrayscale()`	Convert to 1-channel gray
`rgbToBgr()`	Swap R and B channels
`toFloat32(float $scale = 1/255.0)`	Cast pixels to float and scale
`toUint8(float $scale = 255.0)`	Cast back to uint8
`toHWC()`	Ensure HWC memory layout
`toCHW()`	Transpose to CHW layout (PyTorch-style)
`normalize(array $mean, array $stdDev)`	Per-channel (x − μ) / σ. `$mean` and `$stdDev` are float arrays [C].
`adjustBrightness(float $delta)`	Add constant to all pixels
`adjustContrast(float $factor)`	Scale contrast around 128

Export

Method	Description
`toTensor(): Tensor`	Export to a Tensor (shares C memory — zero-copy)
`toArray(): array`	Copy to PHP array (slow — avoid in hot paths)
`save(string $path): void`	Save to JPEG/PNG (by extension)
`encode(string $ext = '.png'): string`	Encode to in-memory byte string

Augmentation

Pml\Vision\Augmentation chains random transforms. It is callable — pass an Image to __invoke() to apply the chain.

use Pml\Vision\Augmentation;

$aug = (new Augmentation(seed: 42))
    ->randomFlipH(prob: 0.5)
    ->randomResizeCrop(224, 224, minScale: 0.8, maxScale: 1.0)
    ->randomBrightness(maxDelta: 0.2)
    ->randomContrast(0.8, 1.2)
    ->randomHue(maxDelta: 0.1);

foreach ($images as $img) {
    $augmented = $aug($img);
}

Method	Parameters	Description
`randomFlipH(float $prob)`	prob=0.5	Horizontal flip with probability
`randomFlipV(float $prob)`	prob=0.5	Vertical flip
`randomCrop(int $w, int $h)`	—	Random crop to size
`randomResizeCrop(int $w, int $h, float $minScale, float $maxScale)`	—	Random scale then crop (torchvision-style)
`randomBrightness(float $maxDelta)`	0.2	Random ±brightness
`randomContrast(float $lo, float $hi)`	0.8, 1.2	Random contrast multiplier
`randomHue(float $maxDelta)`	0.1	Random hue shift in HSV
`randomRotation(float $maxAngle, int $interp, array $border)`	15°	Random ±rotation
`cutout(int $nHoles, int $holeSize, float $fill)`	1, 16, 0.0	CutOut regularization
`mixup(Image $a, Image $b, float $alpha)`	alpha=0.2	MixUp augmentation — returns [Image, float $lambda]
`cutmix(Image $a, Image $b, float $alpha)`	alpha=1.0	CutMix — returns [Image, float $lambda]

Classification Models

MobileNetV3

use Pml\Vision\MobileNetV3;

$net = new MobileNetV3(variant: 'large', numClasses: 1000);
$net->loadWeights('models/mobilenetv3_large.safetensors');

$img = Image::fromFile('cat.jpg')
    ->resize(224, 224)
    ->toFloat32()
    ->normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ->toCHW();

$topK = $net->classify($img);   // returns [['class' => 281, 'score' => 0.92], ...]

Method	Description
`forward(Tensor $x): Tensor`	Full forward pass → [batch, numClasses] logits
`extract(Tensor $x): Tensor`	Extract feature vector before classification head
`preprocess(Image $img): Tensor`	Resize + normalize in one call
`classify(Image $img): array`	Top-5 class predictions with scores
`loadWeights(string $path): void`	Load SafeTensors checkpoint
`parameters(): array`	All parameter tensors (for fine-tuning)
`featDim(): int`	Feature vector dimension before head

Object Detection

All detectors follow the same interface: detect(Image $img): array returning [['bbox' => [x, y, w, h], 'score' => float, 'class' => int], ...].

Class	Notes
`Yolo11n`	YOLOv11 nano variant. Fastest. ~640×640 input. COCO 80 classes.
`NanoDet`	NanoDet-Plus. Very lightweight, mobile-friendly. 320×320 input.
`PicoDet`	PicoDet (Baidu PaddleDetection). Excellent accuracy-per-FLOP for mobile.
`SSDLite`	SSD with MobileNetV3 backbone. Multi-scale detection with 6 detection heads.

use Pml\Vision\Yolo11n;

$yolo = new Yolo11n();
$yolo->loadWeights('models/yolo11n.safetensors');

$img  = Image::fromFile('street.jpg');
$dets = $yolo->detect($img);

foreach ($dets as $d) {
    printf("Class %d @ %.1f%% [%d,%d %dx%d]\n",
        $d['class'], $d['score'] * 100,
        ...$d['bbox']
    );
}

Segmentation

Class	Notes
`FastSAM`	FastSAM (Segment Anything Model, fast variant). Instance segmentation. Returns binary masks per detected object.

use Pml\Vision\FastSAM;

$sam   = new FastSAM('models/fastsam.safetensors');
$masks = $sam->segment($img);  // returns [Tensor $masks [N, H, W], array $scores]

Example: Detect + Classify Pipeline

use Pml\Vision\{Image, Yolo11n, MobileNetV3, Augmentation};

$detector = new Yolo11n();
$detector->loadWeights('models/yolo11n.safetensors');

$classifier = new MobileNetV3('small', 1000);
$classifier->loadWeights('models/mobilenetv3_small.safetensors');

$img = Image::fromFile('scene.jpg');
$dets = $detector->detect($img);

foreach ($dets as $det) {
    [$x, $y, $w, $h] = $det['bbox'];
    $crop  = $img->crop($x, $y, $w, $h);
    $label = $classifier->classify($crop)[0];
    printf("Detected class %d (fine-grained: %d @ %.2f)\n",
        $det['class'], $label['class'], $label['score']);
}