Vision
Vision Module
A complete computer vision pipeline: image I/O, transforms, augmentation,
classification (MobileNetV3), and object detection (YOLO11n, NanoDet, PicoDet, SSDLite).
Backed by libvision.so — 106 C functions across 8 source files.
Backend & Enums
Pml\Lib\VisionEngine is the FFI singleton for libvision.so.
All Vision classes call it via VisionEngine::get().
Build: see Getting Started → Building libvision.so.
| Enum class | Constants | Description |
Interp | NEAREST, BILINEAR, BICUBIC, AREA | Interpolation modes for resize/rotate |
ColorSpace | RGB, BGR, GRAY, HSV, LAB, YUV | Color space identifiers |
PixelFormat | UINT8, FLOAT32 | Pixel data type |
Layout | HWC, CHW | Memory layout for tensor export |
Border | CONSTANT, REFLECT, REPLICATE | Padding border mode |
Image
Pml\Vision\Image wraps a VisionImage* C struct.
All operations return $this for fluent chaining and are zero-copy where possible.
Loading
use Pml\Vision\Image;
$img = Image::fromFile('photo.jpg'); // JPEG/PNG/BMP/TIFF
$img = Image::fromTensor($tensor); // wrap a [H, W, C] or [C, H, W] Tensor
$img = Image::blank(width: 224, height: 224, channels: 3);
Metadata
| Method | Returns |
width(): int | Pixel width |
height(): int | Pixel height |
channels(): int | Number of channels (1, 3, or 4) |
format(): int | PixelFormat constant |
layout(): int | Layout constant (HWC or CHW) |
colorSpace(): int | ColorSpace constant |
Transforms (all return self)
| Method | Description |
resize(int $w, int $h, int $interp) | Resize to exact dimensions |
resizeLongEdge(int $edge, int $interp) | Resize so longest edge = $edge, preserving aspect ratio |
crop(int $x, int $y, int $w, int $h) | Crop a region |
centerCrop(int $w, int $h) | Center crop |
pad(int $top, int $bot, int $left, int $right, array $color, int $mode) | Add border padding |
flipHorizontal() | Mirror left-right |
flipVertical() | Mirror top-bottom |
rotate90(int $k = 1) | Rotate 90° × k |
rotate(float $angle, int $interp, array $border) | Arbitrary rotation |
toGrayscale() | Convert to 1-channel gray |
rgbToBgr() | Swap R and B channels |
toFloat32(float $scale = 1/255.0) | Cast pixels to float and scale |
toUint8(float $scale = 255.0) | Cast back to uint8 |
toHWC() | Ensure HWC memory layout |
toCHW() | Transpose to CHW layout (PyTorch-style) |
normalize(array $mean, array $stdDev) | Per-channel (x − μ) / σ. $mean and $stdDev are float arrays [C]. |
adjustBrightness(float $delta) | Add constant to all pixels |
adjustContrast(float $factor) | Scale contrast around 128 |
Export
| Method | Description |
toTensor(): Tensor | Export to a Tensor (shares C memory — zero-copy) |
toArray(): array | Copy to PHP array (slow — avoid in hot paths) |
save(string $path): void | Save to JPEG/PNG (by extension) |
encode(string $ext = '.png'): string | Encode to in-memory byte string |
Augmentation
Pml\Vision\Augmentation chains random transforms.
It is callable — pass an Image to __invoke() to apply the chain.
use Pml\Vision\Augmentation;
$aug = (new Augmentation(seed: 42))
->randomFlipH(prob: 0.5)
->randomResizeCrop(224, 224, minScale: 0.8, maxScale: 1.0)
->randomBrightness(maxDelta: 0.2)
->randomContrast(0.8, 1.2)
->randomHue(maxDelta: 0.1);
foreach ($images as $img) {
$augmented = $aug($img);
}
| Method | Parameters | Description |
randomFlipH(float $prob) | prob=0.5 | Horizontal flip with probability |
randomFlipV(float $prob) | prob=0.5 | Vertical flip |
randomCrop(int $w, int $h) | — | Random crop to size |
randomResizeCrop(int $w, int $h, float $minScale, float $maxScale) | — | Random scale then crop (torchvision-style) |
randomBrightness(float $maxDelta) | 0.2 | Random ±brightness |
randomContrast(float $lo, float $hi) | 0.8, 1.2 | Random contrast multiplier |
randomHue(float $maxDelta) | 0.1 | Random hue shift in HSV |
randomRotation(float $maxAngle, int $interp, array $border) | 15° | Random ±rotation |
cutout(int $nHoles, int $holeSize, float $fill) | 1, 16, 0.0 | CutOut regularization |
mixup(Image $a, Image $b, float $alpha) | alpha=0.2 | MixUp augmentation — returns [Image, float $lambda] |
cutmix(Image $a, Image $b, float $alpha) | alpha=1.0 | CutMix — returns [Image, float $lambda] |
Classification Models
MobileNetV3
use Pml\Vision\MobileNetV3;
$net = new MobileNetV3(variant: 'large', numClasses: 1000);
$net->loadWeights('models/mobilenetv3_large.safetensors');
$img = Image::fromFile('cat.jpg')
->resize(224, 224)
->toFloat32()
->normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
->toCHW();
$topK = $net->classify($img); // returns [['class' => 281, 'score' => 0.92], ...]
| Method | Description |
forward(Tensor $x): Tensor | Full forward pass → [batch, numClasses] logits |
extract(Tensor $x): Tensor | Extract feature vector before classification head |
preprocess(Image $img): Tensor | Resize + normalize in one call |
classify(Image $img): array | Top-5 class predictions with scores |
loadWeights(string $path): void | Load SafeTensors checkpoint |
parameters(): array | All parameter tensors (for fine-tuning) |
featDim(): int | Feature vector dimension before head |
Object Detection
All detectors follow the same interface: detect(Image $img): array returning [['bbox' => [x, y, w, h], 'score' => float, 'class' => int], ...].
| Class | Notes |
Yolo11n |
YOLOv11 nano variant. Fastest. ~640×640 input. COCO 80 classes. |
NanoDet |
NanoDet-Plus. Very lightweight, mobile-friendly. 320×320 input. |
PicoDet |
PicoDet (Baidu PaddleDetection). Excellent accuracy-per-FLOP for mobile. |
SSDLite |
SSD with MobileNetV3 backbone. Multi-scale detection with 6 detection heads. |
use Pml\Vision\Yolo11n;
$yolo = new Yolo11n();
$yolo->loadWeights('models/yolo11n.safetensors');
$img = Image::fromFile('street.jpg');
$dets = $yolo->detect($img);
foreach ($dets as $d) {
printf("Class %d @ %.1f%% [%d,%d %dx%d]\n",
$d['class'], $d['score'] * 100,
...$d['bbox']
);
}
Segmentation
| Class | Notes |
FastSAM |
FastSAM (Segment Anything Model, fast variant). Instance segmentation. Returns binary masks per detected object. |
use Pml\Vision\FastSAM;
$sam = new FastSAM('models/fastsam.safetensors');
$masks = $sam->segment($img); // returns [Tensor $masks [N, H, W], array $scores]
Example: Detect + Classify Pipeline
use Pml\Vision\{Image, Yolo11n, MobileNetV3, Augmentation};
$detector = new Yolo11n();
$detector->loadWeights('models/yolo11n.safetensors');
$classifier = new MobileNetV3('small', 1000);
$classifier->loadWeights('models/mobilenetv3_small.safetensors');
$img = Image::fromFile('scene.jpg');
$dets = $detector->detect($img);
foreach ($dets as $det) {
[$x, $y, $w, $h] = $det['bbox'];
$crop = $img->crop($x, $y, $w, $h);
$label = $classifier->classify($crop)[0];
printf("Detected class %d (fine-grained: %d @ %.2f)\n",
$det['class'], $label['class'], $label['score']);
}