Training

Training System

The high-level Trainer + TrainingArguments API orchestrates epoch loops, LR scheduling, early stopping, gradient clipping, checkpointing, and callbacks for Sequential models.

Trainer

Pml\Training\Trainer is the high-level training orchestrator. It wraps a Sequential model and a TrainingArguments config, runs the epoch loop, calls LR scheduler, handles early stopping, saves checkpoints, and fires callbacks at each lifecycle event.

__construct (Sequential $model, TrainingArguments $args)

Binds the model and args. If the model's optimizer implements LearningRateAware, the LR is synced to $args->learningRate immediately.

train (Dataset $dataset, ?Dataset $validation = null): TrainingResult

Runs the full training loop. Returns a TrainingResult with loss history and best-epoch metadata.

$args = new TrainingArguments(
    epochs:       30,
    batchSize:    128,
    learningRate: 3e-4,
    lrSchedule:   'cosine',
    patience:     5,
    outputDir:    'ckpt/run1',
);

$trainer = new Trainer($model, $args);
$trainer->addCallback(new MyCallback());
$result = $trainer->train($trainDs, $valDs);

addCallback (TrainerCallback $callback): void

Registers a callback. Multiple callbacks can be added; they fire in registration order.

TrainingArguments

Value object holding all training hyperparameters. All properties are readonly.

Property	Type	Default	Description
`epochs`	int	10	Training epochs
`batchSize`	int	32	Mini-batch size
`patience`	int	0	Early stopping patience (0 = disabled)
`minDelta`	float	1e-4	Minimum improvement to reset patience counter
`learningRate`	float	0.001	Initial learning rate
`lrSchedule`	string	'none'	`'none'` · `'cosine'` · `'step'` · `'linear'`
`lrDecay`	float	0.1	Decay factor for `'step'` schedule
`lrStepSize`	int	5	Epochs between LR drops (step schedule)
`warmupEpochs`	int	0	Linear warm-up epochs from 0 → learningRate
`mixedPrecision`	bool	false	Scaffold for future AMP support
`outputDir`	?string	null	Directory for checkpoints (null = no auto-save)

$args = new TrainingArguments(
    epochs:        100,
    batchSize:     64,
    learningRate:  1e-3,
    lrSchedule:    'cosine',
    warmupEpochs:  5,
    patience:      10,
    outputDir:     'ckpt/mymodel',
);

LRScheduler

Pml\Training\LRScheduler adjusts the optimizer's learning rate each epoch based on the schedule set in TrainingArguments. Used internally by Trainer; you can also use it directly in custom loops.

__construct (Sequential $model, TrainingArguments $args)

Binds to the model's optimizer. Only works if the optimizer implements LearningRateAware.

step (int $epoch, int $totalEpochs): void

Computes the LR for the current epoch and sets it on the optimizer. Call at the start of each epoch.

Schedule formulas

Schedule	Formula
`none`	LR unchanged
`cosine`	lr × ½(1 + cos(π × epoch/totalEpochs))
`step`	lr × decay^(epoch / stepSize)
`linear`	lr × (1 − epoch/totalEpochs)

All schedules are preceded by a linear warm-up if warmupEpochs > 0.

GradScaler

Pml\Training\GradScaler implements dynamic loss scaling for mixed-precision training. Currently a scaffold — useful for fp32 training on systems that benefit from gradient scaling to prevent underflow. Full fp16/bf16 support requires GPU backend.

__construct (bool $enabled = false, float $initScale = 65536.0, float $growthFactor = 2.0, float $backoffFactor = 0.5, int $growthInterval = 2000)

Loss scaling is disabled by default. Enable for mixed-precision training.

Method	Description
`scale(Tensor $lossGrad): Tensor`	Multiplies gradient by current scale factor
`unscaleAndStep(Optimizer $opt, Layer[] $layers): void`	Unscales all gradients, checks for NaN/Inf, then calls optimizer step if valid
`update(): void`	Adjusts scale factor: grows if no overflow for `growthInterval` steps, backs off on overflow
`currentScale(): float`	Current scale value
`isEnabled(): bool`

$scaler = new GradScaler(enabled: true);

foreach ($loader->batches() as $batch) {
    $preds = $model->forward($batch->samples());
    $grad  = $model->getLoss()->differentiate($preds, $batch->labels());
    $grad  = $scaler->scale($grad);
    $model->backward($grad);
    $scaler->unscaleAndStep($model->getOptimizer(), $model->getLayers());
    $scaler->update();
}

EarlyStopping

Pml\NeuralNetwork\EarlyStopping monitors a metric (typically validation loss) and signals when training should stop. Used internally by Sequential::train() and Trainer. Can also be used standalone in custom loops.

__construct (int $patience, string $mode = 'min', float $minDelta = 1e-4)

mode: 'min' for loss (lower is better), 'max' for accuracy.

update (float $metric): int int (signal)

Returns one of three constants:

Constant	Value	Meaning
`EarlyStopping::IMPROVED`	1	Metric improved — save a checkpoint
`EarlyStopping::CONTINUE`	0	No improvement, but still within patience
`EarlyStopping::STOP`	-1	Patience exhausted — stop training

Method	Description
`getBestMetric(): float`	Best metric value seen so far
`getCounter(): int`	Epochs since last improvement
`reset(): void`	Resets counter and best metric

Callbacks

Implement Pml\Training\TrainerCallback to hook into training lifecycle events:

interface TrainerCallback
{
    public function onTrainBegin(TrainingArguments $args, int $steps): void;
    public function onEpochBegin(int $epoch, int $epochs): void;
    public function onBatchEnd(int $step, float $batchLoss): void;
    public function onEpochEnd(int $epoch, float $trainLoss, ?float $valLoss): void;
    public function onTrainEnd(TrainingResult $result): void;
}

class WandbCallback implements TrainerCallback
{
    public function onEpochEnd(int $epoch, float $trainLoss, ?float $valLoss): void
    {
        // Log to W&B, TensorBoard, Redis, etc.
        file_put_contents('logs/loss.ndjson',
            json_encode(['epoch' => $epoch, 'train' => $trainLoss, 'val' => $valLoss]) . "\n",
            FILE_APPEND
        );
    }

    public function onTrainBegin(TrainingArguments $args, int $steps): void {}
    public function onEpochBegin(int $e, int $t): void {}
    public function onBatchEnd(int $s, float $l): void {}
    public function onTrainEnd(TrainingResult $r): void {}
}

TrainingResult

Returned by Trainer::train().

Property	Type	Description
`trainLossHistory`	`float[]`	Per-epoch average training loss
`valLossHistory`	`float[]`	Per-epoch validation loss (if validation provided)
`bestEpoch`	`int`	Epoch where best validation loss occurred
`bestValLoss`	`float`	Best validation loss value
`stoppedEarly`	`bool`	Whether early stopping triggered
`totalEpochs`	`int`	Actual epochs run

Full Example

use Pml\NeuralNetwork\Sequential;
use Pml\NeuralNetwork\Layers\{Dense, LayerNorm, Gelu, Dropout};
use Pml\NeuralNetwork\Optimizers\AdamW;
use Pml\Losses\CategoricalCrossEntropy;
use Pml\Training\{Trainer, TrainingArguments};
use Pml\Dataset;

$model = new Sequential(
    layers: [
        new Dense(512, 256),
        new LayerNorm(256),
        new Gelu(),
        new Dropout(0.1),
        new Dense(256, 10),
    ],
    lossFn:    new CategoricalCrossEntropy(),
    optimizer: new AdamW(learningRate: 3e-4, weightDecay: 0.01),
);

$args = new TrainingArguments(
    epochs:       50,
    batchSize:    256,
    learningRate: 3e-4,
    lrSchedule:   'cosine',
    warmupEpochs: 3,
    patience:     8,
    outputDir:    'ckpt/mnist',
);

$trainer = new Trainer($model, $args);
$trainer->addCallback(new WandbCallback());

$result = $trainer->train($trainDs, $valDs);

printf("Best epoch: %d, Best val loss: %.4f, Stopped early: %s\n",
    $result->bestEpoch,
    $result->bestValLoss,
    $result->stoppedEarly ? 'yes' : 'no'
);