Training System
The high-level Trainer + TrainingArguments API orchestrates
epoch loops, LR scheduling, early stopping, gradient clipping, checkpointing, and callbacks
for Sequential models.
On this page
Trainer
Pml\Training\Trainer is the high-level training orchestrator.
It wraps a Sequential model and a TrainingArguments config,
runs the epoch loop, calls LR scheduler, handles early stopping, saves checkpoints,
and fires callbacks at each lifecycle event.
Binds the model and args. If the model's optimizer implements LearningRateAware, the LR is synced to $args->learningRate immediately.
Runs the full training loop. Returns a TrainingResult with loss history and best-epoch metadata.
$args = new TrainingArguments(
epochs: 30,
batchSize: 128,
learningRate: 3e-4,
lrSchedule: 'cosine',
patience: 5,
outputDir: 'ckpt/run1',
);
$trainer = new Trainer($model, $args);
$trainer->addCallback(new MyCallback());
$result = $trainer->train($trainDs, $valDs);
Registers a callback. Multiple callbacks can be added; they fire in registration order.
TrainingArguments
Value object holding all training hyperparameters. All properties are readonly.
| Property | Type | Default | Description |
|---|---|---|---|
epochs | int | 10 | Training epochs |
batchSize | int | 32 | Mini-batch size |
patience | int | 0 | Early stopping patience (0 = disabled) |
minDelta | float | 1e-4 | Minimum improvement to reset patience counter |
learningRate | float | 0.001 | Initial learning rate |
lrSchedule | string | 'none' | 'none' · 'cosine' · 'step' · 'linear' |
lrDecay | float | 0.1 | Decay factor for 'step' schedule |
lrStepSize | int | 5 | Epochs between LR drops (step schedule) |
warmupEpochs | int | 0 | Linear warm-up epochs from 0 → learningRate |
mixedPrecision | bool | false | Scaffold for future AMP support |
outputDir | ?string | null | Directory for checkpoints (null = no auto-save) |
$args = new TrainingArguments(
epochs: 100,
batchSize: 64,
learningRate: 1e-3,
lrSchedule: 'cosine',
warmupEpochs: 5,
patience: 10,
outputDir: 'ckpt/mymodel',
);
LRScheduler
Pml\Training\LRScheduler adjusts the optimizer's learning rate each epoch
based on the schedule set in TrainingArguments.
Used internally by Trainer; you can also use it directly in custom loops.
Binds to the model's optimizer. Only works if the optimizer implements LearningRateAware.
Computes the LR for the current epoch and sets it on the optimizer. Call at the start of each epoch.
Schedule formulas
| Schedule | Formula |
|---|---|
none | LR unchanged |
cosine | lr × ½(1 + cos(π × epoch/totalEpochs)) |
step | lr × decay^(epoch / stepSize) |
linear | lr × (1 − epoch/totalEpochs) |
All schedules are preceded by a linear warm-up if warmupEpochs > 0.
GradScaler
Pml\Training\GradScaler implements dynamic loss scaling for mixed-precision training.
Currently a scaffold — useful for fp32 training on systems that benefit from gradient scaling
to prevent underflow. Full fp16/bf16 support requires GPU backend.
Loss scaling is disabled by default. Enable for mixed-precision training.
| Method | Description |
|---|---|
scale(Tensor $lossGrad): Tensor | Multiplies gradient by current scale factor |
unscaleAndStep(Optimizer $opt, Layer[] $layers): void | Unscales all gradients, checks for NaN/Inf, then calls optimizer step if valid |
update(): void | Adjusts scale factor: grows if no overflow for growthInterval steps, backs off on overflow |
currentScale(): float | Current scale value |
isEnabled(): bool |
$scaler = new GradScaler(enabled: true);
foreach ($loader->batches() as $batch) {
$preds = $model->forward($batch->samples());
$grad = $model->getLoss()->differentiate($preds, $batch->labels());
$grad = $scaler->scale($grad);
$model->backward($grad);
$scaler->unscaleAndStep($model->getOptimizer(), $model->getLayers());
$scaler->update();
}
EarlyStopping
Pml\NeuralNetwork\EarlyStopping monitors a metric (typically validation loss)
and signals when training should stop. Used internally by Sequential::train()
and Trainer. Can also be used standalone in custom loops.
mode: 'min' for loss (lower is better), 'max' for accuracy.
Returns one of three constants:
| Constant | Value | Meaning |
|---|---|---|
EarlyStopping::IMPROVED | 1 | Metric improved — save a checkpoint |
EarlyStopping::CONTINUE | 0 | No improvement, but still within patience |
EarlyStopping::STOP | -1 | Patience exhausted — stop training |
| Method | Description |
|---|---|
getBestMetric(): float | Best metric value seen so far |
getCounter(): int | Epochs since last improvement |
reset(): void | Resets counter and best metric |
Callbacks
Implement Pml\Training\TrainerCallback to hook into training lifecycle events:
interface TrainerCallback
{
public function onTrainBegin(TrainingArguments $args, int $steps): void;
public function onEpochBegin(int $epoch, int $epochs): void;
public function onBatchEnd(int $step, float $batchLoss): void;
public function onEpochEnd(int $epoch, float $trainLoss, ?float $valLoss): void;
public function onTrainEnd(TrainingResult $result): void;
}
class WandbCallback implements TrainerCallback
{
public function onEpochEnd(int $epoch, float $trainLoss, ?float $valLoss): void
{
// Log to W&B, TensorBoard, Redis, etc.
file_put_contents('logs/loss.ndjson',
json_encode(['epoch' => $epoch, 'train' => $trainLoss, 'val' => $valLoss]) . "\n",
FILE_APPEND
);
}
public function onTrainBegin(TrainingArguments $args, int $steps): void {}
public function onEpochBegin(int $e, int $t): void {}
public function onBatchEnd(int $s, float $l): void {}
public function onTrainEnd(TrainingResult $r): void {}
}
TrainingResult
Returned by Trainer::train().
| Property | Type | Description |
|---|---|---|
trainLossHistory | float[] | Per-epoch average training loss |
valLossHistory | float[] | Per-epoch validation loss (if validation provided) |
bestEpoch | int | Epoch where best validation loss occurred |
bestValLoss | float | Best validation loss value |
stoppedEarly | bool | Whether early stopping triggered |
totalEpochs | int | Actual epochs run |
Full Example
use Pml\NeuralNetwork\Sequential;
use Pml\NeuralNetwork\Layers\{Dense, LayerNorm, Gelu, Dropout};
use Pml\NeuralNetwork\Optimizers\AdamW;
use Pml\Losses\CategoricalCrossEntropy;
use Pml\Training\{Trainer, TrainingArguments};
use Pml\Dataset;
$model = new Sequential(
layers: [
new Dense(512, 256),
new LayerNorm(256),
new Gelu(),
new Dropout(0.1),
new Dense(256, 10),
],
lossFn: new CategoricalCrossEntropy(),
optimizer: new AdamW(learningRate: 3e-4, weightDecay: 0.01),
);
$args = new TrainingArguments(
epochs: 50,
batchSize: 256,
learningRate: 3e-4,
lrSchedule: 'cosine',
warmupEpochs: 3,
patience: 8,
outputDir: 'ckpt/mnist',
);
$trainer = new Trainer($model, $args);
$trainer->addCallback(new WandbCallback());
$result = $trainer->train($trainDs, $valDs);
printf("Best epoch: %d, Best val loss: %.4f, Stopped early: %s\n",
$result->bestEpoch,
$result->bestValLoss,
$result->stoppedEarly ? 'yes' : 'no'
);