Regression Estimators

Regression models in PML compute continuous targets from numeric tensor inputs. They are designed for low-latency inference and efficient training.

Model structure

Input features are represented as [N × D] float32 tensors.
Targets are [N] float32 tensors.
Regression training optimizes a differentiable loss or tree-based objective.

Data flow

Tensor samples [N × D]
   ├─ preprocessing / normalization
   └─ estimator.train() → loss / gradient computation

Example API usage

$dataset = Dataset::fromCSV('datasets/housing/train.csv', labelColumn: 0);
$model = new Pml\Estimators\Regression\GBDTRegressor();
$model->train($dataset);
$predictions = $model->predict($dataset);

Internal behavior

Training may use native tensor operations for matrix algebra.
Tree-based regressors may use C-backed histograms and split scoring.
Weights and trees are persisted separately from PHP object structure.

Performance considerations

Regression workloads benefit from dense tensors and contiguous layout.
Use Dataset::materialize() before training to avoid repeated dataset conversion.
Avoid broadcasting large tensors in the training loop unless required.

When to use

Use regression estimators for real-valued prediction targets.
Use tree-based regressors when feature normalization is less important.

When not to use

Do not use regression estimators for categorical classification targets.
Do not use dense regression models on extremely sparse high-dimensional data without prior dimensionality reduction.