Estimators

Estimators

45 estimators covering classification, regression, anomaly detection, clustering, dimensionality reduction, and meta-estimation. All implement the Learner interface (train(Dataset) + predict(Dataset)).

Estimator Interfaces

Interface	Methods	Description
`Estimator`	`type(): EstimatorType`	Root interface for all estimators
`Learner`	`train(Dataset): void` · `predict(Dataset): Tensor`	Supervised and unsupervised learners
`Probabilistic`	`predictProba(Dataset): Tensor`	Returns class probability matrix [N, C]
`Scoring`	`score(Dataset): float`	Returns accuracy (classifiers) or R² (regressors)
`Online`	`partialFit(Dataset): void`	Incremental / streaming training
`RanksFeatures`	`featureImportances(): Tensor`	Returns per-feature importance scores
`Verbose`	`setLogger(LoggerInterface): void`	PSR-3 training progress logging
`Persistable`	`save(string): void` · `load(string): static`	Checkpoint save/load via SafeTensors

Classifiers

Namespace: Pml\Estimators\Classifiers\

Class	Constructor Parameters	Interfaces	Notes
`GBDTClassifier`	`nEstimators=100, maxDepth=4, numBins=255, lr=0.1, lambda=1.0, minChildWeight=1, subSample=1.0, colSampleByTree=1.0`	Scoring RanksFeatures	Histogram-based leaf-wise GBDT (LightGBM-style). C-level trees.
`RandomForestClassifier`	`nEstimators=100, maxDepth=10, minSamplesSplit=2`	Scoring	Bagged CART trees, random feature subsets. Parallelized over trees.
`ExtraTreesClassifier`	`nEstimators=100, maxDepth=10, minSamplesSplit=2`	Scoring	Extremely randomized trees — random split thresholds, lower variance.
`DecisionTreeClassifier`	`maxDepth=10, minSamplesSplit=2, maxFeatures=null`	Scoring	CART with Gini impurity. C-level split search.
`LogisticRegression`	`epochs=100, learningRate=0.01, batchSize=32`	Scoring Probabilistic	Binary or multi-class via softmax. Mini-batch SGD.
`SoftmaxClassifier`	`epochs, learningRate, batchSize, lambda`	Probabilistic	Multi-class linear classifier with L2 regularization.
`SVC`	`c=1.0, epochs=100, learningRate=0.01, batchSize=32`	Scoring	Linear SVM with hinge loss. SGD optimization.
`MLPClassifier`	`hidden=[100], epochs=100, batchSize=32, learningRate=0.001, dropout=0.0`	Scoring Probabilistic	Multi-layer perceptron using Sequential internally. ReLU activations.
`AdaBoostClassifier`	`nEstimators=50, learningRate=1.0`	Scoring	SAMME algorithm with DecisionTree weak learners.
`LogitBoost`	`nEstimators, learningRate, maxDepth`	Probabilistic	Gradient boosting with logistic loss. Probabilistic outputs.
`GaussianNB`	—	Probabilistic	Gaussian Naive Bayes. C-level log-likelihood computation.
`BernoulliNB`	`alpha=1.0`	Probabilistic	For binary feature vectors. Laplace smoothing.
`MultinomialNB`	`alpha=1.0`	Probabilistic	For count features (e.g., TF-IDF). Laplace smoothing.
`CategoricalNB`	`alpha=1.0`	Probabilistic	For categorical integer-encoded features.
`KNNClassifier`	`k=5`	Scoring Probabilistic	Brute-force KNN via C pairwise-L2. O(N·D) per query.
`KDNeighborsClassifier`	`k=5, leafSize=30`	Scoring	KD-Tree based KNN. Faster for low-D (<20 features).
`RadiusNeighborsClassifier`	`radius=1.0`	Scoring	Classifies using all neighbors within a fixed radius.
`OneVsRest`	`prototype: Learner`	Scoring	Wraps any binary classifier for multi-class OVR decomposition.
`VotingClassifier`	`estimators: Learner[]`	Scoring	Majority-vote ensemble. Optionally soft-vote with Probabilistic members.
`DummyClassifier`	—		Returns the most frequent class label. Baseline.

Regressors

Namespace: Pml\Estimators\Regression\

Class	Constructor Parameters	Notes
`GBDTRegressor`	`nEstimators=100, maxDepth=4, numBins=255, lr=0.1, lambda=1.0`	Histogram leaf-wise GBDT with MSE loss. Same C kernel as GBDTClassifier.
`GradientBoostingRegressor`	`nEstimators, learningRate, maxDepth, loss`	Stage-wise gradient boosting with pluggable loss (MSE, Huber, MAE).
`RandomForestRegressor`	— (see classifier)	Alias for ExtraTreeRegressor ensemble.
`ExtraTreeRegressor`	`nEstimators, maxDepth, minSamplesSplit`	Extremely randomized regression trees.
`DecisionTreeRegressor`	`maxDepth=3, minSamplesSplit=2, maxFeatures=null`	CART with MSE criterion.
`LinearRegression`	—	Closed-form OLS via LAPACKE SGELSD. O(D³) — for small D only.
`Ridge`	`alpha=1.0, epochs=100, learningRate=0.01, batchSize=32`	L2-regularized linear regression. Mini-batch SGD.
`Lasso`	`alpha=1.0, epochs=100, learningRate=0.01, batchSize=32`	L1-regularized via subgradient SGD. Produces sparse weights.
`ElasticNet`	`alpha=1.0, l1Ratio=0.5, epochs=100, learningRate=0.01, batchSize=32`	Combines L1 + L2 regularization.
`Adaline`	`epochs, learningRate, batchSize`	Adaptive linear neuron. Mini-batch gradient descent, MSE loss.
`MLPRegressor`	`hidden=[100], epochs=100, batchSize=32, learningRate=0.001`	Deep MLP for regression. Outputs continuous values via linear head.
`SVR`	`c, epsilon, kernel, epochs, learningRate`	ε-insensitive SVR. Hinge loss SGD with RBF or linear kernel.
`KNNRegressor`	`k=5`	KNN regression — average of k nearest target values.
`KDNeighborsRegressor`	`k=5`	KD-Tree KNN regression.
`VotingRegressor`	`estimators: Learner[]`	Average predictions from an ensemble of regressors.
`DummyRegressor`	—	Returns training label mean. Baseline.

Anomaly Detectors

Namespace: Pml\Estimators\AnomalyDetectors\

Anomaly detectors implement Learner. predict() returns a Tensor of labels: 1.0 = inlier, -1.0 = outlier.

Class	Constructor Parameters	Notes
`IsolationForest`	`nEstimators=100, sampleSize=256, contamination=0.1`	Builds random isolation trees. Anomaly score via average path length. C-accelerated.
`LocalOutlierFactor`	`k=20`	LOF score based on local reachability density vs k neighbors.
`GaussianMLE`	`threshold=-10.0`	Fits a multivariate Gaussian. Points below log-likelihood threshold are anomalies.
`OneClassSVM`	`nu=0.1, kernel=RBF(0.1), epochs=100, learningRate=0.01`	One-class SVM with RBF or linear kernel. SGD optimization.
`Loda`	`nProjections=100, bins=10`	Lightweight Online Detector of Anomalies. Random projections + histogram density.
`RobustZScore`	`threshold=3.5`	Modified Z-Score using median absolute deviation. Robust to outliers in the reference.

Clusterers

Namespace: Pml\Estimators\Clusterers\

Clusterers implement Learner. predict() returns cluster index labels.

Class	Constructor Parameters	Notes
`KMeans`	`k=3, maxIter=300, tolerance=1e-4`	Lloyd's algorithm. Centroid init via KMeans++ seeder. C-level assignment and update.
`GaussianMixture`	`k=3, maxIter=100, tolerance=1e-4`	EM algorithm for GMM. Soft cluster assignments. Uses LAPACKE for covariance inversion.
`FuzzyCMeans`	`k=3, fuzziness=2.0, maxIter=300, tolerance=1e-4`	Fuzzy / soft clustering. Fuzziness m > 1; higher m = softer boundaries.
`DBSCAN`	`epsilon=0.5, minSamples=5`	Density-based. No pre-specified k. Marks noise points as label `-1`.
`MeanShift`	`bandwidth=1.0, maxIter=100, tolerance=1e-4`	Non-parametric; finds cluster centers as density peaks. Automatically determines k.

KMeans Seeders

Pass via seeder: parameter to KMeans or GaussianMixture:

Seeder	Description
`PlusPlus`	KMeans++ — default, reduces poor initializations
`KMC2`	Fast KMeans++ approximation — O(k) vs O(Nk)
`Random`	Uniform random seeding — fast but prone to bad local minima
`Preset`	Fixed user-supplied centroids

Decomposition & Manifold

Class	Namespace	Parameters	Notes
`PrincipalComponentAnalysis`	`Estimators\Decomposition\`	`nComponents=2`	Full SVD via LAPACKE. Returns loadings, explained variance ratio.
`TSNE`	`Estimators\Manifold\`	`nComponents=2, perplexity=30, lr=200, maxIter=1000`	t-SNE via Barnes-Hut approximation. For visualization. O(N log N) per iter.

Meta Estimators

Class	Namespace	Description
`GridSearch`	`Estimators\Meta\`	Exhaustive hyperparameter search over a parameter grid. Trains one model per combination.
`RandomSearch`	`Estimators\Meta\`	Random hyperparameter sampling — faster than grid for large spaces.
`PlattScaler`	`Estimators\Meta\`	Wraps any classifier and calibrates probability outputs via Platt scaling (logistic regression on decision scores).
`BootstrapAggregator`	`Pml\`	Bagging meta-estimator. Wraps any Learner, trains N copies on bootstrap samples, averages/votes predictions.
`CommitteeMachine`	`Pml\`	Weighted ensemble of experts. Weights are updated based on per-expert validation error.
`StackingRegressor`	`Pml\Ensemble\`	Stacked generalization: base estimators → meta-regressor trained on OOF predictions.

Usage Example

use Pml\Estimators\Classifiers\GBDTClassifier;
use Pml\Dataset;

// Train
$clf = new GBDTClassifier(
    nEstimators: 300,
    maxDepth:    6,
    lr:          0.05,
    lambda:      1.0,
    subSample:   0.8,
);
$clf->train($trainSet);

// Predict
$preds = $clf->predict($testSet);   // Tensor [N] of class indices
$score = $clf->score($testSet);     // float accuracy

// Feature importance
$fi = $clf->featureImportances();   // Tensor [D]

// Save / load
$clf->save('models/gbdt/');
$clf2 = GBDTClassifier::load('models/gbdt/');

// Cross-validation with any estimator
use Pml\CrossValidation\StratifiedKFold;

$cv     = new StratifiedKFold(k: 5);
$scores = $cv->score(new GBDTClassifier(), $dataset);
echo "Mean CV accuracy: " . array_sum($scores) / count($scores);

// Hyperparameter search
use Pml\Estimators\Meta\GridSearch;

$gs = new GridSearch(
    estimator: new GBDTClassifier(),
    grid: [
        'nEstimators' => [100, 200, 300],
        'maxDepth'    => [4, 6, 8],
        'lr'          => [0.05, 0.1],
    ],
    metric: 'accuracy',
    cv:     3,
);
$gs->train($dataset);
echo "Best params: "; print_r($gs->bestParams());

On this page