Non-linear regression — second-derivative SG + SNV feeding a 400-tree random forest.
unvalidatedA non-linear regression pipeline for when PLS underfits.
window_length=15, polyorder=2; resolvesoverlapping absorption bands and removes baseline curvature.
Try this when the target responds non-linearly to the spectrum, or when you want out-of-the-box feature-importance diagnostics. Expect higher variance than PLS; keep an eye on the validation/test gap.
{
"name": "Savitzky-Golay (2nd derivative) + Random Forest",
"pipeline": [
{"class": "nirs4all.operators.transforms.SavitzkyGolay", "params": {"window_length": 15, "polyorder": 2, "deriv": 2}},
{"class": "nirs4all.operators.transforms.StandardNormalVariate"},
{"class": "sklearn.model_selection.ShuffleSplit", "params": {"n_splits": 5, "test_size": 0.2, "random_state": 0}},
{"model": {"class": "sklearn.ensemble.RandomForestRegressor", "params": {"n_estimators": 400, "max_depth": 12, "n_jobs": -1, "random_state": 0}}, "name": "RF-400"}
]
}
# Python
import nirs4all_repository as n4r
pipe = n4r.get("savgol_rf")
config = pipe.to_nirs4all() # ready for nirs4all.run() / predict()# any language: read the index, fetch + verify
curl https://repository.nirs4all.org/data/index.json
curl https://repository.nirs4all.org/data/pipelines/savgol_rf/pipeline.json