Experiments
Experiments are the high-level path for reproducible benchmark runs. They connect a model, task, dataset, configuration, result backend, and artifact backend.
Basic Usage
from torch_timeseries import Experiment
results = Experiment(
model="DLinear",
task="Forecast",
dataset="ETTh1",
windows=96,
pred_len=96,
save_dir="./results",
).run(seeds=[1, 2, 3])
print(results[0].metrics)
The public API is flat on purpose: users can pass task, model, and runtime settings in one constructor. Internally, settings are split into validated task, model, and runtime configuration objects before data or model construction.
Supported Task Names
The standard task names are:
ForecastImputationAnomalyDetectionUEAClassification
The canonical engine path is being migrated model by model. Migrated forecast models use the v2 DataModule and typed configuration path. Legacy experiment classes remain available through compatibility shims while the migration continues.
Multiple Seeds
Pass multiple seeds to produce multiple RunResult records under the
same run configuration:
Experiment(
model="DLinear",
task="Forecast",
dataset="ETTh1",
windows=96,
pred_len=96,
save_dir="./results",
).run(seeds=[1, 2, 3])
The random seed is not part of the configuration fingerprint. This lets leaderboards aggregate repeated runs while still preserving every seeded run as its own result record and model artifact.
Grid Runs
Use Experiment.grid() to run repeated combinations:
Experiment.grid(
models=["DLinear", "Crossformer"],
tasks=["Forecast"],
datasets=["ETTh1", "ETTm1"],
seeds=[1, 2, 3],
save_dir="./results",
windows=96,
pred_len=96,
).run()
Compare Results
Local results can be compared from Python:
Experiment.compare(save_dir="./results", task="Forecast")
or from the CLI:
pytexp compare --save_dir ./results --task Forecast
Configuration Validation
Experiment configuration is strict. Unknown or irrelevant keys fail before the dataset or model is built. This protects benchmark results from accidental settings that look meaningful but are not consumed by the selected model or task.
For example, a DLinear forecast run accepts individual but rejects
Crossformer-only settings such as d_model.