Architecture

The project is moving toward a small set of explicit boundaries:

  • datasets describe benchmark data;

  • task DataModules prepare batches;

  • experiment engines train, validate, test, and checkpoint models;

  • result backends store result records;

  • artifact backends store downloadable files;

  • benchmark reports and leaderboards read stored results.

This separation keeps task/model code independent from reporting and deployment concerns.

Datasets

Datasets expose raw benchmark data. They may be regular-grid time series, anomaly datasets, UEA classification datasets, or irregular sample datasets. Task code should not assume every dataset is a continuous timestamped feature matrix.

Task DataModules

Task DataModules adapt datasets to a task. They own splitting, scaling, window construction, masks, labels, and named batch objects.

Examples:

  • ForecastDataModule

  • ImputationDataModule

  • AnomalyDataModule

  • UEADataModule

Named Batches

New task code should use named batch objects. This avoids fragile positional tuple unpacking and makes model/task contracts easier to inspect.

Experiment Engine

An experiment engine owns the runtime loop:

  • initialize dataset and DataModule;

  • build the model;

  • train and validate each epoch;

  • apply early stopping and scheduling;

  • load the best checkpoint;

  • evaluate on the test split;

  • expose hyperparameters, history, metrics, and artifact paths.

The canonical forecast engine currently powers migrated forecast models such as DLinear and Crossformer.

Task and Model Configuration

The public Experiment API stays ergonomic and flat:

Experiment(
    model="DLinear",
    task="Forecast",
    dataset="ETTh1",
    windows=96,
    pred_len=96,
    lr=0.001,
)

Internally, these settings are split into:

Task Configuration

Settings that define data shape and task semantics.

Model Configuration

Settings that define model architecture.

Runtime Configuration

Settings that define optimization, storage, device, and execution behavior.

Unknown or irrelevant settings fail early.

Results Boundary

Experiment engines emit RunResult records. Result backends store those records. Benchmark reports and leaderboards read stored records and curated reference entries.

Task and model code should not know about leaderboard rendering, CSV export, or web dashboards.

Compatibility Shims

Legacy experiment names remain importable during migration. They should delegate to the canonical architecture where possible. New development should target:

  • v2 DataModules;

  • named batches;

  • typed task/model/runtime configuration;

  • Experiment as the high-level entrypoint;

  • RunResult as the result record.