SmartBuildSim

Deterministic execution

SmartBuildSim provides deterministic behaviour across all workflows via the :mod:smartbuildsim.config module. The module exposes a DeterminismConfig model together with helpers for seeding NumPy, Python’s random module and component specific pseudo random number generators.

Sources of randomness

The table below summarises where randomness occurs and how it is controlled.

Area Source Seed resolution
Data generation Latent feature synthesis, sensor specific noise, anomaly injection create_rng("data.generator.zone", explicit=config.seed, offset=zone_index)
Benchmarking (regression/anomaly) Shuffled 5-fold splits resolve_seed("evaluation.<task>.kfold", explicit=seed, offset=index)
Reinforcement learning Q-learning sampling, evaluation rollouts, soft Q-learning temperature exploration create_rng("models.rl.q_learning.train", explicit=config.seed), create_rng("models.rl.q_learning.eval", explicit=config.seed, offset=1), create_rng("evaluation.rl.soft.train", explicit=config.seed)
Model training IsolationForest randomness resolve_seed("models.anomaly.isolation_forest", explicit=config.random_state)

All helpers ultimately derive from a single DeterminismConfig instance which can be populated from configuration files or instantiated manually. CLI commands accept an optional determinism section, for example:

scenario: office-small
determinism:
  seed: 2024
  components:
    models.rl.q_learning.train: 321

When supplied, the CLI will call configure_determinism with the parsed payload before any data is generated or models are trained. In Python code the same effect can be achieved programmatically:

from smartbuildsim.config import DeterminismConfig, configure_determinism

configure_determinism(DeterminismConfig(seed=7))

Specific generators can then be requested using create_rng or raw integer seeds can be obtained via resolve_seed.

Limitations

Reproducibility shortcut

The repository ships with a reproduce.sh script. It re-runs the default CLI data pipeline twice and checks that both outputs match byte-for-byte. This offers a quick regression test for determinism in continuous integration or local development environments.