Simula / OpenSimula

Experimental Simula-style synthetic data pipeline (afterimage.simula).

Checkpointing and export

class afterimage.simula.Checkpointer(checkpoint_root: Path | str, *, validate_taxonomies: bool = True, clear_stale_optional: bool = True)[source]

Bases: object

Collect OpenSimula artifacts under <root>/opensimula/ and write manifest.json on exit.

Typical usage:

with Checkpointer("./run") as cp:
    bundle.save(cp)
    spec.save(cp)
    cp.write_run_config(OpenSimulaRunConfig(name="demo", model="gemini-2.5-flash"))
url = cp.push_to_hub("org/dataset-repo")

Call write_taxonomy_bundle() (or bundle.save(cp)) at least once before the context exits. Optional files are removed on enter when clear_stale_optional is true so omitted spec.save / write_run_config do not leave stale JSON.

async apush_to_hub(repo_id: str, *, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', token: str | None = None, commit_message: str | None = None, private: bool = False, path_in_repo: str = 'opensimula', dataset_card: str | None = None) → str[source]

Same as push_to_hub(), but runs blocking Hub I/O in a worker thread.

Prefer this from async code so uploads do not block the event loop.

finalize() → OpenSimulaManifest[source]: Write manifest.json immediately (usually you rely on context exit instead).

property opensimula_dir: Path

push_to_hub(repo_id: str, *, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', token: str | None = None, commit_message: str | None = None, private: bool = False, path_in_repo: str = 'opensimula', dataset_card: str | None = None) → str[source]

Upload <root>/opensimula/ to the Hugging Face Hub (creates the repo if missing).

Requires manifest.json on disk—for example after the with block exits or after finalize().

dataset_card becomes the repository README.md at the Hub root. When omitted or blank, a default card is generated (YAML tags frontmatter plus a short introduction with links to AfterImage and the Simula paper / blog).

write_run_config(config: OpenSimulaRunConfig) → None[source]: Write run_config.json (call after write_taxonomy_bundle()).

write_sampling_strategy(spec: SamplingStrategySpec) → None[source]: Write sampling_strategy.json (call after write_taxonomy_bundle()).

write_taxonomy_bundle(bundle: TaxonomyBundle) → None[source]: Write taxonomy_bundle.json and record digests for the manifest.

class afterimage.simula.SimulaCheckpoint(manifest: OpenSimulaManifest, bundle: TaxonomyBundle, sampling_strategy: SamplingStrategySpec | None, run_config: OpenSimulaRunConfig | None, root: Path)[source]

Bases: object

Loaded checkpoint: manifest + parsed models + optional extras.

bundle: TaxonomyBundle

manifest: OpenSimulaManifest

root: Path

run_config: OpenSimulaRunConfig | None

sampling_strategy: SamplingStrategySpec | None

Bases: BaseModel

Typed metadata and hyperparameters stored in run_config.json beside a checkpoint.

complexify_c: float | None

corpus_excerpt_count: int | None

data_jsonl: str | None

description: str | None

max_children_per_node: int | None

max_concurrency: int | None

max_factors: int | None

max_frontier_per_depth: int | None

meta_prompt_K: int | None

model: str | None

model_config = {'extra': 'ignore'}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str | None

num_choices: int | None

num_samples: int | None

proposal_N: int | None

seed: int | None

target_depth_D: int | None

temperature: float | None

class afterimage.simula.OpenSimulaManifest(*, producer: Literal['afterimage'] = 'afterimage', format: Literal['opensimula'] = 'opensimula', format_version: str = '1.0', created_at: str, afterimage_version: str | None = None, instruction_y_sha256: str, taxonomy_bundle_sha256: str, sampling_strategy_sha256: str | None = None, taxonomy_bundle_file: str = 'taxonomy_bundle.json', sampling_strategy_file: str | None = None, run_config_file: str | None = None)[source]

Bases: BaseModel

Versioned checkpoint manifest (portable across tools that understand format).

afterimage_version: str | None

created_at: str

format: Literal['opensimula']

format_version: str

instruction_y_sha256: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

producer: Literal['afterimage']

run_config_file: str | None

sampling_strategy_file: str | None

sampling_strategy_sha256: str | None

taxonomy_bundle_file: str

taxonomy_bundle_sha256: str

afterimage.simula.save_checkpoint(checkpoint_root: Path | str, *, bundle: TaxonomyBundle, sampling_strategy: SamplingStrategySpec | None = None, run_config: OpenSimulaRunConfig | None = None, validate_taxonomies: bool = True) → OpenSimulaManifest[source]

Write opensimula/ under checkpoint_root and return the manifest.

Equivalent to using Checkpointer with bundle.save / spec.save / Checkpointer.write_run_config().

afterimage.simula.load_checkpoint(checkpoint_root: Path | str, *, verify_digests: bool = True, validate_taxonomies: bool = True) → SimulaCheckpoint[source]: Load opensimula/ from checkpoint_root.

afterimage.simula.push_checkpoint_to_hub(checkpoint_root: Path | str, repo_id: str, *, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', token: str | None = None, commit_message: str | None = None, private: bool = False, path_in_repo: str = 'opensimula', dataset_card: str | None = None) → str[source]

Upload local opensimula/ to the Hub under path_in_repo (default opensimula).

Same as Checkpointer(checkpoint_root).push_to_hub(...). Returns the canonical repo URL.

afterimage.simula.pull_checkpoint_from_hub(repo_id: str, checkpoint_root: Path | str, *, repo_type: Literal['dataset', 'model', 'space'] = 'dataset', revision: str | None = None, token: str | None = None, path_in_repo: str = 'opensimula') → Path[source]

Download path_in_repo/** from the Hub into checkpoint_root (merging with snapshot_download).

Returns opensimula_dir(checkpoint_root).

afterimage.simula.append_datapoints_jsonl(path: Path | str, records: Iterable[DataPointRecord], *, mkdir: bool = True) → int[source]

Append each record as one JSON line. Creates parent directories when mkdir is true.

Returns the number of lines written.

afterimage.simula.configure_example_console(*, simula_level: int = 30, root_level: int = 30) → None[source]

One-line setup for example scripts: quiet root, optional simula detail, no httpx spam.

Use simula_level=logging.INFO when you want afterimage.simula DEBUG/INFO without tqdm (e.g. show_progress=False on build_taxonomy).

afterimage.simula.silence_noisy_third_party_loggers(level: int = 30) → None[source]: Turn down chatty HTTP and google-genai log lines during OpenSimula runs.