Generators

class afterimage.ConversationGenerator(respondent_prompt: str, api_key: str | SmartKeyPool, correspondent_prompt: str | None = None, model_name: str | None = None, safety_settings: List[Dict[str, str]] | None = None, auto_improve: bool = False, evaluator_model_name: str | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', embedding_provider: EmbeddingProvider | None = None, embedding_provider_config: dict[str, Any] | None = None, judge_config: ConversationJudgeConfig | None = None, storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None)[source]

Bases: BaseGenerator

Generates conversations between a correspondent (question generator) and a respondent (answer generator) asynchronously.

Parameters:
  • respondent_prompt – System prompt to the respondent, e.g., assistant that you want you fine-tune on this dataset

  • api_key – Either a single API key string or a SmartKeyPool instance for LLM use

  • correspondent_prompt – System prompt to the correspondent, e.g., model that roleplays a user of the assistant that you want to fine-tune on this dataset

  • model_name – Model name to use

  • safety_settings – Safety settings for the model

  • auto_improve – Whether to try to improve low-quality generations

  • evaluator_model_name – Model name for the evaluator LLM when auto_improve is True.

  • embedding_provider – Optional shared EmbeddingProvider for embedding metrics.

  • embedding_provider_config – JSON-style config for EmbeddingProviderFactory when embedding_provider is omitted (defaults by chat provider).

  • judge_config – Optional ConversationJudgeConfig (aggregation and grade thresholds).

  • model_provider_name – Provider used for accessing LLMs. Supported values are “gemini”, “openai”, and “deepseek”.

  • storage – Storage implementation for saving conversations. If None, creates JSONLStorage with datetime-based filename.

  • monitor – GenerationMonitor instance for tracking generation metrics. If None, a default one is created.

  • instruction_generator_callback – Callback for instruction generation. Can also be passed to generate() method (deprecated).

  • respondent_prompt_modifier – Callback to modify respondent prompts. Can also be passed to generate() method (deprecated).

async answer(respondent: ChatSession, question: str | ConversationEntry) ConversationEntry[source]

Generates an answer from the respondent based on the given question.

async ask(correspondent: ChatSession, answer: str | ConversationEntry) str[source]

Generates a question from the correspondent based on the given answer.

async create_correspondent_prompt(assistant_prompt: str) str[source]

Create a correspondent prompt based on the assistant prompt.

async create_model(prompt: str) ChatSession[source]

Creates and initializes a chat model with the given prompt.

async generate(num_dialogs: int | None = None, max_turns: int = 1, stopping_criteria: List[BaseStoppingCallback] | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, max_concurrency: int | None = None) None[source]

Generates multiple conversation dialogs until stopping criteria is met.

Parameters:
  • num_dialogs – Number of dialogs to generate. Defaults to 5 if no other stopping criteria is specified.

  • max_turns – Maximum number of turns per dialog. Actual number of turns is randomly sampled from 1 .. max_turns.

  • stopping_criteria – A list of callbacks to determine when to stop generation. If num_dialogs is specified, FixedNumberStoppingCallback is added to this list automatically.

  • instruction_generator_callback – Callback for instruction generation. Deprecated: Pass this to the constructor instead. Defaults to None.

  • respondent_prompt_modifier – Callback to modify respondent prompts. Deprecated: Pass this to the constructor instead. Defaults to None.

  • max_concurrency – Number of concurrent generations. Defaults to 8 for DeepSeek and 4 for other providers.

async generate_single(max_turns: int, check_for_near_duplicates: bool = False, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None) AsyncGenerator[EvaluatedConversationWithContext | Conversation, None][source]

Generates conversations for a single session and yields them.

async go(turns: int = 1, first_question: str | None = None, check_for_near_duplicates: bool = False, correspondent_prompt: str | None = None, respondent_prompt: str | None = None) List[ConversationEntry][source]

Simulates a multi-turn conversation between the correspondent and respondent.

class afterimage.StructuredGenerator(output_schema: Type[T], respondent_prompt: str, api_key: str | SmartKeyPool, model_name: str | None = None, safety_settings: List[Dict[str, str]] | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, correspondent_prompt: str | None = None)[source]

Bases: BaseGenerator

Generates structured datasets where outputs strictly conform to a Pydantic schema.

async create_correspondent_prompt(respondent_prompt: str) str[source]
async generate(num_samples: int | None = None, stopping_criteria: list[BaseStoppingCallback] | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, max_concurrency: int | None = None) None[source]

Generates structured samples and saves them to storage.

Parameters:
  • num_samples – Total number of samples to generate. Defaults to 5 if no other stopping criteria is specified.

  • stopping_criteria – A list of callbacks to determine when to stop generation. If num_samples is specified, FixedNumberStoppingCallback is added to this list.

  • instruction_generator_callback – Callback for instruction generation. Deprecated: Pass this to the constructor instead. Defaults to None.

  • respondent_prompt_modifier – Callback to modify respondent prompts. Deprecated: Pass this to the constructor instead. Defaults to None.

  • max_concurrency – Maximum number of concurrent tasks. Defaults to 8 for DeepSeek and 4 for other providers.

async generate_single(instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None) AsyncGenerator[StructuredGenerationRow[T], None][source]

Generates structured outputs for a single batch of instructions.

class afterimage.PersonaGenerator(api_key: str | SmartKeyPool, model_name: str | None = None, safety_settings: list[dict[str, str]] | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, max_concurrency: int | None = None)[source]

Bases: object

async agenerate_from_persona(persona: str, generation: int = 1) list[str][source]
async agenerate_from_text(text: str) list[str][source]
expected_persona_count(n_iterations: int) int[source]
async generate_from_documents(documents: DocumentProvider | list[str], max_docs: int | None = None, n_iterations: int | None = None, target_data_count: int | None = None, num_random_contexts: int = 1)[source]
generate_from_persona(persona: str, generation: int = 1) list[str][source]
generate_from_text(text: str) list[str][source]