Generators
- class afterimage.ConversationGenerator(respondent_prompt: str, api_key: str | SmartKeyPool, correspondent_prompt: str | None = None, model_name: str | None = None, safety_settings: List[Dict[str, str]] | None = None, auto_improve: bool = False, evaluator_model_name: str | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', embedding_provider: EmbeddingProvider | None = None, embedding_provider_config: dict[str, Any] | None = None, judge_config: ConversationJudgeConfig | None = None, storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None)[source]
Bases:
BaseGeneratorGenerates conversations between a correspondent (question generator) and a respondent (answer generator) asynchronously.
- Parameters:
respondent_prompt – System prompt to the respondent, e.g., assistant that you want you fine-tune on this dataset
api_key – Either a single API key string or a SmartKeyPool instance for LLM use
correspondent_prompt – System prompt to the correspondent, e.g., model that roleplays a user of the assistant that you want to fine-tune on this dataset
model_name – Model name to use
safety_settings – Safety settings for the model
auto_improve – Whether to try to improve low-quality generations
evaluator_model_name – Model name for the evaluator LLM when auto_improve is True.
embedding_provider – Optional shared
EmbeddingProviderfor embedding metrics.embedding_provider_config – JSON-style config for
EmbeddingProviderFactorywhenembedding_provideris omitted (defaults by chat provider).judge_config – Optional
ConversationJudgeConfig(aggregation and grade thresholds).model_provider_name – Provider used for accessing LLMs. Supported values are “gemini”, “openai”, and “deepseek”.
storage – Storage implementation for saving conversations. If None, creates JSONLStorage with datetime-based filename.
monitor – GenerationMonitor instance for tracking generation metrics. If None, a default one is created.
instruction_generator_callback – Callback for instruction generation. Can also be passed to generate() method (deprecated).
respondent_prompt_modifier – Callback to modify respondent prompts. Can also be passed to generate() method (deprecated).
- async answer(respondent: ChatSession, question: str | ConversationEntry) ConversationEntry[source]
Generates an answer from the respondent based on the given question.
- async ask(correspondent: ChatSession, answer: str | ConversationEntry) str[source]
Generates a question from the correspondent based on the given answer.
- async create_correspondent_prompt(assistant_prompt: str) str[source]
Create a correspondent prompt based on the assistant prompt.
- async create_model(prompt: str) ChatSession[source]
Creates and initializes a chat model with the given prompt.
- async generate(num_dialogs: int | None = None, max_turns: int = 1, stopping_criteria: List[BaseStoppingCallback] | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, max_concurrency: int | None = None) None[source]
Generates multiple conversation dialogs until stopping criteria is met.
- Parameters:
num_dialogs – Number of dialogs to generate. Defaults to 5 if no other stopping criteria is specified.
max_turns – Maximum number of turns per dialog. Actual number of turns is randomly sampled from 1 .. max_turns.
stopping_criteria – A list of callbacks to determine when to stop generation. If num_dialogs is specified, FixedNumberStoppingCallback is added to this list automatically.
instruction_generator_callback – Callback for instruction generation. Deprecated: Pass this to the constructor instead. Defaults to None.
respondent_prompt_modifier – Callback to modify respondent prompts. Deprecated: Pass this to the constructor instead. Defaults to None.
max_concurrency – Number of concurrent generations. Defaults to 8 for DeepSeek and 4 for other providers.
- async generate_single(max_turns: int, check_for_near_duplicates: bool = False, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None) AsyncGenerator[EvaluatedConversationWithContext | Conversation, None][source]
Generates conversations for a single session and yields them.
- async go(turns: int = 1, first_question: str | None = None, check_for_near_duplicates: bool = False, correspondent_prompt: str | None = None, respondent_prompt: str | None = None) List[ConversationEntry][source]
Simulates a multi-turn conversation between the correspondent and respondent.
- class afterimage.StructuredGenerator(output_schema: Type[T], respondent_prompt: str, api_key: str | SmartKeyPool, model_name: str | None = None, safety_settings: List[Dict[str, str]] | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, correspondent_prompt: str | None = None)[source]
Bases:
BaseGeneratorGenerates structured datasets where outputs strictly conform to a Pydantic schema.
- async generate(num_samples: int | None = None, stopping_criteria: list[BaseStoppingCallback] | None = None, instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None, max_concurrency: int | None = None) None[source]
Generates structured samples and saves them to storage.
- Parameters:
num_samples – Total number of samples to generate. Defaults to 5 if no other stopping criteria is specified.
stopping_criteria – A list of callbacks to determine when to stop generation. If num_samples is specified,
FixedNumberStoppingCallbackis added to this list.instruction_generator_callback – Callback for instruction generation. Deprecated: Pass this to the constructor instead. Defaults to None.
respondent_prompt_modifier – Callback to modify respondent prompts. Deprecated: Pass this to the constructor instead. Defaults to None.
max_concurrency – Maximum number of concurrent tasks. Defaults to 8 for DeepSeek and 4 for other providers.
- async generate_single(instruction_generator_callback: BaseInstructionGeneratorCallback | None = None, respondent_prompt_modifier: BaseRespondentPromptModifierCallback | None = None) AsyncGenerator[StructuredGenerationRow[T], None][source]
Generates structured outputs for a single batch of instructions.
- class afterimage.PersonaGenerator(api_key: str | SmartKeyPool, model_name: str | None = None, safety_settings: list[dict[str, str]] | None = None, model_provider_name: Literal['gemini', 'openai', 'deepseek'] = 'gemini', storage: BaseStorage | None = None, monitor: GenerationMonitor | None = None, max_concurrency: int | None = None)[source]
Bases:
object- async generate_from_documents(documents: DocumentProvider | list[str], max_docs: int | None = None, n_iterations: int | None = None, target_data_count: int | None = None, num_random_contexts: int = 1)[source]