# Conversation Generation The core capability of Afterimage is generating rigorous synthetic conversations. This process involves simulating a dialogue between a **Correspondent** (User) and a **Respondent** (Assistant) to create training or evaluation data. ## `ConversationGenerator` The `ConversationGenerator` class is the primary workhorse for this task. It orchestrates the multi-turn interaction, manages state, handles concurrency, and can even self-correct using an evaluator loop. ### Initialization To start generating, you need to initialize the generator. The recommended pattern is to configure all strategy callbacks (for instructions and prompt modification) at initialization time. ```python from afterimage import ConversationGenerator import os generator = ConversationGenerator( respondent_prompt="You are a helpful assistant.", api_key=os.getenv("GEMINI_API_KEY"), model_name="gemini-2.0-flash", # Strategies are now passed here instruction_generator_callback=my_instruction_gen, respondent_prompt_modifier=my_prompt_modifier, auto_improve=False, # set to to True to enable auto-improvement evaluator_model_name="gemini-2.0-flash" ) ``` **Key Parameters:** * `respondent_prompt` (str): The system prompt that defines the behavior of the assistant. * `api_key` (str | SmartKeyPool): Your API key or a pool of keys for rotation. * `instruction_generator_callback` (BaseInstructionGeneratorCallback): Controls **what** the user asks (e.g., questions based on docs or personas). * `respondent_prompt_modifier` (BaseRespondentPromptModifierCallback, optional): Controls **context** (e.g., injecting RAG data into the system prompt). * `correspondent_prompt` (str, optional): A static system prompt for the user simulator. If neither this nor a callback is provided, one is auto-generated. * `auto_improve` (bool): If `True`, an internal evaluator checks each conversation. If quality is low, it regenerates the conversation automatically (up to a limit). * `storage` (BaseStorage, optional): Where to save the results. Defaults to `JSONLStorage`. ### Generating Conversations Use the `generate` method to start the simulation. ```python from afterimage.callbacks import PersonaUsageStoppingCallback await generator.generate( num_dialogs=100, max_turns=3, max_concurrency=4, stopping_criteria=[ PersonaUsageStoppingCallback(n_personas=50) # Stop if 50 unique personas are used ] ) ``` **Parameters:** * `num_dialogs` (int, optional): Number of conversations to generate. * `max_turns` (int): Maximum exchanges per conversation. * `max_concurrency` (int): Parallel generation limit. * `stopping_criteria` (List[BaseStoppingCallback], optional): Custom logic for when to stop generating (e.g., when all personas are covered). If `num_dialogs` is set, a `FixedNumberStoppingCallback` is automatically added. ## Strategies & Callbacks Afterimage uses a callback system to modularize "User Behavior" and "Assistant Knowledge". ### 1. Instruction Generators (The "User") These determine what the simulated user wants to talk about. * **`ContextualInstructionGeneratorCallback`**: Samples a document and generates a question based on it. * **`PersonaInstructionGeneratorCallback`**: Samples a document-aware persona ("Angry Customer", "Novice") and a document to generate a styled question. It prunes deeper persona layers when supply exceeds demand and uses depth-weighted reuse when more rows are needed than unique personas. * **`ToolCallingInstructionGeneratorCallback`**: Generates instructions specifically designed to trigger tool/function calls (requires a list of tools). Persona-based generations also carry `persona_generation_depth` in row metadata so downstream analysis can see whether the selected persona came from the seed layer or an evolved layer. ### 2. Prompt Modifiers (The "Assistant") These modify the assistant's system prompt at runtime, usually to inject context. * **`WithContextRespondentPromptModifier`**: Injects the text of the document selected by the instruction generator into the assistant's system prompt. * **`WithRAGRespondentPromptModifier`**: Uses a retriever to fetch relevant chunks based on the user's generated question (simulating a real RAG pipeline). ## Complete Example Here is a full example showing how to generate a dataset for a technical support bot. ```python import asyncio import os from afterimage import ( ConversationGenerator, ContextualInstructionGeneratorCallback, InMemoryDocumentProvider, WithContextRespondentPromptModifier ) async def main(): api_key = os.getenv("GEMINI_API_KEY") # 1. Your Knowledge Base docs = InMemoryDocumentProvider([ "Error 503 means the service is unavailable. Retry after 5 minutes.", "To reset your password, click 'Forgot Password' on the login screen.", ]) # 2. Configure User Behavior (Ask questions about the docs) # We pass this to the generator constructor instruction_gen = ContextualInstructionGeneratorCallback( api_key=api_key, documents=docs ) # 3. Configure Assistant Behavior (Have access to the docs) prompt_modifier = WithContextRespondentPromptModifier() # 4. Initialize Generator generator = ConversationGenerator( respondent_prompt="You are a Tier 1 Technical Support agent.", api_key=api_key, instruction_generator_callback=instruction_gen, respondent_prompt_modifier=prompt_modifier, auto_improve=True # Ensure high quality ) # 5. Run Generation print("Starting generation...") await generator.generate( num_dialogs=10, max_turns=3 ) print("Done. Conversation data saved to JSONL.") if __name__ == "__main__": asyncio.run(main()) ```