Quickstart Guide for Afterimage

afterimage is a powerful Python library for generating synthetic conversation datasets using Large Language Models (LLMs). It allows you to simulate multi-turn conversations between a “Correspondent” (User) and a “Respondent” (Assistant) to create high-quality training or evaluation datasets.

Installation

uv add afterimage

or:

pip install afterimage

Optional extras: embeddings-local (SentenceTransformer for local/process embeddings, Qdrant retriever by model name, quality checks), server (FastAPI server), training (Torch/TRL fine-tuning plus Gradio and FastMCP for the demo UI). Example: pip install "afterimage[embeddings-local]@git+https://github.com/altaidevorg/afterimage.git".

Core Concepts

Generator: The core engine (ConversationGenerator) that orchestrates the conversation flow. It manages the LLM sessions for both the user and the assistant.
Correspondent: The simulated user who asks questions. Its behavior is driven by an Instruction Generator.
Respondent: The assistant who answers questions. Its behavior is defined by a System Prompt and optional Prompt Modifiers.
Document Provider: A source of knowledge (text files, JSONL, memory) used to ground the conversation or generate relevant questions.
Persona: A specific character or role that the Correspondent adopts to make conversations more diverse and realistic.

1. Basic Usage

The simplest way to use afterimage is to define a system prompt for the assistant (Respondent) and let the library automatically generate a persona for the user (Correspondent).

import asyncio
import os
from afterimage import ConversationGenerator

# Ensure your API key is set
api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Define the Assistant's Persona (Respondent)
    respondent_prompt = """
    You are a helpful and polite customer support agent for a tech company.
    You answer questions about laptops, smartphones, and accessories.
    Be concise and professional.
    """

    # 2. Initialize the Generator
    # If you don't provide a correspondent_prompt, one is auto-generated based on the respondent_prompt.
    # We encourage to give a try to the auto-generated correspondent prompt first,
    # and then you can try to provide your own correspondent prompt.
    generator = ConversationGenerator(
        respondent_prompt=respondent_prompt,
        api_key=api_key,
        model_name="gemini-2.5-flash",  # Library default when model_name is omitted
    )

    # 3. Generate Conversations
    print("Generating conversations...")
    await generator.generate(
        num_dialogs=3,      # Number of separate conversations to generate
        max_turns=1,      # Upper bound; each dialog samples a turn count uniformly from 1..max_turns
        max_concurrency=2   # Number of parallel generations
    )
    
    # 4. Access Generated Data (saved to JSONL by default)
    # You can also access the storage directly if needed, but default storage saves to disk.
    print("Done! Check the generated .jsonl file in your directory.")

if __name__ == "__main__":
    asyncio.run(main())

2. Context-Aware Generation (RAG-like)

To generate high-quality domain-specific datasets, you often want the “User” to ask questions based on specific documents. You can achieve this using ContextualInstructionGeneratorCallback.

import asyncio
import os
from afterimage import (
    ConversationGenerator,
    ContextualInstructionGeneratorCallback,
    InMemoryDocumentProvider,
    WithContextRespondentPromptModifier
)

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Provide Context Documents
    # You can use InMemoryDocumentProvider, JSONLDocumentProvider, or DirectoryDocumentProvider
    documents = InMemoryDocumentProvider([
        "The 'Afterimage' library is used for synthetic data generation.",
        "It supports both synchronous and asynchronous generation modes.",
        "Key components include Generators, Callbacks, and Document Providers."
    ])

    # 2. Setup Instruction Generator (The User's Brain)
    # This callback picks random documents and instructs the User to ask questions about them.
    instruction_callback = ContextualInstructionGeneratorCallback(
        api_key=api_key,
        documents=documents,
        num_random_contexts=1,  # How many docs to pick per conversation
        n_instructions=3        # How many questions to brainstorm internally
    )

    # 3. Setup Respondent Modifier (The Assistant's Context)
    # This injects the SAME context the user sees into the assistant's system prompt,
    # ensuring the assistant has the knowledge to answer correctly.
    prompt_modifier = WithContextRespondentPromptModifier()

    # 4. Initialize Generator
    generator = ConversationGenerator(
        respondent_prompt="You are an expert on the Afterimage library. Answer questions based on the provided context.",
        api_key=api_key,
        instruction_generator_callback=instruction_callback,
        respondent_prompt_modifier=prompt_modifier
    )

    # 5. Generate
    await generator.generate(num_dialogs=5, max_turns=2)

if __name__ == "__main__":
    asyncio.run(main())

3. Persona-Based Generation

To add variety, you can generate specific “Personas” for your documents (e.g., “A confused beginner”, “A skeptical expert”) and have the User adopt these personas.

import asyncio
import os
from afterimage import (
    ConversationGenerator,
    PersonaInstructionGeneratorCallback,
    PersonaGenerator,
    InMemoryDocumentProvider
)

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Setup Documents
    texts = [
        "Espresso is brewed by forcing hot water under pressure through finely-ground coffee beans.",
        "Cold brew is made by steeping coarse grounds in cold water for 12-24 hours."
    ]
    documents = InMemoryDocumentProvider(texts)

    # 2. Generate Personas for Documents
    # This step analyzes the documents and creates suitable user personas (e.g., "Coffee Enthusiast", "Barista Student")
    print("Generating personas...")
    persona_gen = PersonaGenerator(api_key=api_key)
    await persona_gen.generate_from_documents(documents)

    # 3. Setup Persona-Aware Instruction Generator
    # This callback will now select a random persona along with the document
    instruction_callback = PersonaInstructionGeneratorCallback(
        api_key=api_key,
        documents=documents,
        num_random_contexts=1
    )

    # 4. Initialize Generator
    generator = ConversationGenerator(
        respondent_prompt="You are a professional barista. Answer questions about coffee brewing methods.",
        api_key=api_key,
        instruction_generator_callback=instruction_callback
    )

    # 5. Generate
    await generator.generate(num_dialogs=4, max_turns=2)

if __name__ == "__main__":
    asyncio.run(main())