Quickstart Guide for Afterimage
afterimage is a powerful Python library for generating synthetic conversation datasets using Large Language Models (LLMs). It allows you to simulate multi-turn conversations between a “Correspondent” (User) and a “Respondent” (Assistant) to create high-quality training or evaluation datasets.
Installation
pip install git+https://github.com/altaidevorg/afterimage.git
Optional extras: embeddings-local (SentenceTransformer for local/process embeddings, Qdrant retriever by model name, quality checks), server (FastAPI server), training (demo UI fine-tuning scripts). Example: pip install "afterimage[embeddings-local]@git+https://github.com/altaidevorg/afterimage.git".
Core Concepts
Generator: The core engine (
ConversationGenerator) that orchestrates the conversation flow. It manages the LLM sessions for both the user and the assistant.Correspondent: The simulated user who asks questions. Its behavior is driven by an Instruction Generator.
Respondent: The assistant who answers questions. Its behavior is defined by a System Prompt and optional Prompt Modifiers.
Document Provider: A source of knowledge (text files, JSONL, memory) used to ground the conversation or generate relevant questions.
Persona: A specific character or role that the Correspondent adopts to make conversations more diverse and realistic.
1. Basic Usage
The simplest way to use afterimage is to define a system prompt for the assistant (Respondent) and let the library automatically generate a persona for the user (Correspondent).
import asyncio
import os
from afterimage import ConversationGenerator
# Ensure your API key is set
api_key = os.getenv("GEMINI_API_KEY")
async def main():
# 1. Define the Assistant's Persona (Respondent)
respondent_prompt = """
You are a helpful and polite customer support agent for a tech company.
You answer questions about laptops, smartphones, and accessories.
Be concise and professional.
"""
# 2. Initialize the Generator
# If you don't provide a correspondent_prompt, one is auto-generated based on the respondent_prompt.
# We encourage to give a try to the auto-generated correspondent prompt first,
# and then you can try to provide your own correspondent prompt.
generator = ConversationGenerator(
respondent_prompt=respondent_prompt,
api_key=api_key,
model_name="gemini-2.0-flash", # Default model
)
# 3. Generate Conversations
print("Generating conversations...")
await generator.generate(
num_dialogs=3, # Number of separate conversations to generate
max_turns=1, # Maximum number of turns (exchange pairs) per conversation. 1 is enough for most cases.
max_concurrency=2 # Number of parallel generations
)
# 4. Access Generated Data (saved to JSONL by default)
# You can also access the storage directly if needed, but default storage saves to disk.
print("Done! Check the generated .jsonl file in your directory.")
if __name__ == "__main__":
asyncio.run(main())
2. Context-Aware Generation (RAG-like)
To generate high-quality domain-specific datasets, you often want the “User” to ask questions based on specific documents. You can achieve this using ContextualInstructionGeneratorCallback.
import asyncio
import os
from afterimage import (
ConversationGenerator,
ContextualInstructionGeneratorCallback,
InMemoryDocumentProvider,
WithContextRespondentPromptModifier
)
api_key = os.getenv("GEMINI_API_KEY")
async def main():
# 1. Provide Context Documents
# You can use InMemoryDocumentProvider, JSONLDocumentProvider, or DirectoryDocumentProvider
documents = InMemoryDocumentProvider([
"The 'Afterimage' library is used for synthetic data generation.",
"It supports both synchronous and asynchronous generation modes.",
"Key components include Generators, Callbacks, and Document Providers."
])
# 2. Setup Instruction Generator (The User's Brain)
# This callback picks random documents and instructs the User to ask questions about them.
instruction_callback = ContextualInstructionGeneratorCallback(
api_key=api_key,
documents=documents,
num_random_contexts=1, # How many docs to pick per conversation
n_instructions=3 # How many questions to brainstorm internally
)
# 3. Setup Respondent Modifier (The Assistant's Context)
# This injects the SAME context the user sees into the assistant's system prompt,
# ensuring the assistant has the knowledge to answer correctly.
prompt_modifier = WithContextRespondentPromptModifier()
# 4. Initialize Generator
generator = ConversationGenerator(
respondent_prompt="You are an expert on the Afterimage library. Answer questions based on the provided context.",
api_key=api_key,
instruction_generator_callback=instruction_callback,
respondent_prompt_modifier=prompt_modifier
)
# 5. Generate
await generator.generate(num_dialogs=5, max_turns=2)
if __name__ == "__main__":
asyncio.run(main())
3. Persona-Based Generation
To add variety, you can generate specific “Personas” for your documents (e.g., “A confused beginner”, “A skeptical expert”) and have the User adopt these personas.
import asyncio
import os
from afterimage import (
ConversationGenerator,
PersonaInstructionGeneratorCallback,
PersonaGenerator,
InMemoryDocumentProvider
)
api_key = os.getenv("GEMINI_API_KEY")
async def main():
# 1. Setup Documents
texts = [
"Espresso is brewed by forcing hot water under pressure through finely-ground coffee beans.",
"Cold brew is made by steeping coarse grounds in cold water for 12-24 hours."
]
documents = InMemoryDocumentProvider(texts)
# 2. Generate Personas for Documents
# This step analyzes the documents and creates suitable user personas (e.g., "Coffee Enthusiast", "Barista Student")
print("Generating personas...")
persona_gen = PersonaGenerator(api_key=api_key)
await persona_gen.generate_from_documents(documents)
# 3. Setup Persona-Aware Instruction Generator
# This callback will now select a random persona along with the document
instruction_callback = PersonaInstructionGeneratorCallback(
api_key=api_key,
documents=documents,
num_random_contexts=1
)
# 4. Initialize Generator
generator = ConversationGenerator(
respondent_prompt="You are a professional barista. Answer questions about coffee brewing methods.",
api_key=api_key,
instruction_generator_callback=instruction_callback
)
# 5. Generate
await generator.generate(num_dialogs=4, max_turns=2)
if __name__ == "__main__":
asyncio.run(main())