# Quickstart Guide for Afterimage

`afterimage` is a powerful Python library for generating synthetic conversation datasets using Large Language Models (LLMs). It allows you to simulate multi-turn conversations between a "Correspondent" (User) and a "Respondent" (Assistant) to create high-quality training or evaluation datasets.

## Installation

```bash
pip install git+https://github.com/altaidevorg/afterimage.git
```

Optional extras: `embeddings-local` (SentenceTransformer for local/process embeddings, Qdrant retriever by model name, quality checks), `server` (FastAPI server), `training` (demo UI fine-tuning scripts). Example: `pip install "afterimage[embeddings-local]@git+https://github.com/altaidevorg/afterimage.git"`.

## Core Concepts

*   **Generator**: The core engine (`ConversationGenerator`) that orchestrates the conversation flow. It manages the LLM sessions for both the user and the assistant.
*   **Correspondent**: The simulated user who asks questions. Its behavior is driven by an **Instruction Generator**.
*   **Respondent**: The assistant who answers questions. Its behavior is defined by a **System Prompt** and optional **Prompt Modifiers**.
*   **Document Provider**: A source of knowledge (text files, JSONL, memory) used to ground the conversation or generate relevant questions.
*   **Persona**: A specific character or role that the Correspondent adopts to make conversations more diverse and realistic.

---

## 1. Basic Usage

The simplest way to use `afterimage` is to define a system prompt for the assistant (Respondent) and let the library automatically generate a persona for the user (Correspondent).

```python
import asyncio
import os
from afterimage import ConversationGenerator

# Ensure your API key is set
api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Define the Assistant's Persona (Respondent)
    respondent_prompt = """
    You are a helpful and polite customer support agent for a tech company.
    You answer questions about laptops, smartphones, and accessories.
    Be concise and professional.
    """

    # 2. Initialize the Generator
    # If you don't provide a correspondent_prompt, one is auto-generated based on the respondent_prompt.
    # We encourage to give a try to the auto-generated correspondent prompt first,
    # and then you can try to provide your own correspondent prompt.
    generator = ConversationGenerator(
        respondent_prompt=respondent_prompt,
        api_key=api_key,
        model_name="gemini-2.0-flash",  # Default model
    )

    # 3. Generate Conversations
    print("Generating conversations...")
    await generator.generate(
        num_dialogs=3,      # Number of separate conversations to generate
        max_turns=1,        # Maximum number of turns (exchange pairs) per conversation. 1 is enough for most cases.
        max_concurrency=2   # Number of parallel generations
    )
    
    # 4. Access Generated Data (saved to JSONL by default)
    # You can also access the storage directly if needed, but default storage saves to disk.
    print("Done! Check the generated .jsonl file in your directory.")

if __name__ == "__main__":
    asyncio.run(main())
```

## 2. Context-Aware Generation (RAG-like)

To generate high-quality domain-specific datasets, you often want the "User" to ask questions based on specific documents. You can achieve this using `ContextualInstructionGeneratorCallback`.

```python
import asyncio
import os
from afterimage import (
    ConversationGenerator,
    ContextualInstructionGeneratorCallback,
    InMemoryDocumentProvider,
    WithContextRespondentPromptModifier
)

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Provide Context Documents
    # You can use InMemoryDocumentProvider, JSONLDocumentProvider, or DirectoryDocumentProvider
    documents = InMemoryDocumentProvider([
        "The 'Afterimage' library is used for synthetic data generation.",
        "It supports both synchronous and asynchronous generation modes.",
        "Key components include Generators, Callbacks, and Document Providers."
    ])

    # 2. Setup Instruction Generator (The User's Brain)
    # This callback picks random documents and instructs the User to ask questions about them.
    instruction_callback = ContextualInstructionGeneratorCallback(
        api_key=api_key,
        documents=documents,
        num_random_contexts=1,  # How many docs to pick per conversation
        n_instructions=3        # How many questions to brainstorm internally
    )

    # 3. Setup Respondent Modifier (The Assistant's Context)
    # This injects the SAME context the user sees into the assistant's system prompt,
    # ensuring the assistant has the knowledge to answer correctly.
    prompt_modifier = WithContextRespondentPromptModifier()

    # 4. Initialize Generator
    generator = ConversationGenerator(
        respondent_prompt="You are an expert on the Afterimage library. Answer questions based on the provided context.",
        api_key=api_key,
        instruction_generator_callback=instruction_callback,
        respondent_prompt_modifier=prompt_modifier
    )

    # 5. Generate
    await generator.generate(num_dialogs=5, max_turns=2)

if __name__ == "__main__":
    asyncio.run(main())
```

## 3. Persona-Based Generation

To add variety, you can generate specific "Personas" for your documents (e.g., "A confused beginner", "A skeptical expert") and have the User adopt these personas.

```python
import asyncio
import os
from afterimage import (
    ConversationGenerator,
    PersonaInstructionGeneratorCallback,
    PersonaGenerator,
    InMemoryDocumentProvider
)

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    # 1. Setup Documents
    texts = [
        "Espresso is brewed by forcing hot water under pressure through finely-ground coffee beans.",
        "Cold brew is made by steeping coarse grounds in cold water for 12-24 hours."
    ]
    documents = InMemoryDocumentProvider(texts)

    # 2. Generate Personas for Documents
    # This step analyzes the documents and creates suitable user personas (e.g., "Coffee Enthusiast", "Barista Student")
    print("Generating personas...")
    persona_gen = PersonaGenerator(api_key=api_key)
    await persona_gen.generate_from_documents(documents)

    # 3. Setup Persona-Aware Instruction Generator
    # This callback will now select a random persona along with the document
    instruction_callback = PersonaInstructionGeneratorCallback(
        api_key=api_key,
        documents=documents,
        num_random_contexts=1
    )

    # 4. Initialize Generator
    generator = ConversationGenerator(
        respondent_prompt="You are a professional barista. Answer questions about coffee brewing methods.",
        api_key=api_key,
        instruction_generator_callback=instruction_callback
    )

    # 5. Generate
    await generator.generate(num_dialogs=4, max_turns=2)

if __name__ == "__main__":
    asyncio.run(main())
```