# Chroma

> We've seen how tool calling and iterative searches over a Chroma collection can build context for an agent. While this works well for individual runs, agents start fresh each time—repeating expensive

---

# Agentic Memory

We've seen how tool calling and iterative searches over a Chroma collection can build context for an agent. While this works well for individual runs, agents start fresh each time—repeating expensive computations, re-learning user preferences, and rediscovering effective strategies they've already found.

Agentic memory solves this by persisting data from agent runs that can be leveraged in the future. This reduces cost on LLM interactions, personalizes user experience, and improves agent performance over time.

## Memory Records

Context engineering is both an art and a science. Your memory schema will ultimately depend on your application's needs. However, in practice, three categories lend themselves well to most use cases:

### Semantic Memory

**Facts** about users, processes, or domain knowledge that inform future interactions:
* User preferences: "Prefers concise responses"
* Context: "Works in marketing, needs quarterly reports"
* Domain facts: "Company fiscal year starts in April"

Storing facts eliminates clarification steps. If a user mentioned they work in marketing last week, the agent shouldn't ask or search for this information again.

### Procedural Memory

Patterns and **instructions** that guide tool selection and execution:
* "If a user asks about sales data, query the sales_summary table first"
* "For date ranges, always confirm timezone before querying"
* "Use the PDF parser for files from the legal department"

Procedural memories help the agent learn how to accomplish tasks more effectively, and specifically how to choose the correct tools for each task.

### Episodic Memory

**Artifacts** and **results** from previous runs that can be reused or referenced:
* Successful query plans 
* Expensive computation results 
* Search results and their relevance scores 
* Previous tool call sequences that worked well

## Memory in an Agentic Harness

Agentic memory integrates naturally with the plan-execute-evaluate architecture we discussed in the [agentic search guide](./agentic-search).

During the planning phase, retrieve memories that will help the agent construct better plans, like examples of successful plans for similar queries and facts about the user or process.

During the execution phase, retrieve memories that guide tool usage:
* Procedural instructions for tool selection
* Parameter patterns that worked before
* Known edge cases to handle

During the evaluation phase, the agent examines the query plan and its execution, and can **write** new memories to persist:
* Did the plan succeed? What made it work?
* What new facts did we learn?
* Should we update existing procedural knowledge?

## Implementation

The best way to implement a memory store for an agent is simply to dedicate a Chroma collection for memory records. This gives us out-of-the-box search functionality that we can leverage - metadata filtering for types of memories, advanced search over the store, and versioning with collection forking.

We can establish a simple interface for interacting with this Chroma collection:

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
from abc import ABC, abstractmethod

class Memory(ABC):
    # Retrieve memories for each phase of the agent harness
    
    @abstractmethod
    async def for_planning(self, query: str) -> list[MemoryRecord]:
        pass
    
    @abstractmethod
    async def for_execution(self, context: Context) -> list[MemoryRecord]:
        pass
    
    @abstractmethod 
    async def for_evaluation(self, context: Context) -> list[MemoryRecord]:
        pass
        
    # Extract and store new memories
    
    @abstractmethod
    async def extract_from_run(self, context: Context) -> None:
        pass
    
    # Expose memory as agent tools
    
    def get_tools(self) -> list[Tool]:
        pass
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
interface Memory {
    // Retrieve memories for each phase
    forPlanning(query: string): Promise<MemoryRecord[]>
    forExecution(context: Context): Promise<MemoryRecord[]>
    forEvaluation(context: Context): Promise<MemoryRecord[]>

    // Extract and store new memories
    extractFromRun(context: Context): Promise<void>

    // Expose memory as agent tools
    getTools(): Tool[]
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

With `MemoryRecord`s:

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
from dataclasses import dataclass
from datetime import datetime
from typing import Literal

@dataclass
class MemoryRecord:
    id: str
    content: str
    type: Literal["semantic", "procedural", "episodic"]
    phase: Literal["planning", "execution", "evaluation"]
    created: datetime
    last_accessed: datetime
    access_count: int
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
interface MemoryRecord {
    id: string
    content: string
    type: 'semantic' | 'procedural' | 'episodic'
    phase: 'planning' | 'execution' | 'evaluation'
    created: Date
    lastAccessed: Date
    accessCount: number
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

Then we can write the methods for retrieving memories for different phases of our agent harness. For example, in the planning phase, we get a user query. We can search our memory collection against it, and add the results to the planner's prompts. We limit the search to semantic memory records (facts), or episodic records (artifacts) that pertain to the planning phase, like successful previous plans for similar queries.

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
async def for_planning(self, query: str) -> list[MemoryRecord]:
    records = self.collection.query(
        query_texts=[query],
        where={
            "$or": [
                {"type": "semantic"},
                {"type": "episodic", "phase": "planning"}
            ]
        },
        n_results=5
    )
    
    return [
        MemoryRecord(
            id=id,
            content=records["documents"][0][i],
            type=records["metadatas"][0][i]["type"],
            phase=records["metadatas"][0][i]["phase"],
            created=datetime.fromisoformat(records["metadatas"][0][i]["created"]),
            last_accessed=datetime.fromisoformat(records["metadatas"][0][i]["last_accessed"]),
            access_count=int(records["metadatas"][0][i]["access_count"]),
        )
        for i, id in records["ids"][0]
    ]
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
async forPlanning(query: string): Promise<MemoryRecord[]> {
    const records = await this.collection.query({
        queryTexts: [query],
        where: {
            $or: [
                { type: 'semantic' },
                { type: 'episodic', phase: 'planning' }
            ]
        },
        nResults: 5
    });
    
    return records.rows()[0].map((record) => ({
        id: record.id,
        content: record.document,
        type: record.metadata.type,
        phase: record.metadata.phase,
        created: new Date(record.metadata.created),
        lastAccessed: new Date(record.metadata.lastAccessed),
        accessCount: record.metadata.accessCount
    }));
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

## Memory Writing Strategies

How you write memories should be guided by how the agent will access them. A well-designed writing strategy ensures memories remain useful, accurate, and retrievable over time.

### Extraction Timing

**End-of-run** extraction processes the entire conversation after completion. This gives full context for deciding what's worth remembering, but delays availability until the run finishes.

**Real-time** extraction writes memories as the conversation progresses. This makes memories immediately available for the current run, but risks storing information that later turns out to be incorrect or irrelevant.

**Async** extraction queues memory writing as a background job. This keeps the agent responsive but introduces complexity around consistency—the agent might not have access to memories from very recent runs.

In practice, a hybrid approach often works best: extract high-confidence facts in real-time, and defer nuanced evaluation to end-of-run processing. You can also save memories identified in one step in the agent's context, so they are available for downstream or long-running parallel steps.

### Selectivity

Not everything is worth remembering. Storing too much creates noise that degrades retrieval quality. Consider:
* Signal strength: How confident is the agent that this information is correct? User-stated facts ("I work in marketing") are higher signal than inferences ("they seem to prefer detailed responses").
* Reuse potential: Will this information be useful in future runs? A user's timezone is broadly applicable; the specific query they ran last Tuesday probably isn't.
* Redundancy: Does this duplicate existing memories? Adding "user works in marketing" when you already have "user is a marketing manager" creates clutter without value.

* A useful heuristic: if the agent would need to ask about this information again in a future run, it's worth storing.

### Classification

Tag memories at write time to enable filtered retrieval. Key dimensions include:
* **Type**: Is this a fact (semantic), an instruction (procedural), or a past result (episodic)?
* **Phase relevance**: When should this memory surface—during planning, execution, or evaluation?
* **Scope**: Is this user-specific, or does it apply globally across all users?
* **Confidence**: How certain is the agent about this memory's accuracy?
* **Source**: Did this come from the user directly, from a tool result, or from agent inference?

Classification decisions made at write time shape retrieval quality. It's easier to filter by metadata than to rely solely on semantic similarity.

### Conflicts

New information sometimes contradicts existing memories. Your strategy might:
* **Override**: Replace the old memory with new information. Simple, but loses historical context.
* **Version**: Keep both memories with timestamps, surfacing the most recent.
* **Merge**: Combine old and new into a single updated memory. Requires careful prompting to avoid losing important nuance.
* **Flag for review**: Mark conflicting memories for human review before resolution.
* **Fork**: Taking advantage of Chroma's [collection forking](../../cloud/features/collection-forking), create a branch of the memory collection with the new information, keeping the original intact. This is particularly useful when you're uncertain which version will perform better — so you can run both branches and measure outcomes. Forking also enables rollback if new memories degrade agent performance, and can support A/B testing different memory strategies across user segments.

The right approach depends on your domain. User preferences might safely override ("actually, I prefer concise responses now"), while factual corrections might warrant versioning for auditability.

### Decay and Relevance

Memories don't stay useful forever. Consider tracking:
* **Access patterns**: Memories that are frequently retrieved are proving their value. Memories never accessed may be candidates for removal.
* **Recency**: Recently created or accessed memories are more likely to be relevant than stale ones.
* **Time-sensitivity**: Some memories have natural expiration. "User is preparing for Q3 review" becomes irrelevant after Q3 ends.

## Example: An Inbox Processing Agent

In the [Chroma Cookbooks](https://github.com/chroma-core/chroma-cookbooks/tree/master/agentic-memory) repo, we feature a simple example using agentic memory. The project includes an inbox-processing agent, which fetches unread emails from a user's inbox and processes each one by user-defined rules. If the agent does not know how to process a given email, it will prompt the user for instructions. These instructions are then extracted from the run to be persisted in the agent's memory collection as procedural memory records, which can be used in future runs.

The project is accompanied by a dataset of mock emails on Chroma Cloud. You can mark an "email" as "unread" by setting a record's `unread` metadata field to `true`.

The project includes an `InboxService` interface, which includes the actions the agent can take on a user's inbox. It includes an implementation for interacting with the mock dataset on Chroma Cloud. You can extend the functionality of the agent by providing your own implementation for a real email provider.

The project uses the same generic agentic harness we introduced for the [agentic search](./agentic-search) project. This time, the harness is configured with:
* A planner that simply fetches unread emails, and creates a plan step for processing each one.
* Data shapes and prompts to support the inbox-processing functionality.
* An input-handler to get email-processing instructions from the user.
* A memory implementation that exposes search tools over the memory collection, and memory extraction logic for persisting user-defined rules.


{% Steps %}

{% Step %}
[Log in](https://trychroma.com/login) to your Chroma Cloud account. If you don't have one yet, you can [sign up](https://trychroma.com/signup). You will get free credits that should be more than enough for running this project.
{% /Step %}

{% Step %}
Use the "Create Database" button on the top right of the Chroma Cloud dashboard, and name your DB `agentic-memory` (or any name of your choice). If you're a first-time user, you will be greeted with the "Create Database" modal after creating your account.
{% /Step %}

{% Step %}
Choose the "Load sample dataset" option, and then choose the "Personal Inbox" dataset. This will copy the data into a collection in your own Chroma DB.
{% /Step %}

{% Step %}
Once your collection loads, choose the "Settings" tab. At the bottom of the page, choose the `.env` tab. Create an API key, and copy the environment variables you will need for running the project: `CHROMA_API_KEY`, `CHROMA_TENANT`, and `CHROMA_DATABASE`.
{% /Step %}

{% Step %}
Clone the [Chroma Cookbooks](https://github.com/chroma-core/chroma-cookbooks) repo:

```terminal
git clone https://github.com/chroma-core/chroma-cookbooks.git
```

{% /Step %}

{% Step %}
Navigate to the `agentic-memory` directory, and create a `.env` file at its root with the values you obtained in the previous step:

```terminal
cd chroma-cookbooks/agentic-memory
touch .env
```

{% /Step %}

{% Step %}
To run this project, you will also need an [OpenAI API key](https://platform.openai.com/api-keys). Set it in your `.env` file:

```text
CHROMA_API_KEY=<YOUR CHROMA API KEY>
CHROMA_TENANT=<YOUR CHROMA TENANT>
CHROMA_DATABASE=agentic-memory
OPENAI_API_KEY=<YOUR OPENAI API KEY>
```

{% /Step %}

{% Step %}

This project uses [pnpm](https://pnpm.io/installation) workspaces. In the root directory, install the dependencies:

```terminal
pnpm install
```

{% /Step %}

{% /Steps %}

The project includes a CLI interface that lets you interact with the inbox-processing agent. You can run it in development mode to get started. From the root directory you can run

```terminal
pnpm cli:dev
```

The dataset is configured with two unread emails. Let the agent process them by providing rules. For example:
* Archive all GitHub notifications
* Label all emails from dad with the "family" label.

Then, go to your Chroma Cloud collection and see the results on the processed records. You will also be able to see the memory collection created by the agent, with the extracted rules from the first run. Set more similar emails as unread, and run the agent again to see agentic memory in action.

---

# Agentic Search

{% Video link="https://www.youtube.com/embed/_VAPVcdow-Q" title="Framework-less Agentic Search" /%}

We've seen how retrieval enables LLMs to answer questions over private data and maintain state for AI applications. While this approach works well for simple lookups, it falls short in most real-world scenarios.

Consider building an internal chatbot for a business where a user asks: 

> What were the key factors behind our Q3 sales growth, and how do they compare to industry trends?

Suppose you have Chroma collections storing quarterly reports, sales data, and industry research papers. A simple retrieval approach might query the sales-data collection—or even all collections at once—retrieve the top results, and pass them to an LLM for answer generation.

However, this single-step retrieval strategy has critical limitations:
* **It can't decompose complex questions** - This query contains multiple sub-questions: internal growth factors, external industry trends, and comparative analysis. The information needed may be scattered across different collections and semantically dissimilar documents.
* **It can't adapt its search strategy** - If the first retrieval returns insufficient context about industry trends, there's no mechanism to refine the query and search again with a different approach.
* **It can't handle ambiguous terms** - "Q3" could refer to different years across your collections, while "sales growth" might mean unit sales, revenue, or profit margins. A single query has no way to disambiguate and search accordingly.

**Agentic search** addresses these limitations by enabling your AI application to use retrieval intelligently - planning, reasoning, and iterating much like a human researcher. At its core, an agentic search system uses an LLM to break down a user query and iteratively search for information needed to generate an answer. The system:
1. **Plans** - Breaks down complex queries into a sequence of retrieval steps
2. **Executes** - Performs targeted searches across Chroma collections or using other tools
3. **Evaluates** - Assesses whether the retrieved information answers the question or identifies gaps
4. **Iterates** - Refines the plan and repeats steps 2-3 based on what it has learned so far
5. **Synthesizes** - Combines information from multiple retrievals to form a comprehensive answer

Instead of executing a single query and hoping for the best, an agentic approach allows the agent to break down complex questions into manageable sub-queries, execute multiple retrievals across different Chroma collections, evaluate whether the retrieved information is sufficient, and refine its search strategy based on what it discovers.

For example, an agentic search system might handle our example question above as follows:

{% CollapsibleCodeBlock %}
```terminal
[PLAN] Analyzing query: "What were the key factors behind our Q3 sales growth, 
       and how do they compare to industry trends?"

[PLAN] Identified information needs:
       1. Internal Q3 sales performance metrics
       2. Factors driving the growth
       3. Industry benchmark data for comparison

━━━━━━━━━━━━━━━━━━━━━ ITERATION 1 ━━━━━━━━━━━━━━━━━━━━━

[EXECUTE] Querying collection: sales-data
          Query: "Q3 2024 sales growth revenue"
          n_results: 5

[RETRIEVED] Found 5 chunks
            - Q3 revenue up 15% YoY
            - Enterprise segment primary growth driver
            - Consumer segment flat

[EVALUATE] ✓ Growth metrics obtained (15%)
           ✗ Root causes unclear - need more detail on drivers
           ✗ Industry comparison data missing

━━━━━━━━━━━━━━━━━━━━━ ITERATION 2 ━━━━━━━━━━━━━━━━━━━━━

[EXECUTE] Querying collection: sales-data
          Query: "Q3 2024 enterprise growth factors drivers"
          n_results: 5

[RETRIEVED] Found 5 chunks
            - New AI-powered analytics features launched in Q3
            - Sales team expanded by 20%
            - Three major enterprise deals closed

[EVALUATE] ✓ Internal factors identified
           ✗ Still missing industry benchmarks

━━━━━━━━━━━━━━━━━━━━━ ITERATION 3 ━━━━━━━━━━━━━━━━━━━━━

[EXECUTE] Querying collection: industry-research
          Query: "Q3 2024 software industry revenue growth benchmarks"
          n_results: 3

[RETRIEVED] Found 3 chunks
            - Industry average: 8% growth in Q3 2024
            - Market conditions: moderate growth environment
            - Top performers: 12-18% growth range

[EVALUATE] ✓ All information requirements satisfied
           ✓ Ready to synthesize answer

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[SYNTHESIZE] Combining findings from 3 retrievals across 2 collections...

[ANSWER] Our 15% Q3 growth significantly outperformed the 8% industry average,
         placing us in the top performer category. This was driven by our AI 
         analytics feature launch and 20% sales team expansion, which enabled 
         us to close three major enterprise deals during the quarter.
```
{% /CollapsibleCodeBlock %}

Agentic search is the technique that powers most production AI applications.

* Legal assistants search across case law databases, statutes, regulatory documents, and internal firm precedents.
* Medical AI systems query across clinical guides, research papers, patient records, and drug databases to support medical reasoning.
* Customer support AI agents navigate product documentation, past ticket resolutions, and company knowledge bases, while dynamically adjusting their search based on specific use cases.
* Coding assistants search across documentation, code repositories, and issue trackers to help developers solve problems.

The common thread across all these systems is that they don't rely on a single retrieval step, but instead use agentic search to orchestrate multiple searches, evaluate results, and iteratively gather the information needed to provide accurate and comprehensive answers.

In more technical terms, an agentic search system implements several key capabilities:
* **Query Planning** - using the LLM to analyze the user's question and generate a structured plan, breaking the input query down to sub-queries that can be addressed step-by-step.
* **Tool Use** - the agent has access to a suite of tools - such as querying Chroma collections, searching the internet, and using other APIs. For each step of the query plan, we ask an LLM to repeatedly call tools to gather information for the current step.
* **Reflection and Evaluation** - at each step, we use an LLM to evaluate the retrieved results, determining if they're sufficient, relevant, or if we need to revise the rest of our plan.
* **State Management and Memory** - the agent maintains context across all steps, tracking retrieved information, remaining sub-queries, and intermediate findings that inform subsequent retrieval decisions.

## BrowseComp-Plus

In this guide we will build a Search Agent from scratch. Our agent will be
able to answer queries from the [BrowseComp-Plus](https://github.com/texttron/BrowseComp-Plus/tree/main) dataset, which is
based on OpenAI's [BrowseComp](https://openai.com/index/browsecomp/) benchmark. The dataset contains
challenging questions that need multiple rounds of searching and reasoning
to answer correctly.

This makes it ideal for demonstrating how to build an agentic search system and
how tuning each of its components (retrieval, reasoning, model selection, and more) affects
overall performance.

Every query in the BrowseComp-Plus dataset has
* Gold docs - that are needed to compile the final correct answer for the query.
* Evidence docs - are needed to answer the query but may not directly contain the final answer themselves. They provide supporting information required for reasoning through the problem. The gold docs are a subset of the evidence docs.
* Negative docs - are included to deliberately make answering the query more difficult. They are introduced to distract the agent, and force it to distinguish between relevant and irrelevant information.

For example, here is query `770`:

```terminal
Could you provide the name of the individual who:
- As of December 2023, the individual was the coordinator of a research group founded in 2009.  
- Co-edited a book published in 2018 by Routledge.  
- The individual with whom they co-edited the book was a keynote speaker at a conference in 2019. 
- Served as the convenor of a panel before 2020. 
- Published an article in 2012. 
- Completed their PhD on the writings of an English writer.
```

And the evidence documents in the dataset needed for answering this question:

{% TabbedUseCaseCodeBlock language="terminal" scrollable=true %}

{% Tab label="6753" %}
```terminal
---
title: Laura Lojo-Rodríguez
date: 2015-05-01
---
Dr. Laura Lojo-Rodriguez is currently the supervisor of the research group "Discourse and Identity," funded by the Galician Regional Government for the period 2014–2018.
Lojo-Rodríguez is Senior Lecturer in English Literature at the Department of English Studies of University of Santiago de Compostela, Spain, where she teaches Literature(s) in English, Literary Theory, and Gender Studies. She is also convenor of the Short Story Panel of the Spanish Association of English and American Studies (AEDEAN).
Research interests: Contemporary British fiction; short story; critical theory; comparative literature.
Publications
2018. "Magic Realism and Experimental Fiction: From Virginia Woolf to Jeanette Winterson", in Anne Fernald, ed. The Oxford Handbook of Virginia Woolf. Oxford: Oxford University Press. Forthcoming.
2018. '"Thought in American and for the Americans": Victoria Ocampo, Sur and European Modernism', in Falcato A., Cardiello A. eds. The Condition of Modernism. Cham: Palgrave Macmillan, 2018, 167-190.
2017. "Tourism and Identitary Conflicts in Monica Ali's Alentejo Blue". Miscelánea: A Journal of English and American Studies. vol. 56(2017): 73-90 201.
2017. "Writing to Historicize and Contextualize: The Example of Virginia Woolf". The Discipline, Ethics, and Art of Writing about Literature. Ed. Kirilka Stavreva. Gale-Cengage, Gale Researcher British Literature. 2017. Online.
2016. "Virginia Woolf in Spanish-Speaking Countries". The Blackwell Companion to Virginia Woolf. Ed. Jessica Berman. Oxford: Wiley-Blackwell, 2016. 46-480.
2015. "La poética del cuento en la primera mitad del siglo XX en Reino Unido: Virgina Woolf y Elizabeth Bowen". Fragmentos de realidad: Los autores y las poéticas del cuento en lengua inglesa. Ed. Santiago Rodríguez Guerrero-Strachan. Valladolid: Servicio de publicaciones de la Universidad de Valladolid, pp. 111-125.
2014. "Unveiling the Past: Éilís Ní Dhuibhne's 'Sex in the Context of Ireland'". Nordic Irish Studies 13.2 (2014): 19–30.
2014. "'The Saving Power of Hallucination': Elizabeth Bowen's "Mysterious Kôr" and Female Romance". Zeitschrift für Anglistik und Amerikanistik 62.4 (2014): 273–289.
2013. "Exilio, historia, e a visión feminina: Éilís Ní Dhuibhne" in Felipe Andrés Aliaga Sáez, ed., Cultura y migraciones: Enfoques multidisciplinarios. Santiago de Compostela: Servicio de publicaciones de la Universidad, 2013, 178–183.
2012. (ed.). Moving across a Century: Women's Short Fiction from Virginia Woolf to Ali Smith. Bern: Peter Lang, 2012.
2012. "Recovering the Maternal Body as Paradise: Michèle Roberts's 'Charity'". Atlantis: A Journal of the Spanish Association of Anglo-American Studies 34.2 (Dec 2012): 33–47.
2011. (with Jorge Sacido-Romero) "Through the Eye of a Postmodernist Child: Ian McEwan's 'Homemade'". Miscelánea: A Journal of English and American Studies 44 (2011): 107–120.
2011. "Voices from the Margins: Éilís Ní Dhuibhne's Female Perspective in The Pale Gold of Alaska and Other Stories". Nordic Irish Studies 10 (2011): 35–40.
2011-2012. "Joyce's Long Shadow: Éilís Ní Dhuibhne's Short Fiction". Papers on Joyce 17.18 (2011-2012): 159–178.
2010. (with Manuela Palacios and Mª Xesús Nogueira). Creation, Publishing, and Criticism: The Advance of Women's Writing. Bern: Peter Lang, 2010.
2009. "The Poetics of Motherhood in Contemporary Irish Women's Verse" in Manuela Palacios and Laura Lojo-Rodríguez, eds., Writing Bonds: Irish and Galician Women Poets. Bern: Peter Lang, 2009, 123-142.
2009. "Making Sense of Wilderness: An Interview with Anne Le Marquand Hartigan" in Manuela Palacios and Laura Lojo-Rodríguez, eds., Writing Bonds: Irish and Galician Women Poets. Bern: Peter Lang, 2009, 195–204.
2008. "Virginia Woolf's Female History in 'The Journal of Mistress Joan Martyn'". Short Story 16.1 (2008): 73–86.
```
{% /Tab %}

{% Tab label="68484" %}
```terminal
---
title: ABOUT US
date: 2019-01-01
---
ABOUT US
DISCOURSE AND IDENTITY (D&I) is a Competitive Reference Research Group ((ED431C 2019/01, Xunta de Galicia) located in the Department of English and German Studies at the University of Santiago de Compostela (USC). Coordinated by Laura Lojo-Rodríguez, D&I is integrated into the following research networks:
	- English Language, Literature and Identity III (ED431D 2017/17)
- European Research Network for Short Fiction (ENSFR)
- Contrastive Linguistics: Constructional and Functional Approaches (FWO-Flanders)
Endowed with an interdisciplinary scope, D&I brings together researchers working in the fields of English Language, Literature and History-Culture. The group includes senior and junior scholars from the USC, support staff and external collaborators from other universities in Spain as well as from Simon Fraser University, University of Notre Dame, Brown University, University of Sussex, University College London or VU University Amsterdam. The research conducted by the members of the group is funded by the University of Santiago de Compostela, the Galician Regional Government (Xunta de Galicia), the Spanish Government as well as by various European entities.
D&I was founded in 2009 with a two-fold objective: to further interdisciplinary inquiry into the relationship between discourse and identity, and to foster high quality research through a successful partnership between Linguistics, Literature and Cultural Studies. The research conducted within the group looks into the relationship between discourse in its multiple manifestations (i.e. linguistic, literary, aesthetic, cultural, semiotic) and the configuration of gender, ethnic, class and cultural identities, taking into account the potential ideologies underlying the discourse-identity correlation. As foregrounded by such approaches as "Critical Discourse Analysis", "Social Semiotics" or "Cognitive Grammar", there exists an intimate relationship between:
-
"discourse" (< Lat dis-currere), understood as the semiotic (not simply linguistic) processes and systems that intervene in the production and interpretation of speech acts (Van Dijk 1985),
-
"identity" (< Lat idem-et-idem), referring both to individual and cultural identity in a given context, as well as to the synergies and antagonisms that might arise between them,
-
"ideology", a concept that we interpret as a systematic body of ideas organised according to a particular viewpoint,
Due to its complexity and broad scope, the critical analysis of the interaction between discourse-identity-ideology needs to be addressed from an interdisciplinary approach, which requires – and at the same time justifies – the collaboration of the different teams working within this research group, to which we should also add the incorporation of the epistemology provided by other disciplines such as psychology, sociology or semiotics. Indeed, the group fosters connections with scholars from other areas who share an interest in the study of discourse and/or identity. Additionally, group members also work in conjunction with a number of scientific and professional societies, scholarly journals, publishing houses and institutions.
LINKS
Collaborating RESEARCH NETWORKS
- Contrastive Linguistics: Constructional and Functional Approaches
- European Research Network for Short Fiction
Collaborating INSTITUTIONS
- AEDEAN (Asociación Española de Estudios Anglo-norteamericanos)
- Amergin. Instituto Universitario de Estudios Irlandeses
- Asociación Española James Joyce
- Asociación de Escritores en Lingua Galega
- Celga-ILTEC. Centro de Estudos de Linguística Geral e Aplicada da Universidade de Coimbra
- CIPPCE (Centro de Investigación de Procesos e Prácticas Culturais Emerxentes)
- Instituto Cervantes (Dublín)
- The Richard III Society
- SELICUP (Sociedad Española de Estudios Literarios de Cultura Popular)
- SITM (Société Internationale pour l'étude du théâtre médiéval)
D&I has organized various activities resulting from the interdisciplinary collaboration between different research teams, the various editions of the International Workshop on Discourse Analysis (2011, 2013, 2015, 2016) and the International Conference on 'The Discourse of Identity' (2012, 2016) being prominent examples in this respect. Both events have successively gathered together more than 300 recognized experts in the fields of English Linguistics, Literature and History-Culture, which turns D&I into a leading research group in discourse and identity studies. In addition to the organization of conferences, workshops and seminars, the group regularly hosts speakers from universities all over the world, thus contributing to the internationalization of our work and to forging new partnerships and collaborations. Research results have also been transferred through multiple publications in world-leading publishing houses and journals. This academic work has led the D&I Research Group to receive generous funding from a variety of entities. Since its foundation in 2009, group members have participated in more than 10 research projects funded by regional, national and international entities. Currently, the group receives funding from the Galician Regional Government (Xunta de Galicia) as a Competitive Reference Research Group. The group has also proved itself to have a strong teaching and training capacity. In the period since 2009, well over 50 theses have been completed and currently there are more than 20 Ph. D. dissertations in progress.
AWARDS
- Gómez González, María de los Ángeles. Premio 'Rafael Monroy' para investigadores experimentados, concedido pola Asociación Española de Lingüística Aplicada (AESLA), 2019.
- Martínez Ponciano, Regina. Premio de investigación 'Patricia Shaw', concedido pola Asociación Española de Estudios Anglonorteamericanos (AEDEAN), 2016.
- Palacios González, Manuela. Premio de Promoción da USC en Destinos Internacionais (1º premio na categoría de Artes e Humanidades)
```
{% /Tab %}

{% Tab label="1735" %}
```terminal
---
title: Creation, Publishing, and Criticism
author: Maria Xesus Nogueira Laura Lojo Rodriguez Manuela Palacios
date: 2025-01-01
---
Creation, Publishing, and Criticism
The Advance of Women's Writing
©2010
Monographs
XX,
230 Pages
Series:
Galician Studies, Volume 2
Summary
Since the 1980s, there has been an unprecedented and unremitting rise in the number of women writers in Galicia and Ireland. Publishers, critics, journals, and women's groups have played a decisive role in this phenomenon. Creation, Publishing, and Criticism provides a plurality of perspectives on the strategies deployed by the various cultural agents in the face of the advance of women authors and brings together a selection of articles by writers, publishers, critics, and theatre professionals who delve into their experiences during this process of cultural change. This collection of essays sets out to show how, departing from comparable circumstances, the Galician and the Irish literary systems explore their respective new paths in ways that are pertinent to each other. This book will be of particular interest to students of Galician and Irish studies, comparative literature, women's studies, and literary criticism. Both specialists in cultural analysis and the common reader will find this an enlightening book.
Details
- Pages
- XX, 230
- Publication Year
- 2010
- ISBN (PDF)
- 9781453900222
- ISBN (Hardcover)
- 9781433109546
- DOI
- 10.3726/978-1-4539-0022-2
- Language
- English
- Publication date
- 2010 (November)
- Keywords
- Irish literature Women Writers Poetry Fiction Theatre Publishing Criticism literary creation. Galician literature
- Published
- New York, Bern, Berlin, Bruxelles, Frankfurt am Main, Oxford, Wien, 2010. XX, 230 pp.
- Product Safety
- Peter Lang Group AG
```
{% /Tab %}

{% Tab label="60284" %}
```terminal
---
title: Publications
date: 2018-06-23
---
PUBLICATIONS
2018
- Lojo-Rodríguez, Laura. \"'Genealogies of Women': Discourses on Mothering and Motherhood in the Short Fiction of Michèle Roberts\" en Gender and Short Fiction: Women's Tales in Contemporary Britain. London and New York: Routledge, 2018. 102-122.
- Lojo-Rodríguez, Laura. \"England's Most Precious Gift: Virginia Woolf's Transformations into Spanish\". A Companion to World Literature. Ed. Kenneth Seigneurie. Oxford: Blackwells, 2018.
- Lojo-Rodríguez, Laura. \"Magic Realism and Experimental Fiction: From Virginia Woolf to Jeanette Winterson\", in Anne Fernald, ed. The Oxford Handbook of Virginia Woolf. Oxford: Oxford University Press, 2018 [forthcoming]
- Lojo-Rodríguez, Laura. '\"Thought in American and for the Americans\": Victoria Ocampo, Sur and European Modernism', in Ana Falcato, ed. Philosophy in the Condition of Modernism. Londres: Palgrave, 2018: 167-190.
- Lojo-Rodríguez, Laura. \"Victorian Male Heroes and Romance in Elizabeth Bowen's Short Fiction\". En Tracing the Heroic through Gender, Monika Mommertz, Thomas Seedorf, Carolin Bahr, Andreas Schlüter, eds. Würzburg.
- Sacido-Romero, Jorge and Laura Lojo Rodríguez. Gender & Short Fiction: Women's Tales in Contemporary Britain. Londres: Routledge.
- Sacido Romero, Jorge \"Chapter 10: In a Different Voice: Janice Galloway's Short Stories\". Gender and Short Fiction: Women's Tales in Contemporary Britain. Eds. J. Sacido and L. Lojo. New York: Routledge, 2018, pp. 191-214.
- Sacido Romero, Jorge y Laura María Lojo Rodríguez. \"Introduction\". Gender and Short Fiction: Women's Tales in Contemporary Britain. Eds. J. Sacido and L. Lojo. New York: Routledge, 2018, pp. 1-14.
- Sacido-Romero, Jorge. \"Liminality in Janice Galloway's Short Fiction\". Zeitschrift für und Amerikanistik: A Quarterly of Language, Literature and Culture. 66/4 (2018). [Forthcoming]
- Sacido-Romero, Jorge. \"An Interview with Janice Galloway\". The Bottle Imp 23 (June 2018)
- Sacido-Romero, Jorge. \"Intertextuality and Intermediality in Janice Galloway's 'Scenes from the Life' (Blood 1991)\". Short Fiction in Theory and Practice 8/1 (2018).
PREVIOUS PUBLICATIONS
2017
- Lojo-Rodriguez, Laura. \"Tourism and Identitary Conflicts in Monica Ali's Alentejo Blue\". Miscelánea: A Journal of English and American Studies. vol. 53 (2017): 73-90.
- Lojo-Rodriguez, Laura. \"Writing to Historicize and Contextualize: The Example of Virginia Woolf\". The Discipline, Ethics, and Art of Writing about Literature. Ed. Kirilka Stavreva. Gale-Cengage, Gale Researcher British Literature. Online.
- Mieszkowksi, Sylvia. \"An Interview with A. L. Kennedy\". The Bottle Imp 22. Online at: 
2016
- Lojo-Rodriguez, Laura. \"Virginia Woolf in Spanish-Speaking Countries\" in Jessica Berman, ed., The Blackwell Companion to Virginia Woolf. Oxford: Wiley-Blackwell, 2016, 446-480.
- Rallo-Lara, Carmen, J. Sacido-Romero, L. Torres-Zúñiga and I. Andrés Cuevas. \"Women's Tales of Dissent: Exploring Female Experience in the Short Fiction of Helen Simpson, Janice Galloway, A. S. Byatt, and Jeanette Winterson\". On the Move: Glancing Backwards to Build a Future in English Studies. Aitor Ibarrola-Armendariz and Jon Ortiz de Urbina Arruabarrena (eds.). Bilbao: Servicio de Publicaciones de la Universidad de Deusto, 2016, 345-50.
- Sacido-Romero, Jorge. \"Ghostly Visitations in Contemporary Short Fiction by Women: Fay Weldon, Janice Galloway and Ali Smith\". Atlantis: A Journal of the Spanish Association for Anglo-American Studies, 38.2 (Dec 2016): 83-102.
2015
- Lojo-Rodriguez, Laura. \"La poética del cuento en la primera mitad del siglo XX en Reino Unido: Virgina Woolf y Elizabeth Bowen\". Fragmentos de realidad. Servicio de publicaciones de la Universidad, 2015: 111-125.
- Mieszkowksi, Sylvia. \"Kitsch als Kitt: Die 'preposterous history' von Gilbert & Sullivans The Mikado in Mike Leighs Topsy-Turvy\" [fertig gestellt], in: Kitsch und Nation eds. Kathrin Ackermann and Christopher F. Laferl; Bielefeld: [transcript], 2015.
- Sacido-Romero, Jorge and Silvia Mieszkowski (eds.). Sound Effects: The Object Voice in Fiction. Leiden: Brill / Rodopi.
- Sacido-Romero, Jorge. \"The Voice in Twentieth-Century English Short Fiction: E.M. Forster, V.S. Pritchett and Muriel Spark,\" in J. Sacido-Romero and S. Mieszkowski, eds., Sound Effects: The Object Voice in Fiction. Leiden: Brill / Rodopi, 2015, 185–214.
2014
- Andrés-Cuevas, Isabel Ma, Laura Lojo-Rodríguez and Carmen Lara-Rallo. \"The Short Story and the Verbal-Visual Dialogue\" in E. Álvarez-López (coord. and ed.), E. M. Durán-Almarza and A. Menéndez-Tarrazo, eds., Building International Knowledge. Approaches to English and American Studies in Spain. AEDEAN/Universidad de Oviedo, 2014, 261–266.
- Andrés-Cuevas, Isabel M. \"Modernism, Postmodernism, and the Short Story in English, ed. Jorge Sacido\". Miscelánea: Revista de Estudios Ingleses y Norteamericanos 50 (2014): 173–177.
- Lara-Rollo, Carmen, Laura Lojo-Rodríguez and Isabel Andrés Cuevas). \"The Short Story and the Verbal-Visual Dialogue\" in Esther Álvarez López et al., eds., Building Interdisciplinary Knowledge. Approaches to English and American Studies in Spain. Oviedo: KRK Ediciones, 2014 261–65.
- Lojo-Rodriguez, Laura. \"'The Saving Power of Hallucination': Elizabeth Bowen's \"Mysterious Kôr\" and Female Romance\". Zeitschrift für Anglistik und Amerikanistik 62.4 (2014): 273–289.
- Lojo-Rodriguez, Laura. \"Unveiling the Past: Éilís Ní Dhuibhne's 'Sex in the Context of Ireland'\". Nordic Irish Studies 13.2 (2014): 19–30.
- Mieszkowksi, Sylvia. \"Feudal Furies: Interpellation and Tragic Irony in Shakespeare's Coriolanus\". Zeitsprünge 18 (2014), Vol. 3/4, 333–348.
- Mieszkowksi, Sylvia. \"QueerIng Ads? Imagepflege (in) der heteronormativen Gesellschaft,\" in Jörn Arendt, Lutz Hieber and York Kautt, eds., Kampf um Images: Visuelle Kommunikation in gesellschaftlichen Konfliktlagen. Bielefeld: transcript, 2014, 117–136.
- Mieszkowksi, Sylvia. \"Was war und ist Homosexualitätsforschung?\" in Jenniver Evans, Rüdiger Lautmann, Florian Mildenberge and Jakob Pastötter Homosexualität, eds., Spiegel der Wissenschaften. Hamburg: Männerschwarm Verlag, 2014.
- Mieszkowksi, Sylvia.Resonant Alterities: Sound, Desire and Anxiety in Non-Realist Fiction. Bielefeld: [transcript], 2014.
- Torres-Zúñiga, Laura. \"Autofiction and Jouissance in Tennessee Williams's 'Ten Minute Stop'\" The Tennessee Williams Annual Review (2014).
- Torres-Zúñiga, Laura. \"Sea and sun and maybe – Quien sabe! Tennessee Williams and Spain\" in J.S. Bak, ed., Tennessee Williams in Europe: Intercultural Encounters, Transatlantic Exchanges. Rodopi, 2014.
2013
- Andrés-Cuevas, Isabel Ma, Laura Lojo-Rodríguez and Jorge Sacido-Romero. \"Parents Then and Now: Infantile and Parental Crises in the Short Fiction of Katherine Mansfield, Helen Simpson and Hanif Kureishi\" in R. Arias, M. López-Rodríguez, C. Pérez-Hernández and A. Moreno-Ortiz, eds., Hopes and Fears. English and American Studies in Spain. AEDEAN/Universidad de Málaga, 2013, 304–307.
- Torres-Zúñiga, Laura. \"Comida, mujeres y poder en la obra de Tennessee Williams/Food, Women and Power in the Work of Tennessee Williams\" Dossiers Feministes 17 (2013).
- Mieszkowksi, Sylvia. \"Unauthorised Intercourse: Early Modern Bed Tricks and their Under-Lying Ideologies\". Zeitschrift für Anglistik und Amerikanistik 4 (2013): 319–340.
- Mieszkowksi, Sylvia. \"Eve Kosofsky Sedgwick\" in Marianne Schmidbaur, Helma Lutz and Ulla Wischermann, KlassikerInnen Feministischer Theorie. Bd III (1986-Gegenwart). Königstein/Taunus: Ulrike Helmer Verlag, 2013, 285–291.
- Lojo-Rodriguez, Laura. \"Exilio, historia, e a visión feminina: Éilís Ní Dhuibhne\" in Felipe Andrés Aliaga Sáez, ed., Cultura y migraciones: Enfoques multidisciplinarios. Santiago de Compostela: Servicio de publicaciones de la Universidad, 2013, 178–183.
- Lara-Rollo, Carmen. \"Intertextual and Relational Echoes in Contemporary British Short Fiction\". Il Confronto Letterario 60 sup. (2013): 119–133.
2012
- Andrés-Cuevas, Isabel Ma, Laura Lojo-Rodríguez and Carmen Lara-Rallo. \"Escenarios de la memoria: espacio, recuerdo y pasado traumático\" in S. Martín-Alegre, M. Moyer, E. Pladevall and S. Tuvau, eds., At a Time of Crisis: English and American Studies in Spain: Works from the 35th AEDEAN Conference. AEDEAN/Universidad Autónoma de Barcelona, 2012, 242–245.
- Torres-Zúñiga, Laura. \"Married Folks They are; And Few Pleasures They Have': Marriage Scenes in O. Henry's Short Stories\" in Mauricio D. Aguilera-Linde, María José de la Torre-Moreno and Laura Torres-Zúñiga, eds., Into Another's Skin: Studies in Honor of Mª Luisa Dañobeitia. Granada: Editorial Universidad de Granada, 2012.
- Sacido-Romero, Jorge. (with C. Lara-Rallo and I. Andrés Cuevas). \"Nature in Late-Twentieth-Century English Short Fiction: Angela Carter, Margaret Drabble and A. S. Byatt\". Proceedings of the 38th AEDEAN Conference.
- Sacido-Romero, Jorge. \"The Boy's Voice and Voices for the Boy in Joyce's 'The Sisters'\". Papers on Joyce 17.18 (Dec 2012): 203–242.
- Sacido-Romero, Jorge. \"Modernism, Postmodernism, and the Short Story\", in Jorge Sacido, ed. Modernism, Postmodernism and the Short Story in English. Amsterdam: Rodopi, 2012, 1-25.
- Sacido-Romero, Jorge (ed.). Modernism, Postmodernism, and the Short Story in English. Amsterdam: Rodopi, 2012
- Lojo-Rodriguez, Laura. (ed.). Moving across a Century: Women's Short Fiction from Virginia Woolf to Ali Smith. Bern: Peter Lang, 2012.
- Lojo-Rodriguez, Laura. \"Recovering the Maternal Body as Paradise: Michèle Roberts's 'Charity'\". Atlantis: A Journal of the Spanish Association of Anglo-American Studies 34.2 (Dec 2012): 33–47.
- Lara-Rollo, Carmen. \"The Rebirth of the Musical Author in Recent Fiction Written in English\". Authorship 1.2 (2012): 1–9.
- Lara-Rollo, Carmen. \"The Myth of Pygmalion and the Petrified Woman\" in José Manuel Losada and Marta Guirao, eds., Recent Anglo-American Fiction. Myth and Subversion in the Contemporary Novel. Newcastle upon Tyne: Cambridge Scholars Publishing, 2012, 199–212.
2011
- Andrés-Cuevas, Isabel Ma. \"Virginia Woolf's Ethics of the Short Story, by Christine Reynier\". Miscelánea: Revista de Estudios Ingleses y Norteamericanos 42 (2011): 173–179.
- Andrés-Cuevas, Isabel Ma and G. Rodríguez-Salas. The Aesthetic Construction of the Female Grotesque in Katherine Mansfield and Virginia Woolf: A Study of the Interplay of Life and Literature. Edwin Mellen Press: Lampeter, Ceredigion, 2011.
- Sacido-Romero, Jorge. \"Failed Exorcism: Kurtz Spectral Status and Its Ideological Function in Conrad's 'Heart of Darkness'\". Atlantis: A Journal of the Spanish Association for Anglo-American Studies. 32.2 (Dec 2011): 43–60.
- Lojo-Rodriguez, Laura. \"Voices from the Margins: Éilís Ní Dhuibhne's Female Perspective in The Pale Gold of Alaska and Other Stories\". Nordic Irish Studies 10 (2011): 35–40.
- Lojo-Rodriguez, Laura and Jorge Sacido-Romero. \"Through the Eye of a Postmodernist Child: Ian McEwan's 'Homemade'\". Miscelánea: A Journal of English and American Studies 44 (2011): 107–120.
- Lara-Rollo, Carmen. \"Deep Time and Human Time: The Geological Representation of Ageing in Contemporary Literature\" in Brian Worsfold, ed., Acculturating Age: Approaches to Cultural Gerontology. Lérida: Servicio de Publicaciones de la Universidad de Lérida, 2011, 167–86.
- Lara-Rollo, Carmen. \"'She thought human thoughts and stone thoughts': Geology and the Mineral World in A.S. Byatt's Fiction\" in Cedric Barfoot and Valeria Tinkler-Villani, eds., Restoring the Mystery of the Rainbow. Literature's Refraction of Science. Amsterdam and New York: Rodopi, 2011, 487–506.
2010
- Andrés-Cuevas, Isabel Ma, Carmen Lara-Rallo and L. Filardo-Lamas. \"The Shot in the Story: A Roundtable Discussion on Subversion in the Short Story\" in R. Galán-Moya et al., eds., Proceedings of the 33rd Aedean International Conference. Aedean/Universidad De Cádiz, 2010.
- Lojo-Rodriguez, Laura, Manuela Palacios and Mª Xesús Nogueira. Creation, Publishing, and Criticism: The Advance of Women's Writing. Bern: Peter Lang, 2010.
2009
- Lojo-Rodriguez, Laura. \"The Poetics of Motherhood in Contemporary Irish Women's Verse\" in Manuela Palacios and Laura Lojo-Rodríguez, eds., Writing Bonds: Irish and Galician Women Poets. Bern: Peter Lang, 2009, 123-142.
- Lojo-Rodriguez, Laura. \"Making Sense of Wilderness: An Interview with Anne Le Marquand Hartigan\" in Manuela Palacios and Laura Lojo-Rodríguez, eds., Writing Bonds: Irish and Galician Women Poets. Bern: Peter Lang, 2009, 195–204.
- Lara-Rollo, Carmen. \"Pictures Worth a Thousand Words: Metaphorical Images of Textual Interdependence\". Nordic Journal of English Studies. Special issue: \"Intertextuality\" 8.2 (2009): 91–110.
- Lara-Rollo, Carmen. \"Museums, Collections and Cabinets: 'Shelf after Shelf after Shelf'\" in Caroline Patey and Laura Scuriatti, eds., The Exhibit in the Text. The Museological Practices of Literature. Bern: Peter Lang, 2009, 219–39. Series: Cultural Interactions.
2008
- Lojo-Rodriguez, Laura. \"Virginia Woolf's Female History in 'The Journal of Mistress Joan Martyn'\". Short Story 16.1 (2008): 73–86.
2007
- Andrés-Cuevas, Isabel Ma. \"The Duplicity of the City in O.Henry: 'Squaring the Circle' and 'The Defeat of the City'\" in G. S. Castillo, M. R. Cabello et al., eds., The Short Story in English: Crossing Boundaries. Universidad de Alcalá de Henares, 2007, 32–42.
- Torres-Zúñiga, Laura. \"Tennessee Williams' 'Something About Him' or the Veiled Diagnosis of an Insane Society\" in Mauricio D. Aguilera-Linde et al., eds., Entre la creación y el aula. Granada: Editorial Universidad de Granada, 2007.
```
{% /Tab %}

{% /TabbedUseCaseCodeBlock %}

For this guide, we prepared a collection with a subset of the BrowseComp-Plus data. It includes the first 10 queries, their associated evidence and negative documents.

In this collection there are 10 query records. Each has the following metadata fields:
* `query_id`: The BrowseComp-Plus query ID.
* `query`: Set to `true`, indicating this is a query record.
* `gold_docs`: The list of gold doc IDs needed to answer this query

Most BrowseComp-Plus documents are too large to embed and store as they are, so we chunked them into discrete pieces. Each document record has the following metadata fields:
* `doc_id`: The original BrowseComp-Plus document ID this record was chunked from.
* `index`: The order in which this chunk appears in the original document. This is useful if we want to reconstruct the original documents.

Chunking the documents not only allows us to store them efficiently, but it is also a good context engineering practice. When the agent issues a search a smaller relevant chunk is more economical than a very large document.

## Running the Agent

Before we start walking through the implementation, let's run the agent to get a sense of what we're going to build.

{% Steps %}

{% Step %}
[Login](https://trychroma.com/login) to your Chroma Cloud account. If you don't have one yet, you can [signup](https://trychroma.com/signup). You will get free credits that should be more than enough for running this project.
{% /Step %}

{% Step %}
Use the "Create Database" button on the top right of the Chroma Cloud dashboard, and name your DB `agentic-search` (or any name of your choice). If you're a first time user, you will  be greeted with the "Create Database" modal after creating your account. 
{% /Step %}

{% Step %}
Choose the "Load sample dataset" option, and then choose the BrowseCompPlus dataset. This will copy the data into a collection in your own Chroma DB. 
{% /Step %}

{% Step %}
Once your collection loads, choose the "Settings" tab. On the bottom of the page, choose the `.env` tab. Create an API key, and copy the environment variables you will need for running the project: `CHROMA_API_KEY`, `CHROMA_TENANT`, and `CHROMA_DATABASE`.
{% /Step %}

{% Step %}
Clone the [Chroma Cookbooks](https://github.com/chroma-core/chroma-cookbooks) repo:

```terminal
git clone https://github.com/chroma-core/chroma-cookbooks.git
```

{% /Step %}

{% Step %}
Navigate to the `agentic-search` directory, and create a `.env` file at its root with the values you obtained in the previous step:

```terminal
cd chroma-cookbooks/agentic-search
touch .env
```

{% /Step %}

{% Step %}
To run this project, you will also need an [OpenAI API key](https://platform.openai.com/api-keys). Set it in your `.env` file:

```text
CHROMA_API_KEY=<YOUR CHROMA API KEY>
CHROMA_TENANT=<YOUR CHROMA TENANT>
CHROMA_DATABASE=agentic-search
OPENAI_API_KEY=<YOUR OPENAI API KEY>
```

{% /Step %}

{% Step %}

This project uses [pnpm](https://pnpm.io/installation) workspaces. In the root directory, install the dependencies:

```terminal
pnpm install
```

{% /Step %}

{% /Steps %}

The project includes a CLI interface that lets you interact with the search agent. You can run it in development mode to get started. The CLI expects one argument - the query ID to solve. From the root directory you can run

```terminal
pnpm cli:dev 770
```

To see the agent in action. It will go through the steps for solving query 770 - query planning, tool calling, and outcome evaluation, until it can solve the input query. The tools in this case, are different search capabilities over the Chroma collection containing the dataset.

Other arguments you can provide:
* `--provider`: The LLM provider you want to use. Defaults to OpenAI (currently only OpenAI is supported).
* `--model`: The model you want the agent to use. Defaults to `gpt-4o-mini`.
* `--max-plan-size`: The maximum query plan steps the agent will go through to solve the query. Defaults to 10. When set to 1, the query planning step is skipped.
* `--max-step-iterations`: The maximum number of tool-call interactions the agent will issue when solving each step. Defaults to 5.

Experiment with different configurations of the agent. For example, stronger reasoning models are slower, but may not need a query plan, or many iterations to solve a query correctly. They are more likely to be better at selecting the correct search tools, providing them with the best arguments, and reasoning through the results. Smaller or older models are faster and may not excel at tool calling. However, with a query plan and the intermediate evaluation steps, they might still produce the correct answer. 

## Building the Agent

{% Banner type="tip" %}
You can find the full implementation in the [chroma-cookbooks](https://github.com/chroma-core/chroma-cookbooks/tree/master/agentic-search) repo.
{% /Banner %}

We built a simple agent in this project to demonstrate the core concepts in this guide.

The `BaseAgent` class orchestrates the agentic workflow described above. It holds a reference to
* An `LLMService` - a simple abstraction for interacting with an LLM provider for getting structured outputs and tool calling.
* A `prompts` objects, defining the prompts used for different LLM interactions needed for this workflow (for example, generating the query plan, evaluating it, etc.).
* A list of `Tool`s that will be used to solve a user's query.

The project encapsulates different parts of the workflow into their own components.

The `QueryPlanner` generates a query plan for a given user query. This is a list of `PlanStep` objects, each keeping track of its status (`Pending`, `Success`, `Failure`, `Cancelled` etc.), and dependency on other steps in the plan. The planner is an iterator that emits the next batch of `Pending` steps ready for execution. It also exposes methods that let other components override the plan and update the status of completed steps.

The `Executor` solves a single `PlanStep`. It implements a simple tool calling loop with the `LLMService` until the step is solved. Finally it produces a `StepOutcome` object, summarizing the execution, identifying candidate answers and supporting evidence.

The `Evaluator` considers the plan and the history of outcomes to decide how to proceed with the query plan.

The `SearchAgent` class extends `BaseAgent` and provides it with the tools to search over the BrowseComp-Plus collection, using Chroma's [Search API](../../cloud/search-api/overview). It also passes the specific prompts needed for this specific search task.

---

---
id: building-with-ai
name: Building with AI
---

# Building with AI

AI is a new type of programming primitive. Large language models (LLMs) let us write software which can process **unstructured** information in a **common sense** way.

Consider the task of writing a program to extract a list of people's names from the following paragraph:

> Now the other princes of the Achaeans slept soundly the whole night through, but Agamemnon son of Atreus was troubled, so that he could get no rest. As when fair Hera's lord flashes his lightning in token of great rain or hail or snow when the snow-flakes whiten the ground, or again as a sign that he will open the wide jaws of hungry war, even so did Agamemnon heave many a heavy sigh, for his soul trembled within him. When he looked upon the plain of Troy he marveled at the many watchfires burning in front of Ilion... - The Iliad, Scroll 10

Extracting names is easy for humans, but is very difficult using only traditional programming. Writing a general program to extract names from any paragraph is harder still.

However, with an LLM the task becomes almost trivial. We can simply provide the following input to an LLM:

> List the names of people in the following paragraph, separated by commas: Now the other princes of the Achaeans slept soundly the whole night through, but Agamemnon son of Atreus was troubled, so that he could get no rest. As when fair Hera's lord flashes his lightning in token of great rain or hail or snow when the snow-flakes whiten the ground, or again as a sign that he will open the wide jaws of hungry war, even so did Agamemnon heave many a heavy sigh, for his soul trembled within him. When he looked upon the plain of Troy he marveled at the many watchfires burning in front of Ilion... - The Iliad, Scroll 10

The output would correctly be:

> Agamemnon, Atreus, Hera

Integrating LLMs into software applications is as simple as calling an API. While the specifics of the API may vary between LLMs, most have converged on some common patterns:

- Calls to the API typically consist of parameters including a `model` identifier, and a list of `messages`.
- Each `message` has a `role` and `content`.
- The `system` role can be thought of as the _instructions_ to the model.
- The `user` role can be thought of as the _data_ to process.

For example, we can use AI to write a general purpose function that extracts names from input text.

{% CustomTabs %}

{% Tab label="OpenAI" %}

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
import json
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

def extract_names(text: str) -> list[str]:
    system_prompt = "You are a name extractor. The user will give you text, and you must return a JSON array of names mentioned in the text. Do not include any explanation or formatting."

    response = openai.ChatCompletion.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": text}
        ]
    )

    response = response.choices[0].message["content"]
    return json.loads(response)
```

{% /Tab %}

{% Tab label="typescript" %}

```typescript
import { OpenAI } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function extractNames(text: string): Promise<string[]> {
  const systemPrompt =
    "You are a name extractor. The user will give you text, and you must return a JSON array of names mentioned in the text. Do not include any explanation or formatting.";

  const chatCompletion = await openai.chat.completions.create({
    model: "gpt-4o",
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: text },
    ],
  });

  const responseText = chatCompletion.choices[0].message?.content ?? "[]";
  return JSON.parse(responseText);
}
```

{% /Tab %}

{% /TabbedCodeBlock %}

{% /Tab %}

{% Tab label="Anthropic" %}

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
import json
import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

def extract_names(text: str) -> list[str]:
    system_prompt = "You are a name extractor. The user will give you text, and you must return a JSON array of names mentioned in the text. Do not include any explanation or formatting."

    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1000,
        system=system_prompt,
        messages=[
            {"role": "user", "content": text}
        ]
    )

    response_text = response.content[0].text
    return json.loads(response_text)
```

{% /Tab %}

{% Tab label="typescript" %}

```typescript
import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export async function extractNames(text: string): Promise<string[]> {
  const systemPrompt =
    "You are a name extractor. The user will give you text, and you must return a JSON array of names mentioned in the text. Do not include any explanation or formatting.";

  const message = await anthropic.messages.create({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1000,
    system: systemPrompt,
    messages: [{ role: "user", content: text }],
  });

  const responseText =
    message.content[0]?.type === "text" ? message.content[0].text : "[]";
  return JSON.parse(responseText);
}
```

{% /Tab %}

{% /TabbedCodeBlock %}

{% /Tab %}

{% /CustomTabs %}

Building with AI allows new type of work to be done by software. LLMs are capable of understanding abstract ideas and take action. Given access to retrieval systems and tools, LLMs can operate on tasks autonomously in ways that wasn't possible with classical software.

---

# Chunking

Retrieval-Augmented Generation (RAG) lets us ground large language models in our 
own data. The core idea is simple: we store our data in a Chroma collection. Then, 
before issuing a request to an LLM, we find the relevant parts of data in the 
collection, and include them in the prompt so the LLM can answer based on real 
information rather than its training data alone. 

But here's the problem: we can't just throw entire documents at the model. For example, a single PDF from our data might contain 50 pages. A codebase might span 
thousands of files. Even a modest knowledge base can exceed what fits in a 
context window — and even when documents do fit, including entire files is 
wasteful. If someone asks "What's the default timeout?", we don't want to 
retrieve a 20-page configuration guide; we want the specific paragraph that 
answers the question.

Beyond the context concerns, we also need to be mindful of how we embed and store 
data. All embedding models have their own token limits. If we try to embed a document 
exceeding this limit, the resulting embedding will not represent the parts of the document 
beyond the model's limit. Additionally, Chroma limits each record document size to 
16KB.

This is why RAG systems work with **chunks** — smaller pieces of documents 
that can be independently retrieved based on relevance to a query.

A common **ingestion pipeline** works as follows: we split data into chunks, collect metadata fields we can attach to each chunk, and insert the resulting records into our Chroma collection. Chroma will automatically embed the chunks using the collection's embedding function.

## Choosing Chunking Boundaries

Chunking forces a trade-off: chunks need to be small enough to match specific 
queries, but large enough to be self-contained and meaningful.

Consider building a chatbot over technical documentation, where we decide to chunk text by sentences. The following paragraph

> The connection timeout controls how long the client waits when establishing a connection to the server. The default value is 30 seconds. For high-latency networks, consider increasing this to 60 seconds. Note that this is different from the read timeout, which controls how long the client waits for data after the connection is established.

Will produce these chunks:

* **Chunk 1**: "The connection timeout controls how long the client waits when establishing a connection to the server."
* **Chunk 2**: "The default value is 30 seconds."
* **Chunk 3**: "For high-latency networks, consider increasing this to 60 seconds."
* **Chunk 4**: "Note that this is different from the read timeout, which controls how long the client waits for data after the connection is established."

Now a user asks:

> How long is the connection timeout?

Chunk 2 contains "The default value is 30 seconds"—but it never mentions "connection timeout." That phrase only appears in Chunk 1.
When we issue this query to the collection, we have no guarantee that both chunks will be retrieved so an LLM can compile the correct answer.

A better approach keeps full paragraphs together, so the answer and its context share the same embedding and get retrieved as a unit.
The right boundaries depend on what we're chunking. A novel has different natural units than an API reference. Code has different logical boundaries than an email thread.

Poor chunking creates a chain of problems through your pipeline:

1. Retrieval returns partial matches. In the example above, searching for "default connection timeout" might rank Chunk 1 highest (it mentions "connection timeout") even though Chunk 2 has the actual answer. Your relevance scores look reasonable, but the retrieved content doesn't actually answer the question.
2. You compensate by increasing top-k. When individual chunks don't contain complete information, you retrieve 10 or 20 results instead of 3 or 4. This increases token costs, and dilutes the prompt with marginally relevant text—hurting the LLM's ability to focus on what matters.
3. The LLM produces degraded answers. The model can only synthesize what you provide. Fragmentary context leads to hedged answers ("The default value appears to be 30 seconds, but I'm not certain what parameter this refers to..."), hallucinated details, or outright errors.

## Chunking Strategies

**Recursive splitting** — Try to split at the largest structural unit first 
(e.g., double newlines for paragraphs), but if a resulting chunk exceeds your 
size limit (token and/or document limit), recursively split it using smaller 
units (single newlines, then sentences, then words). This balances 
structure-awareness with size constraints. LangChain's `RecursiveCharacterTextSplitter` 
is a common implementation.

**Split with Overlap** - Use a chunking strategy (like recursive splitting), but 
include an overlap between chunks. For example, if splitting a PDF by paragraphs, 
Chunk-1 contains the first paragraph and the first sentence of the second paragraph. 
Chunk-2 contains the second paragraph and the last sentence of the first paragraph.
The overlap creates redundancy that helps preserve context across boundaries. 
The downside: you're storing and embedding duplicate content.

**Structure-aware splitting** — Parse the document's explicit structure: 
Markdown headers, HTML DOM, or code ASTs. Split at structural boundaries and 
optionally include hierarchical context in the chunk's content itself. For example, 
when splitting the code for a class by instance methods, include at the top of 
each chunk a code comment mentioning the encompassing class, file name, etc.

**Semantic splitting** — Embed sentences or paragraphs, compute similarity 
between adjacent segments, and place chunk boundaries where similarity 
drops (indicating a topic shift). This process can also be driven by an LLM 
alternatively. This method is more computationally expensive but can produce 
more coherent chunks when documents lack clear structural markers.

{% Banner type="tip" %}
Learn more about different strategies in our [chunking research report](https://research.trychroma.com/evaluating-chunking) 
{% /Banner %}

## Chunking Text

For most text documents, recursive chunking with some chunk overlap is a good 
starting point. LangChain's `RecursiveCharacterTextSplitter` is an example implementation 
for this strategy. It tries to split at natural boundaries (paragraphs first, 
then sentences, then words) while respecting size limits and adding overlap 
to preserve context across boundaries.

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " "]
)

chunks = splitter.split_text(document)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";

const splitter = new RecursiveCharacterTextSplitter({
    chunkSize: 500,
    chunkOverlap: 50,
    separators: ["\n\n", "\n", ". ", " "]
});

const chunks = await splitter.splitText(document);
```
{% /Tab %}

{% /TabbedCodeBlock %}

When chunking Markdown files, we can take advantage of their structure. For example, 
we can split by headers - try to split by `h2` headers, and recursively try inner 
headers. 

We can also contextualize each chunk by specifying its place in the document's
structure. For example, if end up with a chunk that is under an `h3` header, we can
append at the top the path from the document's `h1` to this chunk. 

LangChain's `MarkdownHeaderTextSplitter` splits by section and captures the header hierarchy as metadata.

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
from langchain.text_splitter import MarkdownHeaderTextSplitter

splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[("#", "h1"), ("##", "h2"), ("###", "h3")]
)
chunks = splitter.split_text(markdown_doc)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
import { MarkdownHeaderTextSplitter } from "langchain/text_splitter";

const splitter = new MarkdownHeaderTextSplitter({
    headersToSplitOn: [["#", "h1"], ["##", "h2"], ["###", "h3"]]
});

const chunks = await splitter.splitText(markdownDoc);
```
{% /Tab %}

{% /TabbedCodeBlock %}

Each chunk includes the path to it from the document's `h1` header: 

```JSON
{
  "h1": "Config", 
  "h2": "Timeouts"
}
```

We can leverage it to add this context for each chunk:

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
def contextualize(chunk) -> str:
    headers = [chunk.metadata.get(f"h{i}") for i in range(1, 4)]
    path = " > ".join(h for h in headers if h)
    return f"[{path}]\n\n{chunk.page_content}" if path else chunk.page_content
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
function contextualize(chunk: Document): string {
    const headers = [1, 2, 3].map(i => chunk.metadata[`h${i}`]).filter(Boolean);
    const path = headers.join(" > ");
    return path ? `[${path}]\n\n${chunk.pageContent}` : chunk.pageContent;
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

## Chunking Code

When chunking text-based files, our split boundaries are often obvious - paragraphs, sentences, Markdown headers, etc.
Code is trickier — there's no single obvious unit. Functions? Classes? Files? Instance methods can be too granular, files too large, and the right choice often depends on the codebase and the types of queries you want to answer.

Using the same idea that chunks should be self-contained units of our data, we 
will choose classes and functions as our chunking boundaries, and treat them as 
atomic units of code that should not be broken down further.

This way, if a query like "how is auth handled" is submitted, we can get back a 
chunk containing a relevant function. If that chunk contains references to other 
classes or functions, we can subsequently retrieve the chunks where they are represented (via [regex](../../docs/querying-collections/full-text-search.md) search for example).

A great tool that gives us the ability to parse a file of code into these units is `tree-sitter`. It is a fast parsing library that can build an abstract syntax tree, or an AST, for an input source code.

For example, if we parse this code snippet with tree sitter:

```python
class MyClass:
    def say_hello(self, name: str) -> None:
        print(f"Hello {name}")
```

We will get a tree with a `class_definition` node, which encompasses the entire class. It will have as a child a `method_definition` node, covering the `say_hello` method, and so on.

Each node represents a construct of the language we work with, which is exactly what we want to have in our collection.

### A Small Example

Let's examine a small example of using `tree-sitter` to parse Python files. To being, we'll set up `tree-sitter` and a parser for Python files:

{% Tabs %}

{% Tab label="python" %}
{% PythonInstallation packages="tree-sitter tree-sitter-python" / %}
{% /Tab %}

{% Tab label="typescript" %}
{% TypescriptInstallation packages="tree-sitter tree-sitter-python" / %}
{% /Tab %}

{% /Tabs %}

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
from tree_sitter import Language, Parser
import tree_sitter_python as tspython

# Use Python grammar
python_language = Language(tspython.language())

# Set up the parser
parser = Parser(python_language)
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
import Parser from "tree-sitter";
import Python from "tree-sitter-python";

const parser = new Parser();
parser.setLanguage(Python);
```
{% /Tab %}

{% /TabbedCodeBlock %}

Using the parser, we can process the code snippet from our small example:

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
source_code = b"""
class MyClass:
    def say_hello(self, name: str) -> None:
        print(f"Hello {name}")
"""

tree = parser.parse(source_code)
root = tree.root_node
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
const sourceCode = `
class MyClass:
    def say_hello(self, name: str) -> None:
        print(f"Hello {name}")
`;

const tree = parser.parse(sourceCode);
const root = tree.rootNode;
```
{% /Tab %}

{% /TabbedCodeBlock %}

The root node encompasses the entire source code. Its first child is the `class_definition` node, spanning lines 1-3. If we explore further down the tree, we will find the `function_definition` node, which spans lines 2-3.

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
print(root.children[0])
# <Node type=class_definition, start_point=(1, 0), end_point=(3, 30)>

print(root.children[0].children[3].children[0])
# <Node type=function_definition, start_point=(2, 4), end_point=(3, 30)>
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
console.log(root.children[0].type);
// class_definition

console.log(root.children[0].children[3].children[0].type);
// function_definition
```
{% /Tab %}

{% /TabbedCodeBlock %}

### Recursively Exploring an AST

We can write a function, that given source code, parses it using the `tree-sitter` parser, and recursively explores the tree to find the nodes we want represented in our chunks. Recall that we wanted to treat our "target" node as atomic units. So we will stop the recursion when we find such nodes.

We can also use the nodes' `start_byte` and `end_byte` fields to get back the code each node represents. `tree-sitter` can also give us the line numbers each node spans, which we can save in chunks' metadata:

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
from uuid import uuid4

def parse_code(file_path: str) -> list[Chunk]:
    with open(file_path, "rb") as f:
        source_code = f.read()
    
    tree = parser.parse(source_code)
    root = tree.root_node

    target_types = ['function_definition', 'class_definition']

    def collect_nodes(node: Node) -> list[Node]:
        result: list[Node] = []

        if node.type in target_types:
            result.append(node)
        else:
            for child in node.children:
                result.extend(collect_nodes(child))

        return result

    nodes = collect_nodes(root)
    chunks = []

    for node in nodes:
        name_node = node.child_by_field_name("name")
        symbol = source_code[name_node.start_byte:name_node.end_byte].decode()
        chunk = Chunk(
            id=str(uuid4()),
            content=source_code[node.start_byte : node.end_byte].decode("utf-8"),
            start_line=node.start_point[0],
            end_line=node.end_point[0],
            path=file_path,
        )
        chunks.append(chunk)
    
    return chunks
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
import fs from "fs";
import type Parser from "tree-sitter";
import { v4 as uuid } from "uuid";

export function parseCode(filePath: string, parser: Parser): Chunk[] {
    const sourceCode = fs.readFileSync(filePath, "utf8");

    const tree = parser.parse(sourceCode);
    const root = tree.rootNode;

    const targetTypes = ["function_definition", "class_definition"];

    function collectNodes(node: Parser.SyntaxNode): Parser.SyntaxNode[] {
        const result: Parser.SyntaxNode[] = [];

        if (targetTypes.includes(node.type)) {
            result.push(node);
        } else {
            for (const child of node.children) {
                result.push(...collectNodes(child));
            }
        }

        return result;
    }

    const nodes = collectNodes(root);
    const chunks: Chunk[] = [];

    for (const node of nodes) {
        const nameNode = node.childForFieldName("name");
        if (!nameNode) continue;

        const symbol = sourceCode.slice(nameNode.startIndex, nameNode.endIndex);

        chunks.push({
            id: uuid(),
            content: sourceCode.slice(node.startIndex, node.endIndex),
            start_line: node.startPosition.row,
            end_line: node.endPosition.row,
            path: filePath,
        });
    }

    return chunks;
}

```
{% /Tab %}

{% /TabbedCodeBlock %}

If the chunks this method produces are still too large, we can default to splitting them by line spans. If we ever need to reconstruct them, we can use the line-number metadata fields.

## Evaluation

To evaluate your chunking strategy, test it against real queries and measure how well the right chunks surface. The goal is retrieval quality: when we issue a query to Chroma, do the top results contain the information needed to answer it?

Create a set of test queries with ground truth: each query maps to the chunk(s) that should be retrieved for it:

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
test_queries = [
    {
        "query": "What's the default connection timeout?",
        "expected_chunks": ["chunk-3"],
    },
    {
        "query": "How do I authenticate with OAuth?",
        "expected_chunks": ["chunk-1", "chunk-2"],
    },
    # ...
]
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
const testQueries = [
    {
        query: "What's the default connection timeout?",
        expected_chunks: ["chunk-3"],
    },
    {
        query: "How do I authenticate with OAuth?",
        expected_chunks: ["chunk-1", "chunk-2"],
    },
    // ...
]
```
{% /Tab %}

{% /TabbedCodeBlock %}

The key metrics you will measure are:
* **Recall@k**: Of your test queries, what percentage have the correct chunk in the top `k` results?

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
def recall_at_k(results: list[str], expected: list[str], k: int) -> float:
    top_k = set(results[:k])
    return len(top_k & set(expected)) / len(expected)
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
function recallAtK(results: string[], expected: string[], k: number): number {
    const topK = new Set(results.slice(0, k));
    return [...topK].filter(x => expected.includes(x)).length / expected.length;
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

* **Mean Reciprocal Rank (MRR)** — Where does the first correct chunk appear? (Higher is better)

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
def mrr(results: list[str], expected: list[str]) -> float:
    for i, chunk_id in enumerate(results):
        if chunk_id in expected:
            return 1 / (i + 1)
    return 0
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
function mrr(results: string[], expected: string[]): number {
    for (let i = 0; i < results.length; i++) {
        if (expected.includes(results[i])) {
            return 1 / (i + 1);
        }
    }
    return 0;
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

Then test your queries against the chunks in your collection:

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
k = 10

results = collection.query(
    query_texts=[test_case["query"] for test_case in test_queries],
    n_results=k
)

metrics = [
    {
        "recall": recall_at_k(chunk_ids, test_queries[i]["expected_chunks"], k),
        "mrr": mrr(chunk_ids, test_queries[i]["expected_chunks"])
    }
    for i, chunk_ids in enumerate(results["ids"])
]
```

{% /Tab %}

{% Tab label="typescript" %}
```typescript
const k = 10;

const results = collection.query({
    query_texts: testQueries.map(testCase => testCase.query),
    n_results: k,
});

const metrics = results.ids.map((chunkIds: string[], i: number) => ({
    recall: recallAtK(chunkIds, testQueries[i].expected_chunks, k),
    mrr: mrr(chunkIds, testQueries[i].expected_chunks),
}));
```
{% /Tab %}

{% /TabbedCodeBlock %}

If you see:
* Low recall (the correct chunks are not in the top-k results) - try smaller chunks, with more overlap between them.
* Correct chunks rank low - add context to the chunks themselves and leverage metadata filtering
* Duplicate results - decrease chunk overlap
* Irrelevant matches - try larger chunks, structure-aware chunking, or semantic-aware chunking.

---

---
id: intro-to-retrieval
name: Intro to Retrieval
---

# Introduction to Retrieval

Large language models like GPT-5, Claude, or Gemini can write code, answer questions, generate content, and solve complex problems with remarkable sophistication. You can build chatbots, content generators, coding assistants, and analysis tools by crafting good prompts and calling AI APIs.

However, as you start to build more sophisticated applications, especially those that require knowledge not available to an LLM, you'll quickly run into some fundamental limitations.

## The Knowledge Problem

Imagine you're building an AI assistant for a company's customer support team. You want the AI to answer questions about your products, policies, and procedures.

Given a simple prompt

> What is your return policy for tooth brushes?

The AI is likely to return a generic answer, because it doesn't know about your company's specific policies:

> I'm not a retailer or company that sells toothbrushes - I'm Claude, an AI assistant created by Anthropic. I don't have products or a return policy. If you're looking for information about a toothbrush return policy, you'd need to check with the specific store or company where you purchased or are planning to purchase the toothbrush. Most retailers have their return policies listed on their websites or you can contact their customer service directly.

We can get around this problem, by providing useful information or context for the AI. This often called "context engineering":

> What is your return policy for tooth brushes? Use the following information for context: Tooth brushes can be returned up to 360 days after purchase if unopened. Tooth brushes are unable to be returned if they are used. Tooth brushes can be returned if they suffer manufacturing defects discovered after opening.

Now the AI has the information it needs to answer the user question correctly:

> Based on the information provided, our return policy for toothbrushes allows returns up to 360 days after purchase, as long as the toothbrush remains unopened and in its original packaging. This gives you nearly a full year to return an unopened toothbrush if needed. Once a toothbrush has been opened, it wouid not be eligible for return under this policy unless it has manufacturing defects.

But what happens if we want our AI chatbot to be able to answer questions about shipping, product specifications, troubleshooting, or any other topic? We will have to include our entire knowledge base in our context, which leads to several technical problems.

**Token Limits:** AI models have maximum input lengths. Even the largest models might not be able to fit an entire company's documentation in a single prompt.

**Cost:** AI APIs typically charge per token. Including thousands of words of context in every request becomes expensive quickly.

**Relevance:** When you include too much information, the AI can get confused or focus on irrelevant details instead of what actually matters for answering the user's question.

**Freshness:** Information changes constantly. Product specs update, policies change, new documentation gets written. Keeping everything in prompts means manually updating every prompt whenever anything changes.

**Hallucinations:** Without the correct information or focus for answering a user's question, LLMs may produce a wrong answer with an authoritative voice. For most business applications, where accuracy matters, hallucination is a critical problem.

## Enter Retrieval

Retrieval solves these fundamental challenges by creating a bridge between AI models and your actual data. Instead of trying to cram everything into prompts, a retrieval system **stores your information** in a searchable format. This allows you to search your knowledge base using natural language, so you can find relevant information to answer the user's question, by providing the retrieval system with the user's question itself. This way, you can build context for the model in a strategic manner.

When a retrieval system returns the results from your knowledge base relevant to the user's question, you can use them to provide context for the AI model to help it generate an accurate response.

Here's how a typical retrieval pipeline is built:

1. **Converting information into searchable formats** - this is done by using **embedding models**. They create mathematical representations of your data, called "embeddings", that capture the semantic meaning of text, not just keywords.
2. **Storing these representations** in a retrieval system, optimized for quickly finding similar embeddings for an input query.
3. **Processing user queries** into embeddings, so they can be used as inputs to your retrieval system.
4. **Query and retrieve** results from the database.
5. **Combining the retrieved results** with the original user query to serve to an AI model.

**Chroma** is a powerful retrieval system that handles most of this process out-of-the-box. It also allows you to customize these steps to get the best performance in your AI application. Let's see it in action for our customer support example.

### Step 1: Embed our Knowledge Base and Store it in a Chroma Collection

{% Tabs %}

{% Tab label="python" %}

Install Chroma:

{% TabbedUseCaseCodeBlock language="Terminal" %}

{% Tab label="pip" %}

```terminal
pip install chromadb
```

{% /Tab %}

{% Tab label="poetry" %}

```terminal
poetry add chromadb
```

{% /Tab %}

{% Tab label="uv" %}

```terminal
uv pip install chromadb
```

{% /Tab %}

{% /TabbedUseCaseCodeBlock %}

Chroma embeds and stores information in a single operation.

```python
import chromadb

client = chromadb.Client()
customer_support_collection = client.create_collection(
    name="customer support"
)

customer_support_collection.add(
   ids=["1", "2", "3"],
   documents=[
      "Toothbrushes can be returned up to 360 days after purchase if unopened.",
      "Shipping is free of charge for all orders.",
      "Shipping normally takes 2-3 business days"
   ]
)
```

{% /Tab %}

{% Tab label="typescript" %}

Install Chroma:

{% TabbedUseCaseCodeBlock language="Terminal" %}

{% Tab label="npm" %}

```terminal
npm install chromadb @chroma-core/default-embed
```

{% /Tab %}

{% Tab label="pnpm" %}

```terminal
pnpm add chromadb @chroma-core/default-embed
```

{% /Tab %}

{% Tab label="yarn" %}

```terminal
yarn add chromadb @chroma-core/default-embed
```

{% /Tab %}

{% Tab label="bun" %}

```terminal
bun add chromadb @chroma-core/default-embed
```

{% /Tab %}

{% /TabbedUseCaseCodeBlock %}

Run a Chroma server locally:

```terminal
chroma run
```

Chroma embeds and stores information in a single operation.

```typescript
import { ChromaClient } from "chromadb";

const client = new ChromaClient();
const customer_support_collection = await client.createCollection({
  name: "customer support",
});

await customer_support_collection.add({
  ids: ["1", "2", "3"],
  documents: [
    "Toothbrushes can be returned up to 360 days after purchase if unopened.",
    "Shipping is free of charge for all orders.",
    "Shipping normally takes 2-3 business days",
  ],
});
```

{% /Tab %}

{% /Tabs %}

### Step 2: Process the User's Query

Similarly, Chroma handles the embedding of queries for you out-of-the-box.

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
user_query = "What is your return policy for tooth brushes?"

context = customer_support_collection.query(
    queryTexts=[user_query],
    n_results=1
)['documents'][0]

print(context) # Toothbrushes can be returned up to 360 days after purchase if unopened.
```

{% /Tab %}

{% Tab label="typescript" %}

```typescript
const user_query = "What is your return policy for tooth brushes?";

const context = (
  await customer_support_collection.query({
    queryTexts: [user_query],
    n_results: 1,
  })
).documents[0];

console.log(context); // Toothbrushes can be returned up to 360 days after purchase if unopened.
```

{% /Tab %}

{% /TabbedCodeBlock %}

### Step 3: Generate the AI Response

With the result from Chroma, we can build the correct context for an AI model.

{% CustomTabs %}

{% Tab label="OpenAI" %}

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
import os
import openai

openai.api_key = os.getenv("OPENAI_API_KEY")

prompt = f"{user_query}. Use this as context for answering: {context}"

response = openai.ChatCompletion.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": prompt}
    ]
)
```

{% /Tab %}

{% Tab label="typescript" %}

```typescript
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

const prompt = `${userQuery}. Use this as context for answering: ${context}`;

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [
    { role: "system", content: "You are a helpful assistant" },
    { role: "user", content: prompt },
  ],
});
```

{% /Tab %}

{% /TabbedCodeBlock %}

{% /Tab %}

{% Tab label="Anthropic" %}

{% TabbedCodeBlock %}

{% Tab label="python" %}

```python
import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

prompt = f"{user_query}. Use this as context for answering: {context}"

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": prompt}
    ]
)
```

{% /Tab %}

{% Tab label="typescript" %}

```typescript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

const prompt = `${userQuery}. Use this as context for answering: ${context}`;

const response = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 1024,
  messages: [
    {
      role: "user",
      content: prompt,
    },
  ],
});
```

{% /Tab %}

{% /TabbedCodeBlock %}

{% /Tab %}

{% /CustomTabs %}

There's a lot left to consider, but the core building blocks are here. Some next steps to consider:

- **Embedding Model** There are many embedding models on the market, some optimized for code, others for english and others still for various languages. Embedding model selection plays a big role in retrieval accuracy.
- **Chunking** Chunking strategies are very unique to the data. Deciding how large or small to make chunks is critical to the performance of the system.
- **n_results** varying the number of results balances token usage with correctness. The more results, the likely the better answer from the LLM but at the expense of more token usage.

---

# Look at Your Data

Before building our RAG pipelines and inserting data into Chroma collections, it is worth asking ourselves the following questions:

* What types of searches do we want to support? (semantic, regex, keyword, etc.)
* What embedding models should we use for semantic and keyword searches?
* Should chunks live in one Chroma collection, or should we use different collections for different chunk types?
* What are the meaningful units of data we want to store as records in our Chroma collections?
* What metadata fields can we leverage when querying?

The structure of our collections, the granularity of our chunks, and the metadata we capture - all directly impact retrieval quality—and by extension, the quality of the LLM's responses in our AI application.

## Search Modalities

Chroma supports various search techniques that are useful for different use cases.

**Dense search** (semantic) uses embeddings to find records that are semantically similar to a query. It excels at matching meaning and intent — a query like "how do I return a product" can surface relevant chunks even if they never use the word "return." The weakness? Dense search can struggle with exact terms: product SKUs, part numbers, legal case citations, or domain-specific jargon that didn't appear often in the embedding model's training data.

All Chroma collections enable semantic search by default. You can specify the embedding function your collection will use to embed your data when creating a collection:

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

client = chromadb.CloudClient()

collection = client.create_collection(
    name="my-collection",
    embedding_function=OpenAIEmbeddingFunction(
        api_key="YOUR_OPENAI_API_KEY",
        model="text-embedding-3-small"
    )
)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
import { CloudClient } from "chromadb";
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";

const client = new CloudClient();

const collection = await client.createCollection({
    name: "my-collection",
    embeddingFunction: new OpenAIEmbeddingFunction({
        apiKey: "YOUR_OPENAI_API_KEY",
        model: "text-embedding-3-small"
    })
});
```
{% /Tab %}

{% /TabbedCodeBlock %}

**Lexical search** (keyword) matches on exact tokens. It shines when you need precision: finding a specific product ID like `SKU-4892-X`, a drug name like `omeprazole`, a legal citation like `Smith v. Jones (2019)`, or a model number in a technical manual. Dense search might miss these entirely or return semantically related but wrong results. The tradeoff is that lexical search can't bridge synonyms or paraphrases — searching "cancel" won't find chunks that only mention "terminate."

To enable lexical search on your collection, you can enable a sparse vector index on your collection's schema with a sparse embedding function:

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
import chromadb
from chromadb import Schema, SparseVectorIndexConfig, K
from chromadb.utils.embedding_functions import ChromaCloudSpladeEmbeddingFunction

client = chromadb.CloudClient()

schema = Schema()

schema.create_index(
    config=SparseVectorIndexConfig(
        source_key=K.DOCUMENT,
        embedding_function=ChromaCloudSpladeEmbeddingFunction()
    ),
    key="sparse_embedding"
)

collection = client.create_collection(
    name="my-collection",
    schema=schema
)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
import { CloudClient, Schema, SparseVectorIndexConfig, K } from 'chromadb';
import { ChromaCloudSpladeEmbeddingFunction } from '@chroma-core/chroma-cloud-splade';

const client = new CloudClient();

const schema = new Schema();

schema.createIndex(
    new SparseVectorIndexConfig({
        sourceKey: K.DOCUMENT,
        embeddingFunction: new ChromaCloudSpladeEmbeddingFunction()
    }),
    "sparse_embedding"
);

const collection = await client.createCollection({
    name: "my-collection",
    schema
});
```
{% /Tab %}

{% /TabbedCodeBlock %}

**Hybrid search** combines both: run dense and lexical searches in parallel, then merge the results. This gives you semantic understanding and precise term matching. For many retrieval tasks — especially over technical or specialized content — hybrid outperforms either approach alone.

Chroma's [Search API](../../cloud/search-api/overview) allows you to define how you want to combine dense and sparse (lexical) results. For example, using [RRF](../../cloud/search-api/hybrid-search#understanding-rrf):

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
from chromadb import Search, K, Knn, Rrf

# Dense semantic embeddings
dense_rank = Knn(
    query="machine learning research",  # Text query for dense embeddings
    key="#embedding",          # Default embedding field
    return_rank=True,
    limit=200                  # Consider top 200 candidates
)

# Sparse keyword embeddings
sparse_rank = Knn(
    query="machine learning research",  # Text query for sparse embeddings
    key="sparse_embedding",    # Metadata field for sparse vectors
    return_rank=True,
    limit=200
)

# Combine with RRF
hybrid_rank = Rrf(
    ranks=[dense_rank, sparse_rank],
    weights=[0.7, 0.3],       # 70% semantic, 30% keyword
    k=60
)

# Use in search
search = (Search()
    .where(K("status") == "published")  # Optional filtering
    .rank(hybrid_rank)
    .limit(20)
    .select(K.DOCUMENT, K.SCORE, "title")
)

results = collection.search(search)
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
import { Search, K, Knn, Rrf } from 'chromadb';

// Dense semantic embeddings
const denseRank = Knn({
  query: "machine learning research",  // Text query for dense embeddings
  key: "#embedding",         // Default embedding field
  returnRank: true,
  limit: 200                 // Consider top 200 candidates
});

// Sparse keyword embeddings
const sparseRank = Knn({
  query: "machine learning research",  // Text query for sparse embeddings
  key: "sparse_embedding",   // Metadata field for sparse vectors
  returnRank: true,
  limit: 200
});

// Combine with RRF
const hybridRank = Rrf({
  ranks: [denseRank, sparseRank],
  weights: [0.7, 0.3],       // 70% semantic, 30% keyword
  k: 60
});

// Use in search
const search = new Search()
  .where(K("status").eq("published"))  // Optional filtering
  .rank(hybridRank)
  .limit(20)
  .select(K.DOCUMENT, K.SCORE, "title");

const results = await collection.search(search);
```
{% /Tab %}

{% /TabbedCodeBlock %}

Chroma also supports **text filtering** on top of your searches via the `where_document` parameter. You can filter results to only include chunks that contain an exact string or match a regex pattern. This is useful for enforcing structural constraints—like ensuring results contain a specific identifier—or for pattern matching on things like email addresses, dates, or phone numbers.

## Embedding Models

**Dense embedding models** map text to vectors where semantic similarity is captured by vector distance.

Chroma has first-class support for many embedding models. The tradeoffs include cost (API-based vs. local), latency, embedding dimensions (which affect storage and search speed), and quality on your specific domain. General-purpose models work well for most text, but specialized models trained on code, legal documents, or medical text can outperform them on domain-specific tasks. Larger models typically produce better embeddings but cost more and run slower—so the right choice depends on your quality requirements and constraints.
* If you're building a customer support bot over general documentation, a model like `text-embedding-3-small` offers a good balance of quality and cost. 
* For a codebase search tool, code-specific models will better capture the semantics of function names, syntax, and programming patterns. Chroma works with code-specific models from [OpenAI](../../integrations/embedding-models/openai), [Cohere](../../integrations/embedding-models/cohere), [Mistral](../../integrations/embedding-models/mistral), [Morph](../../integrations/embedding-models/morph), and more. 
* If you need to run entirely locally for privacy or cost reasons, smaller open-source models like `all-MiniLM-L6-v2` are a practical choice, though with some quality tradeoff.

**Sparse embedding models** power lexical search. For example, BM25 counts the frequency of tokens in a document and produces a vector representing the counts for each token. When we issue a lexical search query, we will get back the documents whose sparse vectors have a higher count for the tokens in our query.

SPLADE is a learned alternative that expands terms—so a document about "dogs" might also get weight on "puppy" and "canine," helping bridge the synonym gap that pure lexical search misses.
* If your data contains lots of exact identifiers that must match precisely — SKUs, legal citations, chemical formulas — BM25 is straightforward and effective. 
* If you want lexical search that's more forgiving of vocabulary mismatches, SPLADE can help.

## Collections in your Chroma Database

A Chroma collection indexes records using a specific embedding model and configuration. Whether your records live in one Chroma collection or many depends on your application's access patterns and data types.

**Use a single collection when**:
* You are using the same embedding model for all of your data.
* You want to search across everything at once.
* You can distinguish between records using metadata filtering.

**Use multiple collections when**:
* You have different types of data, requiring different embedding models. For example, you have text data and images, which are embedded using different models.
* You have multi-tenant requirements. In this case, establishing a collection per user or organization helps you avoid filtering overhead at query time.

## Chunking Data

Chunking is the process of breaking source data into smaller, meaningful units (“chunks”) that are embedded and stored as individual records in a Chroma collection. Because embedding models operate on limited context windows and produce a single vector per input, storing entire documents as one record often blurs multiple ideas together and reduces retrieval quality. Chunking allows Chroma to index information at the level users actually search for—paragraphs, sections, functions, or messages—improving both recall and precision. Well-chosen chunks ensure that retrieved results are specific, semantically coherent, and useful on their own, while still allowing larger context to be reconstructed through metadata when needed.

{% Banner type="tip" %}
To learn more about chunking best practices, see our [Chunking Guide](./chunking)
{% /Banner %}

Chroma is flexible enough to support nearly any chunking strategy so long as each chunk fits in 16kB.  This is also the best way to work with large documents, regardless of performance concerns.

When adding chunks to your collection, we recommend using batch operations. Batching increases the number of items sent per operation, acting as a throughput multiplier.  Going
from one vector to two will generally double the number of vectors per second with diminishing
returns as the batch size increases.  Chroma Cloud allows ingesting up to 300 vectors per batch.

{% TabbedCodeBlock %}

{% Tab label="python" %}
```python
# Instead of
for chunk in chunks:
    collection.add(
        ids=[chunk.id],
        documents=[chunk.document],
        metadatas=[chunk.metadata]
    )

# Use batching
BATCH_SIZE = 300
for i in range(0, len(chunks), BATCH_SIZE):
    batch = chunks[i:i + BATCH_SIZE]
    collection.add(
        ids=[chunk.id for chunk in batch],
        documents=[chunk.document for chunk in batch],
        metadatas=[chunk.metadata for chunk in batch]
    )
```
{% /Tab %}

{% Tab label="typescript" %}
```typescript
// Instead of
for (const chunk of chunks) {
    await collection.add({
        ids: [chunk.id],
        documents: [chunk.document],
        metadatas: [chunk.metadata]
    })
}

// Use batching
const BATCH_SIZE = 300;
for (let i = 0; i < chunks.length; i += BATCH_SIZE) {
    const batch = chunks.slice(i, i + BATCH_SIZE);
    await collection.add({
        ids: batch.map((chunk) => chunk.id),
        documents: batch.map((chunk) => chunk.document),
        metadatas: batch.map((chunk) => chunk.metadata)
    });
}
```
{% /Tab %}

{% /TabbedCodeBlock %}

Finally, issuing concurrent requests to the same collection will allow for even more throughput.
Internally, requests are batched to give better performance than would be seen issuing requests individually.
This batching happens automatically and to greater numbers than the 300 vectors per batch permitted
by default.  Every Chroma Cloud user can issue up to 10 concurrent requests.

## Metadata

Metadata lets you attach structured information to each chunk, which serves two purposes: filtering at query time and providing context to the LLM.

For filtering, metadata lets you narrow searches without relying on semantic similarity. You might filter by source type (only search FAQs, not legal disclaimers), by date (only recent documents), by author or department, or by access permissions (only return chunks the user is allowed to see). This is often more reliable than hoping the embedding captures these distinctions.

Metadata is also returned with search results, which means you can pass it to the LLM alongside the chunk text.
Knowing that a chunk came from "Q3 2024 Financial Report, page 12" or "authored by the legal team" helps the LLM interpret the content and cite sources accurately.

When designing your schema, think about what filters you'll need at query time and what context would help the LLM make sense of each chunk.