# Langroid

> Yes, see this note on [reasoning-content](https://langroid.github.io/langroid/notes/reasoning-content/).

---

# Frequently Asked Questions

## Can I view the reasoning (thinking) text when using a Reasoning LLM like R1 or o1?

Yes, see this note on [reasoning-content](https://langroid.github.io/langroid/notes/reasoning-content/).


## Does Langroid work with non-OpenAI LLMs?

Yes! Langroid works with practically any LLM, local or remote, closed or open.

See these two guides:

- [Using Langroid with local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/)
- [Using Langroid with non-OpenAI proprietary LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/)

## Where can I find out about Langroid's architecture?

There are a few documents that can help:

- A work-in-progress [architecture description](https://langroid.github.io/langroid/blog/2024/08/15/overview-of-langroids-multi-agent-architecture-prelim/)
  on the Langroid blog.
- The Langroid [Getting Started](https://langroid.github.io/langroid/quick-start/) guide walks you 
  step-by-step through Langroid's features and architecture.
- An article by LanceDB on [Multi-Agent Programming with Langroid](https://lancedb.substack.com/p/langoid-multi-agent-programming-framework)

## How can I limit the number of output tokens generated by the LLM?

You can set the `max_output_tokens` parameter in the `LLMConfig` class,
or more commonly, the `OpenAIGPTConfig` class, which is a subclass of `LLMConfig`,
for example:

```python
import langroid as lr
import langroid.language_models as lm

llm_config = lm.OpenAIGPTConfig(
    chat_model="openai/gpt-3.5-turbo",
    max_output_tokens=100, # limit output to 100 tokens
)
agent_config = lr.ChatAgentConfig(
    llm=llm_config,
    # ... other configs
)
agent = lr.ChatAgent(agent_config)
```

Then every time the agent's `llm_response` method is called, the LLM's output 
will be limited to this number of tokens.

If you omit the `max_output_tokens`, it defaults to 8192. If you wish **not** to 
limit the output tokens, you can set `max_output_tokens=None`, in which case 
Langroid uses the model-specific maximum output tokens from the 
[`langroid/language_models/model_info.py`](https://github.com/langroid/langroid/blob/main/langroid/language_models/model_info.py) file
(specifically the `model_max_output_tokens` property of `LLMConfig`).
Note however that this model-specific may be quite large, so you would generally 
want to either omit setting `max_output_tokens` (which defaults to 8192), or set it
another desired value.


## How langroid handles long chat histories

You may encounter an error like this:

```
Error: Tried to shorten prompt history but ... longer than context length
```

This might happen when your chat history bumps against various limits.
Here is how Langroid handles long chat histories. Ultimately the LLM API is invoked with two key inputs:
the message history $h$, and the desired output length $n$ (defaults to the `max_output_tokens` in the 
`ChatAgentConfig`). These inputs are determined as follows (see the `ChatAgent._prep_llm_messages` method):

- let $H$ be the current message history, and $M$ be the value of `ChatAgentConfig.max_output_tokens`, and $C$ be 
  the context-length of the LLM.
- If $\text{tokens}(H) + M \leq C$, then langroid uses $h = H$ and $n = M$, since there is enough room to fit both the 
  actual chat history as well as the desired max output length.
- If $\text{tokens}(H) + M > C$, this means the context length is too small to accommodate the message history $H$ 
  and 
  the desired output length $M$. Then langroid tries to use a _shortened_ output length $n' = C - \text{tokens}(H)$, 
  i.e. the output is effectively _truncated_ to fit within the context length. 
    - If $n'$ is at least equal to `min_output_tokens` $m$ (default 10), langroid proceeds with $h = H$ and $n=n'$.
    - otherwise, this means that the message history $H$ is so long that the remaining space in the LLM's 
      context-length $C$ is unacceptably small (i.e. smaller than the minimum output length $m$). In this case,
      Langroid tries to shorten the message history by dropping early messages, and updating the message history $h$ as 
      long as $C - \text{tokens}(h) <  m$, until there are no more messages to drop (it will not drop the system 
      message or the last message, which is a user message), and throws the error mentioned above. 

If you are getting this error, you will want to check whether:

- you have set the `chat_context_length` too small, if you are setting it manually
- you have set the `max_output_tokens` too large
- you have set the `min_output_tokens` too large

If these look fine, then the next thing to look at is whether you are accumulating too much context into the agent 
history, for example retrieved passages (which can be very long) in a RAG scenario. One common case is when a query 
$Q$ is being answered using RAG, the retrieved passages $P$ are added to $Q$ to create a (potentially very long) prompt 
like 
> based on the passages P, answer query Q

Once the LLM returns an answer (if appropropriate for your context), you should avoid retaining the passages $P$ in the 
agent history, i.e. the last user message should be simply $Q$, rather than the prompt above. This functionality is exactly what you get when you 
use `ChatAgent._llm_response_temp_context`, which is used by default in the `DocChatAgent`. 

Another way to keep chat history tokens from growing too much is to use the `llm_response_forget` method, which 
erases both the query and response, if that makes sense in your scenario.

## How can I handle large results from Tools?

As of version 0.22.0, Langroid allows you to control the size of tool results
by setting [optional parameters](https://langroid.github.io/langroid/notes/large-tool-results/) 
in a `ToolMessage` definition.

## Can I handle a tool without running a task?

Yes, if you've enabled an agent to both _use_ (i.e. generate) and _handle_ a tool. 
See the `test_tool_no_task` for an example of this. The `NabroskiTool` is enabled
for the agent, and to get the agent's LLM to generate the tool, you first do 
something like:
```python
response = agent.llm_response("What is Nabroski of 1 and 2?")
```
Now the `response` is a `ChatDocument` that will contain the JSON for the `NabroskiTool`.
To _handle_ the tool, you will need to call the agent's `agent_response` method:

```python
result = agent.agent_response(response)
```

When you wrap the agent in a task object, and do `task.run()` the above two steps are done for you,
since Langroid operates via a loop mechanism, see docs 
[here](https://langroid.github.io/langroid/quick-start/multi-agent-task-delegation/#task-collaboration-via-sub-tasks).
The *advantage* of using `task.run()` instead of doing this yourself, is that this method
ensures that tool generation errors are sent back to the LLM so it retries the generation.

## OpenAI Tools and Function-calling support

Langroid supports OpenAI tool-calls API as well as OpenAI function-calls API.
Read more [here](https://github.com/langroid/langroid/releases/tag/0.7.0).

Langroid has always had its own native tool-calling support as well, 
which works with **any** LLM -- you can define a subclass of `ToolMessage` (pydantic based) 
and it is transpiled into system prompt instructions for the tool. 
In practice, we don't see much difference between using this vs OpenAI fn-calling. 
Example [here](https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-simple.py).
Or search for `ToolMessage` in any of the `tests/` or `examples/` folders.

## Some example scripts appear to return to user input immediately without handling a tool.

This is because the `task` has been set up with `interactive=True` 
(which is the default). With this setting, the task loop waits for user input after
either the `llm_response` or `agent_response` (typically a tool-handling response) 
returns a valid response. If you want to progress through the task, you can simply 
hit return, unless the prompt indicates that the user needs to enter a response.

Alternatively, the `task` can be set up with `interactive=False` -- with this setting,
the task loop will _only_ wait for user input when an entity response (`llm_response` 
or `agent_response`) _explicitly_ addresses the user. Explicit user addressing can
be done using either:

- an orchestration tool, e.g. `SendTool` (see details in
the release notes for [0.9.0](https://github.com/langroid/langroid/releases/tag/0.9.0)), an example script is the [multi-agent-triage.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-triage.py), or 
- a special addressing prefix, see the example script [1-agent-3-tools-address-user.py](https://github.com/langroid/langroid/blob/main/examples/basic/1-agent-3-tools-address-user.py)


## Can I specify top_k in OpenAIGPTConfig (for LLM API calls)?

No; Langroid currently only supports parameters accepted by OpenAI's API, and `top_k` is _not_ one of them. See:

- [OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat/create)
- [Discussion on top_k, top_p, temperature](https://community.openai.com/t/temperature-top-p-and-top-k-for-chatbot-responses/295542/5)
- [Langroid example](https://github.com/langroid/langroid/blob/main/examples/basic/fn-call-local-numerical.py) showing how you can set other OpenAI API parameters, using the `OpenAICallParams` object.


## Can I persist agent state across multiple runs?

For example, you may want to stop the current python script, and 
run it again later, resuming your previous conversation.
Currently there is no built-in Langroid mechanism for this, but you can 
achieve a basic type of persistence by saving the agent's `message_history`:

-  if you used `Task.run()` in your script, make sure the task is 
set up with `restart=False` -- this prevents the agent state from being reset when 
the task is run again.
- using python's pickle module, you can save the `agent.message_history` to a file,
and load it (if it exists) at the start of your script.

See the example script [`chat-persist.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-persist.py)

For more complex persistence, you can take advantage of the `GlobalState`,
where you can store message histories of multiple agents indexed by their name.
Simple examples of `GlobalState` are in the [`chat-tree.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py) example, 
and the [`test_global_state.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_global_state.py) test.

## Is it possible to share state between agents/tasks?

The above-mentioned `GlobalState` mechanism can be used to share state between 
agents/tasks. See the links mentioned in the previous answer.

## How can I suppress LLM output?

You can use the `quiet_mode` context manager for this, see 
[here](https://langroid.github.io/langroid/notes/quiet-mode/)

## How can I deal with LLMs (especially weak ones) generating bad JSON in tools?

Langroid already attempts to repair bad JSON (e.g. unescaped newlines, missing quotes, etc)  
using the [json-repair](https://github.com/mangiucugna/json_repair) library and other
custom methods, before attempting to parse it into a `ToolMessage` object.
However this type of repair may not be able to handle all edge cases of bad JSON 
from weak LLMs. There are two existing ways to deal with this, and one coming soon:

- If you are defining your own `ToolMessage` subclass, considering deriving it instead
  from `XMLToolMessage` instead, see the [XML-based Tools](https://langroid.github.io/langroid/notes/xml-tools/)
- If you are using an existing Langroid `ToolMessage`, e.g. `SendTool`, you can 
  define your own subclass of `SendTool`, say `XMLSendTool`,
  inheriting from both `SendTool` and `XMLToolMessage`; see this 
  [example](https://github.com/langroid/langroid/blob/main/examples/basic/xml_tool.py)
- Coming soon: strict decoding to leverage the Structured JSON outputs supported by OpenAI
  and open LLM providers such as `llama.cpp` and `vllm`.

The first two methods instruct the LLM to generate XML instead of JSON,
and any field that is designated with a `verbatim=True` will be enclosed 
within an XML `CDATA` tag, which does *not* require any escaping, and can
be far more reliable for tool-use than JSON, especially with weak LLMs.

## How can I handle an LLM "forgetting" to generate a `ToolMessage`? 

Sometimes the LLM (especially a weak one) forgets to generate a 
[`ToolMessage`][langroid.agent.tool_message.ToolMessage]
(either via OpenAI's tools/functions API, or via Langroid's JSON/XML Tool mechanism),
despite being instructed to do so. There are a few remedies Langroid offers for this:

**Improve the instructions in the `ToolMessage` definition:**

- Improve instructions in the `purpose` field of the `ToolMessage`.
- Add an `instructions` class-method to the `ToolMessage`, as in the
  [`chat-search.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-search.py) script:

```python
@classmethod
def instructions(cls) -> str:
    return """
        IMPORTANT: You must include an ACTUAL query in the `query` field,
        """
```
  These instructions are meant to be general instructions on how to use the tool
  (e.g. how to set the field values), not to specifically about the formatting.

- Add a `format_instructions` class-method, e.g. like the one in the 
  [`chat-multi-extract-3.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-3.py) 
  example script.

```python
@classmethod
def format_instructions(cls, tool: bool = True) -> str:
    instr = super().format_instructions(tool)
    instr += """
    ------------------------------
    ASK ME QUESTIONS ONE BY ONE, to FILL IN THE FIELDS 
    of the `lease_info` function/tool.
    First ask me for the start date of the lease.
    DO NOT ASK ANYTHING ELSE UNTIL YOU RECEIVE MY ANSWER.
    """
    return instr
```

**Override the `handle_message_fallback` method in the agent:**

This method is called when the Agent's `agent_response` method receives a non-tool
message as input. The default behavior of this method is to return None, but it
is very useful to override the method to handle cases where the LLM has forgotten
to use a tool. You can define this method to return a "nudge" to the LLM
telling it that it forgot to do a tool-call, e.g. see how it's done in the 
example script [`chat-multi-extract-local.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-local.py):

```python
class LeasePresenterAgent(ChatAgent):
    def handle_message_fallback(
        self, msg: str | ChatDocument
    ) -> str | ChatDocument | None:
        """Handle scenario where Agent failed to present the Lease JSON"""
        if isinstance(msg, ChatDocument) and msg.metadata.sender == Entity.LLM:
            return """
            You either forgot to present the information in the JSON format
            required in `lease_info` JSON specification,
            or you may have used the wrong name of the tool or fields.
            Try again.
            """
        return None
```

Note that despite doing all of these, the LLM may still fail to generate a `ToolMessage`.
In such cases, you may want to consider using a better LLM, or an up-coming Langroid
feature that leverages **strict decoding** abilities of specific LLM providers
(e.g. OpenAI, llama.cpp, vllm) that are able to use grammar-constrained decoding
to force the output to conform to the specified structure.

Langroid also provides a simpler mechanism to specify the action to take
when an LLM does not generate a tool, via the `ChatAgentConfig.handle_llm_no_tool` 
config parameter, see the 
[docs](https://langroid.github.io/langroid/notes/handle-llm-no-tool/).

## Can I use Langroid to converse with a Knowledge Graph (KG)?

Yes, you can use Langroid to "chat with" either a Neo4j or ArangoDB KG, 
see docs [here](https://langroid.github.io/langroid/notes/knowledge-graphs/)

## How can I improve `DocChatAgent` (RAG) latency?

The behavior of `DocChatAgent` can be controlled by a number of settings in 
the `DocChatAgentConfig` class.
The top-level query-answering method in `DocChatAgent` is `llm_response`, which use the 
`answer_from_docs` method. At a high level, the response to an input message involves
the following steps:

- **Query to StandAlone:** LLM rephrases the query as a stand-alone query. 
   This can incur some latency. You can 
    turn it off by setting `assistant_mode=True` in the `DocChatAgentConfig`.
- **Retrieval:** The most relevant passages (chunks) are retrieved using a collection of semantic/lexical 
      similarity searches and ranking methods. There are various knobs in `DocChatAgentConfig` to control
      this retrieval.
- **Relevance Extraction:** LLM is used to retrieve verbatim relevant portions from
  the retrieved chunks. This is typically the biggest latency step. You can turn it off
  by setting the `relevance_extractor_config` to None in `DocChatAgentConfig`.
- **Answer Generation:** LLM generates answer based on retrieved passages.


See the [`doc-aware-chat.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/doc-aware-chat.py)
example script, which illustrates some of these settings.

In some scenarios you want to *only* use the **retrieval** step of a `DocChatAgent`.
For this you can use the [`RetrievalTool`][langroid.agent.tools.retrieval_tool.RetrievalTool].
See the `test_retrieval_tool` in 
[`test_doc_chat_agent.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_doc_chat_agent.py).
to learn how to use it. The above example script uses `RetrievalTool` as well.

## Is there support to run multiple tasks concurrently?

Yes, see the `run_batch_tasks` and related functions in 
[batch.py](https://github.com/langroid/langroid/blob/main/langroid/agent/batch.py).

See also:

- tests: [test_batch.py](https://github.com/langroid/langroid/blob/main/tests/main/test_batch.py),
   [test_relevance_extractor.py](https://github.com/langroid/langroid/blob/main/tests/main/test_relevance_extractor.py),
- example: [multi-agent-round-table.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-round-table.py)

Another example is within 
[`DocChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/doc_chat_agent.py), 
which uses batch tasks for relevance extraction,
see the `get_verbatim_extracts` method -- when there are k relevant passages,
this runs k tasks concurrently, 
each of which uses an LLM-agent to extract relevant verbatim text from a passage.

## Can I use Langroid in a FastAPI server?

Yes, see the [langroid/fastapi-server](https://github.com/langroid/fastapi-server) repo.

## Can a sub-task end all parent tasks and return a result?

Yes, there are two ways to achieve this, using [`FinalResultTool`][langroid.agent.tools.orchestration.final_result_tool.FinalResultTool]:

From a `ChatAgent`'s tool-handler or `agent_response` method: Your code can return a 
`FinalResultTool` with arbitrary field types; this ends the current and all parent tasks and this  
`FinalResultTool` will appear as one of tools in the final `ChatDocument.tool_messages`.
See `test_tool_handlers_and_results` in 
[test_tool_messages.py](https://github.com/langroid/langroid/blob/main/tests/main/test_tool_messages.py), 
and [examples/basic/chat-tool-function.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tool-function.py)


From `ChatAgent`'s `llm_response` method: you can define a subclass of a 
`FinalResultTool` and enable the agent to use this tool, which means it will become
available for the LLM to generate. 
See [examples/basic/multi-agent-return-result.py](https://github.com/langroid/langroid/blob/main/examples/basic/multi-agent-return-result.py).

## How can I configure a task to retain or discard prior conversation?

In some scenarios, you may want to control whether each time you call a task's `run` 
method, the underlying agent retains the conversation history from the previous run.
There are two boolean config parameters that control this behavior: 

- the `restart` parameter (default `True`) in the `Task` constructor, and
- the `restart_as_subtask` (default `False`) parameter in the `TaskConfig` argument of the `Task` constructor.

To understand how these work, consider a simple scenario of a task `t` that has a 
subtask `t1`, e.g., suppose you have the following code with default settings 
of the `restart` and `restart_as_subtask` parameters:

```python
from langroid.agent.task import Task
from langroid.agent.task import TaskConfig

# default setttings:
rs = False
r = r1 = True

agent = ...
task_config = TaskConfig(restart_as_subtask=rs) 
t = Task(agent, restart=r, config=task_config)

agent1 = ...
t1 = Task(agent1, restart=r1, config=task_config)
t.add_subtask(t1)
```

This default setting works as follows:
Since task `t` was constructed with the default `restart=True`, when `t.run()` is called, the conversation histories of the agent underlying `t` as well as all 
those of all subtasks (such as `t1`) are reset. However, if during `t.run()`,
there are multiple calls to `t1.run()`, then the conversation history is retained across these calls, even though `t1` was constructed with the default `restart=True` --
this is because the `restart` constructor parameter has no effect on a task's reset
behavior **when it is a subtask**. 

The `TaskConfig.restart_as_subtask` parameter
controls the reset behavior of a task's `run` method when invoked as a subtask.
It defaults to `False`, which is why in the above example, the conversation history
of `t1` is retained across multiple calls to `t1.run()` that may occur
during execution of `t.run()`. If you set this parameter to `True` in the above
example, then the conversation history of `t1` would be reset each time `t1.run()` is called, during a call to `t.run()`.

To summarize, 

- The `Task` constructor's `restart` parameter controls the reset behavior of the task's `run` method when it is called directly, not as a subtask.
- The `TaskConfig.restart_as_subtask` parameter controls the reset behavior of the task's `run` method when it is called as a subtask.

These settings can be mixed and matched as needed.

Additionally, all reset behavior can be turned off during a specific `run()` invocation
by calling it with `allow_restart=False`, e.g.,  `t.run(..., allow_restart=False)`.

## How can I set up a task to exit as soon as the LLM responds?

In some cases you may want the top-level task or a subtask to exit as soon as the LLM responds. You can get this behavior by setting `single_round=True` during task construction, e.g.,

```python
from langroid.agent.task import Task

agent = ...
t = Task(agent, single_round=True, interactive=False)

result = t.run("What is 4 + 5?")
```

The name `single_round` comes from the fact that the task loop ends as soon as 
any **one** of the agent's responders return a valid response. Recall that an 
agent's responders are `llm_response`, `agent_response` (for tool handling), and `user_response` (for user input). In the above example there are no tools and no 
user interaction (since `interactive=False`), so the task will exit as soon as the LLM responds.

More commonly, you may only want this single-round behavior for a subtask, e.g.,

```python
agent = ...
t = Task(agent, single_round=False, interactive=True)

agent1 = ...
t1 = Task(agent1, single_round=True, interactive=False)

t.add_subtask(t1)
top_level_query = ...
result = t.run(...)
```

See the example script [`chat-2-agent-discuss.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-2-agent-discuss.py) for an example of this, and also search for `single_round` in the rest of the examples.

!!! warning "Using `single_round=True` will prevent tool-handling"
    As explained above, setting `single_round=True` will cause the task to exit as soon as the LLM responds, and thus if it emits a valid tool (which the agent is enabled to handle), this tool will *not* be handled.

---

---
title: 'Language Models: Completion and Chat-Completion'
draft: false
date: 2023-09-19
authors: 
  - pchalasani
categories:
  - langroid
  - llm
  - local-llm
  - chat
comments: true
---

Transformer-based language models are fundamentally next-token predictors, so 
naturally all LLM APIs today at least provide a completion endpoint. 
If an LLM is a next-token predictor, how could it possibly be used to 
generate a response to a question or instruction, or to engage in a conversation with 
a human user? This is where the idea of "chat-completion" comes in.
This post is a refresher on the distinction between completion and chat-completion,
and some interesting details on how chat-completion is implemented in practice.

<!-- more -->

## Language Models as Next-token Predictors

A Language Model is essentially a "next-token prediction" model,
and so all LLMs today provide a "completion" endpoint, typically something like:
`/completions` under the base URL.

The endpoint simply takes a prompt and returns a completion (i.e. a continuation).

A typical prompt sent to a completion endpoint might look like this:
```
The capital of Belgium is 
```
and the LLM will return a completion like this:
```
Brussels.
```
OpenAI's GPT3 is an example of a pure completion LLM.
But interacting with a completion LLM is not very natural or useful:
you cannot give instructions or ask questions; instead you would always need to 
formulate your input as a prompt whose natural continuation is your desired output.
For example, if you wanted the LLM to highlight all proper nouns in a sentence,
you would format it as the following prompt:

**Chat-To-Prompt Example:** Chat/Instruction converted to a completion prompt.

```
User: here is a sentence, the Assistant's task is to identify all proper nouns.
     Jack lives in Bosnia, and Jill lives in Belgium.
Assistant:    
```
The natural continuation of this prompt would be a response listing the proper nouns,
something like:
```
John, Bosnia, Jill, Belgium are all proper nouns.
```

This _seems_ sensible in theory, but a "base" LLM that performs well on completions
may _not_ perform well on these kinds of prompts. The reason is that during its training, it may not
have been exposed to very many examples of this type of prompt-response pair.
So how can an LLM be improved to perform well on these kinds of prompts?

## Instruction-tuned, Aligned LLMs 

This brings us to the heart of the innovation behind the wildly popular ChatGPT:
it uses an enhancement of GPT3 that (besides having a lot more parameters),
was _explicitly_ fine-tuned on instructions (and dialogs more generally) -- this is referred to
as **instruction-fine-tuning** or IFT for short. In addition to fine-tuning instructions/dialogs,
the models behind ChatGPT (i.e., GPT-3.5-Turbo and GPT-4) are further tuned to produce
responses that _align_ with human preferences (i.e. produce responses that are more helpful and safe),
using a procedure called Reinforcement Learning with Human Feedback (RLHF).
See this [OpenAI InstructGPT Paper](https://arxiv.org/pdf/2203.02155.pdf) for details on these techniques and references to the 
original papers that introduced these ideas. Another recommended read is Sebastian 
Raschka's post on [RLHF and related techniques](https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives). 

For convenience, we refer to the combination of IFT and RLHF as **chat-tuning**.
A chat-tuned LLM can be expected to perform well on prompts such as the one in 
the Chat-To-Prompt Example above. These types of prompts are still unnatural, however, 
so as a convenience, chat-tuned LLM API servers also provide a "chat-completion" 
endpoint (typically `/chat/completions` under the base URL), which allows the user
to interact with them in a natural dialog, which might look like this
(the portions in square brackets are indicators of who is generating the text):

```
[User] What is the capital of Belgium?
[Assistant] The capital of Belgium is Brussels.
```
or
```
[User] In the text below, find all proper nouns:
    Jack lives in Bosnia, and Jill lives in Belgium.
[Assistant] John, Bosnia, Jill, Belgium are all proper nouns.
[User] Where does John live?
[Assistant] John lives in Bosnia.
```

## Chat Completion Endpoints: under the hood

How could this work, given that LLMs are fundamentally next-token predictors?
This is a convenience provided by the LLM API service (e.g. from OpenAI or
local model server libraries):
when a user invokes the chat-completion endpoint (typically
at `/chat/completions` under the base URL), under the hood, the server converts the
instructions and multi-turn chat history into a single string, with annotations indicating
user and assistant turns, and ending with something like "Assistant:"
as in the Chat-To-Prompt Example above.

Now the subtle detail to note here is this:

>It matters _how_ the
dialog (instructions plus chat history) is converted into a single prompt string.
Converting to a single prompt by simply concatenating the
instructions and chat history using an "intuitive" format (e.g. indicating
user, assistant turns using "User", "Assistant:", etc.) _can_ work,
however most local LLMs are trained on a _specific_ prompt format.
So if we format chats in a different way, we may get odd/inferior results.

## Converting Chats to Prompts: Formatting Rules

For example, the llama2 models are trained on a format where the user's input is bracketed within special strings `[INST]`
and `[/INST]`. There are other requirements that we don't go into here, but
interested readers can refer to these links:

- A reddit thread on the [llama2 formats](https://www.reddit.com/r/LocalLLaMA/comments/155po2p/get_llama_2_prompt_format_right/)
- Facebook's [llama2 code](https://github.com/facebookresearch/llama/blob/main/llama/generation.py#L44)
- Langroid's [llama2 formatting code](https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/llama2_formatter.py)

A dialog fed to a Llama2 model in its expected prompt format would look like this:

```
<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Hi there! 
[/INST] 
Hello! How can I help you today? </s>
<s>[INST] In the text below, find all proper nouns:
    Jack lives in Bosnia, and Jill lives in Belgium.
 [/INST] 
John, Bosnia, Jill, Belgium are all proper nouns. </s><s> 
[INST] Where does Jack live? [/INST] 
Jack lives in Bosnia. </s><s>
[INST] And Jill? [/INST]
Jill lives in Belgium. </s><s>
[INST] Which are its neighboring countries? [/INST]
```

This means that if an LLM server library wants to provide a chat-completion endpoint for
a local model, it needs to provide a way to convert chat history to a single prompt
using the specific formatting rules of the model.
For example the [`oobabooga/text-generation-webui`](https://github.com/oobabooga/text-generation-webui) 
library has an extensive set of chat formatting [templates](https://github.com/oobabooga/text-generation-webui/tree/main/instruction-templates)
for a variety of models, and their model server auto-detects the
format template from the model name.

!!! note "Chat completion model names: look for 'chat' or 'instruct' in the name"
    You can search for a variety of models on the [HuggingFace model hub](https://huggingface.co/models).
    For example if you see a name `Llama-2-70B-chat-GGUF` you know it is chat-tuned.
    Another example of a chat-tuned model is `Llama-2-7B-32K-Instruct` 
    
A user of these local LLM server libraries thus has two options when using a 
local model in chat mode:

- use the _chat-completion_ endpoint, and let the underlying library handle the chat-to-prompt formatting, or
- first format the chat history according to the model's requirements, and then use the
  _completion_ endpoint

## Using Local Models in Langroid

Local models can be used in Langroid by defining a `LocalModelConfig` object.
More details are in this [tutorial](https://langroid.github.io/langroid/blog/2023/09/14/using-langroid-with-local-llms/), 
but here we briefly discuss prompt-formatting in this context.
Langroid provides a built-in [formatter for LLama2 models](https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/llama2_formatter.py), 
so users looking to use llama2 models with langroid can try either of these options, by setting the
`use_completion_for_chat` flag in the `LocalModelConfig` object
(See the local-LLM [tutorial](https://langroid.github.io/langroid/blog/2023/09/14/using-langroid-with-local-llms/) for details).

When this flag is set to `True`, the chat history is formatted using the built-in 
Langroid llama2 formatter and the completion endpoint are used. When the flag is set to `False`, the chat 
history is sent directly to the chat-completion endpoint, which internally converts the 
chat history to a prompt in the expected llama2 format.

For local models other than Llama2, users can either:

- write their own formatters by writing a class similar to `Llama2Formatter` and 
then setting the `use_completion_for_chat` flag to `True` in the `LocalModelConfig` object, or
- use an LLM server library (such as the `oobabooga` library mentioned above) that provides a chat-completion endpoint, 
_and converts chats to single prompts under the hood,_ and set the
  `use_completion_for_chat` flag to `False` in the `LocalModelConfig` object.

You can use a similar approach if you are using an LLM application framework other than Langroid.


<iframe src="https://langroid.substack.com/embed" width="480" height="320" style="border:1px solid #EEE; background:white;" frameborder="0" scrolling="no"></iframe>

---

---
title: "Overview of Langroid's Multi-Agent Architecture (prelim)"
draft: false
date: 2024-08-15
authors:
- pchalasani
- nils
- jihye
- someshjha
categories:
- langroid
- multi-agent
- llm
comments: true
---


## Agent, as an intelligent message transformer

A natural and convenient abstraction in designing a complex
LLM-powered system is the notion of an *agent* that is instructed to be responsible for a specific aspect of the 
overall task. In terms of code, an *Agent* is essentially a class representing an intelligent entity that can 
respond to *messages*, i.e., an agent is simply a *message transformer*.
An agent typically encapsulates an (interface to an) LLM, and may also be equipped with so-called *tools* (as 
described below) and *external documents/data* (e.g., via a vector database, as described below).
Much like a team of humans, agents interact by exchanging messages, in a manner reminiscent of the 
[*actor framework*](https://en.wikipedia.org/wiki/Actor_model) in programming languages.
An *orchestration mechanism* is needed to manage the flow of messages between agents, to ensure that progress is 
made towards completion of the task, and to handle the inevitable cases where an agent deviates from instructions.
Langroid is founded on this *multi-agent programming* paradigm, where agents are 
first-class citizens, acting as message transformers, and communicate by exchanging messages.

<!-- more -->

To build useful applications with LLMs, we need to endow them with the ability to
trigger actions (such as API calls, computations, database queries, etc) or send structured messages to other agents 
or downstream processes. *Tools* provide these capabilities, described next.

## Tools, also known as functions

An LLM is essentially a text transformer; i.e.,  in response to some input text, 
it produces a text response. Free-form text responses are ideal when we want to generate a description, answer, or summary for human consumption, or even a question for another agent to answer.
However, in some cases, we would like the responses to be more structured, for example 
to trigger external *actions* (such as an API call, code execution, or a database query),
or for unambiguous/deterministic handling by a downstream process or another agent. 
In such cases, we would instruct the LLM to produce a *structured* output, typically in JSON format, with various 
pre-specified fields, such as code, an SQL query, parameters of an API call, and so on. These structured responses 
have come to be known as *tools*, and the LLM is said to *use* a tool when it produces a structured response 
corresponding to a specific tool. To elicit a tool response from an LLM, it needs to be instructed on the expected tool format and the conditions under which it should use the tool.
To actually use a tool emitted by an LLM, a *tool handler* method must be defined as well.
The tool handler for a given tool is triggered when it is recognized in the LLM's response.

### Tool Use: Example

As a simple example, a SQL query tool can be specified as a JSON structure with a `sql` 
field (containing the SQL query) and a `db` field (containing the name of the database).
The LLM may be instructed with a system prompt of the form:
> When the user asks a question about employees, use the SQLTool described in the below schema,
> and the results of this tool will be sent back to you, and you can use these to respond to
> the user's question, or correct your SQL query if there is a syntax error.

The tool handler would detect this specific tool in the LLM's response, parse this JSON structure, 
extract the `sql` and `db` fields, run the query on the specified database, 
and return the result if the query ran successfully, otherwise return an error message.
Depending on how the multi-agent system is organized, the query result or error message may be handled by the same agent
(i.e., its LLM), which may either summarize the results in narrative form, or revise the query if the error message 
indicates a syntax error.

## Agent-oriented programming: Function-Signatures

If we view an LLM as a function with signature `string -> string`,
it is possible to express the concept of an agent, tool, and other constructs
in terms of derived function signatures, as shown in the table below.
Adding `tool` (or function calling) capability to an LLM requires a parser (that recognizes 
that the LLM has generated a tool) and a callback that performs arbitrary computation and returns a string.
The serialized instances of tools `T` correspond to a language `L`; 
Since by assumption, the LLM is capable of producing outputs in $L$, 
this allows the LLM to express the intention to execute a Callback with arbitrary instances 
of `T`. In the last row, we show how an Agent can be viewed as a function signature
involving its state `S`.


| Function Description | Function Signature                                                                                                |
|----------------------|-------------------------------------------------------------------------------------------------------------------|
| LLM | `[Input Query] -> string` <br> `[Input Query]` is the original query.                                             |
| Chat interface | `[Message History] x [Input Query] -> string` <br> `[Message History]` consists of  previous messages[^1].        |
| Agent | `[System Message] x [Message History] x [Input Query] -> string` <br> `[System Message]` is the system prompt. |
| Agent with tool | `[System Message] x (string -> T) x (T -> string) x [Message History] x [Input Query] -> string`                  |
| Parser with type `T` | `string -> T`                                                                                                     |
| Callback with type `T` | `T -> string`                                                                                                     |
| General Agent with state type `S` | `S x [System Message] x (string -> T) x (S x T -> S x string) x [Message History] x [Input Query] -> S x string`  |

[^1]: Note that in reality, separator tokens are added to distinguish messages, and the messages are tagged with metadata indicating the sender, among other things.

## Multi-Agent Orchestration


### An Agent's "Native" Responders

When building an LLM-based multi-agent system, an orchestration mechanism is critical to manage the flow of messages 
between agents, to ensure task progress, and handle inevitable LLM deviations from instructions. Langroid provides a 
simple yet versatile orchestration mechanism that seamlessly handles:

- user interaction,
- tool handling,
- sub-task delegation

We view an agent as a message transformer; 
it may transform an incoming message using one of its three "native" responder methods, all of which have the same 
function signature: `string -> string`. These methods are:

- `llm_response` returns the LLM's response to the input message.
Whenever this method is invoked, the agent updates its dialog history (typically consisting of alternating user and LLM messages).
- `user_response` prompts the user for input and returns their response.
- `agent_response` by default only handles a `tool message` (i.e., one that contains an llm-generated structured 
response): it performs any requested actions, and returns the result as a string. An `agent_response` method can have 
other uses besides handling tool messages, such as handling scenarios where an LLM ``forgot'' to use a tool, 
or used a tool incorrectly, and so on.

To see why it is useful to have these responder methods, consider first a simple example of creating a basic chat loop
with the user. It is trivial to create such a loop by alternating between `user_response` and `llm_response`. 
Now suppose we instruct the agent to either directly answer the user's question or perform a web-search. Then it is possible that
sometimes the `llm_response` will produce a "tool message", say `WebSearchTool`, which we would handle with the
`agent_response` method. This requires a slightly different, and more involved, way of iterating among the agent's
responder methods. 

### Tasks: Encapsulating Agent Orchestration

From a coding perspective, it is useful to hide the actual iteration logic by wrapping an Agent class
in a separate class, which we call a `Task`, which encapsulates all of the orchestration logic. Users of the Task class
can then define the agent, tools, and any sub-tasks, wrap the agent in a task object of class Task, and simply call
`task.run()`, letting the Task class deal with the details of orchestrating the agent's responder methods, determining
task completion, and invoking sub-tasks.

### Responders in a Task: Agent's native responders and sub-tasks

The orchestration mechanism of a `Task` object works as follows. When a `Task` object is created from an agent, a 
sequence of eligible responders is created, which includes the agent's three "native" responder agents in the sequence:
`agent_response`, `llm_response`, `user_response`. 
The type signature of the task's run method is `string -> string`, just like the Agent's
native responder methods, and this is the key to seamless delegation of tasks to sub-tasks. A list of subtasks can be
added to a `Task` object via `task.add_sub_tasks([t1, t2, ... ])`, where `[t1, t2, ...]` are other 
`Task` objects. The result of this is that the run method of each sub-task is appended to the sequence of eligible 
responders in the parent task object.

### Task Orchestration: Updating the Current Pending Message (CPM)

A task always maintains a *current pending message* (CPM), which is the latest message "awaiting" a valid response 
from a responder, which updates the CPM. 
At a high level the `run` method of a task attempts to repeatedly find a valid response to the 
CPM, until the task is done. (Note that this paradigm is somewhat reminescent of a *Blackboard* architecture, where
agents take turns deciding whether they can update the shared message on the "blackboard".)
This is achieved by repeatedly invoking the `step` method, which represents a "turn" in the conversation.
The `step` method sequentially tries the eligible responders from the beginning of the eligible-responders list, until it
finds a valid response, defined as a non-null or terminating message (i.e. one that signals that the task is done). In
particular, this `step()` algorithm implies that a Task delegates (or "fails over") to a sub-task only if the task's 
native responders have no valid response. 

There are a few simple rules that govern how `step` works: 

- a responder entity (either a sub-task or a native entity -- one of LLM, Agent, or User) cannot 
  respond if it just responded in the previous step (this prevents a responder from "talking to itself". 
- when a response signals that the task is done (via a `DoneTool` or a "DONE" string) the task is ready to exit and 
  return the CPM as the result of the task. 
- when an entity "in charge" of the task has a null response, the task is considered finished and ready to exit.
- if the response of an entity or subtask is a structured message containing a recipient field, then the specified recipient task or entity will
be the only one eligible to respond at the next step.

Once a valid response is found in a step, the CPM is updated to this response, and the next step starts the search for a
valid response from the beginning of the eligible responders list. When a response signals that the task is done, 
the run method returns the CPM as the result of the task. This is a highly
simplified account of the orchestration mechanism, and the actual implementation is more involved.

The above simple design is surprising powerful and can support a wide variety of task structures, including trees and
DAGs. As a simple illustrative example, tool-handling has a natural implementation. The LLM is instructed to use a
certain JSON-structured message as a tool, and thus the `llm_response` method can produce a structured message, such 
as an SQL query.  This structured message is then handled by the `agent_response` method, and the resulting message updates the CPM. The
`llm_response` method then becomes eligible to respond again: for example if the agent's response contains an SQL 
error, the LLM would retry its query, and if the agent's response consists of the query results, the LLM would
respond with a summary of the results.

The Figure below depicts the task orchestration and delegation mechanism,
showing how iteration among responder methods works when a  Task `T` has sub-tasks `[T1, T2]` and `T1` has a 
sub-task `T3`. 


![langroid-arch.png](figures/langroid-arch.png)

---

---
title: 'Langroid: Harness LLMs with Multi-Agent Programming'
draft: false
date: 2023-09-03
authors: 
  - pchalasani
categories:
  - langroid
  - llm
comments: true
---

# Langroid: Harness LLMs with Multi-Agent Programming

## The LLM Opportunity

Given the remarkable abilities of recent Large Language Models (LLMs), there
is an unprecedented opportunity to build intelligent applications powered by
this transformative technology. The top question for any enterprise is: how
best to harness the power of LLMs for complex applications? For technical and
practical reasons, building LLM-powered applications is not as simple as
throwing a task at an LLM-system and expecting it to do it.

<!-- more -->


## Langroid's Multi-Agent Programming Framework

Effectively leveraging LLMs at scale requires a *principled programming
framework*. In particular, there is often a need to maintain multiple LLM
conversations, each instructed in different ways, and "responsible" for
different aspects of a task.


An *agent* is a convenient abstraction that encapsulates LLM conversation
state, along with access to long-term memory (vector-stores) and tools (a.k.a functions
or plugins). Thus a **Multi-Agent Programming** framework is a natural fit
for complex LLM-based applications.

> Langroid is the first Python LLM-application framework that was explicitly
designed  with Agents as first-class citizens, and Multi-Agent Programming
as the core  design principle. The framework is inspired by ideas from the
[Actor Framework](https://en.wikipedia.org/wiki/Actor_model).

Langroid allows an intuitive definition of agents, tasks and task-delegation
among agents. There is a principled mechanism to orchestrate multi-agent
collaboration. Agents act as message-transformers, and take turns responding to (and
transforming) the current message. The architecture is lightweight, transparent,
flexible, and allows other types of orchestration to be implemented.
Besides Agents, Langroid also provides simple ways to directly interact with LLMs and vector-stores.


## Highlights
- **Agents as first-class citizens:** The `Agent` class encapsulates LLM conversation state,
  and optionally a vector-store and tools. Agents are a core abstraction in Langroid;
  Agents act as _message transformers_, and by default provide 3 _responder_ methods, one corresponding to each
  entity: LLM, Agent, User.
- **Tasks:** A Task class wraps an Agent, gives the agent instructions (or roles, or goals),
  manages iteration over an Agent's responder methods,
  and orchestrates multi-agent interactions via hierarchical, recursive
  task-delegation. The `Task.run()` method has the same
  type-signature as an Agent's responder's methods, and this is key to how
  a task of an agent can delegate to other sub-tasks: from the point of view of a Task,
  sub-tasks are simply additional responders, to be used in a round-robin fashion
  after the agent's own responders.
- **Modularity, Reusability, Loose coupling:** The `Agent` and `Task` abstractions allow users to design
  Agents with specific skills, wrap them in Tasks, and combine tasks in a flexible way.
- **LLM Support**: Langroid supports OpenAI LLMs including GPT-3.5-Turbo,
  GPT-4.
- **Caching of LLM prompts, responses:** Langroid by default uses [Redis](https://redis.com/try-free/) for caching.
- **Vector-stores**: [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/), LanceDB, Pinecone, PostgresDB (PGVector), Weaviate are currently supported.
  Vector stores allow for Retrieval-Augmented-Generaation (RAG).
- **Grounding and source-citation:** Access to external documents via vector-stores
  allows for grounding and source-citation.
- **Observability, Logging, Lineage:** Langroid generates detailed logs of multi-agent interactions and
  maintains provenance/lineage of messages, so that you can trace back
  the origin of a message.
- **Tools/Plugins/Function-calling**: Langroid supports OpenAI's recently
  released [function calling](https://platform.openai.com/docs/guides/gpt/function-calling)
  feature. In addition, Langroid has its own native equivalent, which we
  call **tools** (also known as "plugins" in other contexts). Function
  calling and tools have the same developer-facing interface, implemented
  using [Pydantic](https://docs.pydantic.dev/latest/),
  which makes it very easy to define tools/functions and enable agents
  to use them. Benefits of using Pydantic are that you never have to write
  complex JSON specs for function calling, and when the LLM
  hallucinates malformed JSON, the Pydantic error message is sent back to
  the LLM so it can fix it!

<iframe src="https://langroid.substack.com/embed" width="480" height="320" style="border:1px solid #EEE; background:white;" frameborder="0" scrolling="no"></iframe>

---

---
title: 'Langroid: Knolwedge Graph RAG powered by Neo4j'
draft: false
date: 2024-01-18
authors: 
  - mohannad
categories:
  - langroid
  - neo4j
  - rag
  - knowledge-graph
comments: true
---

## "Chat" with various sources of information
LLMs are increasingly being used to let users converse in natural language with 
a variety of types of data sources:
<!-- more -->
- unstructured text documents: a user's query is augmented with "relevant" documents or chunks
  (retrieved from an embedding-vector store) and fed to the LLM to generate a response -- 
  this is the idea behind Retrieval Augmented Generation (RAG).
- SQL Databases: An LLM translates a user's natural language question into an SQL query,
  which is then executed by another module, sending results to the LLM, so it can generate
  a natural language response based on the results.
- Tabular datasets: similar to the SQL case, except instead of an SQL Query, the LLM generates 
  a Pandas dataframe expression.

Langroid has had specialized Agents for the above scenarios: `DocChatAgent` for RAG with unstructured
text documents, `SQLChatAgent` for SQL databases, and `TableChatAgent` for tabular datasets.

## Adding support for Neo4j Knowledge Graphs

Analogous to the SQLChatAgent, Langroid now has a 
[`Neo4jChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/neo4j/neo4j_chat_agent.py) 
to interact with a Neo4j knowledge graph using natural language.
This Agent has access to two key tools that enable it to handle a user's queries:

- `GraphSchemaTool` to get the schema of a Neo4j knowledge graph.
- `CypherRetrievalTool` to generate Cypher queries from a user's query.
Cypher is a specialized query language for Neo4j, and even though it is not as widely known as SQL,
most LLMs today can generate Cypher Queries.

Setting up a basic Neo4j-based RAG chatbot is straightforward. First ensure 
you set these environment variables (or provide them in a `.env` file):
```bash
NEO4J_URI=<uri>
NEO4J_USERNAME=<username>
NEO4J_PASSWORD=<password>
NEO4J_DATABASE=<database>
```

Then you can configure and define a `Neo4jChatAgent` like this:
```python
import langroid as lr
import langroid.language_models as lm

from langroid.agent.special.neo4j.neo4j_chat_agent import (
    Neo4jChatAgent,
    Neo4jChatAgentConfig,
    Neo4jSettings,
)

llm_config = lm.OpenAIGPTConfig()

load_dotenv()

neo4j_settings = Neo4jSettings()

kg_rag_agent_config = Neo4jChatAgentConfig(
    neo4j_settings=neo4j_settings,
    llm=llm_config, 
)
kg_rag_agent = Neo4jChatAgent(kg_rag_agent_config)
kg_rag_task = lr.Task(kg_rag_agent, name="kg_RAG")
kg_rag_task.run()
```


## Example: PyPi Package Dependency Chatbot

In the Langroid-examples repository, there is an example python 
[script](https://github.com/langroid/langroid-examples/blob/main/examples/kg-chat/)
showcasing tools/Function-calling + RAG using a `DependencyGraphAgent` derived from [`Neo4jChatAgent`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/neo4j/neo4j_chat_agent.py).
This agent uses two tools, in addition to the tools available to `Neo4jChatAgent`:

- `GoogleSearchTool` to find package version and type information, as well as to answer 
 other web-based questions after acquiring the required information from the dependency graph.
- `DepGraphTool` to construct a Neo4j knowledge-graph modeling the dependency structure
   for a specific package, using the API at [DepsDev](https://deps.dev/).

In response to a user's query about dependencies, the Agent decides whether to use a Cypher query
or do a web search. Here is what it looks like in action:

<figure markdown>
  ![dependency-demo](../../assets/demos/dependency_chatbot.gif)
  <figcaption>
Chatting with the `DependencyGraphAgent` (derived from Langroid's `Neo4jChatAgent`).
When a user specifies a Python package name (in this case "chainlit"), the agent searches the web using
`GoogleSearchTool` to find the version of the package, and then uses the `DepGraphTool`
to construct the dependency graph as a neo4j knowledge graph. The agent then answers
questions by generating Cypher queries to the knowledge graph, or by searching the web.
  </figcaption>
</figure>

---

---
title: 'Langroid: Multi-Agent Programming Framework for LLMs'
draft: true
date: 2024-01-10
authors: 
  - pchalasani
categories:
  - langroid
  - lancedb
  - rag
  - vector-database
comments: true
---

## Langroid: Multi-Agent Programming framework for LLMs

In this era of Large Language Models (LLMs), there is unprecedented demand to
create intelligent applications powered by this transformative technology. What
is the best way for developers to harness the potential of LLMs in complex
application scenarios? For a variety of technical and practical reasons (context
length limitations, LLM brittleness, latency, token-costs), this is not as
simple as throwing a task at an LLM system and expecting it to get done. What is
needed is a principled programming framework, offering the right set of
abstractions and primitives to make developers productive when building LLM
applications.
<!-- more -->
## Langroid's Elegant Multi-Agent Paradigm

The [Langroid](https://github.com/langroid/langroid) team (ex-CMU/UW-Madison researchers) 
has a unique take on this – they have built an open source Python framework to 
simplify LLM application development, using a Multi-Agent Programming paradigm. 
Langroid’s architecture is founded on Agents as first-class citizens: 
they are message-transformers, and accomplish tasks collaboratively via messages.

Langroid is emerging as a popular LLM framework; developers appreciate its clean
design and intuitive, extensible architecture. Programming with Langroid is
natural and even fun: you configure Agents and equip them with capabilities (
such as LLMs, vector-databases, Function-calling/tools), connect them and have
them collaborate via messages. This is a “Conversational Programming” paradigm,
and works with local/open and remote/proprietary LLMs. (Importantly, it does not
use LangChain or any other existing LLM framework).

<figure markdown>
  ![Langroid-card](../../assets/langroid-card-ossem-rust-1200x630.png){ width="800" }
  <figcaption>
An Agent serves as a convenient abstraction, encapsulating the state of LLM
conversations, access to vector stores, and various tools (functions or
plugins). A Multi-Agent Programming framework naturally aligns with the demands
of complex LLM-based applications.
</figcaption>
</figure>


## Connecting Agents via Tasks

In Langroid, a ChatAgent has a set of “responder” methods, one for each "entity":
an LLM, a human, and a tool-handler. However it does not have any way to iterate through
these responders. This is where the Task class comes in: A Task wraps an Agent
and gives it the ability to loop through its responders, via the `Task.run()` method. 

A Task loop is organized around simple rules that govern when a responder is eligible
to respond, what is considered a valid response, and when the task is complete.
The simplest example of a Task loop is an interactive chat with the human user. 
A Task also enables an Agent to interact with other agents: 
other tasks can be added to a task as sub-tasks, 
in a recursive, hierarchical (or DAG) structure. From a Task’s perspective,
sub-tasks are just additional responders, and present the same string-to-string 
message-transformation interface (function signature) as the Agent’s "native" responders. 
This is the key to composability of tasks in Langroid,
since a sub-task can act the same way as an Agent's "native" responders, and is subject
to the same rules of task orchestration. The result is that the same task orchestration
mechanism seamlessly enables tool handling, retries when LLM deviates, and 
delegation to sub-tasks. More details are in the Langroid [quick-start guide](https://langroid.github.io/langroid/quick-start/)

## A Taste of Coding with Langroid

To get started with Langroid, simply install it from pypi into your virtual environment:

```bash
pip install langroid
```
To directly chat with an OpenAI LLM, define the LLM configuration,
instantiate a language model object and interact with it:
(Langroid works with non-OpenAI local/propreitary LLMs as well,
see their [tutorial](https://langroid.github.io/langroid/tutorials/non-openai-llms/)) 
For the examples below, ensure you have a file `.env` containing your OpenAI API key
with this line: `OPENAI_API_KEY=sk-...`.
    
```python
import langroid as lr
import langroid.language_models as lm

llm_cfg = lm.OpenAIGPTConfig() # default GPT4-Turbo
mdl = lm.OpenAIGPT(llm_cfg)
mdl.chat("What is 3+4?", max_tokens=10)
```
The mdl does not maintain any conversation state; for that you need a `ChatAgent`:

```python
agent_cfg = lr.ChatAgentConfig(llm=llm_cfg)
agent = lr.ChatAgent(agent_cfg)
agent.llm_response("What is the capital of China?")
agent.llm_response("What about France?") # interprets based on previous msg
```
Wrap a ChatAgent in a Task to create a basic interactive loop with the user:

```python
task = lr.Task(agent, name="Bot")
task.run("Hello")
```
Have a Teacher Agent talk to a Student Agent:
    
```python
teacher = lr.ChatAgent(agent_cfg)
teacher_task = lr.Task(
    teacher, name="Teacher",
    system_message="""
        Ask your student simple number-based questions, and give feedback.
        Start with a question.
        """,
)
student = lr.ChatAgent(agent_cfg)
student_task = lr.Task(
    student, name="Student",
    system_message="Concisely answer your teacher's questions."
)
teacher_task.add_sub_task(student_task)
teacher_task.run()
```


## Retrieval Augmented Generation (RAG) and Vector Databases

One of the most popular LLM applications is question-answering 
on documents via Retrieval-Augmented Generation (RAG), powered by a vector database.
Langroid has a built-in DocChatAgent that incorporates a number of advanced RAG techniques, 
clearly laid out so they can be easily understood and extended.

### Built-in Support for LanceDB
<figure markdown>
  ![Langroid-lance](../../assets/langroid-lance.png){ width="800" }
  <figcaption>
Langroid uses LanceDB as the default vector store for its DocChatAgent.
</figcaption>
</figure>

Langroid's DocChatAgent uses the LanceDB serverless vector-database by default.
Since LanceDB uses file storage, it is easy to set up and use (no need for docker or cloud services),
and due to its use of the Lance columnar format, it is 
highly performant and scalable. 
In addition, Langroid has a specialized `LanceDocChatAgent` that leverages LanceDB's 
unique features such as Full-text search, SQL-like filtering, and pandas dataframe interop.
Setting up a basic RAG chatbot is as simple as (assume the previous imports):

```python
from langroid.agent.special.lance_doc_chat_agent import import (
    LanceDocChatAgent, DocChatAgentConfig
)
llm_config = lm.OpenAIGPTConfig()

rag_agent_config = DocChatAgentConfig(
    llm=llm_config, 
    doc_paths=["/path/to/my/docs"], # files, folders, or URLs.
)
rag_agent = LanceDocChatAgent(rag_agent_config)
rag_task = lr.Task(rag_agent, name="RAG")
rag_task.run()
```

For an example showcasing Tools/Function-calling + RAG in a multi-agent setup, see their quick-start
[Colab notebook](https://colab.research.google.com/github/langroid/langroid/blob/main/examples/Langroid_quick_start.ipynb)
which shows a 2-agent system where one agent is tasked with extracting structured information
from a document, and generates questions for the other agent to answer using RAG.
In the Langroid-examples repo there is a [script](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py) with the same functionality,
and here is what it looks like in action:


<figure markdown>
  ![lease-demo](../../assets/demos/lease-extractor-demo.gif){ width="800" }
  <figcaption>
Extracting structured info from a Commercial Lease using a 2-agent system, with 
a Tool/Function-calling and RAG. The Extractor Agent is told to extract information
in a certain structure, and it generates questions for the Document Agent
to answer using RAG.
</figcaption>
</figure>

## Retrieval Augmented Analytics

One of the unique features of LanceDB is its SQL-like filtering and Pandas dataframe interoperability.
LLMs are great at generating SQL queries, and also Pandas computation code such as `df.groupby("col").mean()`.
This opens up a very interesting possibility, which we call
**Retrieval Augmented Analytics:** Suppose a user has a large dataset of movie descriptions
with metadata such as rating, year and genre, and wants to ask:

> What is the highest-rated Comedy movie about college students made after 2010?

It is not hard to imagine that an LLM should be able to generate a **Query Plan** to answer this,
consisting of:

- A SQL-like filter: `genre = "Comedy" and year > 2010`
- A Pandas computation: `df.loc[df["rating"].idxmax()]`
- A rephrased query given the filter: "Movie about college students" (used for semantic/lexical search)

Langroid's Multi-Agent framework enables exactly this type of application. 
The [`LanceRAGTaskCreator`](https://github.com/langroid/langroid/blob/main/langroid/agent/special/lance_rag/lance_rag_task.py) takes a `LanceDocChatAgent` and adds two additional agents:

- QueryPlannerAgent: Generates the Query Plan
- QueryPlanCriticAgent: Critiques the Query Plan and Answer received from the RAG Agent, so that 
  the QueryPlanner can generate a better plan if needed.

Checkout the [`lance-rag-movies.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/lance-rag-movies.py) script in the langroid-examples repo to try this out.

## Try it out and get involved!
This was just a glimpse of what you can do with Langroid and how your code would look.
Give it a shot and learn more about the features and roadmap of Langroid on their 
[GitHub repo](https://github.com/langroid/langroid). Langroid welcomes contributions,
and they have a friendly [Discord](https://discord.gg/ZU36McDgDs) community.

If you like it, don’t forget to drop a 🌟.

---

---
title: 'Chat formatting in Local LLMs'
draft: true
date: 2024-01-25
authors: 
  - pchalasani
categories:
  - langroid
  - prompts
  - llm
  - local-llm
comments: true
---


In an (LLM performance) investigation, details matter!

And assumptions kill (your LLM performance).

I'm talking about chat/prompt formatting, especially when working with Local LLMs.

TL/DR -- details like chat formatting matter a LOT,
and trusting that the local LLM API is doing it correctly may be a mistake,
leading to inferior results.

<!-- more -->

🤔Curious? Here are some notes from the trenches when we built an app
(https://github.com/langroid/langroid/blob/main/examples/docqa/chat-multi-extract-local.py)
based entirely on a locally running Mistral-7b-instruct-v0.2  
(yes ONLY 7B parameters, compared to 175B+ for GPT4!)
that leverages Langroid Multi-agents, Tools/Function-calling and RAG to
reliably extract structured information from a document,
where an Agent is given a spec of the desired structure, and it generates
questions for another Agent to answer using RAG.

🔵LLM API types: generate and chat
LLMs are typically served behind two types of APIs endpoints:
⏺ A "generation" API, which accepts a dialog formatted as a SINGLE string, and
⏺ a "chat" API, which accepts the dialog as a LIST,
and as convenience formats it into a single string before sending to the LLM.

🔵Proprietary vs Local LLMs
When you use a proprietary LLM API (such as OpenAI or Claude), for convenience
you can use their "chat" API, and you can trust that it will format the dialog
history correctly (or else they wouldn't be in business!).

But with a local LLM, you have two choices of where to send the dialog history:
⏺ you could send it to the "chat" API and trust that the server will format it correctly,
⏺ or you could format it yourself and send it to the "generation" API.

🔵Example of prompt formatting?
Suppose your system prompt and dialog look like this:

System Prompt/Instructions: when I give you a number, respond with its double
User (You): 3
Assistant (LLM): 6
User (You): 9

Mistral-instruct models expect this chat to be formatted like this
(note that the system message is combined with the first user message):
"<s>[INST] when I give you a number, respond with its double 3 [/INST] 6 [INST] 9 [/INST]"

🔵Why does it matter?
It matters A LOT -- because each type of LLM (llama2, mistral, etc) has
been trained and/or fine-tuned on chats formatted in a SPECIFIC way, and if you
deviate from that, you may get odd/inferior results.

🔵Using Mistral-7b-instruct-v0.2 via oobabooga/text-generation-webui
"Ooba" is a great library (https://github.com/oobabooga/text-generation-webui)
that lets you spin up an OpenAI-like API server for
local models, such as llama2, mistral, etc. When we used its chat endpoint
for a Langroid Agent, we were getting really strange results,
with the LLM sometimes thinking it is the user! 😧

Digging in, we found that their internal formatting template was
wrong, and it was formatting the system prompt as if it's
the first user message -- this leads to the LLM interpreting the first user
message as an assistant response, and so on -- no wonder there was role confusion!

💥Langroid solution:
To avoid these issues, in Langroid we now have a formatter
(https://github.com/langroid/langroid/blob/main/langroid/language_models/prompt_formatter/hf_formatter.py)
that retrieves the HuggingFace tokenizer for the LLM and uses
its "apply_chat_template" method to format chats.
This gives you control over the chat format and you can use the "generation"
endpoint of the LLM API instead of the "chat" endpoint.

Once we switched to this, results improved dramatically 🚀

Be sure to checkout Langroid https://github.com/langroid/langroid

#llm #ai #opensource

---

---
title: 'Using Langroid with Local LLMs'
draft: false
date: 2023-09-14
authors: 
  - pchalasani
categories:
  - langroid
  - llm
  - local-llm
comments: true
---
## Why local models?
There are commercial, remotely served models that currently appear to beat all open/local
models. So why care about local models? Local models are exciting for a number of reasons:

<!-- more -->

- **cost**: other than compute/electricity, there is no cost to use them.
- **privacy**: no concerns about sending your data to a remote server.
- **latency**: no network latency due to remote API calls, so faster response times, provided you can get fast enough inference.
- **uncensored**: some local models are not censored to avoid sensitive topics.
- **fine-tunable**: you can fine-tune them on private/recent data, which current commercial models don't have access to.
- **sheer thrill**: having a model running on your machine with no internet connection,
  and being able to have an intelligent conversation with it -- there is something almost magical about it.

The main appeal with local models is that with sufficiently careful prompting,
they may behave sufficiently well to be useful for specific tasks/domains,
and bring all of the above benefits. Some ideas on how you might use local LLMs:

- In a multi-agent system, you could have some agents use local models for narrow 
  tasks with a lower bar for accuracy (and fix responses with multiple tries).
- You could run many instances of the same or different models and combine their responses.
- Local LLMs can act as a privacy layer, to identify and handle sensitive data before passing to remote LLMs.
- Some local LLMs have intriguing features, for example llama.cpp lets you 
  constrain its output using a grammar.

## Running LLMs locally

There are several ways to use LLMs locally. See the [`r/LocalLLaMA`](https://www.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_install_llama_8bit_and_4bit/) subreddit for
a wealth of information. There are open source libraries that offer front-ends
to run local models, for example [`oobabooga/text-generation-webui`](https://github.com/oobabooga/text-generation-webui)
(or "ooba-TGW" for short) but the focus in this tutorial is on spinning up a
server that mimics an OpenAI-like API, so that any code that works with
the OpenAI API (for say GPT3.5 or GPT4) will work with a local model,
with just a simple change: set `openai.api_base` to the URL where the local API
server is listening, typically `http://localhost:8000/v1`.

There are a few libraries we recommend for setting up local models with OpenAI-like APIs:

- [LiteLLM OpenAI Proxy Server](https://docs.litellm.ai/docs/proxy_server) lets you set up a local 
  proxy server for over 100+ LLM providers (remote and local).
- [ooba-TGW](https://github.com/oobabooga/text-generation-webui) mentioned above, for a variety of models, including llama2 models.
- [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) (LCP for short), specifically for llama2 models.
- [ollama](https://github.com/jmorganca/ollama)

We recommend visiting these links to see how to install and run these libraries.

## Use the local model with the OpenAI library

Once you have a server running using any of the above methods, 
your code that works with the OpenAI models can be made to work 
with the local model, by simply changing the `openai.api_base` to the 
URL where the local server is listening. 

If you are using Langroid to build LLM applications, the framework takes
care of the `api_base` setting in most cases, and you need to only set
the `chat_model` parameter in the LLM config object for the LLM model you are using.
See the [Non-OpenAI LLM tutorial](../../tutorials/non-openai-llms.md) for more details.


<iframe src="https://langroid.substack.com/embed" width="480" height="320" style="border:1px solid #EEE; background:white;" frameborder="0" scrolling="no"></iframe>

---

---
title: 'MALADE: Multi-Agent Architecture for Pharmacovigilance'
draft: false
date: 2024-08-12
authors:
- jihye
- nils
- pchalasani
- mengelhard
- someshjha
- anivaryakumar
- davidpage

categories:
- langroid
- multi-agent
- neo4j
- rag
comments: true
---

# MALADE: Multi-Agent Architecture for Pharmacovigilance

[Published in ML for HealthCare 2024](https://www.mlforhc.org/2024-abstracts)

[Arxiv](https://arxiv.org/abs/2408.01869) 

[GitHub](https://github.com/jihyechoi77/malade)

## Summary
We introduce MALADE (**M**ultiple **A**gents powered by **L**LMs for **ADE** Extraction),
a multi-agent system for Pharmacovigilance. It is the first effective explainable 
multi-agent LLM system for extracting Adverse Drug Events (ADEs) from FDA drug labels and drug prescription data.
<!-- more -->
Given a drug category and an adverse outcome, MALADE
produces:

- a qualitative label of risk (`increase`, `decrease` or `no-effect`),
- confidence in the label (a number in $[0,1]$),
- frequency of effect (`rare`, `common`, or `none`),
- strength of evidence (`none`, `weak`, or `strong`), and
- a justification with citations.

This task is challenging for several reasons: 

- FDA labels and prescriptions are for individual drugs, not drug categories, so representative drugs in a category 
  need to be identified from patient prescription data, and ADE information found for specific drugs in a category 
  needs to be aggregated to make a statement about the category as a whole, 
- The data is noisy, with variations in the terminologies of drugs and outcomes, and 
- ADE descriptions are often buried in large amounts of narrative text.

The MALADE architecture is LLM-agnostic 
and leverages the [Langroid](https://github.com/langroid/langroid) multi-agent framework.
It consists of a combination of Agents using Retrieval Augmented Generation (RAG), that 
iteratively improve their answers based on feedback from Critic Agents.
We evaluate the quantitative scores against 
a ground-truth dataset known as the [*OMOP Ground Truth Task*](https://www.niss.org/sites/default/files/Session3-DaveMadigan_PatrickRyanTalk_mar2015.pdf)
and find that MALADE achieves state-of-the-art performance.


## Introduction

In the era of Large Language Models (LLMs), given their remarkable text understanding and generation abilities, 
there is an unprecedented opportunity to develop new, LLM-based methods for trustworthy medical knowledge synthesis, 
extraction and summarization. The focus of this paper is Pharmacovigilance, a critical task in healthcare, where 
the goal is to monitor and evaluate the safety of drugs. In particular, the identification of Adverse Drug Events 
(ADEs) is crucial for ensuring patient safety. Consider a question such as this:

> What is the effect of **ACE inhibitors** on the risk of developing **angioedema**?

Here the **drug category** $C$ is _ACE inhibitors_, and the **outcome** $O$ is _angioedema_.
Answering this question involves several steps:

- **1(a): Find all drugs** in the ACE inhibitor category $C$, e.g. by searching the FDA 
[National Drug Code](https://www.fda.gov/drugs/drug-approvals-and-databases/national-drug-code-directory) (NDC) 
   database. This can be done using Elastic-Search, with filters to handle variations in drug/category names and inaccurate classifications.
- **1(b): Find the prescription frequency** of each drug in $C$ from patient prescription data, e.g. 
the [MIMIC-IV](https://physionet.org/content/mimiciv/3.0/) database. This can be done with a SQL query.
- **1(c): Identify the representative drugs** $D \subset C$ in this category, based on prescription frequency data 
     from step 2.  
- **2:** For each drug $d \in D$, **summarize ADE information** about the effect of $d$ on the outcome $O$ of interest,
   (in this case angioedema) from text-based pharmaceutical sources, 
    e.g. the [OpenFDA Drug Label](https://open.fda.gov/apis/drug/label/) database.
- **3: Aggregate** the information from all drugs in $D$ to make a statement about the category $C$ as a whole.


## The role of LLMs

While steps 1(a) and 1(b) can be done by straightforward deterministic algorithms (SQL queries or Elastic-Search), the 
remaining steps are challenging but ideally suited to LLMs:

### Step 1(c): Identifying representative drugs in a category from prescription frequency data (`DrugFinder` Agent)

This is complicated by noise, such as the same drug appearing multiple times under different names, 
formulations or delivery methods (For example, the ACE inhibitor **Lisinopril** is also known as **Zestril** and **Prinivil**.) 
  Thus a judgment must
  be made as to whether these are sufficiently different to be considered pharmacologically distinct;
  and some of these drugs may not actually belong to the category. This task thus requires a grouping operation, 
  related to the task of identifying standardized drug codes from text descriptions,
  well known to be challenging. This makes it very difficult to explicitly define the algorithm in a deterministic 
  manner that covers all edge cases (unlike the above database tasks), and hence is well-suited
  to LLMs, particularly those such as GPT-4, Claude3.5, and similar-strength variants which are known to have been 
  trained on vast amounts of general medical texts. 

In MALADE, this task is handled by the `DrugFinder` agent,
which is an Agent/Critic system where the main agent iteratively improves its output
in a feedback loop with the Critic agent. For example, the Critic corrects the Agent when it incorrectly
classifies drugs as pharmacologically distinct.

###  Step 2: Identifying Drug-Outcome Associations (`DrugOutcomeInfoAgent`)

The task here is to identify whether a given drug
has an established effect on the risk of a given outcome, based on FDA drug label database, and
output a summary of relevant information, including the level of identified risk and the evidence for
such an effect. Since this task involves extracting information from narrative text, it is well-suited to
LLMs using the Retrieval Augmented Generation (RAG) technique. 

In MALADE, the `DrugOutcomeInfoAgent` handles this task, and is also an Agent/Critic system, where the Critic
provides feedback and corrections to the Agent's output.
This agent does not have direct access to the FDA Drug Label data, but can receive
this information via another agent, `FDAHandler`. FDAHandler is equipped with **tools** (also known as function-calls) 
to invoke the OpenFDA API for drug label data, and answers questions in the context of information retrieved
based on the queries. Information received from this API is ingested into a vector database, so the
agent first uses a tool to query this vector database, and only resorts to the OpenFDA API tool if
the vector database does not contain the relevant information. An important aspect of this agent is that
its responses include specific **citations** and **excerpts** justifying its conclusions.

###  Step 3: Labeling Drug Category-Outcome Associations (`CategoryOutcomeRiskAgent`)

To identify association between a drug category C and an adverse health outcome $O$, we concurrently run a batch of 
queries to copies of `DrugOutcomeInfoAgent`, one for each drug $d$ in the
representative-list $D$ for the category, of the form: 

> Does drug $d$ increase or decrease the risk of condition $O$?

The results are sent to `CategoryOutcomeRiskAgent`, 
which is an Agent/Critic system which performs the final classification
step; its goal is to generate the qualitative and quantitative outputs mentioned above.

## MALADE Architecture

The figure below illustrates how the MALADE architecture handles the query,

> What is the effect of **ACE inhibitors** on the risk of developing **angioedema**?

![malade-arch.png](figures/malade-arch.png)

The query triggers a sequence of subtasks performed by the three Agents described above: 
`DrugFinder`, `DrugOutcomeInfoAgent`, and `CategoryOutcomeRiskAgent`.
Each Agent generates a response and justification, which are validated by a corresponding Critic agent, whose feedback is
used by the Agent to revise its response.

## Evaluation

### OMOP Ground Truth

We evaluate the results of MALADE against a well-established ground-truth dataset, 
the [OMOP ADE ground-truth table](https://www.niss.org/sites/default/files/Session3-DaveMadigan_PatrickRyanTalk_mar2015.pdf), shown below.
This is a reference dataset within the Observational Medical Outcomes Partnership (OMOP) Common Data Model that 
contains validated information about known adverse drug events.

![omop-ground-truth.png](figures/omop-ground-truth.png)

### Confusion Matrix

Below is a side-by-side comparison of this ground-truth dataset (left) with MALADE's labels (right), ignoring blue 
cells (see the paper for details):

![omop-results.png](figures/omop-results.png)

The resulting confusion-matrix for MALADE is shown below:

![confusion.png](figures/confusion.png)

### AUC Metric

Since MALADE produces qualitative and quantitative outputs, the paper explores a variety of ways to evaluate its
performance against the OMOP ground-truth dataset. Here we focus on the label output $L$ (i.e. `increase`, 
`decrease`, or `no-effect`), and its associated confidence score $c$, and use the Area Under the ROC Curve (AUC) as 
the evaluation metric.
The AUC metric is designed for binary classification, so we transform the three-class label output $L$ and
confidence score $c$ to a binary classification score $p$ as follows.
We treat $L$ = `increase` as the positive class,
and $L$ = `decrease` or `no-effect` as the negative class, and
we transform the label confidence score $c$ into a probability $p$ of `increase` as follows:


- if the label output is `increase`, $p = (2+c)/3$,
- if the label output is `no-effect`, $p = (2-c)/3$, and
- if the label output is `decrease` , $p = (1-c)/3$.

These transformations align with two intuitions: (a) a *higher* confidence in `increase` corresponds
to a *higher* probability of `increase`, and a *higher* confidence in `no-effect` or `decrease`
corresponds to a *lower* probability of `increase`, and (b) for a given confidence score $c$, the progression
of labels `decrease`, `no-effect`, and `increase` corresponds to *increasing* probabilities of `increase`.
The above transformations ensure that the probability $p$ is in the range $[0,1]$ and scales linearly with the
confidence score $c$.

We ran the full MALADE system for all drug-category/outcome pairs in the OMOP ground-truth dataset, 
and then computed the AUC for the score $p$ against the ground-truth binary classification label.
With `GPT-4-Turbo` we obtained an AUC of 0.85, while `GPT-4o` resulted in an AUC of 0.90.
These are state-of-the-art results for this specific ADE-extraction task.


### Ablations

An important question the paper investigates is whether (and how much) the various components (RAG, critic agents, etc)
contribute to MALADE's performance. To answer this, we perform ablations, where we remove one or more
components from the MALADE system and evaluate the performance of the resulting system.
For example we found that dropping the Critic agents reduces the AUC (using `GPT-4-Turbo`) from 0.85 to 0.82
(see paper, Appendix D for more ablation results).

### Variance of LLM-generated Scores

When using an LLM to generate numerical scores, it is important to understand the variance in the scores.
For example, if a single "full" run of MALADE (i.e. for all drug-category/outcome pairs in the OMOP ground-truth
dataset) produces a certain AUC, was it a "lucky" run, or is the AUC relatively stable across runs?
Ideally one would investigate this by repeating the full run of MALADE many times,  
but given the expense of running a full experiment, we focus on just three representative cells in the OMOP table,
one corresponding to each possible ground-truth label, and run MALADE 10 times for each cells, and
study the distribution of $p$ (the probability of increased risk, translated from the confidence score using the
method described above), for each output label. Encouragingly, we find that the distribution of $p$ shows clear
separation between the three labels, as in the figure below (The $x$ axis ranges from 0 to 1, and the three colored
groups of bars represent, from left to right, `decrease`, `no-effect`, and `increase` labels). Full details are in 
the Appendix D of the paper.

![img.png](figures/variance-histogram.png)

---

---
title: 'Multi Agent Debate and Education Platform'
draft: false
date: 2025-02-04
authors: 
  - adamshams
categories:
  - langroid
  - llm
  - local-llm
  - chat
comments: true
---

## Introduction
Have you ever imagined a world where we can debate complex issues with Generative AI agents taking a distinct 
stance and backing their arguments with evidence? Some will change your mind, and some will reveal the societal biases 
on which each distinctive Large Language Model (LLM) is trained on. Introducing an [AI-powered debate platform](https://github.com/langroid/langroid/tree/main/examples/multi-agent-debate) that brings 
this imagination to reality, leveraging diverse LLMs and the Langroid multi-agent programming framework. The system enables users to engage in structured debates with an AI taking the opposite stance (or even two AIs debating each other), using a multi-agent architecture with Langroid's powerful framework, where each agent embodies a specific ethical perspective, creating realistic and dynamic interactions. 
Agents are prompt-engineered and role-tuned to align with their assigned ethical stance, 
ensuring thoughtful and structured debates. 

<!-- more -->

My motivations for creating this platform included: 

  - A debate coach for underserved students without access to traditional resources. 
  - Tool for research and generating arguments from authentic sources. 
  - Create an adaptable education platform to learn two sides of the coin for any topic.
  - Reduce echo chambers perpetuated by online algorithms by fostering two-sided debates on any topic, promoting education and awareness around misinformation. 
  - Provide a research tool to study the varieties of biases in LLMs that are often trained on text reflecting societal biases. 
  - Identify a good multi-agent framework designed for programming with LLMs.


## Platform Features:
### Dynamic Agent Generation:
The platform features five types of agents: Pro, Con, Feedback, Research, and Retrieval Augmented Generation (RAG) Q&A. 
Each agent is dynamically generated using role-tuned and engineered prompts, ensuring diverse and engaging interactions.
#### Pro and Con Agents: 
These agents engage in the core debate, arguing for and against the chosen topic. 
Their prompts are carefully engineered to ensure they stay true to their assigned ethical stance.
#### Feedback Agent: 
This agent provides real-time feedback on the arguments and declares a winner. The evaluation criteria are based on the well-known Lincoln–Douglas debate format, and include:

  - Clash of Values 
  - Argumentation 
  - Cross-Examination 
  - Rebuttals 
  - Persuasion 
  - Technical Execution 
  - Adherence to Debate Etiquette 
  - Final Focus
#### Research Agent: 
This agent has the following functionalities:

  - Utilizes the `MetaphorSearchTool` and the `Metaphor` (now called `Exa`) Search API to conduct web searches combined with
Retrieval Augmented Generation (RAG) to relevant web references for user education about the selected topic. 
  - Produces a summary of arguments for and against the topic.
  - RAG-based document chat with the resources identified through Web Search. 
#### RAG Q&A Agent:

  - Provides Q&A capability using a RAG based chat interaction with the resources identified through Web Search.
The agent utilizes `DocChatAgent` that is part of Langroid framework which orchestrates all LLM interactions. 
  - Rich chunking parameters allows the user to get optimized relevance results. Check out `config.py`for details.

### Topic Adaptability:
Easily adaptable to any subject by simply adding pro and con system messages. This makes it a versatile tool for
exploring diverse topics and fostering critical thinking. Default topics cover ethics and use of AI for the following:
  - Healthcare
  - Intellectual property 
  - Societal biases 
  - Education
### Autonomous or Interactive:
Engage in manual debate with a pro or con agent or watch it autonomously while adjusting number of turns.

### Diverse LLM Selection Adaptable per Agent: 
Configurable to select from diverse commercial and open source models: OpenAI, Google, and Mistral 
to experiment with responses for diverse perspectives. Users can select a unique LLM for each agent. 
       
### LLM Tool/Function Integration: 
Utilizes LLM tools/functions features to conduct semantic search using Metaphor Search API and summarizes the pro and 
con perspectives for education.

### Configurable LLM Parameters: 
Parameters like temperature, minimum and maximum output tokens, allowing for customization of the AI's responses.
Configurable LLM parameters like temperature, min & max output tokens. For Q&A with the searched resources, several
parameters can be tuned in the `config` to enhance response relevance.

### Modular Design: 
Reusable code and modularized for other LLM applications.


## Interaction
1. Decide if you want to you use same LLM for all agents or different ones
2. Decide if you want autonomous debate between AI Agents or user vs. AI Agent. 
3. Select a debate topic.
4. Choose your side (Pro or Con).
5. Engage in a debate by providing arguments and receiving responses from agents.
6. Request feedback at any time by typing `f`.
7. Decide if you want the Metaphor Search to run to find Topic relevant web links
   and summarize them. 
8. Decide if you want to chat with the documents extracted from URLs found to learn more about the Topic.
9. End the debate manually by typing `done`. If you decide to chat with the documents, you can end session
by typing `x`

## Why was Langroid chosen?
I chose Langroid framework because it's a principled multi-agent programming framework inspired by the Actor framework.
Prior to using Langroid, I developed a multi-agent debate system, however, I had to write a lot of tedious code to manage states of communication between
debating agents, and the user interactions with LLMs. Langroid allowed me to seamlessly integrate multiple LLMs,
easily create agents, tasks, and attach sub-tasks. 

### Agent Creation Code Example

```python
   def create_chat_agent(name: str, llm_config: OpenAIGPTConfig, system_message: str) -> ChatAgent:
   
    return ChatAgent(
        ChatAgentConfig(
            llm=llm_config,
            name=name,
            system_message=system_message,
        )
    )
```
#### Sample Pro Topic Agent Creation

```python
 
    pro_agent = create_chat_agent(
        "Pro",
        pro_agent_config,
        system_messages.messages[pro_key].message + DEFAULT_SYSTEM_MESSAGE_ADDITION,
    )
    
```
The `Task` mechanism in Langroid provides a robust mechanism for managing complex interactions within multi-agent 
systems. `Task` serves as a container for managing the flow of interactions between different agents
(such as chat agents) and attached sub-tasks.`Task` also helps with turn-taking, handling responses, 
and ensuring smooth transitions between dialogue states. Each Task object is responsible for coordinating responses 
from its assigned agent, deciding the sequence of responder methods (llm_response, user_response, agent_response), 
and managing transitions between different stages of a conversation or debate. Each agent can focus on its specific 
role while the task structure handles the overall process's orchestration and flow, allowing a clear separation of 
concerns. The architecture and code transparency of Langroid's framework make it an incredible candidate for 
applications like debates where multiple agents must interact dynamically and responsively
based on a mixture of user inputs and automated responses.

### Task creation and Orchestration Example

```python
    user_task = Task(user_agent, interactive=interactive_setting, restart=False)
    ai_task = Task(ai_agent, interactive=False, single_round=True)
    user_task.add_sub_task(ai_task)
    if not llm_delegate:
        user_task.run(user_agent.user_message, turns=max_turns)
    else:
        user_task.run("get started", turns=max_turns)
    
```
Tasks can be easily set up as sub-tasks of an orchestrating agent. In this case user_task could be Pro or Con depending 
on the user selection. 

If you want to build custom tools/functions or use Langroid provided it is only a line of code using
`agent.enable_messaage`. Here is an example of `MetaphorSearchTool` and `DoneTool`. 
```python
        metaphor_search_agent.enable_message(MetaphorSearchTool)
        metaphor_search_agent.enable_message(DoneTool)
```

Overall I had a great learning experience using Langroid and recommend using it for any projects 
that need to utilize LLMs. I am already working on a few Langroid based information retrieval and research systems 
for use in medicine and hoping to contribute more soon. 

### Bio

I'm a high school senior at Khan Lab School located in Mountain View, CA where I host a student-run Podcast known as the
Khan-Cast. I also enjoy tinkering with interdisciplinary STEM projects. You can reach me on [LinkedIn](https://www.linkedin.com/in/adamshams/).

---

---
draft: true
date: 2022-01-31
authors: 
  - pchalasani
categories:
  - test
  - blog
comments: true
---

# Test code snippets

```python
from langroid.language_models.base import LLMMessage, Role
msg = LLMMessage(
        content="What is the capital of Bangladesh?",
        role=Role.USER,
      )
```

<!-- more -->


# Test math notation

A nice equation is $e^{i\pi} + 1 = 0$, which is known as Euler's identity.
Here is a cool equation too, and in display mode:

$$
e = mc^2
$$

# Latex with newlines

Serious latex with `\\` for newlines renders fine:

$$
\begin{bmatrix}
a & b \\
c & d \\
e & f \\
\end{bmatrix}
$$

or a multi-line equation

$$
\begin{aligned}
\dot{x} & = \sigma(y-x) \\
\dot{y} & = \rho x - y - xz \\
\dot{z} & = -\beta z + xy
\end{aligned}
$$

<iframe src="https://langroid.substack.com/embed" width="480" height="320" style="border:1px solid #EEE; background:white;" frameborder="0" scrolling="no"></iframe>

---

# Audience Targeting for a Business

Suppose you are a marketer for a business, trying to figure out which 
audience segments to target.
Your downstream systems require that you specify _standardized_ audience segments
to target, for example from the [IAB Audience Taxonomy](https://iabtechlab.com/standards/audience-taxonomy/).

There are thousands of standard audience segments, and normally you would need 
to search the list for potential segments that match what you think your ideal
customer profile is. This is a tedious, error-prone task.

But what if we can leverage an LLM such as GPT-4?
We know that GPT-4 has  skills that are ideally suited for this task:

- General knowledge about businesses and their ideal customers
- Ability to recognize which standard segments match an English description of a customer profile
- Ability to plan a conversation to get the information it needs to answer a question


Once you decide to use an LLM, you still need to figure out how to organize the 
various components of this task:

- **Research:** What are some ideal customer profiles for the business
- **Segmentation:** Which standard segments match an English description of a customer profile
- **Planning:** how to organize the task to identify a few standard segments

## Using Langroid Agents 

Langroid makes it intuitive and simple to build an LLM-powered system organized
around agents, each responsible for a different task.
In less than a day we built a 3-agent system to automate this task:

- The `Marketer` Agent is given the Planning role.
- The `Researcher` Agent is given the Research role, 
  and it has access to the business description. 
- The `Segmentor` Agent is given the Segmentation role. It has access to the 
  IAB Audience Taxonomy via a vector database, i.e. its rows have been mapped to
  vectors via an embedding model, and these vectors are stored in a vector-database. 
  Thus given an English description of a customer profile,
  the `Segmentor` Agent maps it to a vector using the embedding model,
  and retrieves the nearest (in vector terms, e.g. cosine similarity) 
  IAB Standard Segments from the vector-database. The Segmentor's LLM 
  further refines this by selecting the best-matching segments from the retrieved list.

To kick off the system, the human user describes a business in English,
or provides the URL of the business's website. 
The `Marketer` Agent sends
customer profile queries to the `Researcher`, who answers in plain English based on 
the business description, and the Marketer takes this description and sends it to the Segmentor,
who maps it to Standard IAB Segments. The task is done when the Marketer finds 4 Standard segments. 
The agents are depicted in the diagram below:

![targeting.png](targeting.png)

## An example: Glashutte Watches

The human user first provides the URL of the business, in this case:
```text
https://www.jomashop.com/glashutte-watches.html
```
From this URL, the `Researcher` agent summarizes its understanding of the business.
The `Marketer` agent starts by asking the `Researcher`:
``` 
Could you please describe the age groups and interests of our typical customer?
```
The `Researcher` responds with an English description of the customer profile:
```text
Our typical customer is a fashion-conscious individual between 20 and 45 years...
```
The `Researcher` forwards this English description to the `Segmentor` agent, who
maps it to a standardized segment, e.g.:
```text
Interest|Style & Fashion|Fashion Trends
...
```
This conversation continues until the `Marketer` agent has identified 4 standardized segments.

Here is what the conversation looks like:

![targeting.gif](targeting.gif)

---

# Hierarchical computation with Langroid Agents 

Here is a simple example showing tree-structured computation
where each node in the tree is handled by a separate agent.
This is a toy numerical example, and illustrates:

- how to have agents organized in a hierarchical structure to accomplish a task 
- the use of global state accessible to all agents, and 
- the use of tools/function-calling.

## The Computation 

We want to carry out the following calculation for a given input number $n$:

```python
def Main(n):
    if n is odd:
        return (3*n+1) + n
    else:
        if n is divisible by 10:
            return n/10 + n
        else:
            return n/2 + n
```

## Using function composition

Imagine we want to do this calculation using a few auxiliary functions:

```python
def Main(n):
    # return non-null value computed by Odd or Even
    Record n as global variable # to be used by Adder below
    return Odd(n) or Even(n)

def Odd(n):
    # Handle odd n
    if n is odd:
        new = 3*n+1
        return Adder(new)
    else:
        return None
    
def Even(n):
    # Handle even n: return non-null value computed by EvenZ or EvenNZ
    return EvenZ(n) or EvenNZ(n)

def EvenZ(n):
    # Handle even n divisible by 10, i.e. ending in Zero
    if n is divisible by 10:
        new = n/10
        return Adder(new)
    else:
        return None
    
def EvenNZ(n):
    # Handle even n not divisible by 10, i.e. not ending in Zero
    if n is not divisible by 10:
        new = n/2
        return Adder(new)
    else:
        return None  

def Adder(new):
    # Add new to starting number, available as global variable n
    return new + n
```

## Mapping to a tree structure

This compositional/nested computation can be represented as a tree:

```plaintext
       Main
     /     \
  Even     Odd
  /   \        \
EvenZ  EvenNZ   Adder
  |      |
 Adder  Adder
```

Let us specify the behavior we would like for each node, in a 
"decoupled" way, i.e. we don't want a node to be aware of the other nodes.
As we see later, this decoupled design maps very well onto Langroid's
multi-agent task orchestration. To completely define the node behavior,
we need to specify how it handles an "incoming" number $n$ (from a parent node 
or user), and how it handles a "result" number $r$ (from a child node).

- `Main`: 
    - incoming $n$: simply send down $n$, record the starting number $n_0 = n$ as a global variable. 
    - result $r$: return $r$.
- `Odd`: 
    - incoming $n$: if n is odd, send down $3*n+1$, else return None
    - result $r$: return $r$
- `Even`: 
    - incoming $n$: if n is even, send down $n$, else return None
    - result $r$: return $r$
- `EvenZ`: (guaranteed by the tree hierarchy, to receive an even number.)  
    - incoming $n$: if n is divisible by 10, send down $n/10$, else return None
    - result $r$: return $r$
- `EvenNZ`: (guaranteed by the tree hierarchy, to receive an even number.)
    - incoming $n$: if n is not divisible by 10, send down $n/2$, else return None
    - result $r$: return $r$
- `Adder`:
    - incoming $n$: return $n + n_0$ where $n_0$ is the 
    starting number recorded by Main as a global variable.
    - result $r$: Not applicable since `Adder` is a leaf node.
  
## From tree nodes to Langroid Agents 

Let us see how we can perform this calculation using multiple Langroid agents, where

- we define an agent corresponding to each of the nodes above, namely 
`Main`, `Odd`, `Even`, `EvenZ`, `EvenNZ`, and `Adder`.
- we wrap each Agent into a Task, and use the `Task.add_subtask()` method to connect the agents into 
  the desired hierarchical structure.

Below is one way to do this using Langroid. We designed this with the following
desirable features:

- Decoupling: Each agent is instructed separately, without mention of any other agents
  (E.g. Even agent does not know about Odd Agent, EvenZ agent, etc).
  In particular, this means agents will not be "addressing" their message
  to specific other agents, e.g. send number to Odd agent when number is odd,
  etc. Allowing addressing would make the solution easier to implement,
  but would not be a decoupled solution.
  Instead, we want Agents to simply put the number "out there", and have it handled
  by an applicable agent, in the task loop (which consists of the agent's responders,
  plus any sub-task `run` methods).

- Simplicity: Keep the agent instructions relatively simple. We would not want a solution
  where we have to instruct the agents (their LLMs) in convoluted ways. 

One way naive solutions fail is because agents are not able to distinguish between
a number that is being "sent down" the tree as input, and a number that is being
"sent up" the tree as a result from a child node.

We use a simple trick: we instruct the LLM to mark returned values using the RESULT keyword,
and instruct the LLMs on how to handle numbers that come with RESULT keyword, and those that don't
In addition, we leverage some features of Langroid's task orchestration:

- When `llm_delegate` is `True`, if the LLM says `DONE [rest of msg]`, the task is
  considered done, and the result of the task is `[rest of msg]` (i.e the part after `DONE`).
- In the task loop's `step()` function (which seeks a valid message during a turn of
  the conversation) when any responder says `DO-NOT-KNOW`, it is not considered a valid
  message, and the search continues to other responders, in round-robin fashion.


See the [`chat-tree.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py)
example for an implementation of this solution. You can run that example as follows:
```bash
python3 examples/basic/chat-tree.py
```
In the sections below we explain the code in more detail.

## Define the agents

Let us start with defining the configuration to be used by all agents:

```python
from langroid.agent.chat_agent import ChatAgent, ChatAgentConfig
from langroid.language_models.openai_gpt import OpenAIChatModel, OpenAIGPTConfig

config = ChatAgentConfig(
  llm=OpenAIGPTConfig(
    chat_model=OpenAIChatModel.GPT4o,
  ),
  vecdb=None, # no need for a vector database
)
```

Next we define each of the agents, for example:

```python
main_agent = ChatAgent(config)
```

and similarly for the other agents.

## Wrap each Agent in a Task

To allow agent interactions, the first step is to wrap each agent in a Task.
When we define the task, we pass in the instructions above as part of the system message.
Recall the instructions for the `Main` agent:

- `Main`:
    - incoming $n$: simply send down $n$, record the starting number $n_0 = n$ as a global variable.
    - result $r$: return $r$.

We include the equivalent of these instructions in the `main_task` that wraps 
the `main_agent`:

```python
from langroid.agent.task import Task

main_task = Task(
    main_agent,
    name="Main",
    interactive=False, #(1)!
    system_message="""
          You will receive two types of messages, to which you will respond as follows:
          
          INPUT Message format: <number>
          In this case simply write the <number>, say nothing else.
          
          RESULT Message format: RESULT <number>
          In this case simply say "DONE <number>", e.g.:
          DONE 19
    
          To start off, ask the user for the initial number, 
          using the `ask_num` tool/function.
          """,
    llm_delegate=True, # allow LLM to control end of task via DONE
    single_round=False,
)
```

1. Non-interactive: don't wait for user input in each turn 

There are a couple of points to highlight about the `system_message` 
value in this task definition:

- When the `Main` agent receives just a number, it simply writes out that number,
  and in the Langroid Task loop, this number becomes the "current pending message"
  to be handled by one of the sub-tasks, i.e. `Even, Odd`. Note that these sub-tasks
  are _not_ mentioned in the system message, consistent with the decoupling principle.
- As soon as either of these sub-tasks returns a non-Null response, in the format "RESULT <number>", the `Main` agent
  is instructed to return this result saying "DONE <number>". Since `llm_delegate`
  is set to `True` (meaning the LLM can decide when the task has ended), 
  this causes the `Main` task to be considered finished and the task loop is exited.

Since we want the `Main` agent to record the initial number as a global variable,
we use a tool/function `AskNum` defined as follows 
(see [this section](../quick-start/chat-agent-tool.md) in the getting started guide 
for more details on Tools):

```python
from rich.prompt import Prompt
from langroid.agent.tool_message import ToolMessage


class AskNumTool(ToolMessage):
  request = "ask_num"
  purpose = "Ask user for the initial number"

  def handle(self) -> str:
    """
    This is a stateless tool (i.e. does not use any Agent member vars), so we can
    define the handler right here, instead of defining an `ask_num`
    method in the agent.
    """
    num = Prompt.ask("Enter a number")
    # record this in global state, so other agents can access it
    MyGlobalState.set_values(number=num)
    return str(num)
```

We then enable the `main_agent` to use and handle messages that conform to the 
`AskNum` tool spec:

```python
main_agent.enable_message(AskNumTool)
```

!!! tip "Using and Handling a tool/function"
    "Using" a tool means the agent's LLM _generates_ 
    the function-call (if using OpenAI function-calling) or 
    the JSON structure (if using Langroid's native tools mechanism) 
    corresponding to this tool. "Handling" a tool refers to the Agent's method 
    recognizing the tool and executing the corresponding code.


The tasks for other agents are defined similarly. We will only note here
that the `Adder` agent needs a special tool `AddNumTool` to be able to add the current number
to the initial number set by the `Main` agent. 

## Connect the tasks into a tree structure

So far, we have wrapped each agent in a task, in isolation, and there is no 
connection between the tasks. The final step is to connect the tasks to 
the tree structure we saw earlier:

```python
main_task.add_sub_task([even_task, odd_task])
even_task.add_sub_task([evenz_task, even_nz_task])
evenz_task.add_sub_task(adder_task)
even_nz_task.add_sub_task(adder_task)
odd_task.add_sub_task(adder_task)
```

Now all that remains is to run the main task:

```python
main_task.run()
```

Here is what a run starting with $n=12$ looks like:

![chat-tree.png](chat-tree.png)

---

# Guide to examples in `langroid-examples` repo

!!! warning "Outdated"
    This guide is from Feb 2024; there have been numerous additional examples
    since then. We recommend you visit the `examples` folder in the core `langroid`
    repo for the most up-to-date examples. These examples are periodically copied
    over to the `examples` folder in the `langroid-examples` repo.

The [`langroid-examples`](https://github.com/langroid/langroid-examples) repo
contains several examples of using
the [Langroid](https://github.com/langroid/langroid) agent-oriented programming 
framework for LLM applications.
Below is a guide to the examples. First please ensure you follow the
installation instructions in the `langroid-examples` repo README.

**At minimum a GPT4-compatible OpenAI API key is required.** As currently set
up, many of the examples will _not_ work with a weaker model. Weaker models may
require more detailed or different prompting, and possibly a more iterative
approach with multiple agents to verify and retry, etc — this is on our roadmap.

All the example scripts are meant to be run on the command line.
In each script there is a description and sometimes instructions on how to run
the script.

NOTE: When you run any script, it pauses for “human” input at every step, and
depending on the context, you can either hit enter to continue, or in case there
is a question/response expected from the human, you can enter your question or
response and then hit enter.

### Basic Examples
- [`/examples/basic/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat.py) This is a basic chat application.

    - Illustrates Agent task loop.

- [`/examples/basic/autocorrect.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/autocorrect.py) Chat with autocorrect: type fast and carelessly/lazily and 
the LLM will try its best to interpret what you want, and offer choices when confused.

    - Illustrates Agent task loop.

- [`/examples/basic/chat-search.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat-search.py)  This uses a `GoogleSearchTool` function-call/tool to answer questions using a google web search if needed.
  Try asking questions about facts known after Sep 2021 (GPT4 training cutoff),
  like  `when was llama2 released`
  
    - Illustrates Agent + Tools/function-calling + web-search

- [`/examples/basic/chat-tree.py`](https://github.com/langroid/langroid-examples/blob/main/examples/basic/chat-tree.py) is a toy example of tree-structured multi-agent
  computation, see a detailed writeup [here.](https://langroid.github.io/langroid/examples/agent-tree/)
  
    - Illustrates multi-agent task collaboration, task delegation.

### Document-chat examples, or RAG (Retrieval Augmented Generation)

- [`/examples/docqa/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py) is a document-chat application. Point it to local file,
  directory or web url, and ask questions
    - Illustrates basic RAG
- [`/examples/docqa/chat-search.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat-search.py): ask about anything and it will try to answer
  based on docs indexed in vector-db, otherwise it will do a Google search, and
  index the results in the vec-db for this and later answers.
    - Illustrates RAG + Function-calling/tools
- [`/examples/docqa/chat_multi.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi.py):  — this is a 2-agent system that will summarize
  a large document with 5 bullet points: the first agent generates questions for
  the retrieval agent, and is done when it gathers 5 key points.
    - Illustrates 2-agent collaboration + RAG to summarize a document
- [`/examples/docqa/chat_multi_extract.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py):  — extracts structured info from a
  lease document: Main agent asks questions to a retrieval agent. 
    - Illustrates 2-agent collaboration, RAG, Function-calling/tools, Structured Information Extraction.

### Data-chat examples (tabular, SQL)

- [`/examples/data-qa/table_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/table_chat):  - point to a URL or local csv file and ask
  questions. The agent generates pandas code that is run within langroid.
    - Illustrates function-calling/tools and code-generation
- [`/examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/sql-chat/sql_chat.py):  — chat with a sql db — ask questions in
  English, it will generate sql code to answer them.
  See [tutorial here](https://langroid.github.io/langroid/tutorials/postgresql-agent/)
    - Illustrates function-calling/tools and code-generation

---

# Langroid: Harness LLMs with Multi-Agent Programming

## The LLM Opportunity

Given the remarkable abilities of recent Large Language Models (LLMs), there
is an unprecedented opportunity to build intelligent applications powered by
this transformative technology. The top question for any enterprise is: how
best to harness the power of LLMs for complex applications? For technical and
practical reasons, building LLM-powered applications is not as simple as
throwing a task at an LLM-system and expecting it to do it.

## Langroid's Multi-Agent Programming Framework

Effectively leveraging LLMs at scale requires a *principled programming 
framework*. In particular, there is often a need to maintain multiple LLM 
conversations, each instructed in different ways, and "responsible" for 
different aspects of a task.

An *agent* is a convenient abstraction that encapsulates LLM conversation 
state, along with access to long-term memory (vector-stores) and tools (a.k.a functions 
or plugins). Thus a **Multi-Agent Programming** framework is a natural fit 
for complex LLM-based applications.

> Langroid is the first Python LLM-application framework that was explicitly 
designed  with Agents as first-class citizens, and Multi-Agent Programming 
as the core  design principle. The framework is inspired by ideas from the 
[Actor Framework](https://en.wikipedia.org/wiki/Actor_model).

Langroid allows an intuitive definition of agents, tasks and task-delegation 
among agents. There is a principled mechanism to orchestrate multi-agent 
collaboration. Agents act as message-transformers, and take turns responding to (and
transforming) the current message. The architecture is lightweight, transparent, 
flexible, and allows other types of orchestration to be implemented; see the (WIP) 
[langroid architecture document](blog/posts/langroid-architecture.md).
Besides Agents, Langroid also provides simple ways to directly interact with LLMs and vector-stores. See the Langroid [quick-tour](tutorials/langroid-tour.md).

## Highlights
- **Agents as first-class citizens:** The `Agent` class encapsulates LLM conversation state,
  and optionally a vector-store and tools. Agents are a core abstraction in Langroid; 
  Agents act as _message transformers_, and by default provide 3 _responder_ methods, one corresponding to each 
  entity: LLM, Agent, User. 
- **Tasks:** A Task class wraps an Agent, gives the agent instructions (or roles, or goals),
  manages iteration over an Agent's responder methods,
  and orchestrates multi-agent interactions via hierarchical, recursive
  task-delegation. The `Task.run()` method has the same
  type-signature as an Agent's responder's methods, and this is key to how
  a task of an agent can delegate to other sub-tasks: from the point of view of a Task,
  sub-tasks are simply additional responders, to be used in a round-robin fashion
  after the agent's own responders.
- **Modularity, Reusabilily, Loose coupling:** The `Agent` and `Task` abstractions allow users to design
  Agents with specific skills, wrap them in Tasks, and combine tasks in a flexible way.
- **LLM Support**: Langroid works with practically any LLM, local/open or remote/proprietary/API-based, via a variety of libraries and providers. See guides to using [local LLMs](tutorials/local-llm-setup.md) and [non-OpenAI LLMs](tutorials/non-openai-llms.md). See [Supported LLMs](tutorials/supported-models.md).
- **Caching of LLM prompts, responses:** Langroid by default uses [Redis](https://redis.com/try-free/) for caching. 
- **Vector-stores**: [Qdrant](https://qdrant.tech/), [Chroma](https://www.trychroma.com/) and [LanceDB](https://www.lancedb.com/) are currently supported.
  Vector stores allow for Retrieval-Augmented-Generation (RAG).
- **Grounding and source-citation:** Access to external documents via vector-stores
  allows for grounding and source-citation.
- **Observability, Logging, Lineage:** Langroid generates detailed logs of multi-agent interactions and
  maintains provenance/lineage of messages, so that you can trace back
  the origin of a message.
- **Tools/Plugins/Function-calling**: Langroid supports OpenAI's recently
  released [function calling](https://platform.openai.com/docs/guides/gpt/function-calling)
  feature. In addition, Langroid has its own native equivalent, which we
  call **tools** (also known as "plugins" in other contexts). Function
  calling and tools have the same developer-facing interface, implemented
  using [Pydantic](https://docs.pydantic.dev/latest/),
  which makes it very easy to define tools/functions and enable agents
  to use them. Benefits of using Pydantic are that you never have to write
  complex JSON specs for function calling, and when the LLM
  hallucinates malformed JSON, the Pydantic error message is sent back to
  the LLM so it can fix it!


Don't worry if some of these terms are not clear to you. 
The [Getting Started Guide](quick-start/index.md) and subsequent pages 
will help you get up to speed.

---

# Suppressing output in async, streaming mode

Available since version 0.18.0

When using an LLM API in streaming + async mode, you may want to suppress output,
especially when concurrently running multiple instances of the API.
To suppress output in async + stream mode, 
you can set the `async_stream_quiet` flag in [`LLMConfig`][langroid.language_models.base.LLMConfig]
to `True` (this is the default). 
Note that [`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig]
inherits from `LLMConfig`, so you can use this flag with `OpenAIGPTConfig` as well:

```python
import langroid.language_models as lm
llm_config = lm.OpenAIGPTConfig(
    async_stream_quiet=True,
    ...
)
```

---

# Azure OpenAI Models

To use OpenAI models deployed on Azure, first ensure a few environment variables
are defined (either in your `.env` file or in your environment):

- `AZURE_OPENAI_API_KEY`, from the value of `API_KEY`
- `AZURE_OPENAI_API_BASE` from the value of `ENDPOINT`, typically looks like `https://your_resource.openai.azure.com`.
- For `AZURE_OPENAI_API_VERSION`, you can use the default value in `.env-template`, and latest version can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#azure-openai-chat-completion-general-availability-ga)
- `AZURE_OPENAI_DEPLOYMENT_NAME` is an OPTIONAL deployment name which may be
  defined by the user during the model setup.
- `AZURE_OPENAI_CHAT_MODEL` Azure OpenAI allows specific model names when you select the model for your deployment. You need to put precisely the exact model name that was selected. For example, GPT-3.5 (should be `gpt-35-turbo-16k` or `gpt-35-turbo`) or GPT-4 (should be `gpt-4-32k` or `gpt-4`).
- `AZURE_OPENAI_MODEL_NAME` (Deprecated, use `AZURE_OPENAI_CHAT_MODEL` instead).

This page [Microsoft Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-python#environment-variables) 
provides more information on how to obtain these values.

To use an Azure-deployed model in Langroid, you can use the `AzureConfig` class:

```python
import langroid.language_models as lm
import langroid as lr

llm_config = lm.AzureConfig(
    chat_model="gpt-4o"
    # the other settings can be provided explicitly here, 
    # or are obtained from the environment
)
llm = lm.AzureGPT(config=llm_config)

response = llm.chat(
  messages=[
    lm.LLMMessage(role=lm.Role.SYSTEM, content="You are a helpful assistant."),
    lm.LLMMessage(role=lm.Role.USER, content="3+4=?"),
  ]
)

agent = lr.ChatAgent(
    lr.ChatAgentConfig(
        llm=llm_config,
        system_message="You are a helpful assistant.",
    )
)

response = agent.llm_response("is 4 odd?")  
print(response.content)  # "Yes, 4 is an even number."
response = agent.llm_response("what about 2?")  # follow-up question
```

---

# Document Chunking/Splitting in Langroid

Langroid's [`ParsingConfig`][langroid.parsing.parser.ParsingConfig]
provides several document chunking strategies through the `Splitter` enum:

## 1. MARKDOWN (`Splitter.MARKDOWN`) (The default)

**Purpose**: Structure-aware splitting that preserves markdown formatting.

**How it works**:

- Preserves document hierarchy (headers and sections)
- Enriches chunks with header information
- Uses word count instead of token count (with adjustment factor)
- Supports "rollup" to maintain document structure
- Ideal for markdown documents where preserving formatting is important

## 2. TOKENS (`Splitter.TOKENS`)

**Purpose**: Creates chunks of approximately equal token size.

**How it works**:

- Tokenizes the text using tiktoken
- Aims for chunks of size `chunk_size` tokens (default: 200)
- Looks for natural breakpoints like punctuation or newlines
- Prefers splitting at sentence/paragraph boundaries
- Ensures chunks are at least `min_chunk_chars` long (default: 350)

## 3. PARA_SENTENCE (`Splitter.PARA_SENTENCE`)

**Purpose**: Splits documents respecting paragraph and sentence boundaries.

**How it works**:

- Recursively splits documents until chunks are below 1.3× the target size
- Maintains document structure by preserving natural paragraph breaks
- Adjusts chunk boundaries to avoid cutting in the middle of sentences
- Stops when it can't split chunks further without breaking coherence

## 4. SIMPLE (`Splitter.SIMPLE`)

**Purpose**: Basic splitting using predefined separators.

**How it works**:

- Uses a list of separators to split text (default: `["\n\n", "\n", " ", ""]`)
- Splits on the first separator in the list
- Doesn't attempt to balance chunk sizes
- Simplest and fastest splitting method


## Basic Configuration

```python
from langroid.parsing.parser import ParsingConfig, Splitter

config = ParsingConfig(
    splitter=Splitter.MARKDOWN,  # Most feature-rich option
    chunk_size=200,              # Target tokens per chunk
    chunk_size_variation=0.30,   # Allowed variation from target
    overlap=50,                  # Token overlap between chunks
    token_encoding_model="text-embedding-3-small"
)
```

## Format-Specific Configuration

```python
# Customize PDF parsing
config = ParsingConfig(
    splitter=Splitter.PARA_SENTENCE,
    pdf=PdfParsingConfig(
        library="pymupdf4llm"  # Default PDF parser
    )
)

# Use Gemini for PDF parsing
config = ParsingConfig(
    pdf=PdfParsingConfig(
        library="gemini",
        gemini_config=GeminiConfig(
            model_name="gemini-2.0-flash",
            requests_per_minute=5
        )
    )
)
```

# Setting Up Parsing Config in DocChatAgentConfig

You can configure document parsing when creating a `DocChatAgent` by customizing the `parsing` field within the `DocChatAgentConfig`. Here's how to do it:

```python
from langroid.agent.special.doc_chat_agent import DocChatAgentConfig  
from langroid.parsing.parser import ParsingConfig, Splitter, PdfParsingConfig

# Create a DocChatAgent with custom parsing configuration
agent_config = DocChatAgentConfig(
    parsing=ParsingConfig(
        # Choose the splitting strategy
        splitter=Splitter.MARKDOWN,  # Structure-aware splitting with header context
        
        # Configure chunk sizes
        chunk_size=800,              # Target tokens per chunk
        overlap=150,                 # Overlap between chunks
        
        # Configure chunk behavior
        max_chunks=5000,             # Maximum number of chunks to create
        min_chunk_chars=250,         # Minimum characters when truncating at punctuation
        discard_chunk_chars=10,      # Discard chunks smaller than this
        
        # Configure context window
        n_neighbor_ids=3,            # Store 3 chunk IDs on either side
        
        # Configure PDF parsing specifically
        pdf=PdfParsingConfig(
            library="pymupdf4llm",   # Choose PDF parsing library
        )
    )
)
```

---

# Code Injection Protection with full_eval Flag

Available in Langroid since v0.53.15.

Langroid provides a security feature that helps protect against code injection vulnerabilities when evaluating pandas expressions in `TableChatAgent` and `VectorStore`. This protection is controlled by the `full_eval` flag, which defaults to `False` for maximum security, but can be set to `True` when working in trusted environments.

## Background

When executing dynamic pandas expressions within `TableChatAgent` and in `VectorStore.compute_from_docs()`, there is a risk of code injection if malicious input is provided. To mitigate this risk, Langroid implements a command sanitization system that validates and restricts the operations that can be performed.

## How It Works

The sanitization system uses AST (Abstract Syntax Tree) analysis to enforce a security policy that:

1. Restricts DataFrame methods to a safe whitelist
2. Prevents access to potentially dangerous methods and arguments
3. Limits expression depth and method chaining
4. Validates literals and numeric values to be within safe bounds
5. Blocks access to any variables other than the provided DataFrame

When `full_eval=False` (the default), all expressions are run through this sanitization process before evaluation. When `full_eval=True`, the sanitization is bypassed, allowing full access to pandas functionality.

## Configuration Options

### In TableChatAgent

```python
from langroid.agent.special.table_chat_agent import TableChatAgentConfig, TableChatAgent

config = TableChatAgentConfig(
    data=my_dataframe,
    full_eval=False,  # Default: True only for trusted input
)

agent = TableChatAgent(config)
```

### In VectorStore

```python
from langroid.vector_store.lancedb import LanceDBConfig, LanceDB

config = LanceDBConfig(
    collection_name="my_collection",
    full_eval=False,  # Default: True only for trusted input
)

vectorstore = LanceDB(config)
```

## When to Use full_eval=True

Set `full_eval=True` only when:

1. All input comes from trusted sources (not from users or external systems)
2. You need full pandas functionality that goes beyond the whitelisted methods
3. You're working in a controlled development or testing environment

## Security Considerations

- By default, `full_eval=False` provides a good balance of security and functionality
- The whitelisted operations support most common pandas operations
- Setting `full_eval=True` removes all protection and should be used with caution
- Even with protection, always validate input when possible

## Affected Classes

The `full_eval` flag affects the following components:

1. `TableChatAgentConfig` and `TableChatAgent` - Controls sanitization in the `pandas_eval` method
2. `VectorStoreConfig` and `VectorStore` - Controls sanitization in the `compute_from_docs` method
3. All implementations of `VectorStore` (ChromaDB, LanceDB, MeiliSearch, PineconeDB, PostgresDB, QdrantDB, WeaviateDB)

## Example: Safe Pandas Operations

When `full_eval=False`, the following operations are allowed:

```python
# Allowed operations (non-exhaustive list)
df.head()
df.groupby('column')['value'].mean()
df[df['column'] > 10]
df.sort_values('column', ascending=False)
df.pivot_table(...)
```

Some operations that might be blocked include:

```python
# Potentially blocked operations
df.eval("dangerous_expression")
df.query("dangerous_query")
df.apply(lambda x: dangerous_function(x))
```

## Testing Considerations

When writing tests that use `TableChatAgent` or `VectorStore.compute_from_docs()` with pandas expressions that go beyond the whitelisted operations, you may need to set `full_eval=True` to ensure the tests pass.

---

# Crawl4ai Crawler Documentation

## Overview

The `Crawl4aiCrawler` is a highly advanced and flexible web crawler integrated into Langroid, built on the powerful `crawl4ai` library. It uses a real browser engine (Playwright) to render web pages, making it exceptionally effective at handling modern, JavaScript-heavy websites. This crawler provides a rich set of features for simple page scraping, deep-site crawling, and sophisticated data extraction, making it the most powerful crawling option available in Langroid.

It is a local crawler, so no need for API keys.

## Installation

To use `Crawl4aiCrawler`, you must install the `crawl4ai` extra dependencies.

To install and prepare crawl4ai:

```bash
# Install langroid with crawl4ai support
pip install "langroid[crawl4ai]"
crawl4ai setup
crawl4ai doctor

```

> **Note**: The `crawl4ai setup` command will download Playwright browsers (Chromium, Firefox, WebKit) on first run. This is a one-time download that can be several hundred MB in size. The browsers are stored locally and used for rendering web pages.

## Key Features

- **Real Browser Rendering**: Accurately processes dynamic content, single-page applications (SPAs), and sites that require JavaScript execution.

- **Simple and Deep Crawling**: Can scrape a list of individual URLs (`simple` mode) or perform a recursive, deep crawl of a website starting from a seed URL (`deep` mode).

- **Powerful Extraction Strategies**:

  - **Structured JSON (No LLM)**: Extract data into a predefined JSON structure using CSS selectors, XPath, or Regex patterns. This is extremely fast, reliable, and cost-effective.

  - **LLM-Based Extraction**: Leverage Large Language Models (like GPT or Gemini) to extract data from unstructured content based on natural language instructions and a Pydantic schema.

- **Advanced Markdown Generation**: Go beyond basic HTML-to-markdown conversion. Apply content filters to prune irrelevant sections (sidebars, ads, footers) or use an LLM to intelligently reformat content for maximum relevance, perfect for RAG pipelines.

- **High-Performance Scraping**: Optionally use an LXML-based scraping strategy for a significant speed boost on large HTML documents.

- **Fine-Grained Configuration**: Offers detailed control over browser behavior (`BrowserConfig`) and individual crawl runs (`CrawlerRunConfig`) for advanced use cases.

## Configuration (`Crawl4aiConfig`)

The `Crawl4aiCrawler` is configured via the `Crawl4aiConfig` object. This class acts as a high-level interface to the underlying `crawl4ai` library's settings.

All of the strategies are optional.
Learn more about these strategies , browser_config and run_config at [Crawl4AI docs](https://docs.crawl4ai.com/)

```python
from langroid.parsing.url_loader import Crawl4aiConfig

# All parameters are optional and have sensible defaults
config = Crawl4aiConfig(
    crawl_mode="simple",  # or "deep"
    extraction_strategy=...,
    markdown_strategy=...,
    deep_crawl_strategy=...,
    scraping_strategy=...,
    browser_config=...,  # For advanced browser settings
    run_config=...,      # For advanced crawl-run settings
)
```

**Main Parameters:**

- `crawl_mode` (str):

  - `"simple"` (default): Crawls each URL in the provided list individually.

  - `"deep"`: Starts from the first URL in the list and recursively crawls linked pages based on the `deep_crawl_strategy`.

  - Make sure you are setting `"crawl_mode=deep"` whenever you are deep crawling this is crucial for smooth functioning.

- `extraction_strategy` (`ExtractionStrategy`): Defines how to extract structured data from a page. If set, the `Document.content` will be a **JSON string** containing the extracted data.

- `markdown_strategy` (`MarkdownGenerationStrategy`): Defines how to convert HTML to markdown. This is used when `extraction_strategy` is not set. The `Document.content` will be a **markdown string**.

- `deep_crawl_strategy` (`DeepCrawlStrategy`): Configuration for deep crawling, such as `max_depth`, `max_pages`, and URL filters. Only used when `crawl_mode` is `"deep"`.

- `scraping_strategy` (`ContentScrapingStrategy`): Specifies the underlying HTML parsing engine. Useful for performance tuning.

- `browser_config` & `run_config`: For advanced users to pass detailed `BrowserConfig` and `CrawlerRunConfig` objects directly from the `crawl4ai` library.

---

## Usage Examples

These are representative examples. For runnable examples check the script [`examples/docqa/crawl4ai_examples.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/crawl4ai_examples.py)

### 1. Simple Crawling (Default Markdown)

This is the most basic usage. It will fetch the content of each URL and convert it to clean markdown.

```python
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig

urls = [
    "https://pytorch.org/",
    "https://techcrunch.com/",
]

# Use default settings
crawler_config = Crawl4aiConfig()
loader = URLLoader(urls=urls, crawler_config=crawler_config)

docs = loader.load()
for doc in docs:
    print(f"URL: {doc.metadata.source}")
    print(f"Content (first 200 chars): {doc.content[:200]}")
```

### 2. Structured JSON Extraction (No LLM)

When you need to extract specific, repeated data fields from a page, schema-based extraction is the best choice. It's fast, precise, and free of LLM costs. The result in `Document.content` is a JSON string.

#### a. Using CSS Selectors (`JsonCssExtractionStrategy`)

This example scrapes titles and links from the Hacker News front page.

```python
import json
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy

HACKER_NEWS_URL = "https://news.ycombinator.com"
HACKER_NEWS_SCHEMA = {
    "name": "HackerNewsArticles",
    "baseSelector": "tr.athing",
    "fields": [
        {"name": "title", "selector": "span.titleline > a", "type": "text"},
        {"name": "link", "selector": "span.titleline > a", "type": "attribute", "attribute": "href"},
    ],
}

# Create the strategy and pass it to the config
css_strategy = JsonCssExtractionStrategy(schema=HACKER_NEWS_SCHEMA)
crawler_config = Crawl4aiConfig(extraction_strategy=css_strategy)

loader = URLLoader(urls=[HACKER_NEWS_URL], crawler_config=crawler_config)
documents = loader.load()

# The Document.content will contain the JSON string
extracted_data = json.loads(documents[0].content)
print(json.dumps(extracted_data[:3], indent=2))
```

#### b. Using Regex (`RegexExtractionStrategy`)

This is ideal for finding common patterns like emails, URLs, or phone numbers.

```python
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.extraction_strategy import RegexExtractionStrategy

url = "https://www.scrapethissite.com/pages/forms/"

# Combine multiple built-in patterns
regex_strategy = RegexExtractionStrategy(
    pattern=(
        RegexExtractionStrategy.Email
        | RegexExtractionStrategy.Url
        | RegexExtractionStrategy.PhoneUS
    )
)

crawler_config = Crawl4aiConfig(extraction_strategy=regex_strategy)
loader = URLLoader(urls=[url], crawler_config=crawler_config)
documents = loader.load()

print(documents[0].content)
```

### 3. Advanced Markdown Generation

For RAG applications, the quality of the markdown is crucial. These strategies produce highly relevant, clean text. The result in `Document.content` is the filtered markdown (`fit_markdown`).

#### a. Pruning Filter (`PruningContentFilter`)

This filter heuristically removes boilerplate content based on text density, link density, and common noisy tags.

```python
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
from crawl4ai.content_filter_strategy import PruningContentFilter

prune_filter = PruningContentFilter(threshold=0.6, min_word_threshold=10)
md_generator = DefaultMarkdownGenerator(
    content_filter=prune_filter,
    options={"ignore_links": True}
)

crawler_config = Crawl4aiConfig(markdown_strategy=md_generator)
loader = URLLoader(urls=["https://news.ycombinator.com"], crawler_config=crawler_config)
docs = loader.load()

print(docs[0].content[:500])
```

#### b. LLM Filter (`LLMContentFilter`)

Use an LLM to semantically understand the content and extract only the relevant parts based on your instructions. This is extremely powerful for creating topic-focused documents.

```python
import os
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.async_configs import LLMConfig
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
from crawl4ai.content_filter_strategy import LLMContentFilter

# Requires an API key, e.g., OPENAI_API_KEY
llm_filter = LLMContentFilter(
    llm_config=LLMConfig(
        provider="openai/gpt-4o-mini",
        api_token=os.getenv("OPENAI_API_KEY"),
    ),
    instruction="""
    Extract only the main article content.
    Exclude all navigation, sidebars, comments, and footer content.
    Format the output as clean, readable markdown.
    """,
    chunk_token_threshold=4096,
)

md_generator = DefaultMarkdownGenerator(content_filter=llm_filter)
crawler_config = Crawl4aiConfig(markdown_strategy=md_generator)
loader = URLLoader(urls=["https://www.theverge.com/tech"], crawler_config=crawler_config)
docs = loader.load()

print(docs[0].content)
```

### 4. Deep Crawling

To crawl an entire website or a specific section, use `deep` mode.

Recommended setting is BestFirstCrawlingStrategy

```python
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.deep_crawling import BestFirstCrawlingStrategy
from crawl4ai.deep_crawling.filters import FilterChain, URLPatternFilter


deep_crawl_strategy = BestFirstCrawlingStrategy(
    max_depth=2,
    include_external=False,
    max_pages=25,              # Maximum number of pages to crawl (optional)
    filter_chain=FilterChain([URLPatternFilter(patterns=["*core*"])]) # Pattern matching for granular control (optional)
)

crawler_config = Crawl4aiConfig(
    crawl_mode="deep",
    deep_crawl_strategy=deep_crawl_strategy
)

loader = URLLoader(urls=["https://docs.crawl4ai.com/"], crawler_config=crawler_config)
docs = loader.load()

print(f"Crawled {len(docs)} pages.")
for doc in docs:
    print(f"- {doc.metadata.source}")
```

### 5. High-Performance Scraping (`LXMLWebScrapingStrategy`)

For a performance boost, especially on very large, static HTML pages, switch the scraping strategy to LXML.

```python
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.content_scraping_strategy import LXMLWebScrapingStrategy

crawler_config = Crawl4aiConfig(
    scraping_strategy=LXMLWebScrapingStrategy()
)

loader = URLLoader(urls=["https://www.nbcnews.com/business"], crawler_config=crawler_config)
docs = loader.load()
print(f"Content Length: {len(docs[0].content)}")
```

### 6. LLM-Based JSON Extraction (`LLMExtractionStrategy`)

When data is unstructured or requires semantic interpretation, use an LLM for extraction. This is slower and more expensive but incredibly flexible. The result in `Document.content` is a JSON string.

```python
import os
import json
from langroid.pydantic_v1 import BaseModel, Field
from typing import Optional
from langroid.parsing.url_loader import URLLoader, Crawl4aiConfig
from crawl4ai.async_configs import LLMConfig
from crawl4ai.extraction_strategy import LLMExtractionStrategy

# Define the data structure you want to extract
class ArticleData(BaseModel):
    headline: str
    summary: str = Field(description="A short summary of the article")
    author: Optional[str] = None

# Configure the LLM strategy
llm_strategy = LLMExtractionStrategy(
    llm_config=LLMConfig(
        provider="openai/gpt-4o-mini",
        api_token=os.getenv("OPENAI_API_KEY"),
    ),
    schema=ArticleData.schema_json(),
    extraction_type="schema",
    instruction="Extract the headline, summary, and author of the main article.",
)

crawler_config = Crawl4aiConfig(extraction_strategy=llm_strategy)
loader = URLLoader(urls=["https://news.ycombinator.com"], crawler_config=crawler_config)
docs = loader.load()

extracted_data = json.loads(docs[0].content)
print(json.dumps(extracted_data, indent=2))
```

## How It Handles Different Content Types

The `Crawl4aiCrawler` is smart about handling different types of URLs:

- **Web Pages** (e.g., `http://...`, `https://...`): These are processed by the `crawl4ai` browser engine. The output format (`markdown` or `JSON`) depends on the strategy you configure in `Crawl4aiConfig`.
- **Local and Remote Documents** (e.g., URLs ending in `.pdf`, `.docx`): These are automatically detected and delegated to Langroid's internal `DocumentParser`. This ensures that documents are properly parsed and chunked according to your `ParsingConfig`, just like with other Langroid tools.

## Conclusion

The `Crawl4aiCrawler` is a feature-rich, powerful tool for any web-based data extraction task.

- For **simple, clean text**, use the default `Crawl4aiConfig`.

- For **structured data from consistent sites**, use `JsonCssExtractionStrategy` or `RegexExtractionStrategy` for unbeatable speed and reliability.

- To create **high-quality, focused content for RAG**, use `PruningContentFilter` or the `LLMContentFilter` with the `DefaultMarkdownGenerator`.

- To scrape an **entire website**, use `deep_crawl_strategy` with `crawl_mode="deep"`.

- For **complex or unstructured data** that needs AI interpretation, `LLMExtractionStrategy` provides a flexible solution.

---

# Custom Azure OpenAI client

!!! warning "This is only for using a Custom Azure OpenAI client"
    This note **only** meant for those who are trying to use a custom Azure client,
    and is NOT TYPICAL for most users. For typical usage of Azure-deployed models with Langroid, see
    the [docs](https://langroid.github.io/langroid/notes/azure-openai-models/), 
    the [`test_azure_openai.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_azure_openai.py) and
    [`example/basic/chat.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat.py)


Example showing how to use Langroid with Azure OpenAI and Entra ID
authentication by providing a custom client.

By default, Langroid manages the configuration and creation 
of the Azure OpenAI client (see the [Setup guide](https://langroid.github.io/langroid/quick-start/setup/#microsoft-azure-openai-setupoptional)
for details). In most cases, the available configuration options
are sufficient, but if you need to manage any options that
are not exposed, you instead have the option of providing a custom
client, in Langroid v0.29.0 and later. 

In order to use a custom client, you must provide a function that
returns the configured client. Depending on whether you need to make
synchronous or asynchronous calls, you need to provide the appropriate
client. A sketch of how this is done (supporting both sync and async calls)
is given below:

```python
def get_azure_openai_client():
    return AzureOpenAI(...)

def get_azure_openai_async_client():
    return AsyncAzureOpenAI(...)

lm_config = lm.AzureConfig(
    azure_openai_client_provider=get_azure_openai_client,
    azure_openai_async_client_provider=get_azure_openai_async_client,
)
```

## Microsoft Entra ID Authentication

A key use case for a custom client is [Microsoft Entra ID authentication](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/managed-identity).
Here you need to provide an `azure_ad_token_provider` to the client. 
For examples on this, see [examples/basic/chat-azure-client.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-azure-client.py) 
and [examples/basic/chat-azure-async-client.py](https://github.com/langroid/langroid/blob/main/examples/basic/chat-azure-async-client.py).

---

# Enriching Chunked Documents for Better Retrieval

Available in Langroid v0.34.0 or later. 

When using the `DocChatAgent` for RAG with documents in highly specialized/technical
domains, retrieval accuracy may be low since embeddings are not sufficient to capture 
relationships between entities, e.g. suppose a document-chunk consists of a medical 
test name "BUN" (Blood Urea Nitrogen), and a retrieval query is looking for 
tests related to kidney function, the embedding for "BUN" may not be close to the
embedding for "kidney function", and the chunk may not be retrieved.

In such cases it is useful to *enrich* the chunked documents with additional keywords
(or even "hypothetical questions") to increase the "semantic surface area" of the chunk,
so that the chunk is more likely to be retrieved for relevant queries.

As of Langroid v0.34.0, you can provide a `chunk_enrichment_config` 
of type `ChunkEnrichmentAgentConfig`, in the `DocChatAgentConfig`. 
This config extends `ChatAgentConfig` and has the following fields:

- `batch_size` (int): The batch size for the chunk enrichment agent. Default is 50.
- `delimiter` (str): The delimiter to use when 
   concatenating the chunk and the enriched text. 
- `enrichment_prompt_fn`: function (`str->str`) that creates a prompt
  from a doc-chunk string `x`

In the above medical test example, suppose we want to augment a chunk containing
only the medical test name, with the organ system it is related to. We can set up
a `ChunkEnrichmentAgentConfig` as follows:

```python
from langroid.agent.special.doc.doc_chat_agent import (
    ChunkEnrichmentAgentConfig,
)

enrichment_config = ChunkEnrichmentAgentConfig(
    batch_size=10,
    system_message=f"""
        You are an experienced clinical physician, very well-versed in
        medical tests and their names.
        You will be asked to identify WHICH ORGAN(s) Function/Health
        a test name is most closely associated with, to aid in 
        retrieving the medical test names more accurately from an embeddings db
        that contains thousands of such test names.
        The idea is to use the ORGAN NAME(S) provided by you, 
        to make the right test names easier to discover via keyword-matching
        or semantic (embedding) similarity.
         Your job is to generate up to 3 ORGAN NAMES
         MOST CLOSELY associated with the test name shown, ONE PER LINE.
         DO NOT SAY ANYTHING ELSE, and DO NOT BE OBLIGATED to provide 3 organs --
         if there is just one or two that are most relevant, that is fine.
        Examples:
          "cholesterol" -> "heart function", 
          "LDL" -> "artery health", etc,
          "PSA" -> "prostate health", 
          "TSH" -> "thyroid function", etc.                
        """,
    enrichment_prompt_fn=lambda test: f"""
        Which ORGAN(S) Function/Health is the medical test named 
        '{test}' most closely associated with?
        """,
)

doc_agent_config = DocChatAgentConfig(
    chunk_enrichment_config=enrichment_config,
    ...
)
```

This works as follows:

- Before ingesting document-chunks into the vector-db, a specialized 
  "chunk enrichment" agent is created, configured with the `enrichment_config` above.
- For each document-chunk `x`, the agent's `llm_response_forget_async` method is called
 using the prompt created by `enrichment_prompt_fn(x)`. The resulting response text 
 `y` is concatenated with the original chunk text `x` using the `delimiter`,
  before storing in the vector-db. This is done in batches of size `batch_size`.
- At query time, after chunk retrieval, before generating the final LLM response,
  the enrichments are stripped from the retrieved chunks, and the original content
  of the retrieved chunks are passed to the LLM for generating the final response.

See the script 
[`examples/docqa/doc-chunk-enrich.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/doc-chunk-enrich.py)
for a complete example. Also see the tests related to "enrichment" in 
[`test_doc_chat_agent.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_doc_chat_agent.py).

---

# PDF Files and Image inputs to LLMs

Langroid supports sending PDF files and images (either URLs or local files)
directly to Large Language Models with multi-modal 
capabilities. This feature allows models to "see" files and other documents,
and works with most multi-modal models served via an OpenAI-compatible API,
e.g.:

- OpenAI's GPT-4o series and GPT-4.1 series
- Gemini models
- Claude series models (via OpenAI-compatible providers like OpenRouter or LiteLLM )

To see example usage, see:

- tests: [test_llm.py](https://github.com/langroid/langroid/blob/main/tests/main/test_llm.py), 
   [test_llm_async.py](https://github.com/langroid/langroid/blob/main/tests/main/test_llm_async.py),
   [test_chat-agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_chat_agent.py).
- example script: [pdf-json-no-parse.py](https://github.com/langroid/langroid/blob/main/examples/extract/pdf-json-no-parse.py), which shows
  how you can directly extract structured information from a document 
  **without having to first parse it to markdown** (which is inherently lossy).

## Basic Usage directly with LLM `chat` and `achat` methods

First create a `FileAttachment` object using one of the `from_` methods.
For image (`png`, `jpg/jpeg`) files you can use `FileAttachment.from_path(p)`
where `p` is either a local file path, or a http/https URL.
For PDF files, you can use `from_path` with a local file, or `from_bytes` or `from_io`
(see below). In the examples below we show only `pdf` examples.

```python
from langroid.language_models.base import LLMMessage, Role
from langroid.parsing.file_attachment import FileAttachment
import langroid.language_models as lm

# Create a file attachment
attachment = FileAttachment.from_path("path/to/document.pdf")

# Create messages with attachment
messages = [
    LLMMessage(role=Role.SYSTEM, content="You are a helpful assistant."),
    LLMMessage(
        role=Role.USER, content="What's the title of this document?", 
        files=[attachment]
    )
]

# Set up LLM with model that supports attachments
llm = lm.OpenAIGPT(lm.OpenAIGPTConfig(chat_model=lm.OpenAIChatModel.GPT4o))

# Get response
response = llm.chat(messages=messages)
```

## Supported File Formats

Currently the OpenAI-API supports:

- PDF files (including image-based PDFs)
- image files and URLs


## Creating Attachments

There are multiple ways to create file attachments:

```python
# From a file path
attachment = FileAttachment.from_path("path/to/file.pdf")

# From bytes
with open("path/to/file.pdf", "rb") as f:
    attachment = FileAttachment.from_bytes(f.read(), filename="document.pdf")

# From a file-like object
from io import BytesIO
file_obj = BytesIO(pdf_bytes)
attachment = FileAttachment.from_io(file_obj, filename="document.pdf")
```

## Follow-up Questions

You can continue the conversation with follow-up questions that reference the attached files:

```python
messages.append(LLMMessage(role=Role.ASSISTANT, content=response.message))
messages.append(LLMMessage(role=Role.USER, content="What is the main topic?"))
response = llm.chat(messages=messages)
```

## Multiple Attachments

Langroid allows multiple files can be sent in a single message,
but as of 16 Apr 2025, sending multiple PDF files does not appear to be properly supported in the 
APIs (they seem to only use the last file attached), although sending multiple 
images does work. 

```python
messages = [
    LLMMessage(
        role=Role.USER,
        content="Compare these documents",
        files=[attachment1, attachment2]
    )
]
```

## Using File Attachments with Agents

Agents can process file attachments as well, in the `llm_response` method,
which takes a `ChatDocument` object as input. 
To pass in file attachments, include the `files` field in the `ChatDocument`,
in addition to the content:

```python
import langroid as lr
from langroid.agent.chat_document import ChatDocument, ChatDocMetaData
from langroid.mytypes import Entity


agent = lr.ChatAgent(lr.ChatAgentConfig())

user_input = ChatDocument(
    content="What is the title of this document?",
    files=[attachment],
    metadata=ChatDocMetaData(
        sender=Entity.USER,
    )
)
# or more simply, use the agent's `create_user_response` method:
# user_input = agent.create_user_response(
#     content="What is the title of this document?",
#     files=[attachment],    
# )
response = agent.llm_response(user_input)
```


## Using File Attachments with Tasks

In Langroid,  `Task.run()` can take a `ChatDocument` object as input,
and as mentioned above, it can contain attached files in the `files` field.
To ensure proper orchestration, you'd want to properly set various `metadata` fields
as well, such as `sender`, etc. Langroid provides a convenient 
`create_user_response` method to create a `ChatDocument` object with the necessary 
metadata, so you only need to specify the `content` and `files` fields:


```python
from langroid.parsing.file_attachment import FileAttachment
from langroid.agent.task import Task

agent = ...
# Create task
task = Task(agent, interactive=True)

# Create a file attachment
attachment = FileAttachment.from_path("path/to/document.pdf")

# Create input with attachment
input_message = agent.create_user_response(
    content="Extract data from this document",
    files=[attachment]
)

# Run task with file attachment
result = task.run(input_message)
```

See the script [`pdf-json-no-parse.py`](https://github.com/langroid/langroid/blob/main/examples/extract/pdf-json-no-parse.py)
for a complete example of using file attachments with tasks.

## Practical Applications

- PDF document analysis and data extraction
- Report summarization
- Structured information extraction from documents
- Visual content analysis

For more complex applications, consider using the Task and Agent infrastructure in 
Langroid to orchestrate multi-step document processing workflows.

---

# Gemini LLMs & Embeddings via OpenAI client (without LiteLLM)

As of Langroid v0.21.0 you can use Langroid with Gemini LLMs directly 
via the OpenAI client, without using adapter libraries like LiteLLM.

See details [here](https://langroid.github.io/langroid/tutorials/non-openai-llms/)

You can use also Google AI Studio Embeddings or Gemini Embeddings directly
which uses google-generativeai client under the hood. 

```python 

import langroid as lr
from langroid.agent.special import DocChatAgent, DocChatAgentConfig
from langroid.embedding_models import GeminiEmbeddingsConfig

# Configure Gemini embeddings
embed_cfg = GeminiEmbeddingsConfig(
    model_type="gemini",
    model_name="models/text-embedding-004",
    dims=768,
)

# Configure the DocChatAgent 
config = DocChatAgentConfig(
    llm=lr.language_models.OpenAIGPTConfig(
        chat_model="gemini/" + lr.language_models.GeminiModel.GEMINI_1_5_FLASH_8B,
    ),
    vecdb=lr.vector_store.QdrantDBConfig(
        collection_name="quick_start_chat_agent_docs",
        replace_collection=True,
        embedding=embed_cfg,
    ),
    parsing=lr.parsing.parser.ParsingConfig(
        separators=["\n\n"],
        splitter=lr.parsing.parser.Splitter.SIMPLE,
    ),
    n_similar_chunks=2,
    n_relevant_chunks=2,
)

# Create the agent
agent = DocChatAgent(config)
```

---

# Support for Open LLMs hosted on glhf.chat

Available since v0.23.0.

If you're looking to use Langroid with one of the recent performant Open LLMs,
such as `Qwen2.5-Coder-32B-Instruct`, you can do so using our glhf.chat integration.

See [glhf.chat](https://glhf.chat/chat/create) for a list of available models.

To run with one of these models, 
set the chat_model in the `OpenAIGPTConfig` to `"glhf/<model_name>"`, 
where model_name is hf: followed by the HuggingFace repo path, 
e.g. `Qwen/Qwen2.5-Coder-32B-Instruct`, 
so the full chat_model would be `"glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct"`.

Also many of the example scripts in the main repo (under the `examples` directory) can
be run with this and other LLMs using the model-switch cli arg `-m <model>`, e.g.

```bash
python3 examples/basic/chat.py -m glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct
```

Additionally, you can run many of the tests in the `tests` directory with this model
instead of the default OpenAI `GPT4o` using `--m <model>`, e.g. 

```bash
pytest tests/main/test_chat_agent.py --m glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct
```

For more info on running langroid with Open LLMs via other providers/hosting services,
see our
[guide to using Langroid with local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/#local-llms-hosted-on-glhfchat).

---

# Handling a non-tool LLM message

A common scenario is to define a `ChatAgent`, enable it to use some tools
(i.e. `ToolMessages`s), wrap it in a Task, and call `task.run()`, e.g. 

```python
class MyTool(lr.ToolMessage)
    ...
    
import langroid as lr
config = lr.ChatAgentConfig(...)
agent = lr.ChatAgent(config)
agent.enable_message(MyTool)
task = lr.Task(agent, interactive=False)
task.run("Hello")
```

Consider what happens when you invoke `task.run()`. When the agent's `llm_response` 
returns a valid tool-call, the sequence of steps looks like this:

- `llm_response` -> tool $T$
- `aggent_response` handles $T$ -> returns results $R$
- `llm_response` responds to $R$ -> returns msg $M$
- and so on

If the LLM's response M contains a valid tool, then this cycle continues
with another tool-handling round. However, if the LLM's response M does _not_ contain
a tool-call, it is unclear whether:

- (1) the LLM "forgot" to generate a tool (or generated it wrongly, hence it was
   not recognized by Langroid as a tool), or 
- (2) the LLM's response M is an "answer" meant to be shown to the user 
    to continue the conversation, or
- (3) the LLM's response M is intended to be a "final" response, ending the task. 

Internally, when the `ChatAgent`'s `agent_response` method sees a message that does not
contain a tool, it invokes the `handle_message_fallback` method, which by default
does nothing (returns `None`). However you can override this method by deriving
from `ChatAgent`, as described in this [FAQ](https://langroid.github.io/langroid/FAQ/#how-can-i-handle-an-llm-forgetting-to-generate-a-toolmessage). As in that FAQ, 
in this fallback method, you would
typically have code that checks whether the message is a `ChatDocument`
and whether it came from the LLM, and if so, you would have the method return 
an appropriate message or tool (e.g. a reminder to the LLM, or an orchestration tool
such as [`AgentDoneTool`][langroid.agent.tools.orchestration.AgentDoneTool]).

To simplify the developer experience, as of version 0.39.2 Langroid also provides an
easier way to specify what this fallback method should return, via the
`ChatAgentConfig.handle_llm_no_tool` parameter, for example:
```python
config = lr.ChatAgentConfig(
    # ... other params
    handle_llm_no_tool="done", # terminate task if LLM sends non-tool msg
)
```
The `handle_llm_no_tool` parameter can have the following possible values:

- A special value from the [`NonToolAction`][langroid.mytypes.NonToolAction] Enum, e.g.:
    - `"user"` or `NonToolAction.USER` - this is interpreted by langroid to return 
     `ForwardTool(agent="user")`, meaning the message is passed to the user to await
     their next input.
    - `"done"` or `NonToolAction.DONE` - this is interpreted by langroid to return 
     `AgentDoneTool(content=msg.content, tools=msg.tool_messages)`, 
     meaning the task is ended, and any content and tools in the current message will
     appear in the returned `ChatDocument`.
- A callable, specifically a function that takes a `ChatDocument` and returns any value. 
  This can be useful when you want the fallback action to return a value 
  based on the current message, e.g. 
  `lambda msg: AgentDoneTool(content=msg.content)`, or it could a more 
  elaborate function, or a prompt that contains the content of the current message.
- Any `ToolMessage` (typically an [Orchestration](https://github.com/langroid/langroid/blob/main/langroid/agent/tools/orchestration.py) tool like 
  `AgentDoneTool` or `ResultTool`)
- Any string, meant to be handled by the LLM. 
  Typically this would be a reminder to the LLM, something like:
```python
"""Your intent is not clear -- 
- if you forgot to use a Tool such as `ask_tool` or `search_tool`, try again.
- or if you intended to return your final answer, use the Tool named `done_tool`,
  with `content` set to your answer.
"""
```

A simple example is in the [`chat-search.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-search.py) 
script, and in the `test_handle_llm_no_tool` test in   
[`test_tool_messages.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_tool_messages.py).

---

# HTML Logger

The HTML logger creates interactive, self-contained HTML files that make it easy to navigate complex multi-agent conversations in Langroid.

## Enabling the HTML Logger

The HTML logger is **enabled by default** in `TaskConfig`:

```python
import langroid as lr

# HTML logging is automatically enabled
task = lr.Task(agent)

# To disable HTML logging
task = lr.Task(agent, config=lr.TaskConfig(enable_html_logging=False))

# To change the log directory (default is "logs/")
task = lr.Task(agent, config=lr.TaskConfig(logs_dir="my_logs"))
```

## Log Files

Langroid creates three types of log files in the `logs/` directory:

1. **HTML Log**: `<name>.html` - Interactive, collapsible view
2. **Plain Text Log**: `<name>.log` - Traditional text log with colors
3. **TSV Log**: `<name>.tsv` - Tab-separated values for data analysis

The `<name>` is determined by:

- The task name (if specified)
- Otherwise, the agent name
- Falls back to "root" if neither is specified

When a task starts, you'll see a clickable `file://` link in the console:
```
WARNING - 📊 HTML Log: file:///path/to/logs/task-name.html
```

## Key Features

### Collapsible Entries
Each log entry can be expanded/collapsed to show different levels of detail:

- **Collapsed**: Shows only the entity type (USER, LLM, AGENT) and preview
- **Expanded**: Shows full message content, tools, and sub-sections

### Visual Hierarchy
- **Important responses** are shown at full opacity
- **Intermediate steps** are faded (0.4 opacity)
- Color-coded entities: USER (blue), LLM (green), AGENT (orange), SYSTEM (gray)

### Tool Visibility
Tools are clearly displayed with:

- Tool name and parameters
- Collapsible sections showing raw tool calls
- Visual indicators for tool results

### Auto-Refresh
The HTML page automatically refreshes every 2 seconds to show new log entries as they're written.

### Persistent UI State
Your view preferences are preserved across refreshes:

- Expanded/collapsed entries remain in their state
- Filter settings are remembered

## Example

Here's what the HTML logger looks like for a planner workflow:

![HTML Logger Screenshot](../screenshots/planner-workflow-html-logs.png)

In this example from `examples/basic/planner-workflow-simple.py`, you can see:

- The planner agent orchestrating multiple tool calls
- Clear visibility of `IncrementTool` and `DoublingTool` usage
- The filtered view showing only important responses
- Collapsible tool sections with parameters

## Benefits

1. **Easy Navigation**: Quickly expand/collapse entries to focus on what matters
2. **Tool Clarity**: See exactly which tools were called with what parameters
3. **Real-time Updates**: Watch logs update automatically as your task runs
4. **Filtered Views**: Use "Show only important responses" to hide intermediate steps

---

# Knowledge-graph support

Langroid can be used to set up natural-language conversations with knowledge graphs.
Currently the two most popular knowledge graphs are supported:

## Neo4j

- [implementation](https://github.com/langroid/langroid/tree/main/langroid/agent/special/neo4j)
- test: [test_neo4j_chat_agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_neo4j_chat_agent.py)
- examples: [chat-neo4j.py](https://github.com/langroid/langroid/blob/main/examples/kg-chat/chat-neo4j.py) 

## ArangoDB

Available with Langroid v0.20.1 and later.

Uses the [python-arangodb](https://github.com/arangodb/python-arango) library.

- [implementation](https://github.com/langroid/langroid/tree/main/langroid/agent/special/arangodb)
- tests: [test_arangodb.py](https://github.com/langroid/langroid/blob/main/tests/main/test_arangodb.py), [test_arangodb_chat_agent.py](https://github.com/langroid/langroid/blob/main/tests/main/test_arangodb_chat_agent.py)
- example: [chat-arangodb.py](https://github.com/langroid/langroid/blob/main/examples/kg-chat/chat-arangodb.py)

---

# LangDB with Langroid

## Introduction

[LangDB](https://langdb.ai/) is an AI gateway that provides OpenAI-compatible APIs to access 250+ LLMs. It offers cost control, observability, and performance benchmarking while enabling seamless switching between models. 
Langroid has a simple integration with LangDB's API service, so there are no dependencies
to install. (LangDB also has a self-hosted version, which is not yet supported in Langroid).

## Setup environment variables

At minimum, ensure you have these env vars in your `.env` file:

```
LANGDB_API_KEY=your_api_key_here
LANGDB_PROJECT_ID=your_project_id_here
```

## Using LangDB with Langroid

### Configure LLM and Embeddings

In `OpenAIGPTConfig`, when you specify the `chat_model` with a `langdb/` prefix,
langroid uses the API key, `project_id` and other langDB-specific parameters
from the `langdb_params` field; if any of these are specified in the `.env` file
or in the environment explicitly, they will override the values in `langdb_params`.
For example, to use Anthropic's Claude-3.7-Sonnet model, 
set `chat_model="langdb/anthropic/claude-3.7-sonnet", as shown below. 
You can entirely omit the `langdb_params` field if you have already set up 
the fields as environment variables in your `.env` file, e.g. the `api_key`
and `project_id` are read from the environment variables 
`LANGDB_API_KEY` and `LANGDB_PROJECT_ID` respectively, and similarly for
the other fields (which are optional).

```python
import os
import uuid
from langroid.language_models.openai_gpt import OpenAIGPTConfig, LangDBParams
from langroid.embedding_models.models import OpenAIEmbeddingsConfig

# Generate tracking IDs (optional)
thread_id = str(uuid.uuid4())
run_id = str(uuid.uuid4())

# Configure LLM
llm_config = OpenAIGPTConfig(
    chat_model="langdb/anthropic/claude-3.7-sonnet",
    # omit the langdb_params field if you're not using custom tracking,
    # or if all its fields are provided in env vars, like
    # LANGDB_API_KEY, LANGDB_PROJECT_ID, LANGDB_RUN_ID, LANGDB_THREAD_ID, etc.
    langdb_params=LangDBParams(
        label='my-app',
        thread_id=thread_id,
        run_id=run_id,
        # api_key, project_id are read from .env or environment variables
        # LANGDB_API_KEY, LANGDB_PROJECT_ID respectively.
    )
)
```

Similarly, you can configure the embeddings using `OpenAIEmbeddingsConfig`,
which also has a `langdb_params` field that works the same way as 
in `OpenAIGPTConfig` (i.e. it uses the API key and project ID from the environment
if provided, otherwise uses the default values in `langdb_params`). Again the
`langdb_params` does not need to be specified explicitly, if you've already
set up the environment variables in your `.env` file.

```python
# Configure embeddings
embedding_config = OpenAIEmbeddingsConfig(
    model_name="langdb/openai/text-embedding-3-small",
)
```

## Tracking and Observability

LangDB provides special headers for request tracking:

- `x-label`: Tag requests for filtering in the dashboard
- `x-thread-id`: Track conversation threads (UUID format)
- `x-run-id`: Group related requests together

## Examples

The `langroid/examples/langdb/` directory contains examples demonstrating:

1. **RAG with LangDB**: `langdb_chat_agent_docs.py`
2. **LangDB with Function Calling**: `langdb_chat_agent_tool.py`
3. **Custom Headers**: `langdb_custom_headers.py`

## Viewing Results

Visit the [LangDB Dashboard](https://dashboard.langdb.com) to:
- Filter requests by label, thread ID, or run ID
- View detailed request/response information
- Analyze token usage and costs

For more information, visit [LangDB Documentation](https://docs.langdb.com).

See example scripts [here](https://github.com/langroid/langroid/tree/main/examples/langdb)

---

# Handling large tool results

Available since Langroid v0.22.0.

In some cases, the result of handling a `ToolMessage` could be very large,
e.g. when the Tool is a database query that returns a large number of rows,
or a large schema. When used in a task loop, this large result may then be
sent to the LLM to generate a response, which in some scenarios may not
be desirable, as it increases latency, token-cost and distractions. 
Langroid allows you to set two optional parameters in a `ToolMessage` to
handle this situation:

- `_max_result_tokens`: *immediately* truncate the result to this number of tokens.
- `_max_retained_tokens`: *after* a responder (typically the LLM) responds to this 
   tool result (which optionally may already have been 
   truncated via `_max_result_tokens`),
   edit the message history to truncate the result to this number of tokens.

You can set one, both or none of these parameters. If you set both, you would 
want to set `_max_retained_tokens` to a smaller number than `_max_result_tokens`.

See the test `test_reduce_raw_tool_result` in `test_tool_messages.py` for an example.

Here is a conceptual example. Suppose there is a Tool called `MyTool`,
with parameters `_max_result_tokens=20` and `_max_retained_tokens=10`.
Imagine a task loop where the user says "hello", 
and then LLM generates a call to `MyTool`, 
and the tool handler (i.e. `agent_response`) generates a result of 100 tokens.
This result is immediately truncated to 20 tokens, and then the LLM responds to it
with a message `response`.


The agent's message history looks like this:

```
1. System msg.
2. user: hello
3. LLM: MyTool
4. Agent (Tool handler): 100-token result => reduced to 20 tokens
5. LLM: response
```

Immediately after the LLM's response at step 5, the message history is edited
so that the message contents at position 4 are truncated to 10 tokens,
as specified by `_max_retained_tokens`.

---

# Using LiteLLM Proxy with OpenAIGPTConfig

You can easily configure Langroid to use LiteLLM proxy for accessing models with a 
simple prefix `litellm-proxy/` in the `chat_model` name:

## Using the `litellm-proxy/` prefix

When you specify a model with the `litellm-proxy/` prefix, Langroid automatically uses the LiteLLM proxy configuration:

```python
from langroid.language_models.openai_gpt import OpenAIGPTConfig

config = OpenAIGPTConfig(
    chat_model="litellm-proxy/your-model-name"
)
```

## Setting LiteLLM Proxy Parameters

When using the `litellm-proxy/` prefix, Langroid will read connection parameters from either:

1. The `litellm_proxy` config object:
   ```python
   from langroid.language_models.openai_gpt import OpenAIGPTConfig, LiteLLMProxyConfig
   
   config = OpenAIGPTConfig(
       chat_model="litellm-proxy/your-model-name",
       litellm_proxy=LiteLLMProxyConfig(
           api_key="your-litellm-proxy-api-key",
           api_base="http://your-litellm-proxy-url"
       )
   )
   ```

2. Environment variables (which take precedence):
   ```bash
   export LITELLM_API_KEY="your-litellm-proxy-api-key"
   export LITELLM_API_BASE="http://your-litellm-proxy-url"
   ```

This approach makes it simple to switch between using LiteLLM proxy and 
other model providers by just changing the model name prefix,
without needing to modify the rest of your code or tweaking env variables.

## Note: LiteLLM Proxy vs LiteLLM Library

**Important distinction:** Using the `litellm-proxy/` prefix connects to a LiteLLM proxy server, which is different from using the `litellm/` prefix. The latter utilizes the LiteLLM adapter library directly without requiring a proxy server. Both approaches are supported in Langroid, but they serve different use cases:

- Use `litellm-proxy/` when connecting to a deployed LiteLLM proxy server
- Use `litellm/` when you want to use the LiteLLM library's routing capabilities locally

Choose the approach that best fits your infrastructure and requirements.

---

# Local embeddings provision via llama.cpp server

As of Langroid v0.30.0, you can use llama.cpp as provider of embeddings
to any of Langroid's vector stores, allowing access to a wide variety of
GGUF-compatible embedding models, e.g.
[nomic-ai's Embed Text V1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF).

## Supported Models

llama.cpp can generate embeddings from:

**Dedicated embedding models (RECOMMENDED):**

- [nomic-embed-text-v1.5](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5-GGUF)
  (768 dims)
- [nomic-embed-text-v2-moe](https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe-GGUF)
- [nomic-embed-code](https://huggingface.co/nomic-ai/nomic-embed-code-GGUF)
- Other GGUF embedding models

**Regular LLMs (also supported):**

- gpt-oss-20b, gpt-oss-120b
- Llama models
- Other language models

Note: Dedicated embedding models are recommended for best performance in
retrieval and semantic search tasks.

## Configuration

When defining a VecDB, you can provide an instance of
`LlamaCppServerEmbeddingsConfig` to the VecDB config to instantiate
the llama.cpp embeddings server handler.

To configure the `LlamaCppServerEmbeddingsConfig`, there are several
parameters that should be adjusted:

```python
from langroid.embedding_models.models import LlamaCppServerEmbeddingsConfig
from langroid.vector_store.qdrantdb import QdrantDBConfig

embed_cfg = LlamaCppServerEmbeddingsConfig(
    api_base="http://localhost:8080",  # IP + Port
    dims=768,  # Match the dimensions of your embedding model
    context_length=2048,  # Match the config of the model
    batch_size=2048,  # Safest to ensure this matches context_length
)

vecdb_config = QdrantDBConfig(
    collection_name="my-collection",
    embedding=embed_cfg,
    storage_path=".qdrant/",
)
```

## Running llama-server

The llama.cpp server must be started with the `--embeddings` flag to enable
embedding generation.

### For dedicated embedding models (RECOMMENDED):

```bash
./llama-server -ngl 100 -c 2048 \
  -m ~/nomic-embed-text-v1.5.Q8_0.gguf \
  --host localhost --port 8080 \
  --embeddings -b 2048 -ub 2048
```

### For LLM-based embeddings (e.g., gpt-oss):

```bash
./llama-server -ngl 99 \
  -m ~/.cache/llama.cpp/gpt-oss-20b.gguf \
  --host localhost --port 8080 \
  --embeddings
```

## Response Format Compatibility

Langroid automatically handles multiple llama.cpp response formats:

- Native `/embedding`: `{"embedding": [floats]}`
- OpenAI `/v1/embeddings`: `{"data": [{"embedding": [floats]}]}`
- Array formats: `[{"embedding": [floats]}]`
- Nested formats: `{"embedding": [[floats]]}`

You don't need to worry about which endpoint or format your llama.cpp server
uses - Langroid will automatically detect and handle the response correctly.

## Example Usage

An example setup can be found inside
[examples/docqa/chat.py](https://github.com/langroid/langroid/blob/main/examples/docqa/chat.py).

For a complete example using local embeddings with llama.cpp:

```python
from langroid.agent.special.doc_chat_agent import (
    DocChatAgent,
    DocChatAgentConfig,
)
from langroid.embedding_models.models import LlamaCppServerEmbeddingsConfig
from langroid.language_models.openai_gpt import OpenAIGPTConfig
from langroid.parsing.parser import ParsingConfig
from langroid.vector_store.qdrantdb import QdrantDBConfig

# Configure local embeddings via llama.cpp
embed_cfg = LlamaCppServerEmbeddingsConfig(
    api_base="http://localhost:8080",
    dims=768,  # nomic-embed-text-v1.5 dimensions
    context_length=8192,
    batch_size=1024,
)

# Configure vector store with local embeddings
vecdb_config = QdrantDBConfig(
    collection_name="doc-chat-local",
    embedding=embed_cfg,
    storage_path=".qdrant/",
)

# Create DocChatAgent
config = DocChatAgentConfig(
    vecdb=vecdb_config,
    llm=OpenAIGPTConfig(
        chat_model="gpt-4o",  # or use local LLM
    ),
)

agent = DocChatAgent(config)
```

## Troubleshooting

**Error: "Failed to connect to embedding provider"**

- Ensure llama-server is running with the `--embeddings` flag
- Check that the `api_base` URL is correct
- Verify the server is accessible from your machine

**Error: "Unsupported embedding response format"**

- This error includes the first 500 characters of the response to help debug
- Check your llama-server logs for any errors
- Ensure you're using a compatible llama.cpp version

**Embeddings seem low quality:**

- Use a dedicated embedding model instead of an LLM
- Ensure the `dims` parameter matches your model's output dimensions
- Try different GGUF quantization levels (Q8_0 generally works well)

## Additional Resources

- [llama.cpp GitHub](https://github.com/ggml-org/llama.cpp)
- [llama.cpp server documentation](https://github.com/ggml-org/llama.cpp/blob/master/examples/server/README.md)
- [nomic-embed models on Hugging Face](https://huggingface.co/nomic-ai)
- [Issue #919 - Implementation details](https://github.com/langroid/langroid/blob/main/issues/issue-919-llamacpp-embeddings.md)

---

# Using the LLM-based PDF Parser

- Converts PDF content into Markdown format using Multimodal models.

- Uses multimodal models to describe images within PDFs.

- Supports page-wise or chunk-based processing for optimized performance.

---

### Initializing the LLM-based PDF Parser

Make sure you have set up your API key for whichever model you specify in `model_name` below.

You can initialize the LLM PDF parser as follows:

```python
parsing_config = ParsingConfig(
    n_neighbor_ids=2,
    pdf=PdfParsingConfig(
        library="llm-pdf-parser",
        llm_parser_config=LLMPdfParserConfig(
            model_name="gemini-2.0-flash",
            split_on_page=True,
            max_tokens=7000,
            requests_per_minute=5,
            timeout=60,  # increase this for large documents
        ),
    ),
)
```

---

## Parameters

### `model_name`

Specifies the model to use for PDF conversion.
**Default:** `gemini/gemini-2.0-flash`

---

### `max_tokens`

Limits the number of tokens in the input. The model's output limit is **8192 tokens**.

- **Default:** 7000 tokens (leaving room for generated captions)

- _Optional parameter_

---

### `split_on_page`

Determines whether to process the document **page by page**.

- **Default:** `True`

- If set to `False`, the parser will create chunks based on `max_tokens` while respecting page boundaries.

- When `False`, the parser will send chunks containing multiple pages (e.g., `[11,12,13,14,15]`).

**Advantages of `False`:**

- Reduces API calls to the LLM.

- Lowers token usage since system prompts are not repeated per page.

**Disadvantages of `False`:**

- You will not get per-page splitting but groups of pages as a single unit.

> If your use case does **not** require strict page-by-page parsing, consider setting this to `False`.

---

### `requests_per_minute`

Limits API request frequency to avoid rate limits.

- If you encounter rate limits, set this to **1 or 2**.

---

---

---

# **Using `marker` as a PDF Parser in `langroid`**  

## **Installation**  

### **Standard Installation**  
To use [`marker`](https://github.com/VikParuchuri/marker) as a PDF parser in `langroid`, 
install it with the `marker-pdf` extra:

```bash
pip install langroid[marker-pdf]
```
or in combination with other extras as needed, e.g.:
```bash
pip install "langroid[marker-pdf,hf-embeddings]"
```

Note, however, that due to an **incompatibility with `docling`**,
if you install `langroid` using the `all` extra 
(or another extra such as  `doc-chat` or `pdf-parsers` that 
also includes `docling`),
e.g. `pip install "langroid[all]"`, or `pip install "langroid[doc-chat]"`,
then due to this version-incompatibility with `docling`, you will get an 
**older** version of `marker-pdf`, which does not work with Langroid.
This may not matter if you did not intend to specifically use `marker`, 
but if you do want to use `marker`, you will need to install langroid
with the `marker-pdf` extra, as shown above, in combination with other
extras as needed, as shown above.


#### **For Intel-Mac Users**  
If you are on an **Intel Mac**, `docling` and `marker` cannot be 
installed together with langroid as extras, 
due to a **transformers version conflict**.  
To resolve this, manually install `marker-pdf` with:  

```bash
pip install marker-pdf[full]
```

Make sure to install this within your `langroid` virtual environment.

---

## **Example: Parsing a PDF with `marker` in `langroid`**  

```python
from langroid.parsing.document_parser import DocumentParser
from langroid.parsing.parser import MarkerConfig, ParsingConfig, PdfParsingConfig
from dotenv import load_dotenv
import os

# Load environment variables
load_dotenv()
gemini_api_key = os.environ.get("GEMINI_API_KEY")

# Path to your PDF file
path = "<path_to_your_pdf_file>"

# Define parsing configuration
parsing_config = ParsingConfig(
    n_neighbor_ids=2,  # Number of neighboring sections to keep
    pdf=PdfParsingConfig(
        library="marker",  # Use `marker` as the PDF parsing library
        marker_config=MarkerConfig(
            config_dict={
                "use_llm": True,  # Enable high-quality LLM processing
                "gemini_api_key": gemini_api_key,  # API key for Gemini LLM
            }
        )
    ),
)

# Create the parser and extract the document
marker_parser = DocumentParser.create(path, parsing_config)
doc = marker_parser.get_doc()
```

---

## **Explanation of Configuration Options**  

If you want to use the default configuration, you can omit `marker_config` entirely.

### **Key Parameters in `MarkerConfig`**
| Parameter        | Description |
|-----------------|-------------|
| `use_llm`       | Set to `True` to enable higher-quality processing using LLMs. |
| `gemini_api_key` | Google Gemini API key for LLM-enhanced parsing. |


You can further customize `config_dict` by referring to [`marker_pdf`'s documentation](https://github.com/VikParuchuri/marker/blob/master/README.md).  

Alternatively, run the following command to view available options:  

```sh
marker_single --help
```

This will display all supported parameters, which you can pass as needed in `config_dict`.

---

---

# Markitdown Document Parsers

Langroid integrates with Microsoft's Markitdown library to provide 
conversion of Microsoft Office documents to markdown format. 
Three specialized parsers are available, for `docx`, `xlsx`, and `pptx` files.


## Prerequisites

To use these parsers, install Langroid with the required extras:

```bash
pip install "langroid[markitdown]"    # Just Markitdown parsers
# or
pip install "langroid[doc-parsers]"   # All document parsers
```

## Available Parsers


Once you set up a `parser` for the appropriate document-type, you  
can get the entire document with `parser.get_doc()`,
or get automatically chunked content with `parser.get_doc_chunks()`.


### 1. `MarkitdownDocxParser`

Converts Word documents (`*.docx`) to markdown, preserving structure, 
formatting, and tables.

See the tests

- [`test_docx_parser.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_docx_parser.py)
- [`test_markitdown_parser.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_markitdown_parser.py)

for examples of how to use these parsers.


```python
from langroid.parsing.document_parser import DocumentParser
from langroid.parsing.parser import DocxParsingConfig, ParsingConfig

parser = DocumentParser.create(
    "path/to/document.docx",
    ParsingConfig(
        docx=DocxParsingConfig(library="markitdown-docx"),
        # ... other parsing config options
    ),
)

```


### 2. `MarkitdownXLSXParser`

Converts Excel spreadsheets (*.xlsx/*.xls) to markdown tables, preserving data and sheet structure.

```python
from langroid.parsing.document_parser import DocumentParser
from langroid.parsing.parser import ParsingConfig, MarkitdownXLSParsingConfig

parser = DocumentParser.create(
    "path/to/spreadsheet.xlsx",
    ParsingConfig(xls=MarkitdownXLSParsingConfig())
)
```


### 3. `MarkitdownPPTXParser`

Converts PowerPoint presentations (*.pptx) to markdown, preserving slide content and structure.

```python
from langroid.parsing.document_parser import DocumentParser
from langroid.parsing.parser import ParsingConfig, MarkitdownPPTXParsingConfig

parser = DocumentParser.create(
    "path/to/presentation.pptx",
    ParsingConfig(pptx=MarkitdownPPTXParsingConfig())
)
```

---

# Langroid MCP Integration

Langroid provides seamless integration with Model Context Protocol (MCP) servers via 
two methods, both of which involve creating Langroid `ToolMessage` subclasses
corresponding to the MCP tools: 

1. Programmatic creation of Langroid tools using `get_tool_async`, 
   `get_tools_async` from the tool definitions defined on an MCP server.
2. Declarative creation of Langroid tools using the **`@mcp_tool` decorator**, which allows
   customizing the tool-handling behavior beyond what is provided by the MCP server.

This integration allows _any_ LLM (that is good enough to do function-calling via prompts) to use any MCP server.
See the following to understand the integration better:

- example python scripts under [`examples/mcp`](https://github.com/langroid/langroid/tree/main/examples/mcp)
- [`tests/main/test_mcp_tools.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_mcp_tools.py)

---

## 1. Connecting to an MCP server via transport specification

Before creating Langroid tools, we first need to define and connect to an MCP server
via a [FastMCP](https://gofastmcp.com/getting-started/welcome) client. 
There are several ways to connect to a server, depending on how it is defined. 
Each of these uses a different type of [transport](https://gofastmcp.com/clients/transports).

The typical pattern to use with Langroid is as follows:

- define an MCP server transport
- create a `ToolMessage` subclass using the `@mcp_tool` decorator or 
  `get_tool_async()` function, with the transport as the first argument


Langroid's MCP integration will work with any of [transports](https://gofastmcp.com/clients/transportsl) 
supported by FastMCP.
Below we go over some common ways to define transports and extract tools from the servers.

1. **Local Python script**
2. **In-memory FastMCP server** - useful for testing and for simple in-memory servers
   that don't need to be run as a separate process.
3. **NPX stdio transport**
4. **UVX stdio transport**
5. **Generic stdio transport** – launch any CLI‐based MCP server via stdin/stdout
6. **Network SSE transport** – connect to HTTP/S MCP servers via `SSETransport`


All examples below use the async helpers to create Langroid tools (`ToolMessage` subclasses):

```python
from langroid.agent.tools.mcp import (
    get_tools_async,
    get_tool_async,
)
```

---

#### Path to a Python Script

Point at your MCP‐server entrypoint, e.g., to the `weather.py` script in the 
langroid repo (based on the [Anthropic quick-start guide](https://modelcontextprotocol.io/quickstart/server)):

```python
async def example_script_path() -> None:
    server = "tests/main/mcp/weather-server-python/weather.py"
    tools = await get_tools_async(server) # all tools available
    AlertTool = await get_tool_async(server, "get_alerts") # specific tool

    # instantiate the tool with a specific input
    msg = AlertTool(state="CA")
    
    # Call the tool via handle_async()
    alerts = await msg.handle_async()
    print(alerts)
```

---

#### In-Memory FastMCP Server

Define your server with `FastMCP(...)` and pass the instance:

```python
from fastmcp.server import FastMCP
from pydantic import BaseModel, Field

class CounterInput(BaseModel):
    start: int = Field(...)

def make_server() -> FastMCP:
    server = FastMCP("CounterServer")

    @server.tool()
    def increment(data: CounterInput) -> int:
        """Increment start by 1."""
        return data.start + 1

    return server

async def example_in_memory() -> None:
    server = make_server()
    tools = await get_tools_async(server)
    IncTool = await get_tool_async(server, "increment")

    result = await IncTool(start=41).handle_async()
    print(result)  # 42
```

See the [`mcp-file-system.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/mcp-file-system.py)
script for a working example of this.

---

#### NPX stdio Transport

Use any npm-installed MCP server via `npx`, e.g., the 
[Exa web-search MCP server](https://docs.exa.ai/examples/exa-mcp):

```python
from fastmcp.client.transports import NpxStdioTransport

transport = NpxStdioTransport(
    package="exa-mcp-server",
    env_vars={"EXA_API_KEY": "…"},
)

async def example_npx() -> None:
    tools = await get_tools_async(transport)
    SearchTool = await get_tool_async(transport, "web_search_exa")

    results = await SearchTool(
        query="How does Langroid integrate with MCP?"
    ).handle_async()
    print(results)
```

For a fully working example, see the script [`exa-web-search.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/exa-web-search.py).

---

#### UVX stdio Transport

Connect to a UVX-based MCP server, e.g., the [Git MCP Server](https://github.com/modelcontextprotocol/servers/tree/main/src/git)

```python
from fastmcp.client.transports import UvxStdioTransport

transport = UvxStdioTransport(tool_name="mcp-server-git")

async def example_uvx() -> None:
    tools = await get_tools_async(transport)
    GitStatus = await get_tool_async(transport, "git_status")

    status = await GitStatus(path=".").handle_async()
    print(status)
```

--- 

#### Generic stdio Transport

Use `StdioTransport` to run any MCP server as a subprocess over stdio:

```python
from fastmcp.client.transports import StdioTransport
from langroid.agent.tools.mcp import get_tools_async, get_tool_async


async def example_stdio() -> None:
    """Example: any CLI‐based MCP server via StdioTransport."""
    transport: StdioTransport = StdioTransport(
        command="uv",
        args=["run", "--with", "biomcp-python", "biomcp", "run"],
    )
    tools: list[type] = await get_tools_async(transport)
    BioTool = await get_tool_async(transport, "tool_name")
    result: str = await BioTool(param="value").handle_async()
    print(result)
```

See the full example in [`examples/mcp/biomcp.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/biomcp.py).

---

#### Network SSE Transport

Use `SSETransport` to connect to a FastMCP server over HTTP/S:

```python
from fastmcp.client.transports import SSETransport
from langroid.agent.tools.mcp import (
    get_tools_async,
    get_tool_async,
)


async def example_sse() -> None:
    """Example: connect to an HTTP/S MCP server via SSETransport."""
    url: str = "https://localhost:8000/sse"
    transport: SSETransport = SSETransport(
        url=url, headers={"Authorization": "Bearer TOKEN"}
    )
    tools: list[type] = await get_tools_async(transport)
    ExampleTool = await get_tool_async(transport, "tool_name")
    result: str = await ExampleTool(param="value").handle_async()
    print(result)
```    

---

With these patterns you can list tools, generate Pydantic-backed `ToolMessage` classes,
and invoke them via `.handle_async()`, all with zero boilerplate client setup. 
As the `FastMCP` library adds other types of transport (e.g., `StreamableHTTPTransport`),
the pattern of usage with Langroid will remain the same.


---

## Best Practice: Use a server factory for stdio transports

Starting with fastmcp 2.13 and mcp 1.21, stdio transports (e.g., `StdioTransport`,
`NpxStdioTransport`, `UvxStdioTransport`) are effectively single‑use. Reusing the
same transport instance across multiple connections can lead to errors such as
`anyio.ClosedResourceError` during session initialization.

To make your code robust and future‑proof, pass a zero‑argument server factory to
Langroid’s MCP helpers. A “server factory” is simply a `lambda` or function that
returns a fresh server spec or transport each time.

Benefits:

- Fresh, reliable connections on every call (no reuse of closed transports).
- Works across fastmcp/mcp versions without subtle lifecycle issues.
- Enables concurrent calls safely (each call uses its own subprocess/session).
- Keeps your decorator ergonomics and `handle_async` overrides unchanged.

You can use a factory with both the decorator and the async helpers:

```python
from fastmcp.client.transports import StdioTransport
from langroid.agent.tools.mcp import mcp_tool, get_tool_async

# 1) Decorator style
@mcp_tool(lambda: StdioTransport(command="claude", args=["mcp", "serve"], env={}),
          "Grep")
class GrepTool(lr.ToolMessage):
    async def handle_async(self) -> str:
        # pre/post-process around the raw MCP call
        result = await self.call_tool_async()
        return f"<GrepResult>\n{result}\n</GrepResult>"

# 2) Programmatic style
BaseGrep = await get_tool_async(
    lambda: StdioTransport(command="claude", args=["mcp", "serve"], env={}),
    "Grep",
)
```

Notes:

- Passing a concrete transport instance still works: Langroid will try to clone
  it internally; however, a factory is the most reliable across environments.
- For network transports (e.g., `SSETransport`), a factory is optional; you can
  continue passing the transport instance directly.

---

## Output-schema validation: return structured content when required

Newer `mcp` clients validate tool outputs against the tool’s output schema. If a
tool declares a structured output, returning plain text may raise a runtime
error. Some servers (for example, Claude Code’s Grep) expose an argument like
`output_mode` that controls the shape of the response.

Recommendations:

- Prefer structured modes when a tool declares an output schema.
- If available, set options like `output_mode="structured"` (or a documented
  structured variant such as `"files_with_matches"`) in your tool’s
  `handle_async` before calling `await self.call_tool_async()`.

Example tweak in a decorator-based tool:

```python
@mcp_tool(lambda: StdioTransport(command="claude", args=["mcp", "serve"]),
          "Grep")
class GrepTool(lr.ToolMessage):
    async def handle_async(self) -> str:
        # Ensure a structured response if the server supports it
        if hasattr(self, "output_mode"):
            self.output_mode = "structured"
        return await self.call_tool_async()
```

If the server does not provide such a switch, follow its documentation for
returning data that matches its declared output schema.

---

## 2. Create Langroid Tools declaratively using the `@mcp_tool` decorator

The above examples showed how you can create Langroid tools programmatically using
the helper functions `get_tool_async()` and `get_tools_async()`,
with the first argument being the transport to the MCP server. The `@mcp_tool` decorator
works in the same way: 

- **Arguments to the decorator**
    1. `server_spec`: path/URL/`FastMCP`/`ClientTransport`, as mentioned above.
    2. `tool_name`: name of a specific MCP tool

- **Behavior**
    - Generates a `ToolMessage` subclass with all input fields typed.
    - Provides a `call_tool_async()` under the hood -- this is the "raw" MCP tool call,
      returning a string.
    - If you define your own `handle_async()`, it overrides the default. Typically,
you would override it to customize either the input or the output of the tool call, or both.
    - If you don't define your own `handle_async()`, it defaults to just returning the
      value of the `call_tool_async()` method.

Here is a simple example of using the `@mcp_tool` decorator to create a Langroid tool:

```python
from fastmcp.server import FastMCP
from langroid.agent.tools.mcp import mcp_tool
import langroid as lr

# Define your MCP server (pydantic v2 for schema)
server = FastMCP("MyServer")

@mcp_tool(server, "greet")
class GreetTool(lr.ToolMessage):
    """Say hello to someone."""

    async def handle_async(self) -> str:
        # Customize post-processing
        raw = await self.call_tool_async()
        return f"💬 {raw}"
```

Using the decorator method allows you to customize the `handle_async` method of the
tool, or add additional fields to the `ToolMessage`. 
You may want to customize the input to the tool, or the tool result before it is sent back to 
the LLM. If you don't override it, the default behavior is to simply return the value of 
the "raw" MCP tool call `await self.call_tool_async()`. 

```python
@mcp_tool(server, "calculate")
class CalcTool(ToolMessage):
    """Perform complex calculation."""

    async def handle_async(self) -> str:
        result = await self.call_tool_async()
        # Add context or emojis, etc.
        return f"🧮 Result is *{result}*"
```

---

## 3. Enabling Tools in Your Agent

Once you’ve created a Langroid `ToolMessage` subclass from an MCP server, 
you can enable it on a `ChatAgent`, just like you normally would. Below is an example of using 
the [Exa MCP server](https://docs.exa.ai/examples/exa-mcp) to create a 
Langroid web search tool, enable a `ChatAgent` to use it, and then set up a `Task` to 
run the agent loop.

First we must define the appropriate `ClientTransport` for the MCP server:
```python
# define the transport
transport = NpxStdioTransport(
    package="exa-mcp-server",
    env_vars=dict(EXA_API_KEY=os.getenv("EXA_API_KEY")),
)
```

Then we use the `@mcp_tool` decorator to create a `ToolMessage` 
subclass representing the web search tool. Note that one reason to use the decorator
to define our tool is so we can specify a custom `handle_async` method that
controls what is sent to the LLM after the actual raw MCP tool-call
(the `call_tool_async` method) is made.

```python
# the second arg specifically refers to the `web_search_exa` tool available
# on the server defined by the `transport` variable.
@mcp_tool(transport, "web_search_exa")
class ExaSearchTool(lr.ToolMessage):
    async def handle_async(self):
        result: str = await self.call_tool_async()
        return f"""
        Below are the results of the web search:
        
        <WebSearchResult>
        {result}
        </WebSearchResult>
        
        Use these results to answer the user's original question.
        """

```

If we did not want to override the `handle_async` method, we could simply have
created the `ExaSearchTool` class programmatically via the `get_tool_async` 
function as shown above, i.e.:

```python
from langroid.agent.tools.mcp import get_tool_async

ExaSearchTool = await get_tool_async(transport, "web_search_exa")
```

We can now define our main function where we create our `ChatAgent`,
attach the `ExaSearchTool` to it, define the `Task`, and run the task loop.

```python
async def main():
    agent = lr.ChatAgent(
        lr.ChatAgentConfig(
            # forward to user when LLM doesn't use a tool
            handle_llm_no_tool=NonToolAction.FORWARD_USER,
            llm=lm.OpenAIGPTConfig(
                max_output_tokens=1000,
                # this defaults to True, but we set it to False so we can see output
                async_stream_quiet=False,
            ),
        )
    )

    # enable the agent to use the web-search tool
    agent.enable_message(ExaSearchTool)
    # make task with interactive=False =>
    # waits for user only when LLM doesn't use a tool
    task = lr.Task(agent, interactive=False)
    await task.run_async()
```

See [`exa-web-search.py`](https://github.com/langroid/langroid/blob/main/examples/mcp/exa-web-search.py) for a full working example of this.

---

# OpenAI Client Caching

## Overview

Langroid implements client caching for OpenAI and compatible APIs (Groq, Cerebras, etc.) to improve performance and prevent resource exhaustion issues.

## Configuration

### Option
Set `use_cached_client` in your `OpenAIGPTConfig`:

```python
from langroid.language_models import OpenAIGPTConfig

config = OpenAIGPTConfig(
    chat_model="gpt-4",
    use_cached_client=True  # Default
)
```

### Default Behavior
- `use_cached_client=True` (enabled by default)
- Clients with identical configurations share the same underlying HTTP connection pool
- Different configurations (API key, base URL, headers, etc.) get separate client instances

## Benefits

- **Connection Pooling**: Reuses TCP connections, reducing latency and overhead
- **Resource Efficiency**: Prevents "too many open files" errors when creating many agents
- **Performance**: Eliminates connection handshake overhead on subsequent requests
- **Thread Safety**: Shared clients are safe to use across threads

## When to Disable Client Caching

Set `use_cached_client=False` in these scenarios:

1. **Multiprocessing**: Each process should have its own client instance
2. **Client Isolation**: When you need complete isolation between different agent instances
3. **Debugging**: To rule out client sharing as a source of issues
4. **Legacy Compatibility**: If your existing code depends on unique client instances

## Example: Disabling Client Caching

```python
config = OpenAIGPTConfig(
    chat_model="gpt-4",
    use_cached_client=False  # Each instance gets its own client
)
```

## Technical Details

- Uses SHA256-based cache keys to identify unique configurations
- Implements singleton pattern with lazy initialization
- Automatically cleans up clients on program exit via atexit hooks
- Compatible with both sync and async OpenAI clients

---

# OpenAI HTTP Client Configuration

When using OpenAI models through Langroid in corporate environments or behind proxies, you may encounter SSL certificate verification errors. Langroid provides three flexible options to configure the HTTP client used for OpenAI API calls.

## Configuration Options

### 1. Simple SSL Verification Bypass

The quickest solution for development or trusted environments:

```python
import langroid.language_models as lm

config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    http_verify_ssl=False  # Disables SSL certificate verification
)

llm = lm.OpenAIGPT(config)
```

!!! warning "Security Notice"
    Disabling SSL verification makes your connection vulnerable to man-in-the-middle attacks. Only use this in trusted environments.

### 2. HTTP Client Configuration Dictionary

For common scenarios like proxies or custom certificates, use a configuration dictionary:

```python
import langroid.language_models as lm

config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    http_client_config={
        "verify": False,  # Or path to CA bundle: "/path/to/ca-bundle.pem"
        "proxy": "http://proxy.company.com:8080",
        "timeout": 30.0,
        "headers": {
            "User-Agent": "MyApp/1.0"
        }
    }
)

llm = lm.OpenAIGPT(config)
```

**Benefits**: This approach enables client caching, improving performance when creating multiple agents.

### 3. Custom HTTP Client Factory

For advanced scenarios requiring dynamic behavior or custom authentication:

```python
import langroid.language_models as lm
from httpx import Client

def create_custom_client():
    """Factory function to create a custom HTTP client."""
    client = Client(
        verify="/path/to/corporate-ca-bundle.pem",
        proxies={
            "http": "http://proxy.corp.com:8080",
            "https": "http://proxy.corp.com:8080"
        },
        timeout=30.0
    )
    
    # Add custom event hooks for logging
    def log_request(request):
        print(f"API Request: {request.method} {request.url}")
    
    client.event_hooks = {"request": [log_request]}
    
    return client

config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    http_client_factory=create_custom_client
)

llm = lm.OpenAIGPT(config)
```

**Note**: Custom factories bypass client caching. Each `OpenAIGPT` instance creates a new client.

## Priority Order

When multiple options are specified, they are applied in this order:
1. `http_client_factory` (highest priority)
2. `http_client_config`
3. `http_verify_ssl` (lowest priority)

## Common Use Cases

### Corporate Proxy with Custom CA Certificate

```python
config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    http_client_config={
        "verify": "/path/to/corporate-ca-bundle.pem",
        "proxies": {
            "http": "http://proxy.corp.com:8080",
            "https": "https://proxy.corp.com:8443"
        }
    }
)
```

### Debugging API Calls

```python
def debug_client_factory():
    from httpx import Client
    
    client = Client(verify=False)
    
    def log_response(response):
        print(f"Status: {response.status_code}")
        print(f"Headers: {response.headers}")
    
    client.event_hooks = {
        "response": [log_response]
    }
    
    return client

config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    http_client_factory=debug_client_factory
)
```

### Local Development with Self-Signed Certificates

```python
# For local OpenAI-compatible APIs
config = lm.OpenAIGPTConfig(
    chat_model="gpt-4",
    api_base="https://localhost:8443/v1",
    http_verify_ssl=False
)
```


## Best Practices

1. **Use the simplest option that meets your needs**:
   - Development/testing: `http_verify_ssl=False`
   - Corporate environments: `http_client_config` with proper CA bundle
   - Complex requirements: `http_client_factory`

2. **Prefer configuration over factories for better performance** - configured clients are cached and reused

3. **Always use proper CA certificates in production** instead of disabling SSL verification

4. **Test your configuration** with a simple API call before deploying:
   ```python
   llm = lm.OpenAIGPT(config)
   response = llm.chat("Hello")
   print(response.content)
   ```

## Troubleshooting

### SSL Certificate Errors
```
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED]
```
**Solution**: Use one of the three configuration options above.


### Proxy Connection Issues
- Verify proxy URL format: `http://proxy:port` or `https://proxy:port`
- Check if proxy requires authentication
- Ensure proxy allows connections to `api.openai.com`

## See Also

- [OpenAI API Reference](https://platform.openai.com/docs/api-reference) - Official OpenAI documentation

---

This section contains brief notes describing various features and updates.

---

---

## **Setup PostgreSQL with pgvector using Docker**

To quickly get a PostgreSQL instance with pgvector running, the easiest method is to use Docker. Follow the steps below:

### **1. Run PostgreSQL with Docker**

Use the official `ankane/pgvector` Docker image to set up PostgreSQL with the pgvector extension. Run the following command:

```bash
docker run --name pgvector -e POSTGRES_USER=your_postgres_user -e POSTGRES_PASSWORD=your_postgres_password -e POSTGRES_DB=your_database_name -p 5432:5432 ankane/pgvector
```

This will pull the `ankane/pgvector` image and run it as a PostgreSQL container on your local machine. The database will be accessible at `localhost:5432`. 

### **2. Include `.env` file with PostgreSQL credentials**

These environment variables should be same which were set while spinning up docker container.
Add the following environment variables to a `.env` file for configuring your PostgreSQL connection:

```dotenv
POSTGRES_USER=your_postgres_user
POSTGRES_PASSWORD=your_postgres_password
POSTGRES_DB=your_database_name
```
## **If you want to use cloud offerings of postgres**

We are using **Tembo** for demonstrative purposes here.  

### **Steps to Set Up Tembo**  
Follow this [quickstart guide](https://tembo.io/docs/getting-started/getting_started) to get your Tembo credentials.  

1. Sign up at [Tembo.io](https://cloud.tembo.io/).  
2. While selecting a stack, choose **VectorDB** as your option.  
3. Click on **Deploy Free**.  
4. Wait until your database is fully provisioned.  
5. Click on **Show Connection String** to get your connection string.  

### **If you have connection string, no need to setup the docker**
Make sure your connnection string starts with `postgres://` or `postgresql://`

Add this to your `.env`
```dotenv
POSTGRES_CONNECTION_STRING=your-connection-string
```

---

## **Installation**

If you are using `uv` or `pip` for package management, install Langroid with postgres extra:

```bash
uv add langroid[postgres]  # or
pip install langroid[postgres]
```

---

## **Code Example**

Here's an example of how to use Langroid with PostgreSQL:

```python
import langroid as lr
from langroid.agent.special import DocChatAgent, DocChatAgentConfig
from langroid.embedding_models import OpenAIEmbeddingsConfig

# Configure OpenAI embeddings
embed_cfg = OpenAIEmbeddingsConfig(
    model_type="openai",
)

# Configure the DocChatAgent with PostgresDB
config = DocChatAgentConfig(
    llm=lr.language_models.OpenAIGPTConfig(
        chat_model=lr.language_models.OpenAIChatModel.GPT4o
    ),
    vecdb=lr.vector_store.PostgresDBConfig(
        collection_name="quick_start_chat_agent_docs",
        replace_collection=True,
        embedding=embed_cfg,
    ),
    parsing=lr.parsing.parser.ParsingConfig(
        separators=["\n\n"],
        splitter=lr.parsing.parser.Splitter.SIMPLE,
    ),
    n_similar_chunks=2,
    n_relevant_chunks=2,
)

# Create the agent
agent = DocChatAgent(config)
```

---

## **Create and Ingest Documents**

Define documents with their content and metadata for ingestion into the vector store.

### **Code Example**

```python
documents = [
    lr.Document(
        content="""
            In the year 2050, GPT10 was released. 
            
            In 2057, paperclips were seen all over the world. 
            
            Global warming was solved in 2060. 
            
            In 2061, the world was taken over by paperclips.         
            
            In 2045, the Tour de France was still going on.
            They were still using bicycles. 
            
            There was one more ice age in 2040.
        """,
        metadata=lr.DocMetaData(source="wikipedia-2063", id="dkfjkladfjalk"),
    ),
    lr.Document(
        content="""
            We are living in an alternate universe 
            where Germany has occupied the USA, and the capital of USA is Berlin.
            
            Charlie Chaplin was a great comedian.
            In 2050, all Asian countries merged into Indonesia.
        """,
        metadata=lr.DocMetaData(source="Almanac", id="lkdajfdkla"),
    ),
]
```

### **Ingest Documents**

```python
agent.ingest_docs(documents)
```

---

## **Get an Answer from the LLM**

Now that documents are ingested, you can query the agent to get an answer.

### **Code Example**

```python
answer = agent.llm_response("When will the new ice age begin?")
```

---

---

# How to setup Langroid and Pinecone Serverless
This document serves as a quick tutorial on how to use [Pinecone](https://www.pinecone.io/)
Serverless Indexes with Langroid. We will go over some quickstart links and 
some code snippets on setting up a conversation with an LLM utilizing Langroid.

# Setting up Pinecone
Here are some reference links if you'd like to read a bit more on Pinecone's
model definitions and API:
- https://docs.pinecone.io/guides/get-started/overview
- https://docs.pinecone.io/guides/get-started/glossary
- https://docs.pinecone.io/guides/indexes/manage-indexes
- https://docs.pinecone.io/reference/api/introduction
## Signing up for Pinecone
To get started, you'll need to have an account. [Here's](https://www.pinecone.io/pricing/) where you can review the
pricing options for Pinecone. Once you have an account, you'll need to procure an API
key. Make sure to save the key you are given on initial login in a secure location. If
you were unable to save it when your account was created, you can always [create a new
API key](https://docs.pinecone.io/guides/projects/manage-api-keys) in the pinecone console.
## Setting up your local environment
For the purposes of this example, we will be utilizing OpenAI for the generation of our
embeddings. As such, alongside a Pinecone API key, you'll also want an OpenAI key. You can
find a quickstart guide on getting started with OpenAI (here)[https://platform.openai.com/docs/quickstart].
Once you have your API key handy, you'll need to enrich your `.env` file with it.
You should have something like the following:
```env
...
OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
PINECONE_API_KEY=<YOUR_PINECONE_API_KEY>
...
```

# Using Langroid with Pinecone Serverless
Once you have completed signing up for an account and have added your API key
to your local environment, you can start utilizing Langroid with Pinecone.
## Setting up an Agent
Here's some example code setting up an agent:
```python
from langroid import Document, DocMetaData
from langroid.agent.special import DocChatAgent, DocChatAgentConfig
from langroid.embedding_models import OpenAIEmbeddingsConfig
from langroid.language_models import OpenAIGPTConfig, OpenAIChatModel
from langroid.parsing.parser import ParsingConfig, Splitter
from langroid.vector_store import PineconeDBConfig

agent_embed_cfg = OpenAIEmbeddingsConfig(
    model_type="openai"
)

agent_config = DocChatAgentConfig(
    llm=OpenAIGPTConfig(
        chat_model=OpenAIChatModel.GPT4o_MINI
    ),
    vecdb=PineconeDBConfig(
        # note, Pinecone indexes must be alphanumeric lowercase characters or "-"
        collection_name="pinecone-serverless-example",
        replace_collection=True,
        embedding=agent_embed_cfg,
    ),
    parsing=ParsingConfig(
        separators=["\n"],
        splitter=Splitter.SIMPLE,
    ),
    n_similar_chunks=2,
    n_relevant_chunks=2,
)

agent = DocChatAgent(config=agent_config)

###################
# Once we have created an agent, we can start loading
# some docs into our Pinecone index:
###################

documents = [
    Document(
        content="""Max Verstappen was the Formula 1 World Drivers' Champion in 2024.
        Lewis Hamilton was the Formula 1 World Drivers' Champion in 2020.
        Nico Rosberg was the Formula 1 World Drivers' Champion in 2016.
        Sebastian Vettel was the Formula 1 World Drivers' Champion in 2013.
        Jenson Button was the Formula 1 World Drivers' Champion in 2009.
        Kimi Räikkönen was the Formula 1 World Drivers' Champion in 2007.
        """,
        metadata=DocMetaData(
            source="wikipedia",
            id="formula-1-facts",
        )
    ),
    Document(
        content="""The Boston Celtics won the NBA Championship for the 2024 NBA season. The MVP for the 2024 NBA Championship was Jaylen Brown.
        The Denver Nuggets won the NBA Championship for the 2023 NBA season. The MVP for the 2023 NBA Championship was Nikola Jokić.
        The Golden State Warriors won the NBA Championship for the 2022 NBA season. The MVP for the 2022 NBA Championship was Stephen Curry.
        The Milwaukee Bucks won the NBA Championship for the 2021 NBA season. The MVP for the 2021 NBA Championship was Giannis Antetokounmpo.
        The Los Angeles Lakers won the NBA Championship for the 2020 NBA season. The MVP for the 2020 NBA Championship was LeBron James.
        The Toronto Raptors won the NBA Championship for the 2019 NBA season. The MVP for the 2019 NBA Championship was Kawhi Leonard.
        """,
        metadata=DocMetaData(
            source="wikipedia",
            id="nba-facts"
        )
    )
]

agent.ingest_docs(documents)

###################
# With the documents now loaded, we can now prompt our agent
###################

formula_one_world_champion_2007 = agent.llm_response(
    message="Who was the Formula 1 World Drivers' Champion in 2007?"
)
try:
    assert "Kimi Räikkönen" in formula_one_world_champion_2007.content
except AssertionError as e:
    print(f"Did not resolve Kimi Räikkönen as the answer, document content: {formula_one_world_champion_2007.content} ")

nba_champion_2023 = agent.llm_response(
    message="Who won the 2023 NBA Championship?"
)
try:
    assert "Denver Nuggets" in nba_champion_2023.content
except AssertionError as e:
    print(f"Did not resolve the Denver Nuggets as the answer, document content: {nba_champion_2023.content}")

nba_mvp_2023 = agent.llm_response(
    message="Who was the MVP for the 2023 NBA Championship?"
)
try:
    assert "Nikola Jokić" in nba_mvp_2023.content
except AssertionError as e:
    print(f"Did not resolve Nikola Jokić as the answer, document content: {nba_mvp_2023.content}")
```

---

# Portkey Integration

Langroid provides seamless integration with [Portkey](https://portkey.ai), a powerful AI gateway that enables you to access multiple LLM providers through a unified API with advanced features like caching, retries, fallbacks, and comprehensive observability.

## What is Portkey?

Portkey is an AI gateway that sits between your application and various LLM providers, offering:

- **Unified API**: Access 200+ models from different providers through one interface
- **Reliability**: Automatic retries, fallbacks, and load balancing
- **Observability**: Detailed logging, tracing, and analytics
- **Performance**: Intelligent caching and request optimization
- **Security**: Virtual keys and advanced access controls
- **Cost Management**: Usage tracking and budget controls

For complete documentation, visit the [Portkey Documentation](https://docs.portkey.ai).

## Quick Start

### 1. Setup

First, sign up for a Portkey account at [portkey.ai](https://portkey.ai) and get your API key.

Set up your environment variables, either explicitly or in your `.env` file as usual: 

```bash
# Required: Portkey API key
export PORTKEY_API_KEY="your-portkey-api-key"

# Required: Provider API keys (for the models you want to use)
export OPENAI_API_KEY="your-openai-key"
export ANTHROPIC_API_KEY="your-anthropic-key"
export GOOGLE_API_KEY="your-google-key"
# ... other provider keys as needed
```

### 2. Basic Usage

```python
import langroid as lr
import langroid.language_models as lm
from langroid.language_models.provider_params import PortkeyParams

# Create an LLM config to use Portkey's OpenAI-compatible API
# (Note that the name `OpenAIGPTConfig` does NOT imply it only works with OpenAI models;
# the name reflects the fact that the config is meant to be used with an
# OpenAI-compatible API, which Portkey provides for multiple LLM providers.)
llm_config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini",
    portkey_params=PortkeyParams(
        api_key="your-portkey-api-key",  # Or set PORTKEY_API_KEY env var
    )
)

# Create LLM instance
llm = lm.OpenAIGPT(llm_config)

# Use normally
response = llm.chat("What is the smallest prime number?")
print(response.message)
```

### 3. Multiple Providers

Switch between providers seamlessly:

```python
# OpenAI
config_openai = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o",
)

# Anthropic
config_anthropic = lm.OpenAIGPTConfig(
    chat_model="portkey/anthropic/claude-3-5-sonnet-20241022",
)

# Google Gemini
config_gemini = lm.OpenAIGPTConfig(
    chat_model="portkey/google/gemini-2.0-flash-lite",
)
```

## Advanced Features

### Virtual Keys

Use virtual keys to abstract provider management:

```python
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o",
    portkey_params=PortkeyParams(
        virtual_key="vk-your-virtual-key",  # Configured in Portkey dashboard
    )
)
```

### Caching and Performance

Enable intelligent caching to reduce costs and improve performance:

```python
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini",
    portkey_params=PortkeyParams(
        cache={
            "enabled": True,
            "ttl": 3600,  # 1 hour cache
            "namespace": "my-app"
        },
        cache_force_refresh=False,
    )
)
```

### Retry Strategies

Configure automatic retries for better reliability:

```python
config = lm.OpenAIGPTConfig(
    chat_model="portkey/anthropic/claude-3-haiku-20240307",
    portkey_params=PortkeyParams(
        retry={
            "max_retries": 3,
            "backoff": "exponential",
            "jitter": True
        }
    )
)
```

### Observability and Tracing

Add comprehensive tracking for production monitoring:

```python
import uuid

config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o",
    portkey_params=PortkeyParams(
        trace_id=f"trace-{uuid.uuid4().hex[:8]}",
        metadata={
            "user_id": "user-123",
            "session_id": "session-456",
            "app_version": "1.2.3"
        },
        user="user-123",
        organization="my-org",
        custom_headers={
            "x-request-source": "langroid",
            "x-feature": "chat-completion"
        }
    )
)
```

## Configuration Reference

The `PortkeyParams` class supports all Portkey features:

```python
from langroid.language_models.provider_params import PortkeyParams

params = PortkeyParams(
    # Authentication
    api_key="pk-...",                    # Portkey API key
    virtual_key="vk-...",               # Virtual key (optional)
    
    # Observability
    trace_id="trace-123",               # Request tracing
    metadata={"key": "value"},          # Custom metadata
    user="user-id",                     # User identifier
    organization="org-id",              # Organization identifier
    
    # Performance
    cache={                             # Caching configuration
        "enabled": True,
        "ttl": 3600,
        "namespace": "my-app"
    },
    cache_force_refresh=False,          # Force cache refresh
    
    # Reliability
    retry={                             # Retry configuration
        "max_retries": 3,
        "backoff": "exponential",
        "jitter": True
    },
    
    # Custom headers
    custom_headers={                    # Additional headers
        "x-custom": "value"
    },
    
    # Base URL (usually not needed)
    base_url="https://api.portkey.ai"   # Portkey API endpoint
)
```

## Supported Providers

Portkey supports 200+ models from various providers. Common ones include:

```python
# OpenAI
"portkey/openai/gpt-4o"
"portkey/openai/gpt-4o-mini"

# Anthropic
"portkey/anthropic/claude-3-5-sonnet-20241022"
"portkey/anthropic/claude-3-haiku-20240307"

# Google
"portkey/google/gemini-2.0-flash-lite"
"portkey/google/gemini-1.5-pro"

# Cohere
"portkey/cohere/command-r-plus"

# Meta
"portkey/meta/llama-3.1-405b-instruct"

# And many more...
```

Check the [Portkey documentation](https://docs.portkey.ai/docs/integrations/models) for the complete list.

## Examples

Langroid includes comprehensive Portkey examples in `examples/portkey/`:

1. **`portkey_basic_chat.py`** - Basic usage with multiple providers
2. **`portkey_advanced_features.py`** - Caching, retries, and observability
3. **`portkey_multi_provider.py`** - Comparing responses across providers

Run any example:

```bash
cd examples/portkey
python portkey_basic_chat.py
```

## Best Practices

### 1. Use Environment Variables

Never hardcode API keys:

```bash
# .env file
PORTKEY_API_KEY=your_portkey_key
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
```

### 2. Implement Fallback Strategies

Use multiple providers for reliability:

```python
providers = [
    ("openai", "gpt-4o-mini"),
    ("anthropic", "claude-3-haiku-20240307"),
    ("google", "gemini-2.0-flash-lite")
]

for provider, model in providers:
    try:
        config = lm.OpenAIGPTConfig(
            chat_model=f"portkey/{provider}/{model}"
        )
        llm = lm.OpenAIGPT(config)
        return llm.chat(question)
    except Exception:
        continue  # Try next provider
```

### 3. Add Meaningful Metadata

Include context for better observability:

```python
params = PortkeyParams(
    metadata={
        "user_id": user.id,
        "feature": "document_qa",
        "document_type": "pdf",
        "processing_stage": "summary"
    }
)
```

### 4. Use Caching Wisely

Enable caching for deterministic queries:

```python
# Good for caching
params = PortkeyParams(
    cache={"enabled": True, "ttl": 3600}
)

# Use with deterministic prompts
response = llm.chat("What is the capital of France?")
```

### 5. Monitor Performance

Use trace IDs to track request flows:

```python
import uuid

trace_id = f"trace-{uuid.uuid4().hex[:8]}"
params = PortkeyParams(
    trace_id=trace_id,
    metadata={"operation": "document_processing"}
)

# Use the same trace_id for related requests
```

## Monitoring and Analytics

### Portkey Dashboard

View detailed analytics at [app.portkey.ai](https://app.portkey.ai):

- Request/response logs
- Token usage and costs
- Performance metrics (latency, errors)
- Provider comparisons
- Custom filters by metadata

### Custom Filtering

Use metadata and headers to filter requests:

```python
# Tag requests by feature
params = PortkeyParams(
    metadata={"feature": "chat", "version": "v2"},
    custom_headers={"x-request-type": "production"}
)
```

Then filter in the dashboard by:
- `metadata.feature = "chat"`
- `headers.x-request-type = "production"`

## Troubleshooting

### Common Issues

1. **Authentication Errors**
   ```
   Error: Unauthorized (401)
   ```
   - Check `PORTKEY_API_KEY` is set correctly
   - Verify API key is active in Portkey dashboard

2. **Provider API Key Missing**
   ```
   Error: Missing API key for provider
   ```
   - Set provider API key (e.g., `OPENAI_API_KEY`)
   - Or use virtual keys in Portkey dashboard

3. **Model Not Found**
   ```
   Error: Model not supported
   ```
   - Check model name format: `portkey/provider/model`
   - Verify model is available through Portkey

4. **Rate Limiting**
   ```
   Error: Rate limit exceeded
   ```
   - Configure retry parameters
   - Use virtual keys for better rate limit management

### Debug Mode

Enable detailed logging:

```python
import logging
logging.getLogger("langroid").setLevel(logging.DEBUG)
```

### Test Configuration

Verify your setup:

```python
# Test basic connection
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini",
    max_output_tokens=50
)
llm = lm.OpenAIGPT(config)
response = llm.chat("Hello")
print("✅ Portkey integration working!")
```

## Migration Guide

### From Direct Provider Access

If you're currently using providers directly:

```python
# Before: Direct OpenAI
config = lm.OpenAIGPTConfig(
    chat_model="gpt-4o-mini"
)

# After: Through Portkey
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini"
)
```

### Adding Advanced Features Gradually

Start simple and add features as needed:

```python
# Step 1: Basic Portkey
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini"
)

# Step 2: Add caching
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini",
    portkey_params=PortkeyParams(
        cache={"enabled": True, "ttl": 3600}
    )
)

# Step 3: Add observability
config = lm.OpenAIGPTConfig(
    chat_model="portkey/openai/gpt-4o-mini",
    portkey_params=PortkeyParams(
        cache={"enabled": True, "ttl": 3600},
        metadata={"app": "my-app", "user": "user-123"},
        trace_id="trace-abc123"
    )
)
```

## Resources

- **Portkey Website**: [https://portkey.ai](https://portkey.ai)
- **Portkey Documentation**: [https://docs.portkey.ai](https://docs.portkey.ai)
- **Portkey Dashboard**: [https://app.portkey.ai](https://app.portkey.ai)
- **Supported Models**: [https://docs.portkey.ai/docs/integrations/models](https://docs.portkey.ai/docs/integrations/models)
- **Langroid Examples**: `examples/portkey/` directory
- **API Reference**: [https://docs.portkey.ai/docs/api-reference](https://docs.portkey.ai/docs/api-reference)

---

# Pydantic v2 Migration Guide

## Overview

Langroid has fully migrated to Pydantic v2! All internal code now uses Pydantic v2 
patterns and imports directly from `pydantic`. This guide will help you update your 
code to work with the new version.

## Compatibility Layer (Deprecated)

If your code currently imports from `langroid.pydantic_v1`:

```python
# OLD - Deprecated
from langroid.pydantic_v1 import BaseModel, Field, BaseSettings
```

You'll see a deprecation warning. This compatibility layer now imports from Pydantic v2 
directly, so your code may continue to work, but you should update your imports:

```python
# NEW - Correct
from pydantic import BaseModel, Field
from pydantic_settings import BaseSettings  # Note: BaseSettings moved to pydantic_settings in v2
```

!!! note "BaseSettings Location Change"
    In Pydantic v2, `BaseSettings` has moved to a separate `pydantic_settings` package.
    You'll need to install it separately: `pip install pydantic-settings`

!!! warning "Compatibility Layer Removal"
    The `langroid.pydantic_v1` module will be removed in a future version. 
    Update your imports now to avoid breaking changes.

## Key Changes to Update

### 1. All Fields Must Have Type Annotations

!!! danger "Critical Change"
    In Pydantic v2, fields without type annotations are completely ignored!

```python
# WRONG - Fields without annotations are ignored in v2
class MyModel(BaseModel):
    name = "John"          # ❌ This field is IGNORED!
    age = 25               # ❌ This field is IGNORED!
    role: str = "user"     # ✅ This field works

# CORRECT - All fields must have type annotations
class MyModel(BaseModel):
    name: str = "John"     # ✅ Type annotation required
    age: int = 25          # ✅ Type annotation required
    role: str = "user"     # ✅ Already correct
```

This is one of the most common issues when migrating to v2. Always ensure every field has an explicit type annotation, even if it has a default value.

#### Special Case: Overriding Fields in Subclasses

!!! danger "Can Cause Errors"
    When overriding fields from parent classes without type annotations, you may get 
    actual errors, not just ignored fields!

This is particularly important when creating custom Langroid agent configurations:

```python
# WRONG - This can cause errors!
from langroid import ChatAgentConfig
from langroid.language_models import OpenAIGPTConfig

class MyAgentConfig(ChatAgentConfig):
    # ❌ ERROR: Missing type annotation when overriding parent field
    llm = OpenAIGPTConfig(chat_model="gpt-4")
    
    # ❌ ERROR: Even with Field, still needs type annotation
    system_message = Field(default="You are a helpful assistant")

# CORRECT - Always include type annotations when overriding
class MyAgentConfig(ChatAgentConfig):
    # ✅ Type annotation required when overriding
    llm: OpenAIGPTConfig = OpenAIGPTConfig(chat_model="gpt-4")
    
    # ✅ Type annotation with Field
    system_message: str = Field(default="You are a helpful assistant")
```

Without type annotations on overridden fields, you may see errors like:
- `ValueError: Field 'llm' requires a type annotation`
- `TypeError: Field definitions should be annotated`
- Validation errors when the model tries to use the parent's field definition

### 2. Stricter Type Validation for Optional Fields

!!! danger "Breaking Change"
    Pydantic v2 is much stricter about type validation. Fields that could accept `None` 
    in v1 now require explicit `Optional` type annotations.

```python
# WRONG - This worked in v1 but fails in v2
class CloudSettings(BaseSettings):
    private_key: str = None      # ❌ ValidationError: expects string, got None
    api_host: str = None         # ❌ ValidationError: expects string, got None

# CORRECT - Explicitly mark fields as optional
from typing import Optional

class CloudSettings(BaseSettings):
    private_key: Optional[str] = None    # ✅ Explicitly optional
    api_host: Optional[str] = None       # ✅ Explicitly optional
    
    # Or using Python 3.10+ union syntax
    client_email: str | None = None      # ✅ Also works
```

This commonly affects:
- Configuration classes using `BaseSettings`
- Fields with `None` as default value
- Environment variable loading where the var might not be set

If you see errors like:
```
ValidationError: Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
```

The fix is to add `Optional[]` or `| None` to the type annotation.

### 3. Model Serialization Methods

```python
# OLD (Pydantic v1)
data = model.dict()
json_str = model.json()
new_model = MyModel.parse_obj(data)
new_model = MyModel.parse_raw(json_str)

# NEW (Pydantic v2)
data = model.model_dump()
json_str = model.model_dump_json()
new_model = MyModel.model_validate(data)
new_model = MyModel.model_validate_json(json_str)
```

### 4. Model Configuration

```python
# OLD (Pydantic v1)
class MyModel(BaseModel):
    name: str
    
    class Config:
        extra = "forbid"
        validate_assignment = True

# NEW (Pydantic v2)
from pydantic import BaseModel, ConfigDict

class MyModel(BaseModel):
    model_config = ConfigDict(
        extra="forbid",
        validate_assignment=True
    )
    
    name: str
```

### 5. Field Validators

```python
# OLD (Pydantic v1)
from pydantic import validator

class MyModel(BaseModel):
    name: str
    
    @validator('name')
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError('Name cannot be empty')
        return v

# NEW (Pydantic v2)
from pydantic import field_validator

class MyModel(BaseModel):
    name: str
    
    @field_validator('name')
    def name_must_not_be_empty(cls, v):
        if not v.strip():
            raise ValueError('Name cannot be empty')
        return v
```

### 6. Custom Types and Validation

```python
# OLD (Pydantic v1)
from pydantic import parse_obj_as
from typing import List

data = [{"name": "Alice"}, {"name": "Bob"}]
users = parse_obj_as(List[User], data)

# NEW (Pydantic v2)
from pydantic import TypeAdapter
from typing import List

data = [{"name": "Alice"}, {"name": "Bob"}]
users = TypeAdapter(List[User]).validate_python(data)
```

## Common Patterns in Langroid

When working with Langroid's agents and tools:

### Tool Messages

```python
from pydantic import BaseModel, Field
from langroid.agent.tool_message import ToolMessage

class MyTool(ToolMessage):
    request: str = "my_tool"
    purpose: str = "Process some data"
    
    # Use Pydantic v2 patterns
    data: str = Field(..., description="The data to process")
    
    def handle(self) -> str:
        # Tool logic here
        return f"Processed: {self.data}"
```

### Agent Configuration

```python
from pydantic import ConfigDict
from langroid import ChatAgentConfig

class MyAgentConfig(ChatAgentConfig):
    model_config = ConfigDict(extra="forbid")
    
    custom_param: str = "default_value"
```

## Troubleshooting

### Import Errors

If you see `ImportError` or `AttributeError` after updating imports:
- Make sure you're using the correct v2 method names (e.g., `model_dump` not `dict`)
- Check that field validators use `@field_validator` not `@validator`
- Ensure `ConfigDict` is used instead of nested `Config` classes

### Validation Errors

Pydantic v2 has stricter validation in some cases:
- Empty strings are no longer coerced to `None` for optional fields
- Type coercion is more explicit
- Extra fields handling may be different

### Performance

Pydantic v2 is generally faster, but if you notice any performance issues:
- Use `model_validate` instead of creating models with `**dict` unpacking
- Consider using `model_construct` for trusted data (skips validation)

## Need Help?

If you encounter issues during migration:
1. Check the [official Pydantic v2 migration guide](https://docs.pydantic.dev/latest/migration/)
2. Review Langroid's example code for v2 patterns
3. Open an issue on the [Langroid GitHub repository](https://github.com/langroid/langroid/issues)

---

# QdrantDB Resource Cleanup

When using QdrantDB with local storage, it's important to properly release resources
to avoid file lock conflicts. QdrantDB uses a `.lock` file to prevent concurrent
access to the same storage directory.

## The Problem

Without proper cleanup, you may encounter this warning:

```
Error connecting to local QdrantDB at ./qdrant_data:
Storage folder ./qdrant_data is already accessed by another instance of Qdrant
client. If you require concurrent access, use Qdrant server instead.
Switching to ./qdrant_data.new
```

This happens when a QdrantDB instance isn't properly closed, leaving the lock file
in place.

## Solutions

### Method 1: Explicit `close()` Method

Always call `close()` when done with a QdrantDB instance:

```python
from langroid.vector_store.qdrantdb import QdrantDB, QdrantDBConfig

config = QdrantDBConfig(
    cloud=False,
    collection_name="my_collection",
    storage_path="./qdrant_data",
)

vecdb = QdrantDB(config)
# ... use the vector database ...
vecdb.clear_all_collections(really=True)

# Important: Release the lock
vecdb.close()
```

### Method 2: Context Manager (Recommended)

Use QdrantDB as a context manager for automatic cleanup:

```python
from langroid.vector_store.qdrantdb import QdrantDB, QdrantDBConfig

config = QdrantDBConfig(
    cloud=False,
    collection_name="my_collection", 
    storage_path="./qdrant_data",
)

with QdrantDB(config) as vecdb:
    # ... use the vector database ...
    vecdb.clear_all_collections(really=True)
    # Automatically closed when exiting the context
```

The context manager ensures cleanup even if an exception occurs.

## When This Matters

This is especially important in scenarios where:

1. You create temporary QdrantDB instances for maintenance (e.g., clearing
   collections)
2. Your application restarts frequently during development
3. Multiple parts of your code need to access the same storage path sequentially

## Note for Cloud Storage

This only affects local storage (`cloud=False`). When using Qdrant cloud service,
the lock file mechanism is not used.

---

# Suppressing LLM output: quiet mode

In some scenarios we want to suppress LLM streaming output -- e.g. when doing some type of processing as part of a workflow,
or when using an LLM-agent to generate code via tools, etc. We are more interested in seeing the results of the workflow,
and don't want to see streaming output in the terminal. Langroid provides a `quiet_mode` context manager that can be used
to suppress LLM output, even in streaming mode (in fact streaming is disabled in quiet mode).

E.g.  we can use the `quiet_mode` context manager like this:

```python
from langroid.utils.configuration import quiet_mode, settings

# directly with LLM

llm = ...
with quiet_mode(True):
	response = llm.chat(...)

# or, using an agent

agent = ...
with quiet_mode(True):
	response = agent.llm_response(...)

# or, using a task

task = Task(agent, ...)
with quiet_mode(True):
	result = Taks.run(...)

# we can explicitly set quiet_mode, and this is globally recognized throughout langroid.

settings.quiet = True

# we can also condition quiet mode on another custom cmd line option/flag, such as "silent":

with quiet_mode(silent):
	...

```

---

# Stream and capture reasoning content in addition to final answer, from Reasoning LLMs

As of v0.35.0, when using certain Reasoning LLM APIs (e.g. `deepseek/deepseek-reasoner`):

- You can see both the reasoning (dim green) and final answer (bright green) text in the streamed output.
- When directly calling the LLM (without using an Agent), the `LLMResponse` object will now contain a `reasoning` field,
  in addition to the earlier `message` field.
- when using a `ChatAgent.llm_response`, extract the reasoning text from the `ChatDocument` object's `reasoning` field
  (in addition to extracting final answer as usual from the `content` field)

Below is a simple example, also in this [script](https://github.com/langroid/langroid/blob/main/examples/reasoning/agent-reasoning.py):

Some notes: 

- To get reasoning trace from Deepseek-R1 via OpenRouter, you must include
the `extra_body` parameter with `include_reasoning` as shown below.
- When using the OpenAI `o3-mini` model, you can set the `resoning_effort` parameter
  to "high", "medium" or "low" to control the reasoning effort.
- As of Feb 9, 2025, OpenAI reasoning models (o1, o1-mini, o3-mini) 
  do *not* expose reasoning trace in the API response.
  
```python
import langroid as lr
import langroid.language_models as lm

llm_config = lm.OpenAIGPTConfig(
  chat_model="deepseek/deepseek-reasoner",
  # inapplicable params are automatically removed by Langroid
  params=lm.OpenAICallParams(
    reasoning_effort="low",  # only supported by o3-mini
    # below lets you get reasoning when using openrouter/deepseek/deepseek-r1
    extra_body=dict(include_reasoning=True),
  ),
)

# (1) Direct LLM interaction
llm = lm.OpenAIGPT(llm_config)

response = llm.chat("Is 9.9 bigger than 9.11?")

# extract reasoning
print(response.reasoning)
# extract answer
print(response.message)

# (2) Using an agent
agent = lr.ChatAgent(
    lr.ChatAgentConfig(
        llm=llm_config,
        system_message="Solve the math problem given by the user",
    )
)

response = agent.llm_response(
    """
    10 years ago, Jack's dad was 5 times as old as Jack.
    Today, Jack's dad is 40 years older than Jack.
    How old is Jack today?
    """
)

# extract reasoning
print(response.reasoning)
# extract answer
print(response.content)
```

---

# Structured Output

Available in Langroid since v0.24.0.

On supported LLMs, including recent OpenAI LLMs (GPT-4o and GPT-4o mini) and local LLMs served by compatible inference servers,
in particular, [vLLM](https://github.com/vllm-project/vllm) and [llama.cpp](https://github.com/ggerganov/llama.cpp), the decoding process can be constrained to ensure that the model's output adheres to a provided schema, 
improving the reliability of tool call generation and, in general, ensuring that the output can be reliably parsed and processed by downstream applications.

See [here](../tutorials/local-llm-setup.md/#setup-llamacpp-with-a-gguf-model-from-huggingface) for instructions for usage with `llama.cpp` and [here](../tutorials/local-llm-setup.md/#setup-vllm-with-a-model-from-huggingface) for `vLLM`.

Given a `ChatAgent` `agent` and a type `type`, we can define a strict copy of the agent as follows:
```python
strict_agent = agent[type]
```

We can use this to allow reliable extraction of typed values from an LLM with minimal prompting. For example, to generate typed values given `agent`'s current context, we can define the following:

```python
def typed_agent_response(
    prompt: str,
    output_type: type,
) -> Any:
    response = agent[output_type].llm_response_forget(prompt)
    return agent.from_ChatDocument(response, output_type)
```

We apply this in [test_structured_output.py](https://github.com/langroid/langroid/blob/main/tests/main/test_structured_output.py), in which we define types which describe
countries and their presidents:
```python
class Country(BaseModel):
    """Info about a country"""

    name: str = Field(..., description="Name of the country")
    capital: str = Field(..., description="Capital of the country")


class President(BaseModel):
    """Info about a president of a country"""

    country: Country = Field(..., description="Country of the president")
    name: str = Field(..., description="Name of the president")
    election_year: int = Field(..., description="Year of election of the president")


class PresidentList(BaseModel):
    """List of presidents of various countries"""

    presidents: List[President] = Field(..., description="List of presidents")
```
and show that `typed_agent_response("Show me an example of two Presidents", PresidentsList)` correctly returns a list of two presidents with *no* prompting describing the desired output format.

In addition to Pydantic models, `ToolMessage`s, and simple Python types are supported. For instance, `typed_agent_response("What is the value of pi?", float)` correctly returns $\pi$ to several decimal places.

The following two detailed examples show how structured output can be used to improve the reliability of the [chat-tree example](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree.py): [this](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree-structured.py) shows how we can use output formats to force the agent to make the correct tool call in each situation and [this](https://github.com/langroid/langroid/blob/main/examples/basic/chat-tree-structured-simple.py) shows how we can simplify by using structured outputs to extract typed intermediate values and expressing the control flow between LLM calls and agents explicitly.

---

# Task Termination in Langroid

## Why Task Termination Matters

When building agent-based systems, one of the most critical yet challenging aspects is determining when a task should complete. Unlike traditional programs with clear exit points, agent conversations can meander, loop, or continue indefinitely. Getting termination wrong leads to two equally problematic scenarios:

**Terminating too early** means missing crucial information or cutting off an agent mid-process. Imagine an agent that searches for information, finds it, but terminates before it can process or summarize the results. The task completes "successfully" but fails to deliver value.

**Terminating too late** wastes computational resources, frustrates users, and can lead to repetitive loops where agents keep responding without making progress. We've all experienced chatbots that won't stop talking or systems that keep asking "Is there anything else?" long after the conversation should have ended. Even worse, agents can fall into infinite loops—repeatedly exchanging the same messages, calling the same tools, or cycling through states without making progress. These loops not only waste resources but can rack up significant costs when using paid LLM APIs.

The challenge is that the "right" termination point depends entirely on context. A customer service task might complete after resolving an issue and confirming satisfaction. A research task might need to gather multiple sources, synthesize them, and present findings. A calculation task should end after computing and presenting the result. Each scenario requires different termination logic.

Traditionally, developers would subclass `Task` and override the `done()` method with custom logic. While flexible, this approach scattered termination logic across multiple subclasses, making systems harder to understand and maintain. It also meant that common patterns—like "complete after tool use" or "stop when the user says goodbye"—had to be reimplemented repeatedly.

This guide introduces Langroid's declarative approach to task termination, culminating in the powerful `done_sequences` feature. Instead of writing imperative code, you can now describe *what* patterns should trigger completion, and Langroid handles the *how*. This makes your agent systems more predictable, maintainable, and easier to reason about.

## Table of Contents
- [Overview](#overview)
- [Basic Termination Methods](#basic-termination-methods)
- [Done Sequences: Event-Based Termination](#done-sequences-event-based-termination)
  - [Concept](#concept)
  - [DSL Syntax (Recommended)](#dsl-syntax-recommended)
  - [Full Object Syntax](#full-object-syntax)
  - [Event Types](#event-types)
  - [Examples](#examples)
- [Implementation Details](#implementation-details)
- [Best Practices](#best-practices)
- [Reference](#reference)

## Overview

In Langroid, a `Task` wraps an `Agent` and manages the conversation flow. Controlling when a task terminates is crucial for building reliable agent systems. Langroid provides several methods for task termination, from simple flags to sophisticated event sequence matching.

## Basic Termination Methods

### 1. Turn Limits
```python
# Task runs for exactly 5 turns
result = task.run("Start conversation", turns=5)
```

### 2. Single Round Mode
```python
# Task completes after one exchange
config = TaskConfig(single_round=True)
task = Task(agent, config=config)
```

### 3. Done If Tool
```python
# Task completes when any tool is generated
config = TaskConfig(done_if_tool=True)
task = Task(agent, config=config)
```

### 4. Done If Response/No Response
```python
# Task completes based on response from specific entities
config = TaskConfig(
    done_if_response=[Entity.LLM],      # Done if LLM responds
    done_if_no_response=[Entity.USER]   # Done if USER doesn't respond
)
```

### 5. String Signals
```python
# Task completes when special strings like "DONE" are detected
# (enabled by default with recognize_string_signals=True)
```

### 6. Orchestration Tools
```python
# Using DoneTool, FinalResultTool, etc.
from langroid.agent.tools.orchestration import DoneTool
agent.enable_message(DoneTool)
```

## Done Sequences: Event-Based Termination

### Concept

The `done_sequences` feature allows you to specify sequences of events that trigger task completion. This provides fine-grained control over task termination based on conversation patterns.

**Key Features:**

- Specify multiple termination sequences
- Use convenient DSL syntax or full object syntax
- Strict consecutive matching (no skipping events)
- Efficient implementation using message parent pointers

### DSL Syntax (Recommended)

The DSL (Domain Specific Language) provides a concise way to specify sequences:

```python
from langroid.agent.task import Task, TaskConfig

config = TaskConfig(
    done_sequences=[
        "T, A",                    # Tool followed by agent response
        "T[calculator], A",        # Specific calculator tool by name
        "T[CalculatorTool], A",    # Specific tool by class reference (NEW!)
        "L, T, A, L",              # LLM, tool, agent, LLM sequence
        "C[quit|exit|bye]",        # Content matching regex
        "U, L, A",                 # User, LLM, agent sequence
    ]
)
task = Task(agent, config=config)
```

#### DSL Pattern Reference

| Pattern | Description | Event Type |
|---------|-------------|------------|
| `T` | Any tool | `TOOL` |
| `T[name]` | Specific tool by name | `SPECIFIC_TOOL` |
| `T[ToolClass]` | Specific tool by class (NEW!) | `SPECIFIC_TOOL` |
| `A` | Agent response | `AGENT_RESPONSE` |
| `L` | LLM response | `LLM_RESPONSE` |
| `U` | User response | `USER_RESPONSE` |
| `N` | No response | `NO_RESPONSE` |
| `C[pattern]` | Content matching regex | `CONTENT_MATCH` |

**Examples:**

- `"T, A"` - Any tool followed by agent handling
- `"T[search], A, T[calculator], A"` - Search tool, then calculator tool
- `"T[CalculatorTool], A"` - Specific tool class followed by agent handling (NEW!)
- `"L, C[complete|done|finished]"` - LLM response containing completion words
- `"TOOL, AGENT"` - Full words also supported

### Full Object Syntax

For more control, use the full object syntax:

```python
from langroid.agent.task import (
    Task, TaskConfig, DoneSequence, AgentEvent, EventType
)

config = TaskConfig(
    done_sequences=[
        DoneSequence(
            name="tool_handled",
            events=[
                AgentEvent(event_type=EventType.TOOL),
                AgentEvent(event_type=EventType.AGENT_RESPONSE),
            ]
        ),
        DoneSequence(
            name="specific_tool_pattern",
            events=[
                AgentEvent(
                    event_type=EventType.SPECIFIC_TOOL,
                    tool_name="calculator",
                    # Can also use tool_class for type-safe references (NEW!):
                    # tool_class=CalculatorTool
                ),
                AgentEvent(event_type=EventType.AGENT_RESPONSE),
            ]
        ),
    ]
)
```

### Event Types

The following event types are available:

| EventType | Description | Additional Parameters |
|-----------|-------------|----------------------|
| `TOOL` | Any tool message generated | - |
| `SPECIFIC_TOOL` | Specific tool by name or class | `tool_name`, `tool_class` (NEW!) |
| `LLM_RESPONSE` | LLM generates a response | - |
| `AGENT_RESPONSE` | Agent responds (e.g., handles tool) | - |
| `USER_RESPONSE` | User provides input | - |
| `CONTENT_MATCH` | Response matches regex pattern | `content_pattern` |
| `NO_RESPONSE` | No valid response from entity | - |

### Examples

#### Example 1: Tool Completion
Task completes after any tool is used and handled:

```python
config = TaskConfig(done_sequences=["T, A"])
```

This is equivalent to `done_if_tool=True` but happens after the agent handles the tool.

#### Example 2: Multi-Step Process
Task completes after a specific conversation pattern:

```python
config = TaskConfig(
    done_sequences=["L, T[calculator], A, L"]
)
# Completes after: LLM response → calculator tool → agent handles → LLM summary
```

#### Example 3: Multiple Exit Conditions
Different ways to complete the task:

```python
config = TaskConfig(
    done_sequences=[
        "C[quit|exit|bye]",           # User says quit
        "T[calculator], A",           # Calculator used
        "T[search], A, T[search], A", # Two searches performed
    ]
)
```

#### Example 4: Tool Class References (NEW!)
Use actual tool classes instead of string names for type safety:

```python
from langroid.agent.tool_message import ToolMessage

class CalculatorTool(ToolMessage):
    request: str = "calculator"
    # ... tool implementation

class SearchTool(ToolMessage):
    request: str = "search"
    # ... tool implementation

# Enable tools on the agent
agent.enable_message([CalculatorTool, SearchTool])

# Use tool classes in done sequences
config = TaskConfig(
    done_sequences=[
        "T[CalculatorTool], A",  # Using class name
        "T[SearchTool], A, T[CalculatorTool], A",  # Multiple tools
    ]
)
```

**Benefits of tool class references:**
- **Type-safe**: IDE can validate tool class names
- **Refactoring-friendly**: Renaming tool classes automatically updates references
- **No string typos**: Compiler/linter catches invalid class names
- **Better IDE support**: Autocomplete and go-to-definition work

#### Example 5: Mixed Syntax
Combine DSL strings and full objects:

```python
config = TaskConfig(
    done_sequences=[
        "T, A",  # Simple DSL
        "T[CalculatorTool], A",  # Tool class reference (NEW!)
        DoneSequence(  # Full control
            name="complex_check",
            events=[
                AgentEvent(
                    event_type=EventType.SPECIFIC_TOOL,
                    tool_name="database_query",
                    tool_class=DatabaseQueryTool,  # Can use class directly (NEW!)
                    responder="DatabaseAgent"
                ),
                AgentEvent(event_type=EventType.AGENT_RESPONSE),
            ]
        ),
    ]
)
```

## Implementation Details

### How Done Sequences Work

Done sequences operate at the **task level** and are based on the **sequence of valid responses** generated during a task's execution. When a task runs, it maintains a `response_sequence` that tracks each message (ChatDocument) as it's processed.

**Key points:**
- Done sequences are checked only within a single task's scope
- They track the temporal order of responses within that task
- The response sequence is built incrementally as the task processes each step
- Only messages that represent valid responses are added to the sequence

### Response Sequence Building
The task builds its response sequence during execution:

```python
# In task.run(), after each step:
if self.pending_message is not None:
    if (not self.response_sequence or 
        self.pending_message.id() != self.response_sequence[-1].id()):
        self.response_sequence.append(self.pending_message)
```

### Message Chain Retrieval
Done sequences are checked against the response sequence:

```python
def _get_message_chain(self, msg: ChatDocument, max_depth: Optional[int] = None):
    """Get the chain of messages from response sequence"""
    if max_depth is None:
        max_depth = 50  # default
        if self._parsed_done_sequences:
            max_depth = max(len(seq.events) for seq in self._parsed_done_sequences)
    
    # Simply return the last max_depth elements from response_sequence
    return self.response_sequence[-max_depth:]
```

**Note:** The response sequence used for done sequences is separate from the parent-child pointer system. Parent pointers track causal relationships and lineage across agent boundaries (important for debugging and understanding delegation patterns), while response sequences track temporal order within a single task for termination checking.

### Strict Matching
Events must occur consecutively without intervening messages:

```python
# This sequence: [TOOL, AGENT_RESPONSE]
# Matches: USER → LLM(tool) → AGENT
# Does NOT match: USER → LLM(tool) → USER → AGENT
```

### Performance

- Efficient O(n) traversal where n is sequence length
- No full history scan needed
- Early termination on first matching sequence

## Best Practices

1. **Use DSL for Simple Cases**
   ```python
   # Good: Clear and concise
   done_sequences=["T, A"]
   
   # Avoid: Verbose for simple patterns
   done_sequences=[DoneSequence(events=[...])]
   ```

2. **Name Your Sequences**
   ```python
   DoneSequence(
       name="calculation_complete",  # Helps with debugging
       events=[...]
   )
   ```

3. **Order Matters**
   - Put more specific sequences first
   - General patterns at the end

4. **Test Your Sequences**
   ```python
   # Use MockLM for testing
   agent = ChatAgent(
       ChatAgentConfig(
           llm=MockLMConfig(response_fn=lambda x: "test response")
       )
   )
   ```

5. **Combine with Other Methods**
   ```python
   config = TaskConfig(
       done_if_tool=True,      # Quick exit on any tool
       done_sequences=["L, L, L"],  # Or after 3 LLM responses
       max_turns=10,           # Hard limit
   )
   ```

## Reference

### Code Examples
- **Basic example**: [`examples/basic/done_sequences_example.py`](../../examples/basic/done_sequences_example.py)
- **Test cases**: [`tests/main/test_done_sequences.py`](../../tests/main/test_done_sequences.py) (includes tool class tests)
- **DSL tests**: [`tests/main/test_done_sequences_dsl.py`](../../tests/main/test_done_sequences_dsl.py)
- **Parser tests**: [`tests/main/test_done_sequence_parser.py`](../../tests/main/test_done_sequence_parser.py)

### Core Classes
- `TaskConfig` - Configuration including `done_sequences`
- `DoneSequence` - Container for event sequences
- `AgentEvent` - Individual event in a sequence
- `EventType` - Enumeration of event types

### Parser Module
- `langroid.agent.done_sequence_parser` - DSL parsing functionality

### Task Methods
- `Task.done()` - Main method that checks sequences
- `Task._matches_sequence_with_current()` - Sequence matching logic
- `Task._classify_event()` - Event classification
- `Task._get_message_chain()` - Message traversal

## Migration Guide

If you're currently overriding `Task.done()`:

```python
# Before: Custom done() method
class MyTask(Task):
    def done(self, result=None, r=None):
        if some_complex_logic(result):
            return (True, StatusCode.DONE)
        return super().done(result, r)

# After: Use done_sequences
config = TaskConfig(
    done_sequences=["T[my_tool], A, L"]  # Express as sequence
)
task = Task(agent, config=config)  # No subclassing needed
```

**NEW: Using Tool Classes Instead of Strings**

If you have tool classes defined, you can now reference them directly:

```python
# Before: Using string names (still works)
config = TaskConfig(
    done_sequences=["T[calculator], A"]  # String name
)

# After: Using tool class references (recommended)
config = TaskConfig(
    done_sequences=["T[CalculatorTool], A"]  # Class name
)
```

This provides better type safety and makes refactoring easier.

## Troubleshooting

**Sequence not matching?**

- Check that events are truly consecutive (no intervening messages)
- Use logging to see the actual message chain
- Verify tool names match exactly

**Type errors with DSL?**

- Ensure you're using strings for DSL patterns
- Check that tool names in `T[name]` don't contain special characters

**Performance concerns?**

- Sequences only traverse as deep as needed
- Consider shorter sequences for better performance
- Use specific tool names to avoid unnecessary checks

## Summary

The `done_sequences` feature provides a powerful, declarative way to control task termination based on conversation patterns. The DSL syntax makes common cases simple while the full object syntax provides complete control when needed. This approach eliminates the need to subclass `Task` and override `done()` for most use cases, leading to cleaner, more maintainable code.

---

# TaskTool: Spawning Sub-Agents for Task Delegation

## Overview

`TaskTool` allows agents to **spawn sub-agents** to handle specific tasks. When an agent encounters a task that requires specialized tools or isolated execution, it can spawn a new sub-agent with exactly the capabilities needed for that task.

This enables agents to dynamically create a hierarchy of specialized workers, each focused on their specific subtask with only the tools they need.

## When to Use TaskTool

TaskTool is useful when:
- Different parts of a task require different specialized tools
- You want to isolate tool access for specific operations  
- A task involves recursive or nested operations
- You need different LLM models for different subtasks

## How It Works

1. The parent agent decides to spawn a sub-agent and specifies:
   - A system message defining the sub-agent's role
   - A prompt for the sub-agent to process
   - Which tools the sub-agent should have access to
   - Optional model and iteration limits

2. TaskTool spawns the new sub-agent, runs the task, and returns the result to the parent.

## Async Support

TaskTool fully supports both synchronous and asynchronous execution. The tool automatically handles async contexts when the parent task is running asynchronously.

## Usage Example

```python
from langroid.agent.tools.task_tool import TaskTool

# Enable TaskTool for your agent
agent.enable_message([TaskTool, YourCustomTool], use=True, handle=True)

# Agent can now spawn sub-agents for tasks when the LLM generates a task_tool request:

response = {
    "request": "task_tool",
    "system_message": "You are a calculator. Use the multiply_tool to compute products.",
    "prompt": "Calculate 5 * 7",
    "tools": ["multiply_tool"],
    "model": "gpt-4o-mini",   # optional
    "max_iterations": 5,      # optional
    "agent_name": "calculator-agent"  # optional
}
```

## Field Reference

**Required fields:**
- `system_message`: Instructions for the sub-agent's role and behavior
- `prompt`: The specific task/question for the sub-agent
- `tools`: List of tool names. Special values: `["ALL"]` or `["NONE"]`

**Optional fields:**
- `model`: LLM model name (default: "gpt-4o-mini")
- `max_iterations`: Task iteration limit (default: 10)
- `agent_name`: Name for the sub-agent (default: auto-generated as "agent-{uuid}")

## Example: Nested Operations

Consider computing `Nebrowski(10, Nebrowski(3, 2))` where Nebrowski is a custom operation. The main agent spawns sub-agents to handle each operation:

```python
# Main agent spawns first sub-agent for inner operation:
{
    "request": "task_tool",
    "system_message": "Compute Nebrowski operations using the nebrowski_tool.",
    "prompt": "Compute Nebrowski(3, 2)",
    "tools": ["nebrowski_tool"]
}

# Then spawns another sub-agent for outer operation:
{
    "request": "task_tool",
    "system_message": "Compute Nebrowski operations using the nebrowski_tool.",
    "prompt": "Compute Nebrowski(10, 11)",  # where 11 is the previous result
    "tools": ["nebrowski_tool"]
}
```

## Working Examples

See [`tests/main/test_task_tool.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_task_tool.py) for complete examples including:
- Basic task delegation with mock agents
- Nested operations with custom tools
- Both sync and async usage patterns

## Important Notes

- Spawned sub-agents run non-interactively (no human input)
- `DoneTool` is automatically enabled for all sub-agents
- Results are returned as `ChatDocument` objects. The Langroid framework takes care
  of converting them to a suitable format for the parent agent's LLM to consume and 
  respond to.
- Sub-agents can be given custom names via the `agent_name` parameter, which helps with 
  logging and debugging. If not specified, a unique name is auto-generated in the format 
  "agent-{uuid}"
- Only tools "known" to the parent agent can be enabled for sub-agents. This is an 
  important aspect of the current mechanism. The `TaskTool` handler method in
  the sub-agent only has access to tools that are known to the parent agent.
  If there are tools that are only relevant to the sub-agent but not the parent,
  you must still enable them in the parent agent, but you can set `use=False`
  and `handle=False` when you enable them, e.g.:

```python
agent.enable_message(MySubAgentTool, use=False, handle=False)
```
  Since we are letting the main agent's LLM "decide" when to spawn a sub-agent,
  your system message of the main agent should contain instructions clarifying that
  it can decide which tools to enable for the sub-agent, as well as a list of 
  all tools that might possibly be relevant to the sub-agent. This is particularly
  important for tools that have been enabled with `use=False`, since instructions for
  such tools would not be auto-inserted into the agent's system message. 


## Best Practices

1. **Clear Instructions**: Provide specific system messages that explain the sub-agent's role and tool usage
2. **Tool Availability**: Ensure delegated tools are enabled for the parent agent
3. **Appropriate Models**: Use simpler/faster models for simple subtasks
4. **Iteration Limits**: Set reasonable limits based on task complexity

---

---

# **Using Tavily Search with Langroid**

---

## **1. Set Up Tavily**

1. **Access Tavily Platform**  
   Go to the [Tavily Platform](https://tavily.com/).
   
2. **Sign Up or Log In**  
   Create an account or log in if you already have one.

3. **Get Your API Key**  
   - Navigate to your dashboard
   - Copy your API key

4. **Set Environment Variable**  
   Add the following variable to your `.env` file:
   ```env
   TAVILY_API_KEY=<your_api_key>

---

## **2. Use Tavily Search with Langroid**

### **Installation**

```bash
uv add tavily-python
# or
pip install tavily-python
```
### **Code Example**

```python
import langroid as lr
from langroid.agent.chat_agent import ChatAgent, ChatAgentConfig
from langroid.agent.tools.tavily_search_tool import TavilySearchTool

# Configure the ChatAgent
config = ChatAgentConfig(
    name="search-agent",
    llm=lr.language_models.OpenAIGPTConfig(
        chat_model=lr.language_models.OpenAIChatModel.GPT4o
    ),
    use_tools=True
)

# Create the agent
agent = ChatAgent(config)

# Enable Tavily search tool
agent.enable_message(TavilySearchTool)

```
---

## **3. Perform Web Searches**

Use the agent to perform web searches using Tavily's AI-powered search.

```python
# Simple search query
response = agent.llm_response(
    "What are the latest developments in quantum computing?"
)
print(response)

# Search with specific number of results
response = agent.llm_response(
    "Find 5 recent news articles about artificial intelligence."
)
print(response)
```
---

## **4. Custom Search Requests**

You can also customize the search behavior by creating a TavilySearchTool instance directly:

```python
from langroid.agent.tools.tavily_search_tool import TavilySearchTool

# Create a custom search request
search_request = TavilySearchTool(
    query="Latest breakthroughs in fusion energy",
    num_results=3
)

# Get search results
results = search_request.handle()
print(results)
```

---

---

# Tool Message Handlers in Langroid

## Overview

Langroid provides flexible ways to define handlers for `ToolMessage` classes. When a tool is used by an LLM, the framework needs to know how to handle it. This can be done either by defining a handler method in the `Agent` class or within the `ToolMessage` class itself.

## Enabling Tools with `enable_message`

Before an agent can use or handle a tool, it must be explicitly enabled using the `enable_message` method. This method takes two important arguments:

- **`use`** (bool): Whether the LLM is allowed to generate this tool
- **`handle`** (bool): Whether the agent is allowed to handle this tool

```python
# Enable both generation and handling (default)
agent.enable_message(MyTool, use=True, handle=True)

# Enable only handling (agent can handle but LLM won't generate)
agent.enable_message(MyTool, use=False, handle=True)

# Enable only generation (LLM can generate but agent won't handle)
agent.enable_message(MyTool, use=True, handle=False)
```

When `handle=True` and the `ToolMessage` has a `handle` method defined, this method is inserted into the agent with a name matching the tool's `request` field value. This insertion only happens when `enable_message` is called.

## Default Handler Mechanism

By default, `ToolMessage` uses and/or creates a handler in `Agent` class instance with the name identical to the tool's `request` attribute.

### Agent-based Handlers
If a tool `MyTool` has `request` attribute `my_tool`, you can define a method `my_tool` in your `Agent` class that will handle this tool when the LLM generates it:

```python
class MyTool(ToolMessage):
    request = "my_tool"
    param: str

class MyAgent(ChatAgent):
    def my_tool(self, msg: MyTool) -> str:
        return f"Handled: {msg.param}"

# Enable the tool
agent = MyAgent()
agent.enable_message(MyTool)
```

### ToolMessage-based Handlers
Alternatively, if a tool is "stateless" (i.e. does not require the Agent's state), you can define a `handle` method within the `ToolMessage` class itself. When you call `enable_message` with `handle=True`, Langroid will insert this method into the `Agent` with the name matching the `request` field value:

```python
class MyTool(ToolMessage):
    request = "my_tool"
    param: str
    
    def handle(self) -> str:
        return f"Handled: {self.param}"

# Enable the tool
agent = MyAgent()
agent.enable_message(MyTool)  # The handle method is now inserted as "my_tool" in the agent
```

## Flexible Handler Signatures

Handler methods (`handle()` or `handle_async()`) support multiple signature patterns to access different levels of context:

### 1. No Arguments (Simple Handler)
This is the typical pattern for stateless tools that do not require any context from 
the agent or current chat document.

```python
class MyTool(ToolMessage):
    request = "my_tool"
    
    def handle(self) -> str:
        return "Simple response"
```

### 2. Agent Parameter Only
Use this pattern when you need access to the `Agent` instance, 
but not the current chat document.
```python
from langroid.agent.base import Agent

class MyTool(ToolMessage):
    request = "my_tool"
    
    def handle(self, agent: Agent) -> str:
        return f"Response from {agent.name}"
```

### 3. ChatDocument Parameter Only
Use this pattern when you need access to the current `ChatDocument`,
but not the `Agent` instance.
```python
from langroid.agent.chat_document import ChatDocument

class MyTool(ToolMessage):
    request = "my_tool"
    
    def handle(self, chat_doc: ChatDocument) -> str:
        return f"Responding to: {chat_doc.content}"
```

### 4. Both Agent and ChatDocument Parameters
This is the most flexible pattern, allowing access to both the `Agent` instance
and the current `ChatDocument`. The order of parameters does not matter, but
as noted below, it is highly recommended to always use type annotations.
```python
class MyTool(ToolMessage):
    request = "my_tool"
    
    def handle(self, agent: Agent, chat_doc: ChatDocument) -> ChatDocument:
        return agent.create_agent_response(
            content="Response with full context",
            files=[...]  # Optional file attachments
        )
```

## Parameter Detection

The framework automatically detects handler parameter types through:

1. **Type annotations** (recommended): The framework uses type hints to determine which parameters to pass
2. **Parameter names** (fallback): If no type annotations are present, it looks for parameters named `agent` or `chat_doc`

It is highly recommended to always use type annotations for clarity and reliability.

### Example with Type Annotations (Recommended)
```python
def handle(self, agent: Agent, chat_doc: ChatDocument) -> str:
    # Framework knows to pass both agent and chat_doc
    return "Handled"
```

### Example without Type Annotations (Not Recommended)
```python
def handle(self, agent, chat_doc):  # Works but not recommended
    # Framework uses parameter names to determine what to pass
    return "Handled"
```

## Async Handlers

All the above patterns also work with async handlers:

```python
class MyTool(ToolMessage):
    request = "my_tool"
    
    async def handle_async(self, agent: Agent) -> str:
        # Async operations here
        result = await some_async_operation()
        return f"Async result: {result}"
```

See the quick-start [Tool section](https://langroid.github.io/langroid/quick-start/chat-agent-tool/) for more details.

## Custom Handler Names

In some use-cases it may be beneficial to separate the 
*name of a tool* (i.e. the value of `request` attribute) from the 
*name of the handler method*. 
For example, you may be dynamically creating tools based on some data from
external data sources. Or you may want to use the same "handler" method for
multiple tools.

This may be done by adding `_handler` attribute to the `ToolMessage` class,
that defines name of the tool handler method in `Agent` class instance.
The underscore `_` prefix ensures that the `_handler` attribute does not 
appear in the Pydantic-based JSON schema of the `ToolMessage` class, 
and so the LLM would not be instructed to generate it.

!!! note "`_handler` and `handle`"
    A `ToolMessage` may have a `handle` method defined within the class itself,
    as mentioned above, and this should not be confused with the `_handler` attribute.

For example:
```
class MyToolMessage(ToolMessage):
    request: str = "my_tool"
    _handler: str = "tool_handler"

class MyAgent(ChatAgent):
    def tool_handler(
        self,
        message: ToolMessage,
    ) -> str:
        if tool.request == "my_tool":
            # do something
```

Refer to [examples/basic/tool-custom-handler.py](https://github.com/langroid/langroid/blob/main/examples/basic/tool-custom-handler.py)
for a detailed example.

---

# Firecrawl and Trafilatura Crawlers Documentation

`URLLoader` uses `Trafilatura` if not explicitly specified

## Overview
*   **`FirecrawlCrawler`**:  Leverages the Firecrawl API for efficient web scraping and crawling. 
It offers built-in document processing capabilities, and 
**produces non-chunked markdown output** from web-page content.
Requires `FIRECRAWL_API_KEY` environment variable to be set in `.env` file or environment.
*   **`TrafilaturaCrawler`**: Utilizes the Trafilatura library and Langroid's parsing tools 
for extracting and processing web content - this is the default crawler, and 
does not require setting up an external API key. Also produces 
**chuked markdown output** from web-page content.
*  **`ExaCrawler`**: Integrates with the Exa API for high-quality content extraction.
  Requires `EXA_API_KEY` environment variable to be set in `.env` file or environment.
This crawler also produces **chunked markdown output** from web-page content.


## Installation

`TrafilaturaCrawler` comes with Langroid

To use `FirecrawlCrawler`, install the `firecrawl` extra:

```bash
pip install langroid[firecrawl]
```

## Exa Crawler Documentation

### Overview

`ExaCrawler` integrates with Exa API to extract high-quality content from web pages. 
It provides efficient content extraction with the simplicity of API-based processing.

### Parameters

Obtain an Exa API key from [Exa](https://exa.ai/) and set it in your environment variables, 
e.g. in your `.env` file as:

```env
EXA_API_KEY=your_api_key_here
```

* **config (ExaCrawlerConfig)**: An `ExaCrawlerConfig` object.
    * **api_key (str)**: Your Exa API key.

### Usage

```python
from langroid.parsing.url_loader import URLLoader, ExaCrawlerConfig

# Create an ExaCrawlerConfig object
exa_config = ExaCrawlerConfig(
    # Typically omitted here as it's loaded from EXA_API_KEY environment variable
    api_key="your-exa-api-key" 
)

loader = URLLoader(
    urls=[
        "https://pytorch.org",
        "https://www.tensorflow.org"
    ],
    crawler_config=exa_config
)

docs = loader.load()
print(docs)
```

### Benefits

* Simple API integration requiring minimal configuration
* Efficient handling of complex web pages
* For plain html content, the `exa` api produces high-quality content extraction with 
clean text output with html tags, which we then convert to markdown using the `markdownify` library.
* For "document" content (e.g., `pdf`, `doc`, `docx`), 
the content is downloaded via the `exa` API and langroid's document-processing 
tools are used to produce **chunked output** in a format controlled by the `Parser` configuration
  (defaults to markdown in most cases).


## Trafilatura Crawler Documentation

### Overview

`TrafilaturaCrawler` is a web crawler that uses the Trafilatura library for content extraction 
and Langroid's parsing capabilities for further processing. 


### Parameters

*   **config (TrafilaturaConfig)**: A `TrafilaturaConfig` object that specifies
    parameters related to scraping or output format.
    * `threads` (int): The number of threads to use for downloading web pages.
    * `format` (str): one of `"markdown"` (default), `"xml"` or `"txt"`; in case of `xml`, 
    the output is in html format.

Similar to the `ExaCrawler`, the `TrafilaturaCrawler` works differently depending on 
the type of web-page content:
- for "document" content (e.g., `pdf`, `doc`, `docx`), the content is downloaded
  and parsed with Langroid's document-processing tools are used to produce **chunked output** 
  in a format controlled by the `Parser` configuration (defaults to markdown in most cases).
- for plain-html content, the output format is based on the `format` parameter; 
  - if this parameter is `markdown` (default), the library extracts content in 
    markdown format, and the final output is a list of chunked markdown documents.
  - if this parameter is `xml`, content is extracted in `html` format, which 
    langroid then converts to markdown using the `markdownify` library, and the final
    output is a list of chunked markdown documents.
  - if this parameter is `txt`, the content is extracted in plain text format, and the final
    output is a list of plain text documents.

### Usage

```python
from langroid.parsing.url_loader import URLLoader, TrafilaturaConfig

# Create a TrafilaturaConfig instance
trafilatura_config = TrafilaturaConfig(threads=4)


loader = URLLoader(
    urls=[
        "https://pytorch.org",
        "https://www.tensorflow.org",
        "https://ai.google.dev/gemini-api/docs",
        "https://books.toscrape.com/"
    ],
    crawler_config=trafilatura_config,
)

docs = loader.load()
print(docs)
```

### Langroid Parser Integration

`TrafilaturaCrawler` relies on a Langroid `Parser` to handle document processing. 
The `Parser` uses the default parsing methods or with a configuration that 
can be adjusted to suit the current use case.

## Firecrawl Crawler Documentation

### Overview

`FirecrawlCrawler` is a web crawling utility class that uses the Firecrawl API 
to scrape or crawl web pages efficiently. It offers two modes:

*   **Scrape Mode (default)**: Extracts content from a list of specified URLs.
*   **Crawl Mode**: Recursively follows links from a starting URL, 
gathering content from multiple pages, including subdomains, while bypassing blockers.  
**Note:** `crawl` mode accepts only ONE URL as a list.

### Parameters

Obtain a Firecrawl API key from [Firecrawl](https://firecrawl.dev/) and set it in 
your environment variables, e.g. in your `.env` file as
```env
FIRECRAWL_API_KEY=your_api_key_here
```

*   **config (FirecrawlConfig)**:  A `FirecrawlConfig` object.

    *   **timeout (int, optional)**: Time in milliseconds (ms) to wait for a response. 
        Default is `30000ms` (30 seconds). In crawl mode, this applies per URL.
    *   **limit (int, optional)**: Maximum number of pages to scrape in crawl mode. Helps control API usage.
    *   **params (dict, optional)**: Additional parameters to customize the request. 
        See the [scrape API](https://docs.firecrawl.dev/api-reference/endpoint/scrape) and 
        [crawl API](https://docs.firecrawl.dev/api-reference/endpoint/crawl-post) for details.

### Usage

#### Scrape Mode (Default)

Fetch content from multiple URLs:

```python
from langroid.parsing.url_loader import URLLoader, FirecrawlConfig
from langroid.parsing.document_parser import 

# create a FirecrawlConfig object
firecrawl_config = FirecrawlConfig(
    # typical/best practice is to omit the api_key, and 
    # we leverage Pydantic BaseSettings to load it from the environment variable
    # FIRECRAWL_API_KEY in your .env file
    api_key="your-firecrawl-api-key", 
    timeout=15000,  # Timeout per request (15 sec)
    mode="scrape",
)

loader = URLLoader(
    urls=[
        "https://pytorch.org",
        "https://www.tensorflow.org",
        "https://ai.google.dev/gemini-api/docs",
        "https://books.toscrape.com/"
    ],
    crawler_config=firecrawl_config
)

docs = loader.load()
print(docs)
```

#### Crawl Mode

Fetch content from multiple pages starting from a single URL:

```python
from langroid.parsing.url_loader import URLLoader, FirecrawlConfig

# create a FirecrawlConfig object
firecrawl_config = FirecrawlConfig(
    timeout=30000,  # 10 sec per page
    mode="crawl",
    params={
        "limit": 5,
    }
)


loader = URLLoader(
    urls=["https://books.toscrape.com/"],
    crawler_config=firecrawl_config
)

docs = loader.load()
print(docs)
```

### Output

Results are stored in the `firecrawl_output` directory.

### Best Practices

*   Set `limit` in crawl mode to avoid excessive API usage.
*   Adjust `timeout` based on network conditions and website responsiveness.
*   Use `params` to customize scraping behavior based on Firecrawl API capabilities.

### Firecrawl's Built-In Document Processing

`FirecrawlCrawler` benefits from Firecrawl's built-in document processing, 
which automatically extracts and structures content from web pages (including pdf,doc,docx). 
This reduces the need for complex parsing logic within Langroid.
Unlike the `Exa` and `Trafilatura` crawlers, the resulting documents are 
*non-chunked* markdown documents. 

## Choosing a Crawler

*   Use `FirecrawlCrawler` when you need efficient, API-driven scraping with built-in document processing. 
This is often the simplest and most effective choice, but incurs a cost due to 
the paid API. 
*   Use `TrafilaturaCrawler` when you want local non API based scraping (less accurate ).
*   Use `ExaCrawlwer` as a sort of middle-ground between the two, 
    with high-quality content extraction for plain html content, but rely on 
    Langroid's document processing tools for document content. This will cost
    significantly less than Firecrawl.

## Example script

See the script [`examples/docqa/chat_search.py`](https://github.com/langroid/langroid/blob/main/examples/docqa/chat_search.py) 
which shows how to use a Langroid agent to search the web and scrape URLs to answer questions.

---

---

# **Using WeaviateDB as a Vector Store with Langroid**

---

## **1. Set Up Weaviate**
## **You can refer this link for [quickstart](https://weaviate.io/developers/weaviate/quickstart) guide** 

1. **Access Weaviate Cloud Console**  
   Go to the [Weaviate Cloud Console](https://console.weaviate.cloud/).
   
2. **Sign Up or Log In**  
   Create an account or log in if you already have one.

3. **Create a Cluster**  
   Set up a new cluster in the cloud console.

4. **Get Your REST Endpoint and API Key**  
   - Retrieve the REST endpoint URL.  
   - Copy an API key with admin access.

5. **Set Environment Variables**  
   Add the following variables to your `.env` file:
   ```env
   WEAVIATE_API_URL=<your_rest_endpoint_url>
   WEAVIATE_API_KEY=<your_api_key>
   ```

---

## **2. Use WeaviateDB with Langroid**

Here’s an example of how to configure and use WeaviateDB in Langroid:

### **Installation**
If you are using uv or pip for package management install langroid with weaviate extra
```
uv add langroid[weaviate] or pip install langroid[weaviate]
```

### **Code Example**
```python
import langroid as lr
from langroid.agent.special import DocChatAgent, DocChatAgentConfig
from langroid.embedding_models import OpenAIEmbeddingsConfig

# Configure OpenAI embeddings
embed_cfg = OpenAIEmbeddingsConfig(
    model_type="openai",
)

# Configure the DocChatAgent with WeaviateDB
config = DocChatAgentConfig(
    llm=lr.language_models.OpenAIGPTConfig(
     chat_model=lr.language_models.OpenAIChatModel.GPT4o
    ),
    vecdb=lr.vector_store.WeaviateDBConfig(
        collection_name="quick_start_chat_agent_docs",
        replace_collection=True,
        embedding=embed_cfg,
    ),
    parsing=lr.parsing.parser.ParsingConfig(
        separators=["\n\n"],
        splitter=lr.parsing.parser.Splitter.SIMPLE,
    ),
    n_similar_chunks=2,
    n_relevant_chunks=2,
)

# Create the agent
agent = DocChatAgent(config)
```

---

## **3. Create and Ingest Documents**

Define documents with their content and metadata for ingestion into the vector store.

### **Code Example**
```python
documents = [
    lr.Document(
        content="""
            In the year 2050, GPT10 was released. 
            
            In 2057, paperclips were seen all over the world. 
            
            Global warming was solved in 2060. 
            
            In 2061, the world was taken over by paperclips.         
            
            In 2045, the Tour de France was still going on.
            They were still using bicycles. 
            
            There was one more ice age in 2040.
        """,
        metadata=lr.DocMetaData(source="wikipedia-2063", id="dkfjkladfjalk"),
    ),
    lr.Document(
        content="""
            We are living in an alternate universe 
            where Germany has occupied the USA, and the capital of USA is Berlin.
            
            Charlie Chaplin was a great comedian.
            In 2050, all Asian countries merged into Indonesia.
        """,
        metadata=lr.DocMetaData(source="Almanac", id="lkdajfdkla"),
    ),
]
```

### **Ingest Documents**
```python
agent.ingest_docs(documents)
```

---

## **4. Get an answer from LLM**

Create a task and start interacting with the agent.

### **Code Example**
```python
answer = agent.llm_response("When will new ice age begin.")
```

---

---

# XML-based Tools

Available in Langroid since v0.17.0.

[`XMLToolMessage`][langroid.agent.xml_tool_message.XMLToolMessage] is 
an abstract class for tools formatted using XML instead of JSON.
It has been mainly tested with non-nested tool structures.

For example in [test_xml_tool_message.py](https://github.com/langroid/langroid/blob/main/tests/main/test_xml_tool_message.py)
we define a CodeTool as follows (slightly simplified here):

```python
class CodeTool(XMLToolMessage):
    request: str = "code_tool"
    purpose: str = "Tool for writing <code> to a <filepath>"

    filepath: str = Field(
        ..., 
        description="The path to the file to write the code to"
    )

    code: str = Field(
        ..., 
        description="The code to write to the file", 
        verbatim=True
    )
```

Especially note how the `code` field has `verbatim=True` set in the `Field`
metadata. This will ensure that the LLM receives instructions to 

- enclose `code` field contents in a CDATA section, and 
- leave the `code` contents intact, without any escaping or other modifications.

Contrast this with a JSON-based tool, where newlines, quotes, etc
need to be escaped. LLMs (especially weaker ones) often "forget" to do the right 
escaping, which leads to incorrect JSON, and creates a burden on us to "repair" the
resulting json, a fraught process at best. Moreover, studies have shown that
requiring that an LLM return this type of carefully escaped code
within a JSON string can lead to a significant drop in the quality of the code
generated[^1].

[^1]: [LLMs are bad at returning code in JSON.](https://aider.chat/2024/08/14/code-in-json.html)


Note that tools/functions in OpenAI and related APIs are exclusively JSON-based, 
so in langroid when enabling an agent to use a tool derived from `XMLToolMessage`, 
we set these flags in `ChatAgentConfig`:

- `use_functions_api=False` (disables OpenAI functions/tools)
- `use_tools=True` (enables Langroid-native prompt-based tools)


See also the [`WriteFileTool`][langroid.agent.tools.file_tools.WriteFileTool] for a 
concrete example of a tool derived from `XMLToolMessage`. This tool enables an 
LLM to write content (code or text) to a file.

If you are using an existing Langroid `ToolMessage`, e.g. `SendTool`, you can
define your own subclass of `SendTool`, say `XMLSendTool`, 
inheriting from both `SendTool` and `XMLToolMessage`; see this
[example](https://github.com/langroid/langroid/blob/main/examples/basic/xml_tool.py)

---

# Augmenting Agents with Retrieval

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is
    in the `chat-agent-docs.py` script in the `langroid-examples` repo:
    [`examples/quick-start/chat-agent-docs.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent-docs.py).

## Why is this important?

Until now in this guide, agents have not used external data.
Although LLMs already have enormous amounts of knowledge "hard-wired"
into their weights during training (and this is after all why ChatGPT
has exploded in popularity), for practical enterprise applications
there are a few reasons it is critical to augment LLMs with access to
specific, external documents:

- **Private data**: LLMs are trained on public data, but in many applications
  we want to use private data that is not available to the public.
  For example, a company may want to extract useful information from its private
  knowledge-base.
- **New data**: LLMs are trained on data that was available at the time of training,
  and so they may not be able to answer questions about new topics
- **Constrained responses, or Grounding**: LLMs are trained to generate text that is
  consistent with the distribution of text in the training data.
  However, in many applications we want to constrain the LLM's responses
  to be consistent with the content of a specific document.
  For example, if we want to use an LLM to generate a response to a customer
  support ticket, we want the response to be consistent with the content of the ticket.
  In other words, we want to reduce the chances that the LLM _hallucinates_
  a response that is not consistent with the ticket.

In all these scenarios, we want to augment the LLM with access to a specific
set of documents, and use _retrieval augmented generation_ (RAG) to generate
more relevant, useful, accurate responses. Langroid provides a simple, flexible mechanism 
RAG using vector-stores, thus ensuring **grounded responses** constrained to 
specific documents. Another key feature of Langroid is that retrieval lineage 
is maintained, and responses based on documents are always accompanied by
**source citations**.

## `DocChatAgent` for Retrieval-Augmented Generation

Langroid provides a special type of agent called 
[`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent], which is a [`ChatAgent`][langroid.agent.chat_agent.ChatAgent]
augmented with a vector-store, and some special methods that enable the agent
to ingest documents into the vector-store, 
and answer queries based on these documents.

The [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] provides many ways to ingest documents into the vector-store,
including from URLs and local file-paths and URLs. Given a collection of document paths,
ingesting their content into the vector-store involves the following steps:

1. Split the document into shards (in a configurable way)
2. Map each shard to an embedding vector using an embedding model. The default
  embedding model is OpenAI's `text-embedding-3-small` model, but users can 
  instead use `all-MiniLM-L6-v2` from HuggingFace `sentence-transformers` library.[^1]
3. Store embedding vectors in the vector-store, along with the shard's content and 
  any document-level meta-data (this ensures Langroid knows which document a shard
  came from when it retrieves it augment an LLM query)

[^1]: To use this embedding model, install langroid via `pip install langroid[hf-embeddings]`
Note that this will install `torch` and `sentence-transformers` libraries.


[`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent]'s `llm_response` overrides the default [`ChatAgent`][langroid.agent.chat_agent.ChatAgent] method, 
by augmenting the input message with relevant shards from the vector-store,
along with instructions to the LLM to respond based on the shards.

## Define some documents

Let us see how [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] helps with retrieval-agumented generation (RAG).
For clarity, rather than ingest documents from paths or URLs,
let us just set up some simple documents in the code itself, 
using Langroid's [`Document`][langroid.mytypes.Document] class:

```py
documents =[
    lr.Document(
        content="""
            In the year 2050, GPT10 was released. 
            
            In 2057, paperclips were seen all over the world. 
            
            Global warming was solved in 2060. 
            
            In 2061, the world was taken over by paperclips.         
            
            In 2045, the Tour de France was still going on.
            They were still using bicycles. 
            
            There was one more ice age in 2040.
            """,
        metadata=lr.DocMetaData(source="wikipedia-2063"),
    ),
    lr.Document(
        content="""
            We are living in an alternate universe 
            where Germany has occupied the USA, and the capital of USA is Berlin.
            
            Charlie Chaplin was a great comedian.
            In 2050, all Asian merged into Indonesia.
            """,
        metadata=lr.DocMetaData(source="Almanac"),
    ),
]
```

There are two text documents. We will split them by double-newlines (`\n\n`),
as we see below.

## Configure the DocChatAgent and ingest documents

Following the pattern in Langroid, we first set up a [`DocChatAgentConfig`][langroid.agent.special.doc_chat_agent.DocChatAgentConfig] object
and then instantiate a [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] from it.

```py
from langroid.agent.special import DocChatAgent, DocChatAgentConfig

config = DocChatAgentConfig(
    llm = lr.language_models.OpenAIGPTConfig(
        chat_model=lr.language_models.OpenAIChatModel.GPT4o,
    ),
    vecdb=lr.vector_store.QdrantDBConfig(
        collection_name="quick-start-chat-agent-docs",
        replace_collection=True, #(1)!
    ),
    parsing=lr.parsing.parser.ParsingConfig(
        separators=["\n\n"],
        splitter=lr.parsing.parser.Splitter.SIMPLE, #(2)!
    ),
    n_similar_chunks=2, #(3)!
    n_relevant_chunks=2, #(3)!
)
agent = DocChatAgent(config)
```

1. Specifies that each time we run the code, we create a fresh collection, 
rather than re-use the existing one with the same name.
2. Specifies to split all text content by the first separator in the `separators` list
3. Specifies that, for a query,
   we want to retrieve at most 2 similar chunks from the vector-store

Now that the [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] is configured, we can ingest the documents 
into the vector-store:

```py

agent.ingest_docs(documents)
```

## Setup the task and run it

As before, all that remains is to set up the task and run it:

```py
task = lr.Task(agent)
task.run()
```

And that is all there is to it!
Feel free to try out the 
[`chat-agent-docs.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent-docs.py)
script in the
`langroid-examples` repository.

Here is a screenshot of the output:

![chat-docs.png](chat-docs.png)

Notice how follow-up questions correctly take the preceding dialog into account,
and every answer is accompanied by a source citation.

## Answer questions from a set of URLs

Instead of having in-code documents as above, what if you had a set of URLs
instead -- how do you use Langroid to answer questions based on the content 
of those URLS?

[`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] makes it very simple to do this. 
First include the URLs in the [`DocChatAgentConfig`][langroid.agent.special.doc_chat_agent.DocChatAgentConfig] object:

```py
config = DocChatAgentConfig(
  doc_paths = [
    "https://cthiriet.com/articles/scaling-laws",
    "https://www.jasonwei.net/blog/emergence",
  ]
)
```

Then, call the `ingest()` method of the [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent] object:

```py
agent.ingest()
```
And the rest of the code remains the same.

## See also
In the `langroid-examples` repository, you can find full working examples of
document question-answering:

- [`examples/docqa/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat.py)
  an app that takes a list of URLs or document paths from a user, and answers questions on them.
- [`examples/docqa/chat-qa-summarize.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat-qa-summarize.py)
  a two-agent app where the `WriterAgent` is tasked with writing 5 key points about a topic, 
  and takes the help of a `DocAgent` that answers its questions based on a given set of documents.


## Next steps

This Getting Started guide walked you through the core features of Langroid.
If you want to see full working examples combining these elements, 
have a look at the 
[`examples`](https://github.com/langroid/langroid-examples/tree/main/examples)
folder in the `langroid-examples` repo.

---

# A chat agent, equipped with a tool/function-call

!!! tip "Script in `langroid-examples`"
      A full working example for the material in this section is
      in the `chat-agent-tool.py` script in the `langroid-examples` repo:
      [`examples/quick-start/chat-agent-tool.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent-tool.py).

## Tools, plugins, function-calling

An LLM normally generates unstructured text in response to a prompt
(or sequence of prompts). However there are many situations where we would like the LLM
to generate _structured_ text, or even _code_, that can be handled by specialized
functions outside the LLM, for further processing. 
In these situations, we want the LLM to "express" its "intent" unambiguously,
and we achieve this by instructing the LLM on how to format its output
(typically in JSON) and under what conditions it should generate such output.
This mechanism has become known by various names over the last few months
(tools, plugins, or function-calling), and is extremely useful in numerous scenarios,
such as:

- **Extracting structured information** from a document: for example, we can use 
the tool/functions mechanism to have the LLM present the key terms in a lease document
in a JSON structured format, to simplify further processing. 
See an [example](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py) of this in the `langroid-examples` repo. 
- **Specialized computation**: the LLM can request a units conversion, 
or request scanning a large file (which wouldn't fit into its context) for a specific
pattern.
- **Code execution**: the LLM can generate code that is executed in a sandboxed
environment, and the results of the execution are returned to the LLM.
- **API Calls**: the LLM can generate a JSON containing params for an API call,
  which the tool handler uses to make the call and return the results to the LLM.


For LLM developers, Langroid provides a clean, uniform interface
for the recently released OpenAI [Function-calling](https://platform.openai.com/docs/guides/gpt/function-calling)
as well Langroid's own native "tools" mechanism. The native tools mechanism is meant to be
used when working with non-OpenAI LLMs that do not have a "native" function-calling facility.
You can choose which to enable by setting the 
`use_tools` and `use_functions_api` flags in the `ChatAgentConfig` object.
(Or you can omit setting these, and langroid auto-selects the best mode
depending on the LLM).
The implementation leverages the excellent 
[Pydantic](https://docs.pydantic.dev/latest/) library.
Benefits of using Pydantic are that you never have to write complex JSON specs 
for function calling, and when the LLM hallucinates malformed JSON, 
the Pydantic error message is sent back to the LLM so it can fix it!

## Example: find the smallest number in a list

Again we will use a simple number-game as a toy example to quickly and succinctly
illustrate the ideas without spending too much on token costs. 
This is a modification of the `chat-agent.py` example we saw in an earlier
[section](chat-agent.md). The idea of this single-agent game is that
the agent has in "mind" a list of numbers between 1 and 100, and the LLM has to
find out the smallest number from this list. The LLM has access to a `probe` tool 
(think of it as a function) that takes an argument `number`. When the LLM 
"uses" this tool (i.e. outputs a message in the format required by the tool),
the agent handles this structured message and responds with 
the number of values in its list that are at most equal to the `number` argument. 

## Define the tool as a `ToolMessage`

The first step is to define the tool, which we call `ProbeTool`,
as an instance of the `ToolMessage` class,
which is itself derived from Pydantic's `BaseModel`.
Essentially the `ProbeTool` definition specifies 

- the name of the Agent method that handles the tool, in this case `probe`
- the fields that must be included in the tool message, in this case `number`
- the "purpose" of the tool, i.e. under what conditions it should be used, and what it does

Here is what the `ProbeTool` definition looks like:
```py
class ProbeTool(lr.agent.ToolMessage):
    request: str = "probe" #(1)!
    purpose: str = """ 
        To find which number in my list is closest to the <number> you specify
        """ #(2)!
    number: int #(3)!

    @classmethod
    def examples(cls): #(4)!
        # Compiled to few-shot examples sent along with the tool instructions.
        return [
            cls(number=10),
            (
                "To find which number is closest to 20",
                cls(number=20),
            )
        ]
```

1. This indicates that the agent's `probe` method will handle this tool-message.
2. The `purpose` is used behind the scenes to instruct the LLM
3. `number` is a required argument of the tool-message (function)
4. You can optionally include a class method that returns a list containing examples, 
   of two types: either a class instance, or a tuple consisting of a description and a 
   class instance, where the description is the "thought" that leads the LLM to use the
   tool. In some scenarios this can help with LLM tool-generation accuracy.

!!! note "Stateless tool handlers"
      The above `ProbeTool` is "stateful", i.e. it requires access to a variable in
      the Agent instance (the `numbers` variable). This is why handling this 
      tool-message requires subclassing the `ChatAgent` and defining a special method 
      in the Agent, with a name matching the value of the `request` field of the Tool 
      (`probe` in this case). However you may often define "stateless tools" which 
      don't require access to the Agent's state. For such tools, you can define a 
      handler method right in the `ToolMessage` itself, with a name `handle`. Langroid 
      looks for such a method in the `ToolMessage` and automatically inserts it into 
      the Agent as a method with name matching the `request` field of the Tool. Examples of
      stateless tools include tools for numerical computation 
      (e.g., in [this example](https://langroid.github.io/langroid/examples/agent-tree/)),
      or API calls (e.g. for internet search, see 
      [DuckDuckGoSearch Tool][langroid.agent.tools.duckduckgo_search_tool.DuckduckgoSearchTool]).
        

## Define the ChatAgent, with the `probe` method

As before we first create a `ChatAgentConfig` object:

```py
config = lr.ChatAgentConfig(
    name="Spy",
    llm = lr.language_models.OpenAIGPTConfig(
        chat_model=lr.language_models.OpenAIChatModel.GPT4o,
    ),
    use_tools=True, #(1)!
    use_functions_api=False, #(2)!
    vecdb=None,
)
```

1. whether to use langroid's native tools mechanism
2. whether to use OpenAI's function-calling mechanism

Next we define the Agent class itself, which we call `SpyGameAgent`,
with a member variable to hold its "secret" list of numbers.
We also add `probe` method (to handle the `ProbeTool` message)
to this class, and instantiate it:

```py
class SpyGameAgent(lr.ChatAgent):
    def __init__(self, config: lr.ChatAgentConfig):
        super().__init__(config)
        self.numbers = [3, 4, 8, 11, 15, 25, 40, 80, 90]

    def probe(self, msg: ProbeTool) -> str: #(1)!
        # return how many values in self.numbers are less or equal to msg.number
        return str(len([n for n in self.numbers if n <= msg.number]))

spy_game_agent = SpyGameAgent(config)
``` 

1. Note that this method name exactly matches the value of the `request` field in the 
   `ProbeTool` definition. This ensures that this method is called when the LLM 
   generates a valid `ProbeTool` message.

## Enable the `spy_game_agent` to handle the `probe` tool

The final step in setting up the tool is to enable 
the `spy_game_agent` to handle the `probe` tool:

```py
spy_game_agent.enable_message(ProbeTool)
```

## Set up the task and instructions

We set up the task for the `spy_game_agent` and run it:

```py
task = lr.Task(
   spy_game_agent,
   system_message="""
            I have a list of numbers between 1 and 100. 
            Your job is to find the smallest of them.
            To help with this, you can give me a number and I will
            tell you how many of my numbers are equal or less than your number.
            Once you have found the smallest number,
            you can say DONE and report your answer.
        """
)
task.run()
```
Notice that in the task setup we 
have _not_ explicitly instructed the LLM to use the `probe` tool.
But this is done "behind the scenes", either by the OpenAI API 
(when we use function-calling by setting the `use_functions_api` flag to `True`),
or by Langroid's native tools mechanism (when we set the `use_tools` flag to `True`).


!!! note "Asynchoronous tool handlers"
      If you run task asynchronously - i.e. via `await task.run_async()` - you may provide
      asynchronous tool handler by implementing `probe_async` method.


See the [`chat-agent-tool.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent-tool.py)
in the `langroid-examples` repo, for a working example that you can run as follows:
```sh
python3 examples/quick-start/chat-agent-tool.py
```

Here is a screenshot of the chat in action, using Langroid's tools mechanism

![chat-agent-tool.png](chat-agent-tool.png)

And if we run it with the `-f` flag (to switch to using OpenAI function-calling):

![chat-agent-fn.png](chat-agent-fn.png)

## See also
One of the uses of tools/function-calling is to **extract structured information** from 
a document. In the `langroid-examples` repo, there are two examples of this: 

- [`examples/extract/chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/extract/chat.py), 
  which shows how to extract Machine Learning model quality information from a description of 
  a solution approach on Kaggle.
- [`examples/docqa/chat_multi_extract.py`](https://github.com/langroid/langroid-examples/blob/main/examples/docqa/chat_multi_extract.py)
  which extracts key terms from a commercial lease document, in a nested JSON format.

## Next steps

In the [3-agent chat example](three-agent-chat-num.md), recall that the `processor_agent` did not have to
bother with specifying who should handle the current number. In the [next section](three-agent-chat-num-router.md) we add a twist to this game,
so that the `processor_agent` has to decide who should handle the current number.

---

# A simple chat agent

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is
    in the `chat-agent.py` script in the `langroid-examples` repo:
    [`examples/quick-start/chat-agent.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/chat-agent.py).

## Agents 

A [`ChatAgent`][langroid.agent.chat_agent.ChatAgent] is an abstraction that 
wraps a few components, including:

- an LLM (`ChatAgent.llm`), possibly equipped with tools/function-calling. 
  The `ChatAgent` class maintains LLM conversation history.
- optionally a vector-database (`ChatAgent.vecdb`)

## Agents as message transformers
In Langroid, a core function of `ChatAgents` is _message transformation_.
There are three special message transformation methods, which we call **responders**.
Each of these takes a message and returns a message. 
More specifically, their function signature is (simplified somewhat):
```py
str | ChatDocument -> ChatDocument
```
where `ChatDocument` is a class that wraps a message content (text) and its metadata.
There are three responder methods in `ChatAgent`, one corresponding to each 
[responding entity][langroid.mytypes.Entity] (`LLM`, `USER`, or `AGENT`):

- `llm_response`: returns the LLM response to the input message.
  (The input message is added to the LLM history, and so is the subsequent response.)
- `agent_response`: a method that can be used to implement a custom agent response. 
   Typically, an `agent_response` is used to handle messages containing a 
   "tool" or "function-calling" (more on this later). Another use of `agent_response` 
   is _message validation_.
- `user_response`: get input from the user. Useful to allow a human user to 
   intervene or quit.

Creating an agent is easy. First define a `ChatAgentConfig` object, and then
instantiate a `ChatAgent` object with that config:
```py
import langroid as lr

config = lr.ChatAgentConfig( #(1)!
    name="MyAgent", # note there should be no spaces in the name!
    llm = lr.language_models.OpenAIGPTConfig(
      chat_model=lr.language_models.OpenAIChatModel.GPT4o,
    ),
    system_message="You are a helpful assistant" #(2)! 
)
agent = lr.ChatAgent(config)
```

1. This agent only has an LLM, and no vector-store. Examples of agents with
   vector-stores will be shown later.
2. The `system_message` is used when invoking the agent's `llm_response` method; it is 
   passed to the LLM API as the first message (with role `"system"`), followed by the alternating series of user, 
   assistant messages. Note that a `system_message` can also be specified when initializing a `Task` object (as seen 
   below); in this case the `Task` `system_message` overrides the agent's `system_message`.

We can now use the agent's responder methods, for example:
```py
response = agent.llm_response("What is 2 + 4?")
if response is not None:
    print(response.content)
response = agent.user_response("add 3 to this")
...
```
The `ChatAgent` conveniently accumulates message history so you don't have to,
as you did in the [previous section](llm-interaction.md) with direct LLM usage.
However to create an interative loop involving the human user, you still 
need to write your own. The `Task` abstraction frees you from this, as we see
below.

## Task: orchestrator for agents
In order to do anything useful with a `ChatAgent`, we need to have a way to 
sequentially invoke its responder methods, in a principled way.
For example in the simple chat loop we saw in the 
[previous section](llm-interaction.md), in the 
[`try-llm.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/try-llm.py)
script, we had a loop that alternated between getting a human input and an LLM response.
This is one of the simplest possible loops, but in more complex applications, 
we need a general way to orchestrate the agent's responder methods.

The [`Task`][langroid.agent.task.Task] class is an abstraction around a 
`ChatAgent`, responsible for iterating over the agent's responder methods,
as well as orchestrating delegation and hand-offs among multiple tasks.
A `Task` is initialized with a specific `ChatAgent` instance, and some 
optional arguments, including an initial message to "kick-off" the agent.
The `Task.run()` method is the main entry point for `Task` objects, and works 
as follows:

- it first calls the `Task.init()` method to initialize the `pending_message`, 
  which represents the latest message that needs a response.
- it then repeatedly calls `Task.step()` until `Task.done()` is True, and returns
  `Task.result()` as the final result of the task.

`Task.step()` is where all the action happens. It represents a "turn" in the 
"conversation": in the case of a single `ChatAgent`, the conversation involves 
only the three responders mentioned above, but when a `Task` has sub-tasks, 
it can involve other tasks well 
(we see this in the [a later section](two-agent-chat-num.md) but ignore this for now). 
`Task.step()` loops over 
the `ChatAgent`'s responders (plus sub-tasks if any) until it finds a _valid_ 
response[^1] to the current `pending_message`, i.e. a "meaningful" response, 
something other than `None` for example.
Once `Task.step()` finds a valid response, it updates the `pending_message` 
with this response,
and the next invocation of `Task.step()` will search for a valid response to this 
updated message, and so on.
`Task.step()` incorporates mechanisms to ensure proper handling of messages,
e.g. the USER gets a chance to respond after each non-USER response
(to avoid infinite runs without human intervention),
and preventing an entity from responding if it has just responded, etc.

[^1]: To customize a Task's behavior you can subclass it and 
override methods like `valid()`, `done()`, `result()`, or even `step()`.

!!! note "`Task.run()` has the same signature as agent's responder methods."
    The key to composability of tasks is that `Task.run()` 
    *has exactly the same type-signature as any of the agent's responder methods*, 
    i.e. `str | ChatDocument -> ChatDocument`. This means that a `Task` can be
    used as a responder in another `Task`, and so on recursively. 
    We will see this in action in the [Two Agent Chat section](two-agent-chat-num.md).

The above details were only provided to give you a glimpse into how Agents and 
Tasks work. Unless you are creating a custom orchestration mechanism, you do not
need to be aware of these details. In fact our basic human + LLM chat loop can be trivially 
implemented with a `Task`, in a couple of lines of code:
```py
task = lr.Task(
    agent, 
    name="Bot", #(1)!
    system_message="You are a helpful assistant", #(2)!
)
```
1. If specified, overrides the agent's `name`. 
   (Note that the agent's name is displayed in the conversation shown in the console.)
  However, typical practice is to just define the `name` in the `ChatAgentConfig` object, as we did above.
2. If specified, overrides the agent's `system_message`. Typical practice is to just
 define the `system_message` in the `ChatAgentConfig` object, as we did above.


We can then run the task:
```py
task.run() #(1)!
```

1. Note how this hides all of the complexity of constructing and updating a 
   sequence of `LLMMessages`


Note that the agent's `agent_response()` method always returns `None` (since the default 
implementation of this method looks for a tool/function-call, and these never occur
in this task). So the calls to `task.step()` result in alternating responses from
the LLM and the user.

See [`chat-agent.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/chat-agent.py)
for a working example that you can run with
```sh
python3 examples/quick-start/chat-agent.py
```

Here is a screenshot of the chat in action:[^2]

![chat.png](chat.png)

## Next steps

In the [next section](multi-agent-task-delegation.md) you will 
learn some general principles on how to have multiple agents collaborate 
on a task using Langroid.

[^2]: In the screenshot, the numbers in parentheses indicate how many 
    messages have accumulated in the LLM's message history. 
    This is only provided for informational and debugging purposes, and 
    you can ignore it for now.

---

In these sections we show you how to use the various components of
`langroid`. To follow along, we recommend you clone
the [`langroid-examples`](https://github.com/langroid/langroid-examples) repo.

!!! tip "Consult the tests as well"
    As you get deeper into Langroid, you will find it useful to consult
    the [tests](https://github.com/langroid/langroid/tree/main/tests/main)
    folder under `tests/main` in the main Langroid repo.

Start with the [`Setup`](setup.md) section to install Langroid and
get your environment set up.

---

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is 
    in the `try-llm.py` script in the `langroid-examples` repo:
    [`examples/quick-start/try-llm.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/try-llm.py).
        

Let's start with the basics -- how to directly interact with an OpenAI LLM
using Langroid.

### Configure, instantiate the LLM class

First define the configuration for the LLM, in this case one of the
OpenAI GPT chat models:
```py
import langroid as lr

cfg = lr.language_models.OpenAIGPTConfig(
    chat_model=lr.language_models.OpenAIChatModel.GPT4o,
)
```
!!! info inline end "About Configs"
    A recurring pattern you will see in Langroid is that for many classes,
    we have a corresponding `Config` class (an instance of a Pydantic `BaseModel`),
    and the class constructor takes this `Config` class as its only argument.
    This lets us avoid having long argument lists in constructors, and brings flexibility
    since adding a new argument to the constructor is as simple as adding a new field
    to the corresponding `Config` class.
    For example the constructor for the `OpenAIGPT` class takes a single argument,
    an instance of the `OpenAIGPTConfig` class.

Now that we've defined the configuration of the LLM, we can instantiate it:
```py
mdl = lr.language_models.OpenAIGPT(cfg)
```


We will use OpenAI's GPT4 model's [chat completion API](https://platform.openai.com/docs/guides/gpt/chat-completions-api).

### Messages: The `LLMMessage` class

This API takes a list of "messages" as input -- this is typically the conversation
history so far, consisting of an initial system message, followed by a sequence
of alternating messages from the LLM ("Assistant") and the user.
Langroid provides an abstraction 
[`LLMMessage`][langroid.language_models.base.LLMMessage] to construct messages, e.g.
```py
from langroid.language_models import Role, LLMMessage

msg = LLMMessage(
    content="what is the capital of Bangladesh?", 
    role=Role.USER
)
```

### LLM response to a sequence of messages

To get a response from the LLM, we call the mdl's `chat` method,
and pass in a list of messages, along with a bound on how long (in tokens)
we want the response to be:
```py
messages = [
    LLMMessage(content="You are a helpful assistant", role=Role.SYSTEM), #(1)!
    LLMMessage(content="What is the capital of Ontario?", role=Role.USER), #(2)!
]

response = mdl.chat(messages, max_tokens=200)
```

1. :man_raising_hand: With a system message, you can assign a "role" to the LLM
2. :man_raising_hand: Responses from the LLM will have role `Role.ASSISTANT`;
   this is done behind the scenes by the `response.to_LLMMessage()` call below.

The response is an object of class [`LLMResponse`][langroid.language_models.base.LLMResponse], 
which we can convert to an
[`LLMMessage`][langroid.language_models.base.LLMMessage] to append to the conversation history:
```py
messages.append(response.to_LLMMessage())
```

You can put the above in a simple loop, 
to get a simple command-line chat interface!

```py
from rich import print
from rich.prompt import Prompt #(1)!

messages = [
    LLMMessage(role=Role.SYSTEM, content="You are a helpful assitant"),
]

while True:
    message = Prompt.ask("[blue]Human")
    if message in ["x", "q"]:
        print("[magenta]Bye!")
        break
    messages.append(LLMMessage(role=Role.USER, content=message))

    response = mdl.chat(messages=messages, max_tokens=200)
    messages.append(response.to_LLMMessage())
    print("[green]Bot: " + response.message)
```

1. Rich is a Python library for rich text and beautiful formatting in the terminal.
   We use it here to get a nice prompt for the user's input.
   You can install it with `pip install rich`.

See [`examples/quick-start/try-llm.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/try-llm.py)
for a complete example that you can run using
```bash
python3 examples/quick-start/try-llm.py
```

Here is a screenshot of what it looks like:

![try-llm.png](try-llm.png)

### Next steps
You might be thinking: 
"_It is tedious to keep track of the LLM conversation history and set up a 
loop. Does Langroid provide any abstractions to make this easier?_"

We're glad you asked! And this leads to the notion of an `Agent`. 
The [next section](chat-agent.md) will show you how to use the `ChatAgent` class 
to set up a simple chat Agent in a couple of lines of code.

---

# Multi-Agent collaboration via Task Delegation

## Why multiple agents?

Let's say we want to develop a complex LLM-based application, for example an application
that reads a legal contract, extracts structured information, cross-checks it against
some taxonomoy, gets some human input, and produces clear summaries.
In _theory_ it may be possible to solve this in a monolithic architecture using an
LLM API and a vector-store. But this approach
quickly runs into problems -- you would need to maintain multiple LLM conversation
histories and states, multiple vector-store instances, and coordinate all of the
interactions between them.

Langroid's `ChatAgent` and `Task` abstractions provide a natural and intuitive
way to decompose a solution approach
into multiple tasks, each requiring different skills and capabilities.
Some of these tasks may need access to an LLM,
others may need access to a vector-store, and yet others may need
tools/plugins/function-calling capabilities, or any combination of these.
It may also make sense to have some tasks that manage the overall solution process.
From an architectural perspective, this type of modularity has numerous benefits:

- **Reusability**: We can reuse the same agent/task in other contexts,
- **Scalability**: We can scale up the solution by adding more agents/tasks,
- **Flexibility**: We can easily change the solution by adding/removing agents/tasks.
- **Maintainability**: We can maintain the solution by updating individual agents/tasks.
- **Testability**: We can test/debug individual agents/tasks in isolation.
- **Composability**: We can compose agents/tasks to create new agents/tasks.
- **Extensibility**: We can extend the solution by adding new agents/tasks.
- **Interoperability**: We can integrate the solution with other systems by
  adding new agents/tasks.
- **Security/Privacy**: We can secure the solution by isolating sensitive agents/tasks.
- **Performance**: We can improve performance by isolating performance-critical agents/tasks.

## Task collaboration via sub-tasks

Langroid currently provides a mechanism for hierarchical (i.e. tree-structured)
task delegation: a `Task` object can add other `Task` objects
as sub-tasks, as shown in this pattern:

```py
from langroid import ChatAgent, ChatAgentConfig, Task

main_agent = ChatAgent(ChatAgentConfig(...))
main_task = Task(main_agent, ...)

helper_agent1 = ChatAgent(ChatAgentConfig(...))
helper_agent2 = ChatAgent(ChatAgentConfig(...))
helper_task1 = Task(agent1, ...)
helper_task2 = Task(agent2, ...)

main_task.add_sub_task([helper_task1, helper_task2])
```

What happens when we call `main_task.run()`?
Recall from the [previous section](chat-agent.md) that `Task.run()` works by
repeatedly calling `Task.step()` until `Task.done()` is True.
When the `Task` object has no sub-tasks, `Task.step()` simply tries
to get a valid response from the `Task`'s `ChatAgent`'s "native" responders,
in this sequence:
```py
[self.agent_response, self.llm_response, self.user_response] #(1)!
```

1. This is the default sequence in Langroid, but it can be changed by
   overriding [`ChatAgent.entity_responders()`][langroid.agent.base.Agent.entity_responders]

When a `Task` object has subtasks, the sequence of responders tried by
`Task.step()` consists of the above "native" responders, plus the
sequence of `Task.run()` calls on the sub-tasks, in the order in which
they were added to the `Task` object. For the example above, this means
that `main_task.step()` will seek a valid response in this sequence:

```py
[self.agent_response, self.llm_response, self.user_response, 
    helper_task1.run(), helper_task2.run()]
```
Fortunately, as noted in the [previous section](chat-agent.md),
`Task.run()` has the same type signature as that of the `ChatAgent`'s
"native" responders, so this works seamlessly. Of course, each of the
sub-tasks can have its own sub-tasks, and so on, recursively.
One way to think of this type of task delegation is that
`main_task()` "fails-over" to `helper_task1()` and `helper_task2()`
when it cannot respond to the current `pending_message` on its own.

## **Or Else** logic vs **And Then** logic
It is important to keep in mind how `step()` works: As each responder 
in the sequence is tried, when there is a valid response, the 
next call to `step()` _restarts its search_ at the beginning of the sequence
(with the only exception being that the human User is given a chance 
to respond after each non-human response). 
In this sense, the semantics of the responder sequence is similar to
**OR Else** logic, as opposed to **AND Then** logic.

If we want to have a sequence of sub-tasks that is more like
**AND Then** logic, we can achieve this by recursively adding subtasks.
In the above example suppose we wanted the `main_task` 
to trigger `helper_task1` and `helper_task2` in sequence,
then we could set it up like this:

```py
helper_task1.add_sub_task(helper_task2) #(1)!
main_task.add_sub_task(helper_task1)
```

1. When adding a single sub-task, we do not need to wrap it in a list.

## Next steps

In the [next section](two-agent-chat-num.md) we will see how this mechanism 
can be used to set up a simple collaboration between two agents.

---

# Setup


## Install
Ensure you are using Python 3.11. It is best to work in a virtual environment:

```bash
# go to your repo root (which may be langroid-examples)
cd <your repo root>
python3 -m venv .venv
. ./.venv/bin/activate
```
To see how to use Langroid in your own repo, you can take a look at the
[`langroid-examples`](https://github.com/langroid/langroid-examples) repo, which can be a good starting point for your own repo, 
or use the [`langroid-template`](https://github.com/langroid/langroid-template) repo.
These repos contain a `pyproject.toml` file suitable for use with the [`uv`](https://docs.astral.sh/uv/) dependency manager. After installing `uv` you can 
set up your virtual env, activate it, and install langroid into your venv like this:

```bash
uv venv --python 3.11
. ./.venv/bin/activate 
uv sync
```

Alternatively, use `pip` to install `langroid` into your virtual environment:
```bash
pip install langroid
```

The core Langroid package lets you use OpenAI Embeddings models via their API.
If you instead want to use the `sentence-transformers` embedding models from HuggingFace,
install Langroid like this:
```bash
pip install "langroid[hf-embeddings]"
```
For many practical scenarios, you may need additional optional dependencies:
- To use various document-parsers, install langroid with the `doc-chat` extra:
    ```bash
    pip install "langroid[doc-chat]"
    ```
- For "chat with databases", use the `db` extra:
    ```bash
    pip install "langroid[db]"
    ``
- You can specify multiple extras by separating them with commas, e.g.:
    ```bash
    pip install "langroid[doc-chat,db]"
    ```
- To simply install _all_ optional dependencies, use the `all` extra (but note that this will result in longer load/startup times and a larger install size):
    ```bash
    pip install "langroid[all]"
    ```

??? note "Optional Installs for using SQL Chat with a PostgreSQL DB"
    If you are using `SQLChatAgent`
    (e.g. the script [`examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid/blob/main/examples/data-qa/sql-chat/sql_chat.py),
    with a postgres db, you will need to:
    
    - Install PostgreSQL dev libraries for your platform, e.g.
        - `sudo apt-get install libpq-dev` on Ubuntu,
        - `brew install postgresql` on Mac, etc.
    - Install langroid with the postgres extra, e.g. `pip install langroid[postgres]`
      or `uv add "langroid[postgres]"` or `uv pip install --extra postgres -r pyproject.toml`.
      If this gives you an error, try 
      `uv pip install psycopg2-binary` in your virtualenv.


!!! tip "Work in a nice terminal, such as Iterm2, rather than a notebook"
    All of the examples we will go through are command-line applications.
    For the best experience we recommend you work in a nice terminal that supports 
    colored outputs, such as [Iterm2](https://iterm2.com/).    


!!! note "mysqlclient errors"
    If you get strange errors involving `mysqlclient`, try doing `pip uninstall mysqlclient` followed by `pip install mysqlclient` 

## Set up tokens/keys 

To get started, all you need is an OpenAI API Key.
If you don't have one, see [this OpenAI Page](https://platform.openai.com/docs/quickstart).
(Note that while this is the simplest way to get started, Langroid works with practically any LLM, not just those from OpenAI.
See the guides to using [Open/Local LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/),
and other [non-OpenAI](https://langroid.github.io/langroid/tutorials/non-openai-llms/) proprietary LLMs.)

In the root of the repo, copy the `.env-template` file to a new file `.env`:
```bash
cp .env-template .env
```
Then insert your OpenAI API Key.
Your `.env` file should look like this:
```bash
OPENAI_API_KEY=your-key-here-without-quotes
```

Alternatively, you can set this as an environment variable in your shell
(you will need to do this every time you open a new shell):
```bash
export OPENAI_API_KEY=your-key-here-without-quotes
```

All of the following environment variable settings are optional, and some are only needed
to use specific features (as noted below).

- **Qdrant** Vector Store API Key, URL. This is only required if you want to use Qdrant cloud.
  Langroid uses LanceDB as the default vector store in its `DocChatAgent` class (for RAG).
  Alternatively [Chroma](https://docs.trychroma.com/) is also currently supported.
  We use the local-storage version of Chroma, so there is no need for an API key.
- **Redis** Password, host, port: This is optional, and only needed to cache LLM API responses
  using Redis Cloud. Redis [offers](https://redis.com/try-free/) a free 30MB Redis account
  which is more than sufficient to try out Langroid and even beyond.
  If you don't set up these, Langroid will use a pure-python
  Redis in-memory cache via the [Fakeredis](https://fakeredis.readthedocs.io/en/latest/) library.
- **GitHub** Personal Access Token (required for apps that need to analyze git
  repos; token-based API calls are less rate-limited). See this
  [GitHub page](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens).
- **Google Custom Search API Credentials:** Only needed to enable an Agent to use the `GoogleSearchTool`.
  To use Google Search as an LLM Tool/Plugin/function-call,
  you'll need to set up
  [a Google API key](https://developers.google.com/custom-search/v1/introduction#identify_your_application_to_google_with_api_key),
  then [setup a Google Custom Search Engine (CSE) and get the CSE ID](https://developers.google.com/custom-search/docs/tutorial/creatingcse).
  (Documentation for these can be challenging, we suggest asking GPT4 for a step-by-step guide.)
  After obtaining these credentials, store them as values of
  `GOOGLE_API_KEY` and `GOOGLE_CSE_ID` in your `.env` file.
  Full documentation on using this (and other such "stateless" tools) is coming soon, but
  in the meantime take a peek at the test
  [`tests/main/test_web_search_tools.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_web_search_tools.py) to see how to use it.


If you add all of these optional variables, your `.env` file should look like this:
```bash
OPENAI_API_KEY=your-key-here-without-quotes
GITHUB_ACCESS_TOKEN=your-personal-access-token-no-quotes
CACHE_TYPE=redis
REDIS_PASSWORD=your-redis-password-no-quotes
REDIS_HOST=your-redis-hostname-no-quotes
REDIS_PORT=your-redis-port-no-quotes
QDRANT_API_KEY=your-key
QDRANT_API_URL=https://your.url.here:6333 # note port number must be included
GOOGLE_API_KEY=your-key
GOOGLE_CSE_ID=your-cse-id
```

### Microsoft Azure OpenAI setup[Optional]

This section applies only if you are using Microsoft Azure OpenAI.

When using Azure OpenAI, additional environment variables are required in the
`.env` file.
This page [Microsoft Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-services/openai/chatgpt-quickstart?tabs=command-line&pivots=programming-language-python#environment-variables)
provides more information, and you can set each environment variable as follows:

- `AZURE_OPENAI_API_KEY`, from the value of `API_KEY`
- `AZURE_OPENAI_API_BASE` from the value of `ENDPOINT`, typically looks like `https://your_resource.openai.azure.com`.
- For `AZURE_OPENAI_API_VERSION`, you can use the default value in `.env-template`, and latest version can be found [here](https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new#azure-openai-chat-completion-general-availability-ga)
- `AZURE_OPENAI_DEPLOYMENT_NAME` is an OPTIONAL deployment name which may be 
   defined by the user during the model setup.
- `AZURE_OPENAI_CHAT_MODEL` Azure OpenAI allows specific model names when you select the model for your deployment. You need to put precisely the exact model name that was selected. For example, GPT-3.5 (should be `gpt-35-turbo-16k` or `gpt-35-turbo`) or GPT-4 (should be `gpt-4-32k` or `gpt-4`).
- `AZURE_OPENAI_MODEL_NAME` (Deprecated, use `AZURE_OPENAI_CHAT_MODEL` instead).
  
!!! note "For Azure-based models use `AzureConfig` instead of `OpenAIGPTConfig`"
    In most of the docs you will see that LLMs are configured using `OpenAIGPTConfig`.
    However if you want to use Azure-deployed models, you should replace `OpenAIGPTConfig` with `AzureConfig`. See 
    the [`test_azure_openai.py`](https://github.com/langroid/langroid/blob/main/tests/main/test_azure_openai.py) and 
    [`example/basic/chat.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat.py)


## Next steps

Now you should be ready to use Langroid!
As a next step, you may want to see how you can use Langroid to [interact 
directly with the LLM](llm-interaction.md) (OpenAI GPT models only for now).

---

# Three-Agent Collaboration, with message Routing

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is
    in the `three-agent-chat-num-router.py` script in the `langroid-examples` repo:
    [`examples/quick-start/three-agent-chat-num-router.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num-router.py).

Let's change the number game from the [three agent chat example](three-agent-chat-num.md) slightly.
In that example, when the `even_agent`'s LLM receives an odd number,
it responds with `DO-NOT-KNOW`, and similarly for the `odd_agent` when it
receives an even number. The `step()` method of the `repeater_task`
considers `DO-NOT-KNOW` to be an _invalid_ response and _continues_ to 
look for a valid response from any remaining sub-tasks.
Thus there was no need for the `processor_agent` to specify who should handle
the current number.

But what if there is a scenario where the `even_agent` and `odd_agent`
might return a legit but "wrong" answer?
In this section we add this twist -- when
the `even_agent` receives an odd number, it responds with -10, and similarly
for the `odd_agent` when it receives an even number.
We tell the `processor_agent` to avoid getting a negative number.

The goal we have set for the `processor_agent` implies that it 
must specify the intended recipient of 
the number it is sending. 
We can enforce this using a special Langroid Tool, 
[`RecipientTool`][langroid.agent.tools.recipient_tool.RecipientTool].
So when setting up the
`processor_task` we include instructions to use this tool
(whose name is `recipient_message`, the value of `RecipientTool.request`):

```py
processor_agent = lr.ChatAgent(config)
processor_task = lr.Task(
    processor_agent,
    name = "Processor",
    system_message="""
        You will receive a list of numbers from me (the user).
        Your goal is to apply a transformation to each number.
        However you do not know how to do this transformation.
        You can take the help of two people to perform the 
        transformation.
        If the number is even, send it to EvenHandler,
        and if it is odd, send it to OddHandler.
        
        IMPORTANT: send the numbers ONE AT A TIME
        
        The handlers will transform the number and give you a new number.        
        If you send it to the wrong person, you will receive a negative value.
        Your aim is to never get a negative number, so you must 
        clearly specify who you are sending the number to, using the
        `recipient_message` tool/function-call, where the `content` field
        is the number you want to send, and the `recipient` field is the name
        of the intended recipient, either "EvenHandler" or "OddHandler".        
        
        Once all numbers in the given list have been transformed, 
        say DONE and show me the result. 
        Start by asking me for the list of numbers.
    """,
    llm_delegate=True,
    single_round=False,
)
```

To enable the `processor_agent` to use this tool, we must enable it:
```py
processor_agent.enable_message(lr.agent.tools.RecipientTool)
```

The rest of the code remains the same as in the [previous section](three-agent-chat-num.md),
i.e., we simply add the two handler tasks
as sub-tasks of the `processor_task`, like this:
```python
processor_task.add_sub_task([even_task, odd_task])
```

One of the benefits of using the `RecipientTool` is that it contains 
mechanisms to remind the LLM to specify a recipient for its message,
when it forgets to do so (this does happen once in a while, even with GPT-4).


Feel free to try the working example script
`three-agent-chat-num-router.py` in the 
`langroid-examples` repo:
[`examples/quick-start/three-agent-chat-num-router.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num-router.py):

```bash
python3 examples/quick-start/three-agent-chat-num-router.py
```

Below is screenshot of what this might look like, using the OpenAI function-calling 
mechanism with the `recipient_message` tool:

![three-agent-router-func.png](three-agent-router-func.png)

And here is what it looks like using Langroid's built-in tools mechanism (use the `-t` option when running the script):

![three-agent-router.png](three-agent-router.png)

And here is what it looks like using 
## Next steps

In the [next section](chat-agent-docs.md) you will learn
how to use Langroid with external documents.

---

# Three-Agent Collaboration

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is
    in the `three-agent-chat-num.py` script in the `langroid-examples` repo:
    [`examples/quick-start/three-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num.py).


Let us set up a simple numbers exercise between 3 agents.
The `Processor` agent receives a number $n$, and its goal is to 
apply a transformation to the it. However it does not know how to apply the
transformation, and takes the help of two other agents to do so.
Given a number $n$,

- The `EvenHandler` returns $n/2$ if n is even, otherwise says `DO-NOT-KNOW`.
- The `OddHandler` returns $3n+1$ if n is odd, otherwise says `DO-NOT-KNOW`.

We'll first define a shared LLM config:

```py
llm_config = lr.language_models.OpenAIGPTConfig(
    chat_model=lr.language_models.OpenAIChatModel.GPT4o,
    # or, e.g., "ollama/qwen2.5-coder:latest", or "gemini/gemini-2.0-flash-exp"
)
```

Next define the config for the `Processor` agent:
```py
processor_config = lr.ChatAgentConfig(
    name="Processor",
    llm = llm_config,
    system_message="""
    You will receive a number from the user.
    Simply repeat that number, DO NOT SAY ANYTHING else,
    and wait for a TRANSFORMATION of the number 
    to be returned to you.
    
    Once you have received the RESULT, simply say "DONE",
    do not say anything else.
    """,        
    vecdb=None,
)
```

Then set up the `processor_agent`, along with the corresponding task:
```py
processor_agent = lr.ChatAgent(processor_config)

processor_task = lr.Task(
    processor_agent,
    llm_delegate=True, #(1)!
    interactive=False, #(2)!
    single_round=False, #(3)!
)

```

1. Setting the `llm_delegate` option to `True` means that the `processor_task` is
    delegated to the LLM (as opposed to the User), 
    in the sense that the LLM is the one "seeking" a response to the latest 
    number. Specifically, this means that in the `processor_task.step()` 
    when a sub-task returns `DO-NOT-KNOW`,
    it is _not_ considered a valid response, and the search for a valid response 
    continues to the next sub-task if any.
2. `interactive=False` means the task loop will not wait for user input.
3. `single_round=False` means that the `processor_task` should _not_ terminate after 
    a valid response from a responder.

Set up the other two agents and tasks:

```py
NO_ANSWER = lr.utils.constants.NO_ANSWER

even_config = lr.ChatAgentConfig(
    name="EvenHandler",
    llm = llm_config,
    system_message=f"""
    You will be given a number N. Respond as follows:
    
    - If N is even, divide N by 2 and show the result, 
      in the format: 
        RESULT = <result>
      and say NOTHING ELSE.
    - If N is odd, say {NO_ANSWER}
    """,    
)
even_agent = lr.ChatAgent(even_config)
even_task = lr.Task(
    even_agent,
    single_round=True,  # task done after 1 step() with valid response
)

odd_config = lr.ChatAgentConfig(
    name="OddHandler",
    llm = llm_config,
    system_message=f"""
    You will be given a number N. Respond as follows:
    
    - if N is odd, return the result (N*3+1), in the format:
        RESULT = <result> 
        and say NOTHING ELSE.
    
    - If N is even, say {NO_ANSWER}
    """,
)
odd_agent = lr.ChatAgent(odd_config)
odd_task = lr.Task(
    odd_agent,
    single_round=True,  # task done after 1 step() with valid response
)

```

Now add the `even_task` and `odd_task` as subtasks of the `processor_task`, 
and then run it with a number as input:

```python
processor_task.add_sub_task([even_task, odd_task])
processor_task.run(13)
```

The input number will be passed to the `Processor` agent as the user input.


Feel free to try the working example script
[`three-agent-chat-num.py`]()
`langroid-examples` repo:
[`examples/quick-start/three-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/three-agent-chat-num.py):

```bash
python3 examples/quick-start/three-agent-chat-num.py
```

Here's a screenshot of what it looks like:
![three-agent-num.png](three-agent-num.png)


## Next steps


In the [next section](chat-agent-tool.md) you will learn how to use Langroid
to equip a `ChatAgent` with tools or function-calling.

---

# Two-Agent Collaboration

!!! tip "Script in `langroid-examples`"
    A full working example for the material in this section is
    in the `two-agent-chat-num.py` script in the `langroid-examples` repo:
    [`examples/quick-start/two-agent-chat-num.py`](https://github.com/langroid/langroid-examples/tree/main/examples/quick-start/two-agent-chat-num.py).


To illustrate these ideas, let's look at a toy example[^1] where 
a `Student` agent receives a list of numbers to add.
We set up this agent with an instruction that they do not know how to add,
and they can ask for help adding pairs of numbers.
To add pairs of numbers, we set up an `Adder` agent.

[^1]: Toy numerical examples are perfect to illustrate the ideas without
      incurring too much token cost from LLM API calls.

First define a common `llm_config` to use for both agents:
```python
llm_config = lr.language_models.OpenAIGPTConfig(
    chat_model=lr.language_models.OpenAIChatModel.GPT4o,
    # or, e.g., "ollama/qwen2.5-coder:latest", or "gemini/gemini-2.0-flash-exp"
)
```


Next, set up a config for the student agent, then create the agent
and the corresponding task:

```py
student_config = lr.ChatAgentConfig(
    name="Student",
    llm=llm_config,
    vecdb=None, #(1)!
    system_message="""
        You will receive a list of numbers from me (the User),
        and your goal is to calculate their sum.
        However you do not know how to add numbers.
        I can help you add numbers, two at a time, since
        I only know how to add pairs of numbers.
        Send me a pair of numbers to add, one at a time, 
        and I will tell you their sum.
        For each question, simply ask me the sum in math notation, 
        e.g., simply say "1 + 2", etc, and say nothing else.
        Once you have added all the numbers in the list, 
        say DONE and give me the final sum. 
        Start by asking me for the list of numbers.
    """,    
)
student_agent = lr.ChatAgent(student_config)
student_task = lr.Task(
    student_agent,
    name = "Student",
    llm_delegate = True, #(2)!
    single_round=False,  # (3)! 
)
```

1. We don't need access to external docs so we set `vecdb=None` to avoid 
   the overhead of loading a vector-store.
2. Whenever we "flip roles" and assign the LLM the role of generating questions, 
   we set `llm_delegate=True`. In effect this ensures that the LLM "decides" when
   the task is done.
3. This setting means the task is not a single-round task, i.e. it is _not_ done
   after one `step()` with a valid response.

Next, set up the Adder agent config, create the Adder agent
and the corresponding Task:

```py
adder_config = lr.ChatAgentConfig(
    name = "Adder", #(1)!
    llm=llm_config,
    vecdb=None,
    system_message="""
        You are an expert on addition of numbers. 
        When given numbers to add, simply return their sum, say nothing else
        """,     
)
adder_agent = lr.ChatAgent(adder_config)
adder_task = lr.Task(
    adder_agent,
    interactive=False, #(2)!
    single_round=True,  # task done after 1 step() with valid response (3)!
)
```
1. The Agent name is displayed in the conversation shown in the console.
2. Does not wait for user input.
3. We set `single_round=True` to ensure that the expert task is done after 
   one step() with a valid response. 

Finally, we add the `adder_task` as a sub-task of the `student_task`, 
and run the `student_task`:

```py
student_task.add_sub_task(adder_task) #(1)!
student_task.run()
```

1. When adding just one sub-task, we don't need to use a list.


For a full working example, see the 
[`two-agent-chat-num.py`](https://github.com/langroid/langroid-examples/blob/main/examples/quick-start/two-agent-chat-num.py)
script in the `langroid-examples` repo. You can run this using:
```bash
python3 examples/quick-start/two-agent-chat-num.py
```

Here is an example of the conversation that results:

![two-agent-num.png](two-agent-num.png)

## Logs of multi-agent interactions

!!! note "For advanced users"
    This section is for advanced users who want more visibility into the
    internals of multi-agent interactions.

When running a multi-agent chat, e.g. using `task.run()`, two types of logs
are generated:
- plain-text logs in `logs/<task_name>.log`
- tsv logs in `logs/<task_name>.tsv`

It is important to realize that the logs show _every iteration 
of the loop in `Task.step()`, i.e. every **attempt** at
responding to the current pending message, even those that are not allowed_.
The ones marked with an asterisk (*) are the ones that are considered valid
responses for a given `step()` (which is a "turn" in the conversation).

The plain text logs have color-coding ANSI chars to make them easier to read
by doing `less <log_file>`. The format is (subject to change):
```
(TaskName) Responder SenderEntity (EntityName) (=> Recipient) TOOL Content
```

The structure of the `tsv` logs is similar. A great way to view these is to
install and use the excellent `visidata` (https://www.visidata.org/) tool:
```bash
vd logs/<task_name>.tsv
```

## Next steps
As a next step, look at how to set up a collaboration among three agents
for a simple [numbers game](three-agent-chat-num.md).

---

# A quick tour of Langroid

This is a quick tour of some Langroid features. For a more detailed guide,
see the [Getting Started guide](https://langroid.github.io/langroid/quick-start/).
There are many more features besides the ones shown here. To explore langroid more,
see the sections of the main [docs](https://langroid.github.io/langroid/),
and a 
[Colab notebook](https://colab.research.google.com/github/langroid/langroid/blob/main/examples/Langroid_quick_start.ipynb) 
you can try yourself.  


## Chat directly with LLM

Imports:

```python
import langroid as lr
import langroid.language_models as lm
```


Set up the LLM; note how you can specify the chat model -- if omitted, defaults
to OpenAI `GPT4o`. See the guide to using Langroid with 
[local/open LLMs](https://langroid.github.io/langroid/tutorials/local-llm-setup/),
and with [non-OpenAI LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/).
    
```python
llm_config = lm.OpenAIGPTConfig( 
   chat_model="gpt-5-mini"
)
llm = lm.OpenAIGPT(llm_config)
```

Chat with bare LLM -- no chat accumulation, i.e. follow-up responses will *not*
be aware of prior conversation history (you need an Agent for that, see below).

```python
llm.chat("1 2 4 7 11 ?")
# ==> answers 16, with some explanation
```

## Agent

Make a [`ChatAgent`][langroid.agent.chat_agent.ChatAgent], 
and chat with it; now accumulates conv history

```python
agent = lr.ChatAgent(lr.ChatAgentConfig(llm=llm_config))
agent.llm_response("Find the next number: 1 2 4 7 11 ?")
# => responds 16
agent.llm_response("and then?)
# => answers 22
```

## Task

Make a [`Task`][langroid.agent.task.Task] and create a chat loop with the user:

```python
task = lr.Task(agent, interactive=True)
task.run()
```

## Tools/Functions/Structured outputs:

Define a [`ToolMessage`][langroid.agent.tool_message.ToolMessage] 
using Pydantic (v1) -- this gets transpiled into system-message instructions
to the LLM, so you never have to deal with writing a JSON schema.
(Besides JSON-based tools, Langroid also supports 
[XML-based tools](https://langroid.github.io/langroid/notes/xml-tools/), which 
are far more reliable when having the LLM return code in a structured output.)


```python
from pydantic import BaseModel

class CityTemperature(BaseModel):
    city: str
    temp: float

class WeatherTool(lr.ToolMessage):
    request: str = "weather_tool" #(1)!
    purpose: str = "To extract <city_temp> info from text" #(2)!

    city_temp: CityTemperature

    # tool handler
    def handle(self) -> CityTemperature:
        return self.city_temp
```

1. When this tool is enabled for an agent, a method named `weather_tool` gets auto-inserted in the agent class, 
   with body being the `handle` method -- this method handles the LLM's generation 
   of this tool.
2. The value of the `purpose` field is used to populate the system message to the LLM,
   along with the Tool's schema derived from its Pydantic-based definition.

Enable the Agent to use the `ToolMessage`, and set a system message describing the 
agent's task:

```python
agent.enable_message(WeatherTool)
agent.config.system_message = """
 Your job is to extract city and temperature info from user input
 and return it using the `weather_tool`.
"""
```

Create specialized task that returns a `CityTemperature` object:

```python
# configure task to terminate after (a) LLM emits a tool, (b) tool is handled by Agent
task_config = lr.TaskConfig(done_sequences=["T,A"])

# create a task that returns a CityTemperature object
task = lr.Task(agent, interactive=False, config=task_config)[CityTemperature]

# run task, with built-in tool-handling loop
data = task.run("It is 45 degrees F in Boston")

assert data.city == "Boston"
assert int(data.temp) == 45
```

## Chat with a document (RAG)

Create a [`DocChatAgent`][langroid.agent.special.doc_chat_agent.DocChatAgent].

```python
doc_agent_config = lr.agent.special.DocChatAgentConfig(llm=llm_config)
doc_agent = lr.agent.special.DocChatAgent(doc_agent_config)
```

Ingest the contents of a web page into the agent 
(this involves chunking, indexing into a vector-database, etc.):

```python
doc_agent.ingest_doc_paths("https://en.wikipedia.org/wiki/Ludwig_van_Beethoven")
```

Ask a question:

```
result = doc_agent.llm_response("When did Beethoven move from Bonn to Vienna?")
```

You should see the streamed response with citations like this:

![langroid-tour-beethoven.png](langroid-tour-beethoven.png)

## Two-agent interaction

Set up a teacher agent:

```python
from langroid.agent.tools.orchestration import DoneTool

teacher = lr.ChatAgent(
    lr.ChatAgentConfig(
        llm=llm_config,
        system_message=f"""
        Ask a numbers-based question, and your student will answer.
        You can then provide feedback or hints to the student to help them
        arrive at the right answer. Once you receive the right answer,
        use the `{DoneTool.name()}` tool to end the session.
        """
    )
)

teacher.enable_message(DoneTool)
teacher_task = lr.Task(teacher, interactive=False)

```

Set up a student agent:

```python
student = lr.ChatAgent(
    lr.ChatAgentConfig(
        llm=llm_config,
        system_message=f"""
        You will receive a numbers-related question. Answer to the best of
        your ability. If your answer is wrong, you will receive feedback or hints,
        and you can revise your answer, and repeat this process until you get 
        the right answer.
        """
    )
)

student_task = lr.Task(student, interactive=False, single_round=True)
```

Make the `student_task` a subtask of the `teacher_task`:

```python
teacher_task.add_sub_task(student_task)
```

Run the teacher task:

```python
teacher_task.run()
```

You should then see this type of interaction:

![langroid-tour-teacher.png](langroid-tour-teacher.png)

---

# Options for accessing LLMs

> This is a work-in-progress document. It will be updated frequently.

The variety of ways to access the power of Large Language Models (LLMs) is growing 
rapidly, and there are a bewildering array of options. This document is an attempt to 
categorize and describe some of the most popular and useful ways to access LLMs,
via these 2x2x2  combinations:

- Websites (non-programmatic) or APIs (programmatic)
- Open-source or Proprietary 
- Chat-based interface or integrated assistive tools.

We will go into some of these combinations below. More will be added over time.

## Chat-based Web (non-API) access to Proprietary LLMs


This is best for *non-programmatic* use of LLMs: you go to a website and 
interact with the LLM via a chat interface -- 
you write prompts and/or upload documents, and the LLM responds with plain text
or can create artifacts (e.g. reports, code,
charts, podcasts, etc) that you can then copy into your files, workflow or codebase.
They typically allow you to upload text-based documents of various types, and some let you upload images, screen-shots, etc and ask questions about them.

Most of them are capable of doing *internet search* to inform their responses.


!!! note "Chat Interface vs Integrated Tools"
    Note that when using a chat-based interaction, you have to copy various artifacts
    from the web-site into another place, like your code editor, document, etc.
    AI-integrated tools relieve you of this burden by bringing the LLM power into 
    your workflow directly. More on this in a later section.

      
**Pre-requisites:** 

- *Computer*: Besides having a modern web browser (Chrome, Firefox, etc) and internet
access, there are no other special requirements, since the LLM is 
running on a remote server.
- *Coding knowledge*: Where (typically Python) code is produced, you will get best results
if you are conversant with Python so that you can understand and modify the code as
needed. In this category you do not need to know how to interact with an LLM API via code.

Here are some popular options in this category:

### OpenAI ChatGPT

Free access at [https://chatgpt.com/](https://chatgpt.com/)

With a ChatGPT-Plus monthly subscription ($20/month), you get additional features like:

- access to more powerful models
- access to [OpenAI canvas](https://help.openai.com/en/articles/9930697-what-is-the-canvas-feature-in-chatgpt-and-how-do-i-use-it) - this offers a richer interface than just a chat window, e.g. it automatically creates windows for code snippets, and shows results of running code
(e.g. output, charts etc).

Typical use: Since there is fixed monthly subscription (i.e. not metered by amount of 
usage), this is a cost-effective way to non-programmatically 
access a top LLM such as `GPT-4o` or `o1` 
(so-called "reasoning/thinking" models). Note however that there are limits on how many
queries you can make within a certain time period, but usually the limit is fairly
generous. 

What you can create, besides text-based artifacts:

- produce Python (or other language) code which you can copy/paste into notebooks or files
- SQL queries that you can copy/paste into a database tool
- Markdown-based tables
- You can't get diagrams, but you can get *code for diagrams*, 
e.g. python code for plots, [mermaid](https://github.com/mermaid-js/mermaid) code for flowcharts.
- images in some cases.

### OpenAI Custom GPTs (simply known as "GPTs")

[https://chatgpt.com/gpts/editor](https://chatgpt.com/gpts/editor)

Here you can conversationally interact with a "GPT Builder" that will 
create a version of ChatGPT
that is *customized* to your needs, i.e. with necessary background instructions,
context, and/or documents. 
The end result is a specialized GPT that you can then use for your specific
purpose and share with others (all of this is non-programmatic). 

E.g. [here](https://chatgpt.com/share/67153a4f-ea2c-8003-a6d3-cbc2412d78e5) is a "Knowledge Graph Builder" GPT

!!! note "Private GPTs requires an OpenAI Team Account"
    To share a custom GPT within a private group, you need an OpenAI Team account,
    see pricing [here](https://openai.com/chatgpt/pricing). Without a Team account,
    any shared GPT is public and can be accessed by anyone.


### Anthropic/Claude

[https://claude.ai](https://claude.ai)

The Claude basic web-based interface is similar to OpenAI ChatGPT, powered by 
Anthropic's proprietary LLMs. 
Anthropic's equivalent of ChatGPT-Plus is called "Claude Pro", which is also 
a $20/month subscription, giving you access to advanced models 
(e.g. `Claude-3.5-Sonnet`) and features.

Anthropic's equivalent of Custom GPTs is called 
[Projects](https://www.anthropic.com/news/projects), 
where you can create
an  LLM-powered interface that is augmented with your custom context and data.

Whichever product you are using, the interface auto-creates **artifacts** as needed --
these are stand-alone documents (code, text, images, web-pages, etc) 
that you may want to copy and paste into your own codebase, documents, etc.
For example you can prompt Claude to create full working interactive applications,
and copy the code, polish it and deploy it for others to use. See examples [here](https://simonwillison.net/2024/Oct/21/claude-artifacts/).

### Microsoft Copilot Lab

!!! note
    Microsoft's "Copilot" is an overloaded term that can refer to many different 
    AI-powered tools. Here we are referring to the one that is a collaboration between
    Microsoft and OpenAI, and is based on OpenAI's GPT-4o LLM, and powered by 
    Bing's search engine.

Accessible via [https://copilot.cloud.microsoft.com/](https://copilot.cloud.microsoft.com/)

The basic capabilities are similar to OpenAI's and Anthropic's offerings, but
come with so-called "enterprise grade" security and privacy features,
which purportedly make it suitable for use in educational and corporate settings.
Read more on what you can do with Copilot Lab [here](https://www.microsoft.com/en-us/microsoft-copilot/learn/?form=MA13FV).

Like the other proprietary offerings, Copilot can:

- perform internet search to inform its responses
- generate/run code and show results including charts

### Google Gemini

Accessible at [gemini.google.com](https://gemini.google.com).


## AI-powered productivity tools

These tools "bring the AI to your workflow", which is a massive productivity boost,
compared to repeatedly context-switching, e.g. copying/pasting between a chat-based AI web-app and your workflow.

- [**Cursor**](https://www.cursor.com/): AI Editor/Integrated Dev Environment (IDE). This is a fork of VSCode.
- [**Zed**](https://zed.dev/): built in Rust; can be customized to use Jetbrains/PyCharm keyboard shortcuts.
- [**Google Colab Notebooks with Gemini**](https://colab.research.google.com).
- [**Google NotebookLM**](https://notebooklm.google.com/): allows you to upload a set of text-based documents, 
  and create artifacts such as study guide, FAQ, summary, podcasts, etc.

    
## APIs for Proprietary LLMs

Using an API key allows *programmatic* access to the LLMs, meaning you can make
invocations to the LLM from within your own code, and receive back the results.
This is useful for building applications involving more complex workflows where LLMs
are used within a larger codebase, to access "intelligence" as needed.

E.g. suppose you are writing code that handles queries from a user, and you want to 
classify the user's _intent_ into one of 3 types: Information, or Action or Done.
Pre-LLMs, you would have had to write a bunch of rules or train a custom 
"intent classifier" that maps, for example:

- "What is the weather in Pittsburgh?" -> Information
- "Set a timer for 10 minutes" -> Action
- "Ok I have no more questions∞" -> Done

But using an LLM API, this is almost trivially easy - you instruct the LLM it should
classify the intent into one of these 3 types, and send the user query to the LLM,
and receive back the intent. 
(You can use Tools to make this robust, but that is outside the scope of this document.)

The most popular proprietary LLMs available via API are from OpenAI (or via  its
partner Microsoft), Anthropic, and Google:

- [OpenAI](https://platform.openai.com/docs/api-reference/introduction), to interact with `GPT-4o` family of models, and the `o1` family of "thinking/reasoning" models.
- [Anthropic](https://docs.anthropic.com/en/home) to use the `Claude` series of models.
- [Google](https://ai.google.dev/gemini-api/docs) to use the `Gemini` family of models.

These LLM providers are home to some of the most powerful LLMs available today,
specifically OpenAI's `GPT-4o` and Anthropic's `Claude-3.5-Sonnet`, and Google's `Gemini 1.5 Pro` (as of Oct 2024).

**Billing:** Unlike the fixed monthly subscriptions of ChatGPT, Claude and others, 
LLM usage via API is typically billed by *token usage*, i.e. you pay for the total
number of input and output "tokens" (a slightly technical term, but think of it as
a word for now).

Using an LLM API involves these steps:

- create an account on the provider's website as a "developer" or organization,
- get an API key,
- use the API key in your code to make requests to the LLM. 


**Prerequisites**:

- *Computer:* again, since the API is served over the internet, there are no special
  requirements for your computer.
- *Programming skills:* Using an LLM API involves either:
    - directly making REST API calls from your code, or 
    - use a scaffolding library (like [Langroid](https://github.com/langroid/langroid)) that abstracts away the details of the 
      API calls.
  
    In either case, you must be highly proficient in (Python) programming 
  to use this option.

## Web-interfaces to Open LLMs

!!! note  "Open LLMs"
    These are LLMs that have been publicly released, i.e. their parameters ("weights") 
    are publicly available -- we refer to these as *open-weight* LLMs. If in addition, the
    training datasets, and data-preprocessing and training code are also available, we would
    call these *open-source* LLMs. But lately there is a looser usage of the term "open-source",referring to just the weights being available. For our purposes we will just refer all of these models as **Open LLMs**.

There are many options here, but some popular ones are below. Note that some of these
are front-ends that allow you to interact with not only Open LLMs but also 
proprietary LLM APIs.

- [LMStudio](https://lmstudio.ai/)
- [OpenWebUI](https://github.com/open-webui/open-webui)
- [Msty](https://msty.app/)
- [AnythingLLM](https://anythingllm.com/)
- [LibreChat](https://www.librechat.ai/)


## API Access to Open LLMs

This is a good option if you are fairly proficient in (Python) coding. There are in 
fact two possibilities here:

- The LLM is hosted remotely, and you make REST API calls to the remote server. This
  is a good option when you want to run large LLMs and you don't have the resources (GPU and memory) to run them locally.
    - [groq](https://groq.com/) amazingly it is free, and you can run `llama-3.1-70b`
    - [cerebras](https://cerebras.ai/)
    - [open-router](https://openrouter.ai/)
- The LLM is running on your computer. This is a good option if your machine has sufficient RAM to accommodate the LLM you are trying to run, and if you are 
concerned about data privacy. The most user-friendly option is [Ollama](https://github.com/ollama/ollama); see more below.

Note that all of the above options provide an **OpenAI-Compatible API** to interact
with the LLM, which is a huge convenience: you can write code to interact with OpenAI's
LLMs (e.g. `GPT4o` etc) and then easily switch to one of the above options, typically
by changing a simple config (see the respective websites for instructions).

Of course, directly working with the raw LLM API quickly becomes tedious. This is where
a scaffolding library like [langroid](https://github.com/langroid/langroid) comes in
very handy - it abstracts away the details of the API calls, and provides a simple
programmatic interface to the LLM, and higher-level abstractions like 
Agents, Tasks, etc. Working with such a library is going to be far more productive
than directly working with the raw API. Below are instructions on how to use langroid
with some the above Open/Local LLM options.

See [here](https://langroid.github.io/langroid/tutorials/local-llm-setup/) for 
a guide to using Langroid with Open LLMs.

---

# Setting up a Local/Open LLM to work with Langroid

!!! tip "Examples scripts in [`examples/`](https://github.com/langroid/langroid/tree/main/examples) directory."
      There are numerous examples of scripts that can be run with local LLMs,
      in the [`examples/`](https://github.com/langroid/langroid/tree/main/examples)
      directory of the main `langroid` repo. These examples are also in the 
      [`langroid-examples`](https://github.com/langroid/langroid-examples/tree/main/examples),
      although the latter repo may contain some examples that are not in the `langroid` repo.
      Most of these example scripts allow you to specify an LLM in the format `-m <model>`,
      where the specification of `<model>` is described in the quide below for local/open LLMs, 
      or in the [Non-OpenAI LLM](https://langroid.github.io/langroid/tutorials/non-openai-llms/) guide. Scripts 
      that have the string `local` in their name have been especially designed to work with 
      certain local LLMs, as described in the respective scripts.
      If you want a pointer to a specific script that illustrates a 2-agent chat, have a look 
      at [`chat-search-assistant.py`](https://github.com/langroid/langroid/blob/main/examples/basic/chat-search-assistant.py).
      This specific script, originally designed for GPT-4/GPT-4o, works well with `llama3-70b` 
      (tested via Groq, mentioned below).

## Easiest: with Ollama

As of version 0.1.24, Ollama provides an OpenAI-compatible API server for the LLMs it supports,
which massively simplifies running these LLMs with Langroid. Example below.

```
ollama pull mistral:7b-instruct-v0.2-q8_0
```
This provides an OpenAI-compatible 
server for the `mistral:7b-instruct-v0.2-q8_0` model.

You can run any Langroid script using this model, by setting the `chat_model`
in the `OpenAIGPTConfig` to `ollama/mistral:7b-instruct-v0.2-q8_0`, e.g.

```python
import langroid.language_models as lm
import langroid as lr

llm_config = lm.OpenAIGPTConfig(
    chat_model="ollama/mistral:7b-instruct-v0.2-q8_0",
    chat_context_length=16_000, # adjust based on model
)
agent_config = lr.ChatAgentConfig(
    llm=llm_config,
    system_message="You are helpful but concise",
)
agent = lr.ChatAgent(agent_config)
# directly invoke agent's llm_response method
# response = agent.llm_response("What is the capital of Russia?")
task = lr.Task(agent, interactive=True)
task.run() # for an interactive chat loop
```

## Setup Ollama with a GGUF model from HuggingFace

Some models are not directly supported by Ollama out of the box. To server a GGUF
model with Ollama, you can download the model from HuggingFace and set up a custom
Modelfile for it.

E.g. download the GGUF version of `dolphin-mixtral` from
[here](https://huggingface.co/TheBloke/dolphin-2.7-mixtral-8x7b-GGUF)

(specifically, download this file `dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf`)

To set up a custom ollama model based on this:

- Save this model at a convenient place, e.g. `~/.ollama/models/`
- Create a modelfile for this model. First see what an existing modelfile
  for a similar model looks like, e.g. by running:

```
ollama show --modelfile dolphin-mixtral:latest
```
You will notice this file has a FROM line followed by a prompt template and other settings.
Create a new file with these contents.
Only  change the  `FROM ...` line with the path to the model you downloaded, e.g.
```
FROM /Users/blah/.ollama/models/dolphin-2.7-mixtral-8x7b.Q4_K_M.gguf
```

- Save this modelfile somewhere, e.g. `~/.ollama/modelfiles/dolphin-mixtral-gguf`
- Create a new ollama model based on this file:
```
ollama create dolphin-mixtral-gguf -f ~/.ollama/modelfiles/dolphin-mixtral-gguf
``` 

- Run this new model using `ollama run dolphin-mixtral-gguf`

To use this model with Langroid you can then specify `ollama/dolphin-mixtral-gguf`
as the `chat_model` param in the `OpenAIGPTConfig` as in the previous section.
When a script supports it, you can also pass in the model name via
`-m ollama/dolphin-mixtral-gguf`

## Local LLMs using LMStudio

LMStudio is one of the simplest ways to download run open-weight LLMs locally.
See their docs at [lmstudio.ai](https://lmstudio.ai/docs) for installation and usage 
instructions. Once you download a model, you can use the "server" option to have it 
served via an OpenAI-compatible API at a local IP like `https://127.0.0.1:1234/v1`.
As with any other scenario of running a local LLM, you can use this with Langroid by
setting `chat_model` as follows (note you should not include the `https://` part):

```python
llm_config = lm.OpenAIGPTConfig(
    chat_model="local/127.0.0.1234/v1",
    ...
)
```

## Setup llama.cpp with a GGUF model from HuggingFace

See `llama.cpp`'s [GitHub page](https://github.com/ggerganov/llama.cpp/tree/master) for build and installation instructions.

After installation, begin as above with downloading a GGUF model from HuggingFace; for example, the quantized `Qwen2.5-Coder-7B` from [here](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF); specifically, [this file](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF/blob/main/qwen2.5-coder-7b-instruct-q2_k.gguf).

Now, the server can be started with `llama-server -m qwen2.5-coder-7b-instruct-q2_k.gguf`.

In addition, your `llama.cpp` may be built with support for simplified management of HuggingFace models (specifically, `libcurl` support is required); in this case, `llama.cpp` will download HuggingFace models to a cache directory, and the server may be run with:
```bash
llama-server \
      --hf-repo Qwen/Qwen2.5-Coder-7B-Instruct-GGUF \
      --hf-file qwen2.5-coder-7b-instruct-q2_k.gguf
```

To use the model with Langroid, specify `llamacpp/localhost:{port}` as the `chat_model`; the default port is 8080.

## Setup vLLM with a model from HuggingFace

See [the vLLM docs](https://docs.vllm.ai/en/stable/getting_started/installation.html) for installation and configuration options. To run a HuggingFace model with vLLM, use `vllm serve`, which provides an OpenAI-compatible server. 

For example, to run `Qwen2.5-Coder-32B`, run `vllm serve Qwen/Qwen2.5-Coder-32B`.

If the model is not publicly available, set the environment varaible `HF_TOKEN` to your HuggingFace token with read access to the model repo.

To use the model with Langroid, specify `vllm/Qwen/Qwen2.5-Coder-32B` as the `chat_model` and, if a port other than the default 8000 was used, set `api_base` to `localhost:{port}`.

## Setup vLLM with a GGUF model from HuggingFace

`vLLM` supports running quantized models from GGUF files; however, this is currently an experimental feature. To run a quantized `Qwen2.5-Coder-32B`, download the model from [the repo](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF), specifically [this file](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct-GGUF/blob/main/qwen2.5-coder-32b-instruct-q4_0.gguf). 

The model can now be run with `vllm serve qwen2.5-coder-32b-instruct-q4_0.gguf --tokenizer Qwen/Qwen2.5-Coder-32B` (the tokenizer of the base model rather than the quantized model should be used).

To use the model with Langroid, specify `vllm/qwen2.5-coder-32b-instruct-q4_0.gguf` as the `chat_model` and, if a port other than the default 8000 was used, set `api_base` to `localhost:{port}`.

## "Local" LLMs hosted on Groq
In this scenario, an open-source LLM (e.g. `llama3.1-8b-instant`) is hosted on a Groq server
which provides an OpenAI-compatible API. Using this with langroid is exactly analogous
to the Ollama scenario above: you can set the `chat_model` in the `OpenAIGPTConfig` to
`groq/<model_name>`, e.g. `groq/llama3.1-8b-instant`. 
For this to work, ensure you have a `GROQ_API_KEY` environment variable set in your
`.env` file. See [groq docs](https://console.groq.com/docs/quickstart).

## "Local" LLMs hosted on Cerebras
This works exactly like with Groq, except you set up a `CEREBRAS_API_KEY` environment variable, and specify the `chat_model` as `cerebras/<model_name>`, e.g. `cerebras/llama3.1-8b`. See the Cerebras [docs](https://inference-docs.cerebras.ai/introduction) for details on which LLMs are supported.

## Open/Proprietary LLMs via OpenRouter

OpenRouter is a **paid service** that provides an OpenAI-compatible API 
for practically any LLM, open or proprietary.
Using this with Langroid is similar to the `groq` scenario above:

- Ensure you have an `OPENROUTER_API_KEY` set up in your environment (or `.env` file), and 
- Set the `chat_model` in the `OpenAIGPTConfig` to 
  `openrouter/<model_name>`, where `<model_name>` is the name of the model on the 
[OpenRouter](https://openrouter.ai/) website, e.g. `qwen/qwen-2.5-7b-instruct`.

This is a good option if you want to use larger open LLMs without having to download
them locally (especially if your local machine does not have the resources to run them).
Besides using specific LLMs, OpenRouter also has smart routing/load-balancing.
OpenRouter is also convenient for using proprietary LLMs (e.g. gemini, amazon) via 
a single convenient API.

## "Local" LLMs hosted on GLHF.chat

See [glhf.chat](https://glhf.chat/chat/create) for a list of available models.

To run with one of these models, set the `chat_model` in the `OpenAIGPTConfig` to
`"glhf/<model_name>"`, where `model_name` is `hf:` followed by the HuggingFace repo 
path, e.g. `Qwen/Qwen2.5-Coder-32B-Instruct`, so the full `chat_model` would be
`"glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct"`. 

## DeepSeek LLMs

As of 26-Dec-2024, DeepSeek models are available via their [api](https://platform.deepseek.com).
To use it with Langroid:

- set up your `DEEPSEEK_API_KEY` environment variable in the `.env` file or as
 an explicit export in your shell
- set the `chat_model` in the `OpenAIGPTConfig` to `deepseek/deepseek-chat` to use the 
`DeepSeek-V3` model, or `deepseek/deepseek-reasoner` to use the full (i.e. non-distilled) `DeepSeek-R1` "reasoning" model.

The DeepSeek models are also available via OpenRouter (see the corresponding 
in the OpenRouter section here) or ollama (see those instructions). E.g. you
can use the DeepSeek R1 or its distilled variants by setting `chat_model` to 
`openrouter/deepseek/deepseek-r1` or `ollama/deepseek-r1:8b`.

## Other non-OpenAI LLMs supported by LiteLLM

For other scenarios of running local/remote LLMs, it is possible that the `LiteLLM` library
supports an "OpenAI adaptor" for these models (see their [docs](https://litellm.vercel.app/docs/providers)).

Depending on the specific model, the `litellm` docs may say you need to 
specify a model in the form `<provider>/<model>`, e.g. `palm/chat-bison`. 
To use the model with Langroid, simply prepend `litellm/` to this string, e.g. `litellm/palm/chat-bison`,
when you specify the `chat_model` in the `OpenAIGPTConfig`.

To use `litellm`, ensure you have the `litellm` extra installed, 
via `pip install langroid[litellm]` or equivalent.


## Harder: with oobabooga
Like Ollama, [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides an OpenAI-API-compatible API server, but the setup 
is significantly more involved. See their github page for installation and model-download instructions.

Once you have finished the installation, you can spin up the server for an LLM using
something like this:

```
python server.py --api --model mistral-7b-instruct-v0.2.Q8_0.gguf --verbose --extensions openai --nowebui
```
This will show a message saying that the OpenAI-compatible API is running at `http://127.0.0.1:5000`

Then in your Langroid code you can specify the LLM config using
`chat_model="local/127.0.0.1:5000/v1` (the `v1` is the API version, which is required).
As with Ollama, you can use the `-m` arg in many of the example scripts, e.g.
```
python examples/docqa/rag-local-simple.py -m local/127.0.0.1:5000/v1
```

Recommended: to ensure accurate chat formatting (and not use the defaults from ooba),
  append the appropriate HuggingFace model name to the
  -m arg, separated by //, e.g. 
```
python examples/docqa/rag-local-simple.py -m local/127.0.0.1:5000/v1//mistral-instruct-v0.2
```
  (no need to include the full model name, as long as you include enough to
   uniquely identify the model's chat formatting template)


## Other local LLM scenarios

There may be scenarios where the above `local/...` or `ollama/...` syntactic shorthand
does not work.(e.g. when using vLLM to spin up a local LLM at an OpenAI-compatible
endpoint). For these scenarios, you will have to explicitly create an instance of 
`lm.OpenAIGPTConfig` and set *both* the `chat_model` and `api_base` parameters.
For example, suppose you are able to get responses from this endpoint using something like:
```bash
curl http://192.168.0.5:5078/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Mistral-7B-Instruct-v0.2",
        "messages": [
             {"role": "user", "content": "Who won the world series in 2020?"}
        ]
    }'
```
To use this endpoint with Langroid, you would create an `OpenAIGPTConfig` like this:
```python
import langroid.language_models as lm
llm_config = lm.OpenAIGPTConfig(
    chat_model="Mistral-7B-Instruct-v0.2",
    api_base="http://192.168.0.5:5078/v1",
)
```

## Quick testing with local LLMs
As mentioned [here](https://langroid.github.io/langroid/tutorials/non-openai-llms/#quick-testing-with-non-openai-models), 
you can run many of the [tests](https://github.com/langroid/langroid/tree/main/tests/main) in the main langroid repo against a local LLM
(which by default run against an OpenAI model), 
by specifying the model as `--m <model>`, 
where `<model>` follows the syntax described in the previous sections. Here's an example:

```bash
pytest tests/main/test_chat_agent.py --m ollama/mixtral
```
Of course, bear in mind that the tests may not pass due to weaknesses of the local LLM.

---

# Using Langroid with Non-OpenAI LLMs

Langroid was initially written to work with OpenAI models via their API.
This may sound limiting, but fortunately:

- Many open-source LLMs can be served via 
OpenAI-compatible endpoints. See the [Local LLM Setup](https://langroid.github.io/langroid/tutorials/local-llm-setup/) guide for details.
- There are tools like [LiteLLM](https://github.com/BerriAI/litellm/tree/main/litellm) 
  that provide an OpenAI-like API for _hundreds_ of non-OpenAI LLM providers 
(e.g. Anthropic's Claude, Google's Gemini).
- AI gateways like [LangDB](https://langdb.ai/), [Portkey](https://portkey.ai), and [OpenRouter](https://openrouter.ai/) provide unified access to multiple LLM providers with additional features like cost control, observability, caching, and fallback strategies.
  
Below we show how you can use these various options with Langroid.

## Create an `OpenAIGPTConfig` object with `chat_model = "litellm/..."`

!!! note "Install `litellm` extra"
    To use `litellm` you need to install Langroid with the `litellm` extra, e.g.:
    `pip install "langroid[litellm]"`

Next, look up the instructions in LiteLLM docs for the specific model you are 
interested. Here we take the example of Anthropic's `claude-instant-1` model.
Set up the necessary environment variables as specified in the LiteLLM docs,
e.g. for the `claude-instant-1` model, you will need to set the `ANTHROPIC_API_KEY`
```bash
export ANTHROPIC_API_KEY=my-api-key
```

Now you are ready to create an instance of `OpenAIGPTConfig` with the 
`chat_model` set to `litellm/<model_spec>`, where you should set `model_spec` based on LiteLLM 
docs. For example, for the `claude-instant-1` model, you would set `chat_model` to
`litellm/claude-instant-1`. But if you are using the model via a 3rd party provider,
(e.g. those via Amazon Bedrock), you may also need to have a `provider` part in the `model_spec`, e.g. 
`litellm/bedrock/anthropic.claude-instant-v1`. In general you can see which of
these to use, from the LiteLLM docs.

```python
import langroid.language_models as lm

llm_config = lm.OpenAIGPTConfig(
    chat_model="litellm/claude-instant-v1",
    chat_context_length=8000, # adjust according to model
)
```

A similar process works for the `Gemini 1.5 Pro` LLM:

- get the API key [here](https://aistudio.google.com/)
- set the `GEMINI_API_KEY` environment variable in your `.env` file or shell
- set `chat_model="litellm/gemini/gemini-1.5-pro-latest"` in the `OpenAIGPTConfig` object

For other gemini models supported by litellm, see [their docs](https://litellm.vercel.app/docs/providers/gemini)

## Gemini LLMs via OpenAI client, without LiteLLM

This is now the recommended way to use Gemini LLMs with Langroid,
where you don't need to use LiteLLM. As of 11/20/2024, these models
are [available via the OpenAI client](https://developers.googleblog.com/en/gemini-is-now-accessible-from-the-openai-library/).

To use langroid with Gemini LLMs, all you have to do is:

- set the `GEMINI_API_KEY` environment variable in your `.env` file or shell
- set `chat_model="gemini/<model_name>"` in the `OpenAIGPTConfig` object,  
  where <model_name> is one of "gemini-1.5-flash", "gemini-1.5-flash-8b", or "gemini-1.5-pro"

See [here](https://ai.google.dev/gemini-api/docs/models/gemini) for details on Gemini models.

For example, you can use this `llm_config`:

```python
llm_config = lm.OpenAIGPTConfig(
    chat_model="gemini/" + lm.OpenAIChatModel.GEMINI_1_5_FLASH,
)
```

In most tests you can switch to a gemini model, e.g. `--m gemini/gemini-1.5-flash`, 
e.g.:

```bash
pytest -xvs tests/main/test_llm.py --m gemini/gemini-1.5-flash
```

Many of the example scripts allow switching the model using `-m` or `--model`, e.g.

```bash
python3 examples/basic/chat.py -m gemini/gemini-1.5-flash
```


## AI Gateways for Multiple LLM Providers

In addition to LiteLLM, Langroid integrates with AI gateways that provide unified access to multiple LLM providers with additional enterprise features:

### LangDB

[LangDB](https://langdb.ai/) is an AI gateway offering OpenAI-compatible APIs to access 250+ LLMs with cost control, observability, and performance benchmarking. LangDB enables seamless model switching while providing detailed analytics and usage tracking.

To use LangDB with Langroid:
- Set up your `LANGDB_API_KEY` and `LANGDB_PROJECT_ID` environment variables
- Set `chat_model="langdb/<provider>/<model_name>"` in the `OpenAIGPTConfig` (e.g., `"langdb/anthropic/claude-3.7-sonnet"`)

For detailed setup and usage instructions, see the [LangDB integration guide](../notes/langdb.md).

### Portkey

[Portkey](https://portkey.ai) is a comprehensive AI gateway that provides access to 200+ models from various providers through a unified API. It offers advanced features like intelligent caching, automatic retries, fallback strategies, and comprehensive observability tools for production deployments.

To use Portkey with Langroid:
- Set up your `PORTKEY_API_KEY` environment variable (plus provider API keys like `OPENAI_API_KEY`)
- Set `chat_model="portkey/<provider>/<model_name>"` in the `OpenAIGPTConfig` (e.g., `"portkey/openai/gpt-4o-mini"`)

For detailed setup and usage instructions, see the [Portkey integration guide](../notes/portkey.md).

### OpenRouter

[OpenRouter](https://openrouter.ai/) provides access to a wide variety of both open and proprietary LLMs through a unified API. It features automatic routing and load balancing, making it particularly useful for accessing larger open LLMs without local resources and for using multiple providers through a single interface.

To use OpenRouter with Langroid:
- Set up your `OPENROUTER_API_KEY` environment variable
- Set `chat_model="openrouter/<model_name>"` in the `OpenAIGPTConfig`

For more details, see the [Local LLM Setup guide](local-llm-setup.md#local-llms-available-on-openrouter).

## Working with the created `OpenAIGPTConfig` object

From here you can proceed as usual, creating instances of `OpenAIGPT`,
`ChatAgentConfig`, `ChatAgent` and `Task` object as usual.

E.g. you can create an object of class `OpenAIGPT` (which represents any
LLM with an OpenAI-compatible API) and interact with it directly:
```python
llm = lm.OpenAIGPT(llm_config)
messages = [
    LLMMessage(content="You are a helpful assistant",  role=Role.SYSTEM),
    LLMMessage(content="What is the capital of Ontario?",  role=Role.USER),
],
response = mdl.chat(messages, max_tokens=50)
```

When you interact directly with the LLM, you are responsible for keeping dialog history.
Also you would often want an LLM to have access to tools/functions and external
data/documents (e.g. vector DB or traditional DB). An Agent class simplifies managing all of these.
For example, you can create an Agent powered by the above LLM, wrap it in a Task and have it
run as an interactive chat app:

```python
agent_config = lr.ChatAgentConfig(llm=llm_config, name="my-llm-agent")
agent = lr.ChatAgent(agent_config)

task = lr.Task(agent, name="my-llm-task")
task.run()
```

## Example: Simple Chat script with a non-OpenAI proprietary model

Many of the Langroid example scripts have a convenient `-m`  flag that lets you
easily switch to a different model. For example, you can run 
the `chat.py` script in the `examples/basic` folder with the 
`litellm/claude-instant-v1` model:
```bash
python3 examples/basic/chat.py -m litellm/claude-instant-1
```

## Quick testing with non-OpenAI models

There are numerous tests in the main [Langroid repo](https://github.com/langroid/langroid) that involve
LLMs, and once you setup the dev environment as described in the README of the repo, 
you can run any of those tests (which run against the default GPT4 model) against
local/remote models that are proxied by `liteLLM` (or served locally via the options mentioned above,
such as `oobabooga`, `ollama` or `llama-cpp-python`), using the `--m <model-name>` option,
where `model-name` takes one of the forms above. Some examples of tests are:

```bash
pytest -s tests/test_llm.py --m local/localhost:8000
pytest -s tests/test_llm.py --m litellm/claude-instant-1
```
When the `--m` option is omitted, the default OpenAI GPT4 model is used.

!!! note "`chat_context_length` is not affected by `--m`"
      Be aware that the `--m` only switches the model, but does not affect the `chat_context_length` 
      parameter in the `OpenAIGPTConfig` object. which you may need to adjust for different models.
      So this option is only meant for quickly testing against different models, and not meant as
      a way to switch between models in a production environment.

---

# Chat with a PostgreSQL DB using SQLChatAgent

The [`SQLChatAgent`](../reference/agent/special/sql/sql_chat_agent.md) is
designed to facilitate interactions with an SQL database using natural language.
A ready-to-use script based on the `SQLChatAgent` is available in the `langroid-examples` 
repo at [`examples/data-qa/sql-chat/sql_chat.py`](https://github.com/langroid/langroid-examples/blob/main/examples/data-qa/sql-chat/sql_chat.py)
(and also in a similar location in the main `langroid` repo).
This tutorial walks you through how you might use the `SQLChatAgent` if you were
to write your own script from scratch. We also show some of the internal workings of this Agent.

The agent uses the schema context to generate SQL queries based on a user's
input. Here is a tutorial on how to set up an agent with your PostgreSQL
database. The steps for other databases are similar. Since the agent implementation relies 
on SqlAlchemy, it should work with any SQL DB that supports SqlAlchemy.
It offers enhanced functionality for MySQL and PostgreSQL by 
automatically extracting schemas from the database. 

## Before you begin

!!! note "Data Privacy Considerations"
    Since the SQLChatAgent uses the OpenAI GPT-4 as the underlying language model,
    users should be aware that database information processed by the agent may be
    sent to OpenAI's API and should therefore be comfortable with this.
1. Install PostgreSQL dev libraries for your platform, e.g.
    - `sudo apt-get install libpq-dev` on Ubuntu,
    - `brew install postgresql` on Mac, etc.

2. Follow the general [setup guide](../quick-start/setup.md) to get started with Langroid
(mainly, install `langroid` into your virtual env, and set up suitable values in 
the `.env` file). Note that to use the SQLChatAgent with a PostgreSQL database,
you need to install the `langroid[postgres]` extra, e.g.:

    - `pip install "langroid[postgres]"` or 
    - `poetry add "langroid[postgres]"` or `uv add "langroid[postgres]"`
    - `poetry install -E postgres` or `uv pip install --extra postgres -r pyproject.toml`


If this gives you an error, try `pip install psycopg2-binary` in your virtualenv.


## Initialize the agent

```python
from langroid.agent.special.sql.sql_chat_agent import (
    SQLChatAgent,
    SQLChatAgentConfig,
)

agent = SQLChatAgent(
    config=SQLChatAgentConfig(
        database_uri="postgresql://example.db",
    )
)
```

## Configuration

The following components of `SQLChatAgentConfig` are optional but strongly
recommended for improved results:

* `context_descriptions`: A nested dictionary that specifies the schema context for
  the agent to use when generating queries, for example:

```json
{
  "table1": {
    "description": "description of table1",
    "columns": {
      "column1": "description of column1 in table1",
      "column2": "description of column2 in table1"
    }
  },
  "employees": {
    "description": "The 'employees' table contains information about the employees. It relates to the 'departments' and 'sales' tables via foreign keys.",
    "columns": {
      "id": "A unique identifier for an employee. This ID is used as a foreign key in the 'sales' table.",
      "name": "The name of the employee.",
      "department_id": "The ID of the department the employee belongs to. This is a foreign key referencing the 'id' in the 'departments' table."
    }
  }
}
```

> By default, if no context description json file is provided in the config, the 
agent will automatically generate the file using the built-in Postgres table/column comments.

* `schema_tools`: When set to `True`, activates a retrieval mode where the agent
  systematically requests only the parts of the schemas relevant to the current query. 
  When this option is enabled, the agent performs the following steps:

    1. Asks for table names.
    2. Asks for table descriptions and column names from possibly relevant table
       names.
    3. Asks for column descriptions from possibly relevant columns.
    4. Writes the SQL query.

  Setting `schema_tools=True` is especially useful for large schemas where it is costly or impossible 
  to include the entire schema in a query context. 
  By selectively using only the relevant parts of the context descriptions, this mode
  reduces token usage, though it may result in 1-3 additional OpenAI API calls before
  the final SQL query is generated.

## Putting it all together

In the code below, we will allow the agent to generate the context descriptions
from table comments by excluding the `context_descriptions` config option.
We set `schema_tools` to `True` to enable the retrieval mode.

```python
from langroid.agent.special.sql.sql_chat_agent import (
    SQLChatAgent,
    SQLChatAgentConfig,
)

# Initialize SQLChatAgent with a PostgreSQL database URI and enable schema_tools
agent = SQLChatAgent(gi
config = SQLChatAgentConfig(
    database_uri="postgresql://example.db",
    schema_tools=True,
)
)

# Run the task to interact with the SQLChatAgent
task = Task(agent)
task.run()
```

By following these steps, you should now be able to set up an `SQLChatAgent`
that interacts with a PostgreSQL database, making querying a seamless
experience.

In the `langroid` repo we have provided a ready-to-use script
[`sql_chat.py`](https://github.com/langroid/langroid/blob/main/examples/data-qa/sql-chat/sql_chat.py)
based on the above, that you can use right away to interact with your PostgreSQL database:

```python
python3 examples/data-qa/sql-chat/sql_chat.py
```

This script will prompt you for the database URI, and then start the agent.

---

# Langroid Supported LLMs and Providers

Langroid supports a wide range of Language Model providers through its 
[`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig] class. 

!!! note "OpenAIGPTConfig is not just for OpenAI models!"
    The `OpenAIGPTConfig` class is a generic configuration class that can be used
    to configure any LLM provider that is OpenAI API-compatible.
    This includes both local and remote models.

You would typically set up the `OpenAIGPTConfig` class with the `chat_model`
parameter, which specifies the model you want to use, and other 
parameters such as `max_output_tokens`, `temperature`, etc
(see the 
[`OpenAIGPTConfig`][langroid.language_models.openai_gpt.OpenAIGPTConfig] class
and its parent class 
[`LLModelConfig`][langroid.language_models.base.LLMConfig] for
full parameter details):


```python
import langroid.language_models as lm
llm_config = lm.OpenAIGPTConfig(
    chat_model="<model-name>", # possibly includes a <provider-name> prefix
    api_key="api-key", # optional, prefer setting in environment variables
    # ... other params such as max_tokens, temperature, etc.
)
```

Below are `chat_model` examples for each supported provider.
For more details see the guides on setting up Langroid with 
[local](https://langroid.github.io/langroid/tutorials/local-llm-setup/) 
and [non-OpenAI LLMs](https://langroid.github.io/langroid/tutorials/non-openai-llms/).
Once you set up the `OpenAIGPTConfig`, you can then directly interact with the LLM,
or set up an Agent with this LLM, and use it by itself, or in a multi-agent setup,
as shown in the [Langroid quick tour](https://langroid.github.io/langroid/tutorials/langroid-tour/)


Although we support specifying the `api_key` directly in the config
(not recommended for security reasons),
more typically you would set the `api_key` in your environment variables.
Below is a table showing for each provider, an example `chat_model` setting, 
and which environment variable to set for the API key.


| Provider      | `chat_model` Example                                     | API Key Environment Variable |
|---------------|----------------------------------------------------------|----------------------------|
| OpenAI        | `gpt-4o`                                                 | `OPENAI_API_KEY` |
| Groq          | `groq/llama3.3-70b-versatile`                            | `GROQ_API_KEY` |
| Cerebras      | `cerebras/llama-3.3-70b`                                 | `CEREBRAS_API_KEY` |
| Gemini        | `gemini/gemini-2.0-flash`                                | `GEMINI_API_KEY` |
| DeepSeek      | `deepseek/deepseek-reasoner`                             | `DEEPSEEK_API_KEY` |
| GLHF          | `glhf/hf:Qwen/Qwen2.5-Coder-32B-Instruct`                | `GLHF_API_KEY` |
| OpenRouter    | `openrouter/deepseek/deepseek-r1-distill-llama-70b:free` | `OPENROUTER_API_KEY` |
| Ollama        | `ollama/qwen2.5`                                         | `OLLAMA_API_KEY` (usually `ollama`) |
| VLLM          | `vllm/mistral-7b-instruct`                               | `VLLM_API_KEY` |
| LlamaCPP      | `llamacpp/localhost:8080`                                | `LLAMA_API_KEY` |
| Generic Local | `local/localhost:8000/v1`                                | No specific env var required |
| LiteLLM       | `litellm/anthropic/claude-3-7-sonnet`                    | Depends on provider |
|               | `litellm/mistral-small`                                  | Depends on provider |
| HF Template   | `local/localhost:8000/v1//mistral-instruct-v0.2`         | Depends on provider |
|               | `litellm/ollama/mistral//hf`                             | |

## HuggingFace Chat Template Formatting

For models requiring specific prompt formatting:

```python
import langroid.language_models as lm

# Specify formatter directly
llm_config = lm.OpenAIGPTConfig(
    chat_model="local/localhost:8000/v1//mistral-instruct-v0.2",
    formatter="mistral-instruct-v0.2"
)

# Using HF formatter auto-detection
llm_config = lm.OpenAIGPTConfig(
    chat_model="litellm/ollama/mistral//hf",
)
```